QA Chapter 1 2 3 4 5
QA Chapter 1 2 3 4 5
QA Chapter 1 2 3 4 5
The information provided in this document is intended solely for you. Please do not freely distribute.
2
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Define and distinguish between the probability density function, the cumulative
distribution function and the inverse cumulative distribution function, and calculate
probabilities based on each of these functions.
Define joint probability, describe a probability matrix and calculate joint probabilities
using probability matrices.
Define and calculate a conditional probability, and distinguish between conditional and
unconditional probabilities.
3
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
A continuous random variable (X) can take on an infinite number of values within an interval:
( < < )= ( )
4
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
A discrete random variable (X) assumes a value among a finite set including x1, x2, x3 and so
on. The probability function is expressed by:
( = )= ( )
5
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Comparison
Continuous Discrete
Are measured Are counted
Infinite Finite
Applications in Finance
Distance, Time (e.g.) Default (1,0) (e.g.)
Severity of loss (e.g.) Frequency of loss (e.g.)
Asset returns (e.g.)
For example:
Normal Bernoulli (0/1)
Student’s t Binomial (series i.i.d. Bernoullis)
Chi-square Poisson
F distribution Logarithmic
Lognormal
Exponential The four in orange are “sampling
Gamma, Beta distributions”: e.g., the student’s t is
EVT Distributions (GPD, GEV) used to test the sample mean
Gamma, Beta
EVT Distributions (GPD, GEV)
“… we will be working with both discrete and continuous random variables. Discrete random
variables can take on only a countable number of values—for example, a coin, which can only
be heads or tails, or a bond, which can only have one of several letter ratings (AAA, AA, A,
BBB, etc.).
… In contrast to a discrete random variable, a continuous random variable can take on any
value within a given range. A good example of a continuous random variable is the return of a
stock index. If the level of the index can be any real number between zero and infinity, then the
return of the index can be any real number greater than −1.
Even if the range that the continuous variable occupies is finite, the number of values that it can
take is infinite. For this reason, for a continuous variable, the probability of any specific value
occurring is zero.
Even though we cannot talk about the probability of a specific value occurring, we can talk about
the probability of a variable being within a certain range. Take, for example, the return on a
stock market index over the next year. We can talk about the probability of the index return
being between 6% and 7%, but talking about the probability of the return being exactly 6.001%
is meaningless. Even between 6.0% and 7.0% there are an infinite number of possible values.
The probability of any one of those infinite values occurring is zero.”
6
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The probability density function answers a “local” question: If the random variable is discrete,
the pdf (a.k.a., probability mass function, pmf, if discrete) is the probability the variable will
assume an exact value of x; i.e., : ( = ).
If the random variable is continuous, the pdf tells us the likelihood of outcomes occurring on
an interval between any two points. Given our continuous random variable, X, with a probability
p of being between r1 and r2, we can define our pdf, f(x), such that: :∫ ( ) =
The pdf functions are illustrated on the top row in the graph below with continuous (left-hand)
and discrete (right-hand) functions:
The cumulative density function (CDF) associates with either a PMF or PDF (i.e., the CDF can
apply to either a continuous or random variable). The CDF gives the probability the random
variable will be less than, or equal to, some value. CDF: P (X x).
The continuous (left-hand) and discrete (right-hand) cumulative distribution functions are:
( )=∫ ( ) = [ ≤ ] ( )= ( ≤ )
7
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
If F(x) is a cumulative distribution function, then we define F−1(p), the inverse cumulative
distribution, as follows:
( )= ⇔ ( )= s. t. 0 ≤ ≤1
We can examine the inverse cumulative distribution function by applying it to the standard
normal distribution, N(0,1). For example, F-1(95%) = 1.645 because at +1.645 standard
deviations to the right of mean in a standard normal distribution, 95% of the area is to the left
(under the curve).
The inverse cumulative distribution function (CDF) is also called the quantile function;
see http://en.wikipedia.org/wiki/Quantile_function. We can now see why Dowd says that
“VaR is just a quantile [function]”. For example, the 95% VaR is the inverse CDF for p = 5% or p
= 95%; for the standard normal, that is = NORM.S.INV(5%) = -1.645.
Standard Normal
Distribution
p F-1(p)
1.0% (2.326)
5.0% (1.645)
50.0% 0
84.13% 1.000
90.0% 1.282
95.0% 1.645 Must know
97.72% 2.000
99.0% 2.326 Must know
A single variable (univariate) probability distribution is concerned with only a single random
variable; e.g., roll of a die, default of a single obligor.
A multivariate probability density function concerns the outcome of an experiment with more
than one random variable. This includes the simplest case of two variables (i.e., a bivariate
distribution).
Density Cumulative
Univariate f(x)= P (X = x) F(x) = P (X ≤ x)
Bivariate f(x)= P (X = x, Y =y) F(x) = P (X ≤ x, Y ≤ y)
8
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
( )=
For example, consider a craps roll of two six-sided dice. What is the probability of rolling a
seven? i.e., P[X=7]. There are six outcomes that generate a roll of seven: 1+6, 2+5, 3+4, 4+3,
5+2, and 6+1. Further, there are 36 total outcomes. Therefore, the probability is 6/36 =1/6.
In this case, the outcomes need to be mutually exclusive, equally likely, and “cumulatively
exhaustive” (i.e., all possible outcomes included in total). A key property of a probability is that
the sum of the probabilities for all (discrete) outcomes is 1.0.
Relative frequency is based on an actual number of historical observations (or Monte Carlo
simulations). For example, here is a simulation (produced in Excel) of one hundred (100) rolls of
a single six-sided die:
Empirical Distribution
Roll Freq. %
1 11 11%
2 17 17%
3 18 18%
4 21 21%
5 18 18%
6 15 15%
Total 100 100%
This relates also to sampling variation. The a priori probability is based on population properties;
in this case, the a priori probability of rolling any number is clearly 1/6th. However, a sample of
100 trials will exhibit sampling variation: the number of threes (3s) rolled above varies from the
parametric probability of 1/6th. We do not expect the sample to produce 1/6th perfectly for each
outcome.
9
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
For a given random variable, the probability of any of two mutually exclusive events occurring is
just the sum of their individual probabilities. In statistical notation, we can write:
[ ∪ ] = [ ] + [ ] if mutally exclusive
This equality is true only for mutually exclusive events. This property of mutually exclusive
events can be extended to any number of events. The probability that any of n mutually
exclusive events occurs is the sum of the probabilities of (each of) those n events.
The random variables X and Y are independent if the conditional distribution of Y given X equals
the marginal distribution of Y. Since independence implies P (Y=y | X=x) = P(Y=y):
P(X = x, Y = y)
P( = | = )=
P (X = x)
Statistical independence is when the value taken by one variable has no effect on the value
taken by the other variable. If the variables are independent, their joint probability will equal
the product of their marginal probabilities. If they are not independent, they are dependent.
That is, random variables X and Y are independent if their joint distribution is equal to the
product of their marginal distributions. For example, when rolling two dice, the outcome of the
second one will be independent of the first. This independence implies that the probability of
rolling double-sixes is equal to the product of P (rolling one six) and P(rolling one six). So, if two
die are independent, then:
P (first roll = 6, second roll = 6) = P (rolling a six) × P (rolling a six) = (1/6) × (1/6) = 1/36
“The joint probability distribution of two discrete random variables, say X and Y, is the probability
that the random variables simultaneously take on certain values, say x and y. The probabilities
of all possible (x, y) combinations sum to 1. The joint probability distribution can be written as
the function P (X = x, Y = y).” — Stock & Watson
10
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
When dealing with the joint probabilities of two variables, it is convenient to summarize the
various probabilities in a probability matrix (a.k.a., probability table).
Example: (Consider this same example for calculating joint, unconditional and conditional
probabilities.) In Miller’s example, we assume a company that issues both bonds and stock.
The bonds can either be downgraded, be upgraded, or have no change in rating. The stock can
either outperform the market or underperform the market.
In the probability matrix above, the interior cells represent the joint probabilities. In this table
below, the joint probabilities are highlighted and the calculation of joint probabilities using the
probability matrix is shown.
For example, the joint probability of both the company's stock outperforming the market
and the bonds being upgraded is 15%.
Similarly, the joint probability of the stock underperforming the market and the bonds
having no change in rating is 25%.
Importantly, all of the joint probabilities add to 100%. Given all the possible events, one
of them must happen.
11
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
We can also derive the joint probability when conditional probabilities (refer to the next section)
are given, like in the table below:
For eg. the joint probability of both the company's stock outperforming the market and
the bonds being upgraded can be calculated as conditional probability of bonds being
upgraded given that stocks are outperforming multiplied by the marginal (unconditional)
probability of stocks outperforming = 30.0% × 50.0% =15.0%.
“The marginal probability distribution of a random variable Y is just another name for its
probability distribution. This term distinguishes the distribution of Y alone (marginal distribution)
from the joint distribution of Y and another random variable. The marginal distribution of Y can
be computed from the joint distribution of X and Y by adding up the probabilities of all possible
outcomes for which Y takes on a specified value”— Stock & Watson
12
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
We can get the unconditional (a.k.a., marginal) probabilities, by adding across a row or down a
column as exhibited in the exterior cells of the probability matrix.
For example, as calculated in the table above, the probability of the bonds being
downgraded, irrespective of the stock's performance, is 25%.
Similarly, the probability of the equity outperforming the market is 50%.
Conditional probability function: The conditional probability is the probability of an outcome
given (or conditional on) another outcome.
Pr(X=x,Y=y)
Conditional probability is: P( = | = )= Pr(X=x)
P(A ∩ B)
(B | ) = ⇒ P(A)P(B | ) = ( ∩ )
P(A)
Again, using the same probability matrix, conditional probability can be calculated.
For example, the conditional probability of a bond upgrade given stocks are
outperforming is the joint probability of a bond upgrade and stock outperformance
divided by the unconditional or marginal probability of stock outperformance =
15.0%/50.0%= 30.0%.
13
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Miller: “The concept of independence is closely related to the concept of conditional probability.
Rather than trying to determine the probability of the market being up and having rain, we can
ask, ‘What is the probability that the stock market is up given that it is raining?’ We can write this
as a conditional probability: P [ market up | rain].
The vertical bar tells us the probability of the first argument is conditional on the second. We
read this as “The probability of ‘market up’ given ‘rain’ is equal to p.” If the weather and the
stock market are independent, then the probability of the market being up on a rainy day is the
same as the probability of the market being up on a sunny day.
If the weather somehow affects the stock market, however, then the conditional probabilities
might not be equal. We could have a situation where: P [market up | rain] ≠ [market up | no
rain] … In this case, the weather and the stock market are no longer independent. We can no
longer multiply their probabilities together to get their joint probability.”
14
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Chapter Summary
Essential distribution perspectives (terms):
A random variable is described with a probability distribution. A continuous random
variable (X) has an infinite number of values within an interval:
A discrete random variable (X) assumes a value among a finite set including x1, x2, x3
and so on. The probability function is expressed by:
P(X = x ) = f(x )
The probability density function (pdf) answers a “local” question: If the random
variable is discrete, the pdf is the probability the variable will assume an exact value of x.
If the variable is continuous, the pdf tells the likelihood of outcomes occurring on an
interval between any two points.
The cumulative distribution function gives the probability the random variable will be
less than, or equal to, some value. If F(x) is a cumulative distribution function, then
F−1(p), the inverse cumulative distribution, is defined as:
F(x) = p ⇔ F (p) = x s. t. 0 ≤ p ≤ 1
A single variable (univariate) probability distribution is concerned with only a single
random variable. A multivariate probability density function concerns the outcome of
an experiment with more than one random variable.
15
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
16
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
a) 25.750%
b) 28.300%
c) 31.250%
d) 44.667%
300.2. Assume the probability density function (pdf) of the final value (at maturity) of a zero-
coupon bond with a notional value of $5.00 is given by f(x) = (3/125)*x^2 on the domain [0,5]
where x is the price of the bond:
3
f(x) = s. t. 0 ≤ x ≤ 5 where x = bond price
125
Although the mean of this distribution is $3.75,assume the expected final payoff is a return of
the full par of $5.00.If we apply the inverse cumulative distribution function and find the price of
the bond (i.e.,the value of x) such that 5.0% of the distribution is less than or equal to (x),let this
price be represented by q(0.05); in other words,a 5% quantile function.If the 95.0% VaR is given
by -[q(0.05) - 5] or [5 - q(0.05)],which is nearest to this 95.0% VaR?
a) $1.379
b) $2.842
c) $2.704
d) $3.158
301.1. A random variable is given by the discrete probability function f(x) = P[X = x(i)] = a*X^3
such that x(i) is a member of {1, 2, 3} and (a) is a constant. That is, X has only three discrete
outcomes. What is the probability that X will be greater than its mean? (bonus: what is the
distribution's variance?)
( )= ∈ {1,2,3}
a) 45.8%
b) 50.0%
c) 62.3%
d) 75.0%
17
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
301.2. A credit asset has a principal value of $6.0 with probability of default (PD) of 3.0% and a
loss given default (LGD) characterized by the following continuous probability density function
(pdf): f(x) = x/18 such that 0 ≤ x ≤ $6. Let expected loss (EL) = E[PD*LGD]. If PD and LGD are
independent, what is the asset's expected loss? (note: why does independence matter?)
( )= 0≤ ≤6
18
a) $0.120
b) $0.282
c) $0.606
d) $1.125
302.1. There is a prior (unconditional) probability of 20.0% that the Fed will initiate Quantitative
Easing 4 (QE 4). If the Fed announces QE 4, then Macro Hedge Fund will outperform the
market with a 70% probability. If the Fed does not announce QE 4, there is only a 40%
probability that Macro will outperform (and a 60% that Acme will under-perform; like the Fed's
announcement, there are only two outcomes). If we observe that Macro outperforms the market,
which is nearest to the posterior probability that the Fed announced QE 4?
a) 20.0%
b) 27.9%
c) 30.4%
d) 41.6%
18
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Answers:
300.1. C. 31.250%
(Question to ponder: how can this be a CDF if, as a function, it does not appear to start at zero
and end at 1.0?)
300.2. D. $3.158
301.1. D. 75.0%
Because it is a probability function, a*1^3 + a*2^3 + a*3^3 = 1.0; i.e., 1a + 8a + 27a = 1.0, such
that a = 1/36.
Mean = 1*(1/36) + 2*(8/36) + 3*(27/36) = 2.722.
The P [X > 2.2722] = P[X = 3] = (1/36)*3^3 = 27/36 = 75.0%
Bonus: Variance = (1 -2.722)^2*(1/36) + (2 -2.722)^2*(8/36) + (3 -2.722)^2*(27/36) = 0.2562,
with standard deviation = SQRT(0.2562) = 0.506135
19
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
301.2. A. $0.120
If PD and LGD are not independent, then E[PD*LGD] <> E(PD) * E(LGD); for example, if they
are positively correlated, then E[PD*LGD] > E(PD) * E(LGD).
302.1. C. 30.4%
Per Bayes, P(QE 4 | Macro Outperforms) = Joint Prob (QE 4, Outperforms) / Unconditional Prob
(Outperforms) = (20%*70%)/(20%*70% + 80%*40%) = 14%/46% = 30.435%
20
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
You are invested in two hedge funds. The probability that hedge fund Alpha generates positive
returns in any given year is 60%. The probability that hedge fund Omega generates positive
returns in any given year is 70%. Assume the returns are independent. What is the probability
that both funds generate positive returns in a given year? What is the probability that both funds
lose money?
Answer:
Question 2:
Corporation ABC issues $100 million of bonds. The bonds are rated BBB. The probability that
the rating on the bonds is upgraded within the year is 8%. The probability of a downgrade is 4%.
What is the probability that the rating remains unchanged?
Answer:
88%. The sum of all three events—upgrade, downgrade, and no change—must sum to one.
There is no other possible outcome. 88% + 8% + 4% = 100%.
Question 3:
Stock XYZ has a 20% chance of losing more than 10% in a given month. There is also a 30%
probability that XYZ gains more than 10%. What is the probability that stock XYZ either loses
more than 10% or gains more than 10%?
Answer:
50%. The outcomes are mutually exclusive; therefore, 20% + 30% = 50%.
Question 4:
There is a 30% chance that oil prices will increase over the next six months. If oil prices
increase, there is a 60% chance that the stock market will be down. What is the probability that
oil prices increase and the stock market is down over the next six months?
Answer:
[oil up ∩ stock market down] = [stock market down|oil up] ∙ [oil up]
[oil up ∩ stock market down] = 60% ∙ 30% = 18%
21
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 5:
( )= (100 − )for − 10 ≤ x ≤ 10
0 otherwise
Calculate the value of c.
Answer:
Given the density function, we can find cby noting that the sum of probabilities must be equal to
one:
1
( ) = (100 − ) = 100 −
3
3
=
4,000
Question 6:
( )= (20 − )
100
Check that this is a valid CDF; that is, show that (0) = 0 and (10) = 1. Calculate the
probability density function, f(x).
Answer:
First we check that this is a valid CDF, by calculating the value of the CDF for the minimum and
maximum values of x:
0
(0) = (20 − 0) = 0
100
10
(10) = (20 − 10) = 1
100
Next we calculate the PDF by taking the first derivative of the CDF:
20 2 1
( )= ( )= − = (10 − )
100 100 50
22
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 7:
( )=
where 1 ≤ ≤ . Calculate the cumulative distribution function, F(x), and solve for the
constant c.
Answer:
( )= ( ) = = [ln ] = ln
We first try to find cusing the fact that the CDF is zero at the minimum value of x, x =0.
(0) = ln(1) = ∙ 0 = 0
As it turns out, any value of cwill satisfy this constraint, and we cannot use this to determine c.
If we use the fact that the CDF is 1 for the maximum value of x, x = e, we find that c =1:
( ) = ln( ) = ∙ 1 =
∴ =1
( ) = ln( )
Question 8:
You own two bonds. Both bonds have a 30% probability of defaulting. Their default probabilities
are statistically independent. What is the probability that both bonds default? What is the
probability that only one bond defaults? What is the probability that neither bond defaults?
Answer:
For the second part of the question, remember that there are two scenarios in which only one
bond defaults: Either the first defaults and the second does not, or the second defaults and the
first does not.
23
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 9:
The following table is a one-year ratings transition matrix. Given a bond’s rating now, the matrix
gives the probability associated with the bond having a given rating in a year’s time. For
example, a bond that starts the year with an A rating has a 90% chance of maintaining that
rating and an 8% chance of migrating to a B rating. Given a B-rated bond, what is the probability
that the bond defaults (D rating) over one year? What is the probability that the bond defaults
over two years?
Answer:
The probability that a B-rated bond defaults over one year is 2%. This can be read directly from
the last column of the second row of the ratings transition matrix.
The probability of default over two years is 4.8%. During the first year, a B-rated bond can either
be upgraded to an A rating, stay at B, be downgraded to C, or default. From the transition
matrix, we know that the probability of these events is 10%, 80%, 8%, and 2%, respectively. If
the bond is upgraded to A, then there is zero probability of default in the second year (the last
column of the first row of the matrix is 0%). If it remains at B, there is a 2% probability of default
in the second year, the same as in the first year. If it is downgraded to C, there is a 15%
probability of default in the second year. Finally, if a bond defaulted in the first year, it stays
defaulted (the last column of the last row is 100%). Putting all this together, we have:
24
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 10:
Your firm forecasts that there is a 50% probability that the market will be up significantly next
year, a 20% probability that the market will be down significantly next year, and a 30%
probability that the market will be flat, neither up or down significantly. You are asked to
evaluate the prospects of a new portfolio manager. The manager has a long bias and is likely to
perform better in an up market. Based on past data, you believe that the probability that the
manager will be up if the market is up significantly is 80%, and that the probability that the
manager will be up if the market is down significantly is only 10%. If the market is flat, the
manager is just as likely to be up as to be down. What is the unconditional probability that the
manager is up next year?
Answer:
Using Mto represent the market and Xto represent the portfolio manager, we are given the
following information:
M = 50%
[M ] = 20%
[M ] = 30%
X M = 80%
X M = 10%
X M = 50%
The unconditional probability that the manager is up next year, [X ], is then 57%:
X = X M ∙ M + X M ∙ [M ] + X M ∙ [M ]
X = 80% ∙ 50% + 10% ∙ 20% + 50% ∙ 30%
X = 40% + 2% + 15% = 57%
25
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Calculate the mean, standard deviation, and variance of a discrete random variable
Calculate and interpret the covariance and correlation between two random variables.
Interpret the skewness and kurtosis of a statistical distribution, and interpret the
concepts of coskewness and cokurtosis.
Expected value exists when we have a parametric (e.g., normal, binomial) distribution or
probabilities. Expected value is the weighted average of possible values.
( )= + +. . . + =
( )= ( )
26
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
If we have a complete data set, then the mean is a population mean which implies that the
mean is exactly the true (and only true) mean:
1 1
= = ( + +. . . + )
However, in practice, we typically do not have the population. Rather, more often we have only
a subset of the population or a dataset that cannot realistically be considered comprehensive;
e.g., the most recent year of equity returns. A mean of such a dataset, which is much more
likely in practice, is called the sample mean. The sample mean, of course, uses the same
formula:
1 1
̂= = ( + +. . . + )
But the difference between a population parameter (e.g., population mean) and a sample
estimate (e.g., sample mean) is essential to statistics:
Each sample will produce a different sample mean, which is likely to be near the “true”
population mean but different depending on the sample.
We use the sample estimate to infer something about the unobserved population
parameter.
Variance
Variance (and standard deviation) is the second moment, the most common measures of
dispersion. The variance of a discrete random variable Y is given by:
= variance(Y) = E Y − = ( − )
Variance is also expressed as the difference between the expected value of Y2 and the square
of the expected value of Y. This is the more useful variance formula:
= ( − ) = ( ) − [ ( )]
For example, if the probability of loan default (PD) is a Bernoulli trial, what is the variance of
PD?
As [ ] = and [ ] = :
[ ] − ( [ ]) = − = × (1 − )
27
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Properties of variance
=0
= + ; only if independent
= + ; only if independent
=
=
=
= + ; only if independent
= ( )− ( )
Standard deviation
= var(Y) = E( − ) = ( − )
Example 1: Variance of a Bernoulli distribution: For eg. a coin toss where p = 1-p = 1/2 = 0.50.
Example 2: A derivative has a 50/50 chance of being worth either +10 or −10 at expiry. The
variance and the standard deviation of the derivative’s value are calculated below.
28
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The expected value of a single six-sided die is 3.5 (the average outcome or mean).
First, we need to solve for the expected value of X-squared, E[X2]. This is given by:
1 1 1 1 1 1 91
[ ]= (1 ) + (2 ) + (3 ) + (4 ) + (5 ) + (6 ) =
6 6 6 6 6 6 6
Then, we need to square the expected value of X, [E(X)]2 such that the variance of a
single six-sided die is given by:
91
( )= ( ) − [ ( )] = − (3.5) ≅ 2.92
6
The standard deviation is therefore the square root of variance =√2.92 = 1.708
Variance of the total of two six-sided die cast together: It is simply the variance X (=2.92)
plus the variance Y(=2.92) or about 5.83. The reason we can simply add them together is that
they are independent random variables.
29
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Sample Variance
= ∑ ( − )
If the mean is known or we are calculating the population variance, then we divide by k but if
instead the mean is being estimated, then we divide by k – 1 like it is shown here.
The above sample variance is used by Hull, for example, to calculate historical variance (and
volatility) of asset returns. Specifically, it employs a sample variance (which divides by k-1
or n-1) to compute historical volatility. Admittedly, because the variable is daily returns, Hull
subsequently makes two simplifying assumptions, including reversion to division by (n) or (k).
However, the point remains: when computing the volatility (standard deviation) of an historical
set of the returns, the square root of the above sample variance it typically appropriate: it gives
an unbiased estimate (of the variance, at least).
This is merely the square root of the sample variance. The unbiased estimate of the sample
standard deviation is given by:
1
= ( − )
−1
This formula is important because this is the technically a safe way to calculate sample
volatility; i.e., when in doubt, you are rarely mistaken to employ the (n-1) or (k-1).
Example: Assume that the mean of daily Standard & Poor’s (S&P) 500 Index returns is zero.
You observe the returns (Daily Return) given in the table over the course of 10 days. Estimate
the standard deviation of daily S&P 500 Index returns.
30
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Note: We were told to assume the mean was known, so we divide by n = 10, not n−1 = 9.
1
Sample variance is: = ∑ ( − 0) = 0.009630.
Mean: The mean of a discrete random variable is a special case of the weighted mean,
where the outcomes are weighted by their probabilities, and the sum of the weights is
equal to one. For a random variable, X, with possible values, xi, and corresponding
probabilities pi, we define the mean, μ, as:
31
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Median: The median of a discrete random variable is the value such that the probability
that a value is less than or equal to the median is equal to 50%. Working from the other end
of the distribution, we can also define the median such that 50% of the values are greater than
or equal to the median. For a random variable, X, if we denote the median as m, we have:
[ ≥ ] = [ ≤ ] = 0.50
Mode: For a discrete random variable, the mode is the value associated with the highest
probability. As with population and sample data sets, the mode of a discrete random variable
need not be unique.
Example: At the start of the year, a bond portfolio consists of two bonds, each worth $100. At
the end of the year, if a bond defaults, it will be worth $20. If it does not default, the bond will be
worth $100. The probability that both bonds default is 20%. The probability that neither bond
defaults is 45%. What are the mean, median, and mode of the year-end portfolio value?
Solution: We are given the probability for two outcomes for the value of portfolio at the end of
the year:
[ = $40] = 20%
[ = $200] = 45%
The third and only alternate outcome can be the scenario where one bond defaults whereas the
other one does not. In this case, the value of portfolio will be:
At year-end, the value of the portfolio, V, can have only one of three values, and the sum of all
the probabilities must sum to 100% (property of discrete random variable).
This allows us to calculate the final probability: [ = $120] = 100% − 20% − 45% = 35%
The mean of V is then $140: = 0.20 × $40 + 0.35 × $120 + 0.45 × $200 = $140
The median of the distribution is $120; the 50th percentile falls at $120.
The mode of the distribution is $200; this is the most likely single outcome because its
probability is highest at 45%.
32
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Covariance is analogous to variance, but instead of looking at the deviation from the mean of
one variable, we are look at the relationship between the deviations of two variables. Put
another way, the covariance is the average cross-product: if the means of both variables, X and
Y, are known we can use the following formula for covariance, which might be called a
population covariance.
1
= ( − )( − )
If the true means are unknown, we calculate covariance using the same means; however, if we
know only the sample means we calculate a sample covariance. The sample covariance divides
the sum of cross-products by (n-1) rather than n.
1
= ( − ̂ )( − ̂ )
−1
What is the covariance of a variable with itself, i.e., what is covariance (X,X)? It is the
variance of X. It will be helpful to keep in mind that a variable’s covariance with itself is its
variance. For example, knowing this, we realize that the diagonal in a covariance matrix is
populated with variances, because variance is a special case of covariance!
Properties of covariance
33
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Correlation
Correlation is a key measure in the FRM and is typically denoted by Greek rho (ρ). Correlation
is the covariance between two variables divided by the product of their respective standard
deviations (a.k.a., volatilities).
= where = ( , ) = [( − ) − ]
The correlation coefficient translates covariance into a unitless metric that runs from -1.0
to +1.0.
We may also refer to the computationally similar sample correlation, which is sample covariance
divided by the product of sample standard deviations:
Correlation (or dependence) is not causation. For example, in a basket credit default swap,
the correlation (dependence) between the obligors is a key input. But we do not assume there is
mutual causation (e.g., that one default causes another). Rather, more likely, different obligors
are similarly sensitive to economic conditions. So, economic deterioration may be the external
cause that all obligors have in common. Consequently, their defaults exhibit dependence. But
the causation is not internal.
Example: Below we illustrate the application of the covariance and the correlation coefficient.
Given in the table below are the growth projections of two products, Gold (G) and Bitcoin (B)
and their joint probabilities. For both products, we have three scenarios (bad, medium, and
good). Probabilities are assigned to each growth scenario:
20% chance of gold growing at 3.0% and bitcoin growing at 5.0%
60% chance of gold growing at 9.0% and bitcoin growing at 7.0%
20% chance of gold growing at 12.0% and bitcoin growing at 9.0%
34
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
As seen in the table above, for solving this, the calculation of expected values is required: E(G),
E(B), E(GB), E(G2) and E(B2). Make sure you can replicate the following two steps:
The covariance is equal to E(GB) – E(G)E(B) = 62.40 – (8.40 × 7.00) =3.60.
The correlation coefficient is equal to the Cov(G,B) divided by the product of the
standard deviations: 3.60/(2.94 × 1.26) = 0.968 = 97% .
Another example: Here is another example with the same products, Gold (G) and Bitcoin (B)
but with different scenarios (lower and higher growth) and their joint probabilities. The joint
probabilities are: 30% chance of gold growing at 4.0% and bitcoin growing at 3.0%; 15% chance
of gold growing at 4.0% and bitcoin growing at 5.0 %; 20% chance of gold growing at 9.0% and
bitcoin growing at 3.0 %; and 35% chance of gold growing at 9.0% and bitcoin growing at 5.0 %:
The table below shows the calculation of covariance (calculated in two different ways) and
correlation. Note that variance does not divide sum by (n-1) because these are not samples.
Covariance method #1 (M1) is [( − ) − ]. For example, first value is: 0.83 =
(4.0 - 6.75) * (3.0 - 4.00) * 30.0%. Likewise, all these values are summed to give
covariance of 0.750.
Covariance method #2 (M2) is equivalent to [ ] − [ ] [ ]: 27.750 – (6.750*4.000)
= 0.750
Correlation (30%) is the Cov(G,B) divided by the product of the standard deviations.
35
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
( + + )= + +
Variance
Regarding the sum of correlated variables, the variance of correlated variables is given by the
following equations. Please know this substitution: = .
= + + or = + +
With reference to the difference between correlated variables, the variance is given by:
= + − or = + −
36
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
If we need to determine the variance of a portfolio of securities, all we need to know is the
variance of the underlying securities and their respective correlations. If the portfolio is a linear
combination of XA and XB, such that = + , then,
= +2 +
= + 2ℎ +ℎ
The minimum variance hedge ratio is ratio which would achieve the portfolio with the least
variance. It is found by taking the derivative of the equation for the portfolio variance with
respect to h, and setting it equal to zero such that:
ℎ∗ = −
Example: A portfolio manager owns (has a long position in) $100 of Security A which has a
volatility of 12.0%. She wants to hedge with Security B which has a volatility of 30.0%. The
correlation between the securities is 0.60. What position in Security B utilizes the minimum
variance hedge ratio to create a portfolio with the minimum dollar ($) standard deviation?
The minimum hedge ratio, in this case is -0.24, such that the trade is short 0.24 for each
$1.0 in Security A. Therefore, -0.24 * $100 in Security A = short $24.00 in Security B.
So, asset B is short $24.00 such that the net portfolio size happens to end up at net long
$76.00 (=100 – 24).
The portfolio's standard deviation is the minimum standard deviation which is:
100 × 12% + (−24) × 30% + 2 ∗ 0.60 × $100 × (−$24) × 12% × 30% = $9.6
37
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The below graph illustrates the smallest variance that can be achieved with this minimum
variance hedge ratio calculated in the example.
= [ ]
We refer to as the kth moment of X. But this is a raw moment and we are generally
more concerned with central moments; central moments are “moments about the mean”
or sometimes “moments around the mean.”
The kth moment about the mean (), or kth central moment, is given by:
∑ ( − )
moment =
or equivalently,
= [( − ) ]
In this way, the difference of each data point from the mean is raised to a power (k=1, k=2, k=3,
and k=4). There are the four moments of the distribution:
If k=1, this refers to the first moment about zero: the mean.
If k=2, this refers to the second moment about the mean: the variance.
38
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
With respect to skewness and kurtosis, it is common to standardize the moment, such that:
If k=3, then the third moment divided by the cube of the standard deviation returns the
skewness
If k=4, then the fourth moment divided by the square of the variance (standard
deviation^4) about the mean returns the kurtosis; a.k.a., tail density, peakedness.
[( − ) ]
Skewness = =
Note: If you aren't using a probability distribution (but rather a sample), you need to take the
average of the sum of ( − ) to get the [( − ) ]. But that's a "raw" or un-
standardized 3rd central moment. To standardize it, it is then divided by the cube of standard
deviation.
So, skewness is not actually the (raw) third moment, or even the third moment about the mean.
Skewness is the standardized central third moment: the third moment about the mean
divided by the cube of the standard deviation. For example, the gamma distribution has positive
skew (skew > 0) as seen in the figure below.
Gamma Distribution
Positive (Right) Skew
1.20
1.00
alpha=1, beta=1
0.80
0.60 alpha=2, beta=.5
0.40
0.20 alpha=4, beta=.25
-
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
39
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Kurtosis
[( − )
Kurtosis = =
Please note that kurtosis is not actually the (raw) fourth moment, or even the fourth moment
about the mean. Kurtosis is the standardized central fourth moment: the fourth moment
about the mean divided by square of the variance (or the fourth power of standard deviation).
A normal distribution has relative skewness of zero and kurtosis of three (or the same
idea put another way: excess kurtosis of zero).
Relative skewness > 0 indicates positive skewness (a longer right tail) and relative
skewness < 0 indicates negative skewness (a longer left tail).
Kurtosis greater than three (>3), which is the same thing as saying “excess kurtosis > 0,”
indicates high peaks and fat tails (leptokurtic). Kurtosis less than three (<3), which is the
same thing as saying “excess kurtosis < 0,” indicates lower peaks.
Kurtosis is thus a measure of tail weight (heavy, normal, or light-tailed) and
“peakedness”: kurtosis > 3.0 (or excess kurtosis > 0) implies heavy-tails. Financial asset
returns are typically considered leptokurtic (i.e., heavy or fat- tailed). For example, the logistic
distribution exhibits leptokurtosis (heavy-tails; kurtosis > 3.0):
Logistic Distribution
Heavy-tails (excess kurtosis > 0)
0.50
0.40 alpha=0, beta=1
0.30 alpha=2, beta=1
0.20 alpha=0, beta=3
0.10
N(0,1)
-
1 4 7 10 13 16 19 22 25 28 31 34 37 40
40
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Example - Miller EOC Problem #6: Calculate the skewness and kurtosis of each of the
following two series (X and Y), given in the table below.
( ) −37128
The skewness of Y can be calculated as := 3 = −0.63
39
[( ) 4133697
The kurtosis of Y is := 4 = 1.787
39
Just as we generalized the concept of mean and variance to moments and central moments, we
can generalize the concept of covariance to cross central moments. The third and fourth
standardized cross central moments are referred to as coskewness and cokurtosis,
respectively.
For two random variables, there are two non-trivial coskewness statistics.
= [( − ) ( − )]/ 2
2
= [( − )( − ) ]/
In general, for (n) random variables, the number of non-trivial cross-central moments of
order (m) is given by:
( + − 1)!
K= −
! ( − 1)!
In this case, nontrivial means that we have excluded the cross moments that involve only one
variable (i.e., standard skewness and kurtosis). To include the nontrivial moments, we would
simply add n to this result.
( + 2)( + 1)
= −
6
41
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Example: Continuing with our earlier example of Gold and Bitcoin and given their probability
matrix, we find the coskewness and cokurtosis.
X
4.0 9.0
3.0 30% 20% 50%
Y
5.0 15% 35% 50%
45% 55% 100%
X central moments Y central moments
X Y Prob[X,Y] X Y 2 3 4 2 3 4
4.0 3.0 30.0% 1.20 0.90 2.27 -6.24 17.16 0.30 -0.30 0.30
4.0 5.0 15.0% 0.60 0.75 1.13 -3.12 8.58 0.15 0.15 0.15
9.0 3.0 20.0% 1.80 0.60 1.01 2.28 5.13 0.20 -0.20 0.20
9.0 5.0 35.0% 3.15 1.75 1.77 3.99 8.97 0.35 0.35 0.35
Avg: 6.50 4.00
Sum: 6.75 4.00 6.188 -3.094 39.832 1.000 0.000 1.000
Standard Deviations Std Dev: 2.487 1.000
Standardized 3rd central moments Skew(X) -0.20 Skew(Y) 0.00
Standardized 4th central moments Kurt(X) 1.040 Kurt(Y) 1.000
Co-Skew Co-Kurtosis
S(XXY) S(XYY) K(XXXY) K(XXYY) K(XYYY)
(2.27) (0.83) 6.24 2.27 0.83
1.13 (0.41) (3.12) 1.13 (0.41)
(1.01) 0.45 (2.28) 1.01 (0.45)
1.77 0.79 3.99 1.77 0.79
Cross-central Moments (CCM; sum) -0.3750 0.0000 4.8281 6.1875 0.7500
Standardized CCM; i.e., co-skew, co-kurt -0.0606 0.0000 0.3137 1.0000 0.3015
From the table, for eg. when calculating coskewness , initially to find the cross
central moment, the first value is found as: -2.27 = (4.0 - 6.75)2 * (3.0 - 4.00) * 30.0%.
Likewise, all such four values are summed to give the cross central moment of -0.3750.
Since = [( − ) ( − )]/ 2 , we divide by the cross central moment by
square of the standard deviation of X times the standard deviation of Y to get the
standardized cross central moment of -0.0606 = -0.3750 / (2.4872 * 1.000).
Similarly, coskewness is found to be 0.0000 and cokurtosis , and
are 0.3137, 1.0000 and 0.3015 respectively.
42
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
1
̂=
In the Stock & Watson example, the average (mean) wage among 200 people is $22.64 as
shown below in the table:
Please note:
The average wage of n = 200 observations is $22.64
The standard deviation of this sample is $18.14
The standard error of the sample mean is $1.28 because $18.14/SQRT(200) = $1.28
The degrees of freedom (d.f.) in this case are 199 = 200 – 1
In the above example, the sample mean is an estimator of the unknown, true population mean
(in this case, the same mean estimator gives an estimate of $22.64).
“An estimator is a recipe for obtaining an estimate of a population parameter. A simple analogy
explains the core idea: An estimator is like a recipe in a cook book; an estimate is like a cake
baked according to the recipe.” - Barreto & Howland, Introductory Econometrics
43
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Chapter Summary
The expectation of the random variable is often called the mean or arithmetic mean.
Expected value is the weighted average of possible values. In the case of a discrete random
variable, expected value is given by:
( )= + +. . . + =
( )= ( )
If we have a complete data set, then the mean is a population mean which implies that the
mean is exactly the true (and only true) mean:
1 1
= = ( + +. . . + )
Variance(Y): = E[( − ) ]= ( − )
= [( − ) ]= ( ) − [ ( )]
44
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
= var(Y) = [( − ) ]= ( − )
1
= ( − )
−1
1
= ( − )( − )
1
= ( − )( − )
−1
Correlation is the covariance between two variables divided by the product of their respective
standard deviations:
The correlation coefficient translates covariance into a unit less metric that runs from -1.0 to
+1.0.
45
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
With regard to the sum of correlated variables, the variance of correlated variables is:
With regard to the difference between correlated variables, the variance is:
With regard to the sum of variance with constants (a) and (b), variance of sum includes
covariance (X,Y):
( + )= +2 +
The kth moment about the mean (), or kth central moment, is given by:
=
∑
=1( − )
or = [( − ) ]
If k=1, this refers to the first moment about zero: the mean.
If k=2, this refers to the second moment about the mean: the variance.
If k=3, then the third moment divided by the cube of the standard deviation returns the
skewness
If k=4, then the fourth moment divided by the square of the variance (or fourth power of
standard deviation) about the mean returns the kurtosis; a.k.a., tail density,
peakedness.
[( − ) ]
= =
( )
= =
A normal distribution has relative skewness of zero and kurtosis of three (or excess
kurtosis of zero). Relative skewness > 0 indicates positive skewness. Kurtosis greater than
three (>3) indicates high peaks and fat tails (leptokurtic).
46
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Coskewness and cokurtosis: The third and fourth standardized cross central moments are
referred to as coskewness and cokurtosis, respectively. In general, for (n) random variables, the
number of non-trivial cross-central moments of order (m) is given by:
( + − 1)!
=
! ( − 1)!
If the sample is random (i.i.d.), the sample mean is the Best Linear Unbiased Estimator
(BLUE). The sample mean is consistent, and the most efficient among all linear unbiased
estimators of the population mean.
47
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
( )= . . 0≤ ≤ 12
304.1. Two assets, X and Y, produce only three joint outcomes: Prob[X = -3.0%, Y = -2.0%] =
30%, Prob[X = +1.0%, Y = +2.0%] = 50%, and Prob[X = +5.0%, Y = +3.0%] = 20%:
What is the correlation between X & Y? (Bonus question: if we removed the probabilities and
instead simply treated the three sets of returns as a small, [tiny actually!] historical sample,
would the sample correlation be different?)
a) 0.6330
b) 0.7044
c) 0.8175
d) 0.9286
305.1. A two-asset portfolio contains a long position in commodity (T) with volatility of 10.0%
and a long position in stock (S) with volatility of 30.0%. The assets are uncorrelated: rho(T,S) =
zero (0). What weight (0 to 100%) of the portfolio should be allocated to the commodity if the
goal is a minimum variance portfolio (in percentage terms, as no dollars are introduced)?
a) 62.5%
b) 75.0%
c) 83.3%
d) 90.0%
48
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
306.1. In credit risk (Part 2) of the FRM, a single-factor credit risk model is introduced. This
model gives a firm's asset return, r(i), by the following sum of two components:
= + 1− , ~ (0,1)
In this model, a(i) is a constant, while (F) and epsilon (e) are random variables. Specifically, (F)
and (e) are standard normal deviates with, by definition, mean of zero and variance of one ("unit
variance"). If the value of a(i) is 0.750 and the covariance[F,e(i)] is 0.30, which is nearest to
variance of the asset return, variance[r(i)]?
a) 0.15
b) 1.30
c) 1.47
d) 1.85
307.1. A bond has a default probability of 5.0%. Which is nearest, respectively, to the skew (S)
and kurtosis (K) of the distribution?
a) S = 0.0, K = 2.8
b) S = 0.8, K = -7.5
c) S = 4.1, K = 18.1
d) S = 18.9, K = 4.2
49
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Answers:
303.1. C. 8.0
If this is a valid probability (pdf) then a*(1/2)*x^2 evaluated over [0,12] must equal one:
a*(1/2)*12^2 = 1.0, and a = 1/72.
Therefore, the pdf function is given by f(x) = x/72 over the domain of [0,12].
The mean = Integral of x*f(x) = x*(1/72)*x = Integral of x^2/72 over [0,12] = x^3/216 over [0,12] =
12^3/216 = 8.0
304.1. D. 0.9286
50
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
If we removed probabilities and treated as a (very) small historical sample, the sample is
different at 0.945. There are two reasons:
1. The historical sample (by default) treats the observations as equally-weighted; and,
2. A sample correlation divides the sample covariance by sample standard deviations,
where (n-1) is used in the denominator instead of (n);
In this way the sample covariance is larger, ceteris paribus, than a population-type variance,
and so are sample standard deviations.
305.1. D. 90.0%
We want the value of (w) that minimizes the portfolio variance, so we take the first derivative
with respect to w:
dPVARP/dw = d[w^2*0.01 + 0.09*(1 - 2*w + w^2)]/dw = d[w^2*0.01 + 0.09 - 0.18*w +
0.09*w^2)]/dw
= 0.02*w - 0.18 + 0.18*w = 0.20*w - 0.18. To find the local minimum, we set the first derivative
equal to zero, and solve for w: let 0 = 0.20*w - 0.18, such that w = 0.18/0.20 = 90.0%.
A portfolio with 90% weight in the commodity and 10% in the stock will have the lowest variance
at 0.0090, which is equal to standard deviation of SQRT(0.0090) = 9.486%;
i.e., lower than either of the asset volatilities. Of course, this optimal mix is variant to changes in
the correlation.
The first derivative can be taken of the generic two-asset portfolio variance such that its
minimum variance is given by:
51
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
306.1. B. 1.30
var(x+y) = var(x) + var(y) + 2*cov(x,y). In this case, x = a*F and y = sqrt(1-a^2)*e, such that:
var[a*F + sqrt(1-a^2)*e] = var(a*F) + var[sqrt(1-a^2)*e] + 2*cov[a*F, sqrt(1-a^2)*e]
= a^2*var(F) + (1-a^2)*var(e) + 2*cov[a*F, sqrt(1-a^2)*e], and since var(F) and var(e) = 1.0, this
is equal to:
= a^2*1.0 + (1-a^2)*1.0 + 2*cov[a*F, sqrt(1-a^2)*e]
= a^2 + 1-a^2 + 2*cov[a*F, sqrt(1-a^2)*e]
= 1.0 + 2*cov[a*F, sqrt(1-a^2)*e]
= 1.0 + 2*a*sqrt(1-a^2)*cov[F,e(i)]; and per cov(a*x,b*y) = a*b*cov(x,y):
= 1.0 + 2*0.75*SQRT(1-0.75^2)*0.30 = 1.2976
52
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Compute the mean and the median of the following series of returns:
Answer:
Question 2:
Compute the sample mean and the standard deviation of the following returns:
Answer:
Question 3:
Prove that Equation 3.2 is an unbiased estimator of the mean. That is, show that [ ̂ ] = .
Equation 3.2:
1 1
̂= = ( + +. . . + )
Answer:
1 1
̂= = ( + +. . . + + )
1 1 1
[ ̂] = [ ]= = ∙ ∙ =
53
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 4:
What is the standard deviation of the estimator in Equation 3.2? Assume the various data points
are i.i.d.
Equation 3.2:
1 1
̂= = ( + +. . . + )
Answer:
Using the results of question 3, we first calculate the variance of the estimator of the mean:
1 1
[( ̂ − ) ] = − = ( − )
1
[( ̂ − ) ] = ( − ) + ( − ) −
1 1
[( ̂ − ) ] = ( − ) + ( − ) −
1 1
[( ̂ − ) ] = [( − ) ] + [( − ) − ]
1 1
[( ̂ − ) ] = + 0
1
[( ̂ − ) ] = =
where σ is the standard deviation of r. In the second to last line, we rely on the fact that,
because the data points are i.i.d., the covariance between different data points is zero. We
obtain the final answer by taking the square root of the variance of the estimator:
= =
√
54
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 5
Answer:
Question 6
Calculate the population mean, standard deviation, and skewness of each of the following two
series:
Answer:
Question 7
Calculate the population mean, standard deviation, and kurtosis for each of the following two
series:
Answer:
55
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 8
( )= for 0 ≤ x ≤ 6
18
Answer:
The mean, µ, is
6 0 6
= = = − = =4
18 3 ∙ 18 3 ∙ 18 3 ∙ 18 3
1
= ( − 4) = ( −8 + 16 )
18 18
1 1 8 6 1 8
= − +8 = 6 − 6+8
18 4 3 18 4 3
= 2(9 − 16 + 8) = 2
56
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 9
1
= ( − ̂ )
−1
[ ]=
Answer:
1 1 −1 1
= ( − ̂ ) = −
−1 −1
1 1
= −
( − 1)
Assuming that all the different values of Xare uncorrelated with each other, we can use the
following two relationships:
= +
= =0∀ ≠
Then:
1 1
[ ]= ( + )− ( − 1) =
( − 1)
57
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 10
Given two random variables, XA and XB, with corresponding means µA and µB and standard
deviations σA and σB, prove that the variance of XA plus XB is:
Var[ + ]= + +2
Answer:
First we note that the expected value of XA plus XB is just the sum of the means:
[ + ]= [ ]+ [ ]= +
[ + ] = [( + − [ + ]) ]
= ( − )+( − )
[ + ] = [( − ) +( − ) + 2( − )( − )]
[ + ] = [( − ) ] + [( − ) ] + 2 [( − )( − )]
[ + ] + +2 [ , ]
[ + ] + +2
Question 11
A $100 notional, zero coupon bond has one year to expiry. The probability of default is 10%. In
the event of default, assume that the recovery rate is 40%. The continuously compounded
discount rate is 5%. What is the present value of this bond?
Answer:
If the bond does not default, you will receive $100. If the bond does default, you will receive
40% ×$100 =$40. The future value, the expected value of the bond at the end of the year, is
then $94:
58
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Describe the central limit theorem and the implications it has when combining i.i.d.
random variables.
Describe a mixture distribution and explain the creation and characteristics of mixture
distributions.
If the random variable, X, is discrete, then the uniform distribution is given by the following
probability mass function (pmf):
1
( )=
This is an extremely simple distribution. Common examples of discrete uniform distributions are:
A coin, where n=2, such that the probability: P[heads] = 1/2 and P[tails] = 1/2; or
A six-sided die, where for example, probability of rolling a one is: P[rolling a one] = 1/6
If the random variable, X, is continuous, the uniform distribution is given by the following
probability density function (pdf):
1
for ≤ x ≤
( )= −
0 for x < or x >
Using this pdf, the mean, is calculated as the average of the start and end values of the
distribution. Similarly, the variance, is calculated as shown below.
1
= ( + )
2
1
= ( − )
12
59
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The uniform distribution is characterized by the following cumulative distribution function (CDF):
− 1
[ ≤ ]=
2− 1
Bernoulli distribution
A random variable X is called Bernoulli distributed with parameter (p) if it has only two
possible outcomes, often encoded as 1 (“success” or “survival”) or 0 (“failure” or “default”), and
if the probability for realizing “1” equals p and the probability for “0” equals 1 – p. The classic
example for a Bernoulli-distributed random variable is the default event of a company.
1
A Bernoulli variable is discrete and has two possible outcomes: =
0
Binomial distribution
A binomial distributed random variable is the sum of (n) independent and identically distributed
(i.i.d.) Bernoulli-distributed random variables. The probability of observing (k) successes is:
!
( = )= (1 − ) where = − ! !
The mean of this random variable is and the variance of a binomial distribution is (1 − ).
The below exhibit shows binomial distribution with p = 0.10, for n = 10, 50, and 100.
60
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Poisson distribution
In the Poisson distribution, the random number of events that occur during an interval of time,
(e.g., losses/ year, failures/ day) is given by:
( = )=
!
If the rate at which events occur over time is constant, and the probability of any one event
occurring is independent of all other events, then the events follow a Poisson process, where t
is the amount of time elapsed (i.e, the expected number of events before time t is equal to λt):
( = )=
!
In Poisson, lambda is both the expected value (the mean) and the variance!
The exhibit below represents a Poisson distribution for λ and n =2, 4 and 10.
61
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Normal distribution
The normal or Gaussian distribution is often referred to as the bell curve because of the shape
of its probability density function. Characteristics of the normal distribution include:
The middle of the distribution, mu (µ), is the mean (and median). This first moment is
also called the “location”.
Standard deviation and variance are measures of dispersion (a.k.a., shape). Variance is
the second-moment; typically, variance is denoted by sigma-squared such that standard
deviation is sigma.
The distribution is symmetric around µ. In other words, the normal has skewness = 0
The normal has kurtosis = 3 or “excess kurtosis” = 0
62
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
For a random variable X, the probability density function for the normal distribution is:
2
1 −1 −
( )= 2
√2
Conventionally, this is written as X is normally distributed with a mean of µ and variance of σ2:
~ ( , )
The normal distribution is commonplace for at least three (or four) reasons:
The central limit theorem (CLT) says that sampling distribution of sample means tends
to be normal (i.e., converges toward a normally shaped distributed) regardless of the
shape of the underlying distribution; this explains much of the “popularity” of the normal
distribution.
The normal is economical (elegant) because it only requires two parameters (mean
and variance). The standard normal is even more economical: it requires no
parameters.
The normal is tractable: it is easy to manipulate (especially in regard to closed-form
equations like the Black-Scholes)
Parsimony: It requires (or is fully described by) only two parameters: mean and
variance
It is common to retrieve an historical dataset such as a series of monthly returns and compute
the mean and standard deviation of the series. In some cases, the analyst will stop at that point,
having determined the first and second moments of the data.
Often times, the user is implicitly “imposing normality” by assuming the data is normally
distributed. For example, the user might multiply the standard deviation of the dataset by 1.645
or 2.33 (i.e., normal distribution deviates) in order to estimate a value-at-risk. But notice what
happens in this case: without a test (or QQ-plot, for example) the analyst is merely assuming
normality because the normal distribution is conveniently summarized by only the first two
moments! Many other non-normal distributions have first (aka, location) and second moments
(aka, scale or shape).
In this way, it is not uncommon to see the normal distribution used merely for the sake of
convenience: when we only have the first two distributional moments, the normal is
implied perhaps merely because they are the only moments that have been computed.
63
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
A normal distribution is fully specified by two parameters, mean and variance (or standard
deviation). We can transform a normal into a unit or standardized variable:
Standard normal has mean = 0, and variance = 1
No parameters required!
This unit or standardized variable is normally distributed with zero mean and variance of
one. Its standard deviation is also one (variance = 1.0 and standard deviation = 1.0). This is
written as:
Variable Z is approximately (“asymptotically”) normally distributed: Z ~ N(0,1)
Key locations on the normal distribution are noted below. In the FRM curriculum, the choice of
one-tailed 5% significance and 1% significance (i.e., 95% and 99% confidence) is
common, so please pay particular attention to the yellow highlights:
Memorize the two common critical values: 1.65 and 2.33. These correspond to
confidence levels, respectively, of 95% and 99% for a one-tailed test. For VAR, the one-
tailed test is relevant because we are concerned only about losses (left-tail) not gains (right-tail).
64
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
65
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Lognormal
The lognormal is common in finance: If an asset return (r) is normally distributed, the
continuously compounded future asset price level (or ratio or prices; i.e., the wealth ratio) is
lognormal. Expressed in reverse, if a variable is lognormal, its natural log is normal. Here is an
exhibit of lognormal distribution for µ =10 and at various levels of σ (0.25, 0.5 and 1)
The lognormal distribution is extremely common in finance because it is often the distribution
assumed for asset prices (e.g., stock prices). Specifically, it is common to assume that log
(i.e., continuously compounded) asset returns are normally distributed such that, by
definition, asset prices have a lognormal distribution.
Miller: “If a variable has a lognormal distribution, then the log of that variable has a normal
distribution. So, if log returns are assumed to be normally distributed, then one plus the
standard return will be lognormally distributed.
Unlike the normal distribution, which ranges from negative infinity to positive infinity, the
lognormal distribution is undefined, or zero, for negative values. Given an asset with a standard
return, R, if we model (1 +R) using the lognormal distribution, then R will have a minimum value
of –100%. This feature, which we associate with limited liability, is common to most financial
assets. Using the lognormal distribution provides an easy way to ensure that we avoid returns
less than –100%.
66
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Chi-squared distribution
Chi-squared distribution is the sum of the squares of k independent standard normal random
variables. The variable k is referred to as the degrees of freedom. The below exhibit shows the
probability density functions for some chi-squared distributions with different values for k(1, 2
and 3).
Using a chi-square distribution, we can observe a sample variance and compare to hypothetical
population variance. This variable has a chi-square distribution with (n-1) d.f.
Example (Google’s stock return variance): Google’s sample variance over 30 days is
0.0263%. We can test the hypothesis that the population variance (Google’s “true” variance) is
0.02%. The chi-square variable = 38.14:
With 29 degrees of freedom (d.f.), 38.14 corresponds to 11.93% (i.e., to left of 0.10 on the
lookup table). Therefore, we can reject the null with only 88% confidence; i.e., we are likely to
accept the probability that the true variance is 0.02%.
67
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The student’s t distribution (t distribution) is among the most commonly used distributions. As
the degrees of freedom (d.f.) increases, the t-distribution converges with the normal distribution.
It is similar to the normal, except that it exhibits slightly heavier tails (the lower the d.f., heavier
the tails). The below exhibit shows the basic shape of the student t’s distribution and how it
changes with k (specifically the shape of its tail).
−
=
/√
68
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Example: For example, Google’s average periodic return over a ten-day sample period was
+0.02% with sample standard deviation of 1.54%. Here are the statistics:
The sample mean is a random variable. If we know the population variance, we assume the
sample mean is normally distributed. But if we do not know the population variance (typically the
case!), the sample mean is a random variable following a student’s t distribution. In the
above example, we can use this to construct a confidence (random) interval:
±
√
We need the critical (lookup) t value. The critical t value is a function of:
The critical-t is just a lookup (reference to) the student's t distribution as opposed to a computed
t-statistic, aka t-ratio. In this way, a critical t is an inverse CDF (quantile function) just like, for a
normal distribution, the "critical one-tailed value" at 1% is -2.33 and at 5% is -1.645. In this case
we want the critical t for (n-1) degrees of freedom and two-tailed 5% significance (= one tailed
2.5%). We can find 2.262 on the student's t lookup table where column = 2-tail 0.05 and d.f. = 9.
In Excel, 2.262 = T.INV.2t (5%, 9). The 95% confidence interval can be computed.
1.54%
+ (2.262) = 1.12%
√10
And the lower limit is given by:
1.54%
− (2.262) = −1.08%
√10
Please make sure you can take a sample
standard deviation, compute the critical t
value and construct the confidence
interval.
69
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Both the normal (Z) and student’s t (t) distribution characterize the sampling distribution of the
sample mean. The difference is that the normal is used when we know the population variance;
the student’s t is used when we must rely on the sample variance. In practice, we don’t know
the population variance, so the student’s t is typically appropriate.
( − ) ( − )
= =
√ √
F-Distribution
The F distribution is also called the variance ratio distribution (it may be helpful to think of it as
the variance ratio!). The F ratio is the ratio of sample variances, with the greater sample
variance in the numerator:
Properties of F distribution:
70
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Example: Based on two 10-day samples, we calculated the sample variance of Google and
Yahoo. Google’s variance was 0.0237% and Yahoo’s was 0.0084%. Find the F ratio.
GOOG YHOO
=VAR() 0.0237% 0.0084%
=COUNT() 10 10
F ratio 2.82
Confidence 90%
Significance 10%
=FINV() 2.44
The F ratio, therefore, is 2.82 (divide higher variance by lower variance; the F ratio
must be greater than, or equal to, 1.0).
At 10% significance, with (10-1) and (10-1) degrees of freedom, the critical F value is
2.44. Because our F ratio of 2.82 is greater than (>) 2.44, we reject the null (i.e., that the
population variances are the same).
We conclude the population variances are different.
Triangular Distribution
The triangular distribution is a distribution whose PDF is a triangle, say with a minimum of a, a
maximum of b, and a mode of c. Like the uniform distribution, it has a finite range, but being
only slightly more complex than a uniform distribution, it has more flexibility. The triangular
distribution has a unique mode, and can be symmetric, positively skewed, or negatively skewed.
Its PDF is described by the following two-part function:
⎧ 2( − )
≤ ≤
⎪( − )( − )
( )=
⎨ 2( − )
≤ ≤
⎪( − )( − )
⎩
The exhibit shows a triangular distribution where a, b, and c are 0.0, 1.0, and 0.8, respectively.
71
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Triangular distribution
Three params: a, b & c (mode)
2.5
2.0
a= 0, b= 1, c= 0.8
1.5
1.0
0.5
0.0
0.0 0.3 0.5 0.8 1.0
PDF is zero at both a and b, and the value of f(x) reaches a maximum, 2/(b − a), at c.
The mean and variance are given by:
+ +
=
3
+ + − − −
=
18
Beta distribution
The beta distribution has two parameters: alpha (“center”) and beta (“shape”). The beta
distribution is very flexible, and popular for modeling default and recovery rates.
72
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Example: The beta distribution is often used to model recovery rates. Here are two examples:
one beta distribution to model a junior class of debt (i.e., lower mean recovery) and another for
a senior class of debt (i.e., lower loss given default):
Junior Senior
alpha (center) 2.0 4.0
beta (shape) 6.0 3.3
Mean recovery 25% 55%
Beta distribution
for recovery/LGD
0.03
0.02
Senior
0.01 Junior
0.00
0%
7%
14%
21%
28%
35%
42%
49%
56%
63%
70%
77%
84%
91%
98%
Recovery (Residual Value)
Exponential
The exponential distribution is popular in queuing theory. It is used to model the time we have
to wait until a certain event takes place.
Exponential
2.50
2.00
1.50 0.5
1.00 1
0.50 2
0.00
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
4.0
4.4
4.8
-
73
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
According to the text, examples include “the time until the next client enters the store, the time
until a certain company defaults or the time until some machine has a defect.” The exponential
function is non-zero:
( )= , = 1⁄ , >0
Weibull
Weibull is a generalized exponential distribution; i.e., the exponential is a special case of the
Weibull where the alpha parameter equals 1.0.
( )= 1− , >0
Weibull distribution
2.00
1.50
alpha=.5, beta=1
1.00
alpha=2, beta=1
0.50 alpha=2, beta=2
-
0.0
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
4.0
4.4
4.8
The main difference between the exponential distribution and the Weibull is that, under the
Weibull, the default intensity depends upon the point in time t under consideration. This allows
us to model the aging effect or teething troubles:
For α > 1—also called the “light-tailed” case—the default intensity is monotonically
increasing with increasing time, which is useful for modeling the “aging effect” as it
happens for machines: The default intensity of a 20-year old machine is higher than the
one of a 2-year old machine.
For α < 1—the “heavy-tailed” case—the default intensity decreases with increasing
time. That means we have the effect of “teething troubles,” a figurative explanation for
the effect that after some trouble at the beginning things work well, as it is known from
new cars. The credit spread on noninvestment-grade corporate bonds provides a good
example: Credit spreads usually decline with maturity. The credit spread reflects the
default intensity and, thus, we have the effect of “teething troubles.” If the company
survives the next two years, it will survive for a longer time as well, which explains the
decreasing credit spread.
For α = 1, Weibull distribution reduces to an exponential distribution with parameter β.
74
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Gamma distribution
The family of Gamma distributions forms a two-parameter probability distribution family with pdf:
1 /
( )= , >0
Γ( )
Gamma distribution
1.20
1.00
0.80 alpha=1, beta=1
0.60 alpha=2, beta=.5
0.40 alpha=4, beta=.25
0.20
-
Logistic
Logistic distribution
0.50
0.40
alpha=0, beta=1
0.30
alpha=2, beta=1
0.20 alpha=0, beta=3
N(0,1)
0.10
-
1 4 7 10 13 16 19 22 25 28 31 34 37 40
75
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Measures of central tendency and dispersion (variance, volatility) are impacted more by
observations near the mean than outliers. The problem is that, typically, we are concerned with
outliers; we want to size the likelihood and magnitude of low frequency, high severity (LFHS)
events. Extreme value theory (EVT) solves this problem by fitting a separate distribution to
the extreme tail loss. EVT uses only the tail of the distribution, not the entire dataset.
In applying extreme value theory (EVT), the two general approaches are:
Block maxima (BM): The classic approach
Peaks over threshold (POT): The modern approach that is often preferred.
Block maxima
The dataset is parsed into (m) identical, consecutive and non-overlapping periods called blocks.
The length of the block should be greater than the periodicity; e.g., if the returns are daily,
blocks should be weekly or more. Block maxima partitions the set into time-based intervals. It
requires that observations be identically and independently (i.i.d.) distributed.
76
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Generalized extreme value (GEV) fits block maxima. The Generalized extreme value (GEV)
distribution is given by:
1
exp (1 y ) 0
H ( y )
y
exp( e ) 0
The (xi) parameter is the “tail index;” it represents the fatness of the tails. In this expression, a
lower tail index corresponds to fatter tails.
0.10
0.05
0.00
0
10
15
20
25
30
35
40
45
Per the (unassigned) Jorion reading on EVT, the key thing to know here is that (1) among
the three classes of GEV distributions (Gumbel, Frechet, and Weibull), we only care
about the Frechet because it fits to fat-tailed distributions, and (2) the shape parameter
determines the fatness of the tails (higher shape → fatter tails)
Peaks over threshold (POT) collects the dataset of losses above (or in excess of) some
threshold.
77
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The cumulative distribution function here refers to the probability that the “excess loss” (i.e., the
loss, X, in excess of the threshold, u, is less than some value, y, conditional on the loss
exceeding the threshold):
( )= ( − ≤ | > )
u X
-4 -3 -2 -1 0 1 2 3 4
1
x
1 (1 ) 0
G , ( x )
1 exp( x ) 0
1.00
0.50
-
0 1 2 3 4
Block maxima is time-based (i.e., blocks of time), traditional, less sophisticated and more
restrictive in its assumptions (i.i.d.) while peaks over threshold (POT) is more modern,
has at least three variations (semi-parametric, unconditional parametric and conditional
parametric) and is more flexible.
EVT Highlights: Both GEV and GPD are parametric distributions used to model heavy-tails.
GEV (Block Maxima)
Has three parameters: location, scale and tail index
If tail > 0: Frechet
GPD (peaks over threshold, POT)
Has two parameters: scale and tail (or shape)
But must select threshold (u)
78
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Describe the central limit theorem and the implications it has when
combining i.i.d. random variables.
In brief:
Law of large numbers: Under general conditions, the sample mean will be near the
population mean.
Central limit theorem (CLT): As the sample size increases, regardless of the
underlying distribution, the sampling distribution approximates (tends toward) normal.
We assume a population with a known mean and finite variance, but not necessarily a normal
distribution (we may not know the distribution!). Random samples of size (n) are then
drawn from the population. The expected value of each random variable is the population’s
mean. Further, the variance of each random variable is equal the population’s variance divided
by n (note: this is equivalent to saying the standard deviation of each random variable is equal
to the population’s standard deviation divided by the square root of n).
The central limit theorem says that this random variable (i.e., of sample size n, drawn from the
population) is itself normally distributed, regardless of the shape of the underlying
population. Given a population described by any probability distribution having mean () and
finite variance (2), the distribution of the sample mean computed from samples (where each
sample equals size n) will be approximately normal. Generally, if the size of the sample is at
least 30 (n 30), then we can assume the sample mean is approximately normal!
Each sample has a sample mean. There are many sample means. The sample means
have variation: a sampling distribution. The central limit theorem (CLT) says the
sampling distribution of sample means is asymptotically normal.
79
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
We assume a population with a known mean and finite variance, but not necessarily a
normal distribution.
Random samples (size n) drawn from the population.
The expected value of each random variable is the population mean
The distribution of the sample mean computed from samples (where each sample
equals size n) will be approximately (asymptotically) normal.
The variance of each random variable is equal to population variance divided by n
(equivalently, the standard deviation is equal to the population standard deviation
divided by the square root of n).
When we draw from (or take) a sample, the sample is a random variable with its own
characteristics. The “standard deviation of a sampling distribution” is called the standard error.
The mean of the sample or the sample mean is a random variable defined by:
+ +
=
80
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
( )= ( ), =1
where fi(x)’s are the component distributions, and wi’s are the mixing proportions or weights.
For example, consider a stock whose log returns follow a normal distribution with low volatility
90% of the time, and a normal distribution with high volatility 10% of the time. Most of the time
the stock just bounces along but occasionally, the stock’s behavior may be more extreme. In
this Miller’s example, the mixture distribution is:
( )= ( )+ ( )
According to Miller, “Mixture distributions are extremely flexible. In a sense they occupy a
realm between parametric distributions and non-parametric distributions. In a typical mixture
distribution, the component distributions are parametric but the weights are based on empirical
(non-parametric) data. Just as there is a trade-off between parametric distributions and non-
parametric distributions, there is a trade-off between using a low number and a high number of
component distributions. By adding more and more component distributions, we can
approximate any data set with increasing precision. At the same time, as we add more and
more component distributions, the conclusions that we can draw become less and less general
in nature.”
A mixture distribution is extremely flexible. If two normal distributions have the same mean, they
combine (mix) to produce mixture distribution with leptokurtosis (heavy-tails). Otherwise,
mixtures are infinitely flexible.
So, just by adding two normal distributions together, we can develop a large number of
interesting distributions. For example, if we combine two normal distributions with the same
mean but different variances, we can get a symmetrical mixture distribution that displays excess
kurtosis.
81
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
By shifting the mean of one distribution, we can also create a distribution with positive or
negative skew. Finally, if we move the means of far enough apart, the resulting mixture
distribution will be bimodal. This exhibit below shows that we have a PDF with two distinct
maxima.
82
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Chapter Summary
A parametric distribution can be described by a mathematical function, for example, the
normal distribution. A nonparametric distribution cannot be summarized by a mathematical
formula; in its simplest form it is “just a collection of data.”
1
−
for ≤x≤
( )= 2 1
0 for x < or x >
1
( )=
A random variable X is called Bernoulli distributed with parameter (p) if it has only two
possible outcomes.
A binomial distributed random variable is the sum of (n) independent and identically distributed
(i.i.d.) Bernoulli-distributed random variables. The probability of observing (k) successes is given
by:
!
( = )= (1 − ) where = − ! !
The Poisson distribution depends upon only one parameter, lambda λ. The random number of
events that occur during an interval of time, (e.g., losses/ year, failures/ day) is given by:
( = )=
!
In Poisson, lambda is both the expected value (the mean) and the variance.
83
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
A normal distribution can be transformed into a unit or standardized variable that has mean =
0, and variance = 1 and requires no parameters.
Common examples:
The Bernoulli distribution is used to characterize default.
The binomial distribution is commonly used to characterize a portfolio of credits.
The normal distribution is used to test the sample average asset return and to model
equity returns for short horizons.
The Poisson distribution is used to model the time of default in credit risk models and
to calculate operational loss frequency.
If a variable has a lognormal distribution, then the log of that variable has a normal
distribution.
The exponential distribution is used to model the time we have to wait until a certain
event takes place.
Weibull is a generalized exponential distribution; i.e., the exponential is a special case of the
Weibull where the alpha parameter equals 1.0.
( )= 1− , >0
The family of Gamma distributions forms a two-parameter probability distribution family with
the density function (pdf) given by:
1 /
( )= , >0
Γ( )
The beta distribution has two parameters: alpha (“center”) and beta (“shape”). The beta
distribution is popular for modeling recovery rates.
Extreme value theory (EVT) fits a separate distribution to the extreme loss tail. EVT uses
only the tail of the distribution, not the entire dataset. In applying extreme value theory (EVT),
two general approaches are:
Block maxima (BM). The classic approach
Peaks over threshold (POT). The modern approach that is often preferred.
Both GEV and GPD are parametric distributions used to model heavy-tails.
84
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Central limit theorem (CLT): As the sample size increases, regardless of the underlying
distribution, the sampling distributions approximates (tends toward) normal. CLT says the
sampling distribution of sample means is asymptotically normal.
( )= ( ) where (. )
85
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
309.2. At the start of the year, a stock price is $100.00. A twelve-step binomial model describes
the stock price evolution such that each month the extremely volatility price will either jump from
S(t) to S(t)*u with 60.0% probability or down to S(t)*d with 40.0% probability. The up jump (u) =
1.1 and the down jump (d) = 1/1.1; note these (u) and (d) parameters correspond to an annual
volatility of about 33% as exp[33%*SQRT(1/12)] ~= 1.10. At the end of the year, which is
nearest to the probability that the stock price will be exactly $121.00?
a) 0.33%
b) 3.49%
c) 12.25%
d) 22.70%
310.1. A large bond portfolio contains 100 obligors. The average default rate is 4.0%. Analyst
Joe assumes defaults follow a Poisson distribution but his colleague Mary assumes the defaults
instead follow a binomial distribution. If they each compute the probability of exactly four (4)
defaults, which is nearest to the difference between their computed probabilities?
a) 0.40%
b) 1.83%
c) 3.55%
d) 7.06%
86
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
311.1. George the analyst creates a model, displayed below, which generates two series of
random but correlated asset returns. Both asset prices begin at a price of $10.00 with a periodic
mean return of +1.0%. Series #1 has periodic volatility of 10.0% while Series #2 has periodic
volatility of 20.0%. The desired correlation of the simulated series is 0.80. Each series steps
according to a discrete version of geometric Brownian motion (GBM) where price(t+1) = price (t)
+ price(t)*(mean + volatility*standard random normal). Two standard random normals are
generated at each step, X(1) and X(2), but X(2) is transformed into correlated Y(1) with Y(1) =
rho*X(1) + SQRT(1 - rho^2)*X(2), such that Y(1) informs Series #2. The first five steps are
displayed below:
At the fourth step, when the Series #1 Price = $10.81, what is Y(1) and the Series #2 Price [at
Step 4], both of which cells are highlighted in orange above?
a) -0.27 and $9.08
b) +0.55 and $9.85
c) +0.99 and $11.33
d) +2.06 and $12.40
312.1. A random variable X has a density function that is a normal mixture with two independent
components: the first normal component has an expectation (mean) of 4.0 with variance of 16.0;
the second normal component has an expectation (mean) of 6.0 with variance of 9.0. The
probability weight on the first component is 0.30 such that the weight on the second component
is 0.70. What is the probability that X is less than zero; i.e., Prob [X<0]?
a) 0.015%
b) 1.333%
c) 6.352%
d) 12.487%
87
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Answers:
309.2. D. 22.70%
There are 13 outcomes at the end of the 12-step binomial, with $100 as the outcome that must
correspond to six up jumps and six down jumps. Therefore, $121.0 must be the outcome due to
seven up jumps and five down jumps: $100*1.1^7*(1/1.1)^5 = $121.00
Such that we want the binomial probability given by:
Binomial Prob [X = 7 | n = 12, p = 60%] = 22.70%.
310.1. A. 0.40%
312.1. C. 6.352%
Because the normal mixture distribution function is a probability-weighted sum of its component
distribution functions, it is true that:
Prob(mixture)[X < 0] = 0.30*Prob(1st component)[X < 0] + 0.70*Prob(2nd component)[X < 0].
In regard to the 1st component, Z = (0-4)/sqrt(16) = -4/4 = -1.0.
In regard to the 2nd component, Z = (0-6)/sqrt(9) = -6/3 = -2.0. Such that:
Prob(mixture)[X<0] = 0.30*[Z < -1.0] + 0.70*Prob[Z < -2.0],
Prob(mixture)[X<0] = 0.30*15.87% + 0.70*2.28% = 6.352%.
88
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
XYZ Corporation announces its earnings four times per year. Based on historical data, you
estimate that in any given quarter the probability that XYZ Corporation’s earnings will exceed
consensus estimates is 30%. Also, the probability of exceeding the consensus in any one
quarter is independent of the outcome in any other quarter. What is the probability that XYZ
Corporation will exceed estimates three times in a given year?
Answer:
The number of times XYZ Corporation exceeds consensus estimates follows a binomial
distribution; therefore:
Question 2
The market risk group at your firm has developed a value at risk (VaR) model. In Chapter 7 we
examine VaR models more closely. In the meantime, assume the probability of an exceedance
event on any given day is 5%, and the probability of an exceedance event occurring on any
given day is independent of an exceedance event having occurred on any previous day. What is
the probability that there are two exceedances over 20 days?
Answer:
The number of exceedance events follows a binomial distribution; therefore:
Question 3
Assume the annual returns of Fund A are normally distributed with a mean and standard
deviation of 30%. The annual returns of Fund B are also normally distributed, but with a mean
and standard deviation of 40%. The returns of both funds are independent of each other. What
is the mean and standard deviation of the difference of the returns of the two funds, Fund B
minus Fund A? At the end of the year, Fund B has returned 80%, and Fund A has lost 12%.
How likely is it that Fund B outperforms Fund A by this much or more?
Answer:
Because the annual returns of both funds are normally distributed and independent, the
difference in their returns is also normally distributed:
~ ( − , + )
The mean of this distribution is 10%, and the standard deviation is 50%. At the end of the year,
the difference in the expected returns is 92%. This is 82% above the mean, or 1.64 standard
deviations. Using Excel or consulting the table of confidence levels in the chapter, we see that
this is a rare event. The probability of more than a 1.64 standard deviation event is only 5%.
89
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 4
The number of defaults per month in a large bond portfolio follows a Poisson process. On
average, there are two defaults per month. The number of defaults is independent from one
month to the next. What is the probability that there are five defaults over five months? Ten
defaults? Fifteen defaults?
Answer:
10
[ = 5] = = 3.78%
5!
10
[ = 10] = = 12.51%
10!
10
[ = 15] = = 3.47%
15!
Question 5
The annual returns of an emerging markets bond fund have a mean return of 10% and a
standard deviation of 15%. Your firm invests $200 million into the fund. What is the probability of
losing more than $18.4 million? Assume the returns are normally distributed, and ignore the
limited liability constraint (i.e., the impossibility of losing more than the initial $200 million
investment).
Answer:
If the returns of the fund are normally distributed with a mean of 10% and a standard deviation
of 15%, then the returns of $200 million invested in the fund are also normally distributed, but
with an expected return of $20 million and a standard deviation of $30 million. A loss of $18.4
million represents a –1.28 standard deviation move:
−$18.4 − $20
= = −1.28
$30
90
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 6
The annual returns of an emerging markets exchange-traded fund (ETF) have an expected
return of 20.60% and a standard deviation of 30.85%. You are asked to estimate the likelihood
of extreme return scenarios. Assume the returns are normally distributed. What is the probability
that returns are worse than −30%?
Answer:
−30% − 20.60%
= = −1.64
30.85%
According to the table of confidence intervals, 5% of the normal distribution lies below –1.64
standard deviations. The probability of a return less than –30% is then 5%.
Question 7
For a uniform distribution with a lower bound x1and an upper bound x2, prove that the formulas
for calculating the mean and variance are:
1
= ( + )
2
1
= ( − )
12
Answer:
1 1
= = = = ( − )
2 2
1( − ) 1( − )( + ) 1
= = = ( + )
2( − ) 2 ( − ) 2
1
( − ) = ( −2 + ) = − +
3
-continued-
91
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
1 1 1 1
= ( − )− ( − )( − )+ ( + ) ( − )
− 3 2 4
− =( − )( + + )
1
= ( − )
12
Question 8
Prove that the normal distribution is a proper probability distribution. That is, show that:
1 ( )
=1
√2
=√
Answer:
−
=
√2
= √2
1 ( ) 1 1
= √ =1
√2 √ √
92
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 9
Prove that the mean of the normal distribution, as specified in Equation 4.12, is µ. That is, show
that:
1 ( )
=
√2
Answer:
= √2
1 ( ) 1
√2 +
√2 √
1 ( ) √2
= +
√2 √ √
1 ( ) √2
= −2 +
√2 √
1 ( )
=
√2
93
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 10
Prove that the variance of a normal distribution, as specified in Equation 4.12, is σ2. You may
find the following result useful:
1
= √
2
Answer:
= √2
1 ( ) 2
[ ]= ( − ) =
√2 √
1
= √
2
2 1
[ ]= √ =
√ 2
94
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 11
= + 1−
= + 1−
Answer:
[ ]= + 1− = [ ]+ 1− [ ]= ∙0+ 1− ∙0=0
Similarly, the mean of XB is zero. Next, we want to calculate the variance. In order to do that, it
will be useful to know two relationships. First we rearrange the equation for variance, Equation
3.20, to get:
= [ ] + [ ] = 1 + 0 = 1 for = 1,2,3
Similarly, we can rearrange our equation for covariance, Equation 3.26, to get:
= , + [ ] =0+0∙0 =0∀ ≠
With these results in hand, we now show that the variance of XA is one:
[ ]= [ ]− [ ] = [ ]
[ ]= +2 (1 − ) + (1 − )
[ ]= [ ]+2 (1 − ) [ ] + (1 − )
[ ]= ∙1+2 (1 − ) ∙ 0 + (1 − ) ∙ 1 = 1
[ , ]= [ ]− [ ] [ ]= [ ]
[ , ]= + 1− ( + ) + (1 − )
[ , ]= [ ]+ 1− ( [ ]+ [ ]) + (1 − ) [ ]
[ , ]= ∙1+ 1 − (0 + 0) + (1 − ) ∙ 0 =
[ , ]
[ , ]= = =
[ ]∙ [ ] √ .
95
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 12
Imagine we have two independent uniform distributions, A and B. A ranges between −2 and −1,
and is zero everywhere else. B ranges between +1 and +2, and is zero everywhere else. What
are the mean and standard deviation of a portfolio that consists of 50% A and 50% B? What are
the mean and standard deviation of a portfolio where the return is a 50/50 mixture distribution of
A and B?
Answer:
For the portfolio consisting of 50% Aand 50% B, we can proceed two ways. The PDF of the
portfolio is a triangle, from –0.5 to +0.5, with height of 2.0 at 0. We can argue that the mean is
zero based on geometric arguments. Also, because both distributions are just standard uniform
variables shifted by a constant, they must have variance of 1/12; 50% of each asset would have
a variance of 1/4 this amount, and—only because the variables are independent—we can add
the variance of the variables, giving us:
1 1 1
=2 =
4 12 24
1 1 1
= =
24 2 6
. .
4 4
= + 3 + + 3 =0
3 . 3
.
= (2 + 4 ) + (2 − 4 )
.
.
2 2 1
= + + − =
3 . 3 24
For the 50/50 mixture distribution, the PDF is bimodal and symmetrical around zero, giving a
mean of zero:
1 1
= 0.5 + 0.5 = 0.5 +
2 2
1
= 0.5 (1 − 4 + 4 − 1) = 0
2
-continued on next page-
96
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
1 1
= 0.5 + = 0.5 +
3 3
1 7
= (−1 + 8 + 8 − 1) =
6 3
7
=
3
Notice that, while the mean is the same, the variance for the mixture distribution is significantly
higher.
97
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Apply Bayes’ theorem to scenarios with more than two possible outcomes.
[ | ]⋅ [ ] [ | ]⋅ [ ]
[ | ] = =
[ ] [ | ] ⋅ [ ] + [ | ] ⋅ [ ′]
For example: Assume two bonds, Bond A and Bond B, each with a 10% probability of
defaulting over the next year. Further assume that the probability that both bonds default is 6%,
and that the probability that neither bond defaults is 86%. The ensuing probability matrix shows
the probability that only Bond A or Bond B defaults (aka, exclusive or) is 4%:
Without the application of Bayes, the unconditional (marginal) probability that bond A
defaults is 10%. This is the probability without any prior information.
In Bayes terminology, this is called a prior probability. The prior is denoted P[A] in the formula
above. The prior is the unconditional probability which assumes no additional information or
evidence. Next, we will observe the evidence, denoted P[B] in the formula above, that allows us
to update from the prior probability to a posterior probability, denoted P[A|B].
In summary, Bayes uses the evidence P[B] to update from prior P[A] to the posterior P[A|B].
98
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
To apply Bayes Theorem: Assume in the example we are given additional information (aka,
evidence). Specifically, we are told that bond B has defaulted. Now, what is the probability that
Bond A defaults, given that Bond B has defaulted? Bayes’ Theorem solves for a conditional
probability. In this case, Bond B defaults in 10% of the scenarios, but the probability that both
Bond A and Bond B default is only 6%. In other words, Bond A defaults in 60% of the scenarios
in which Bond B defaults.
Because P(B) is itself the sum of two possible outcomes, the denominator can be expanded and
the we get the elaborate version of the Bayes’ formula:
[ | ]⋅ [ ] 60.0% ∗ 10.0%
[ | ] = = = 60%
[ | ] ⋅ [ ] + [ | ] ⋅ [ ′] 60.0% ∗ 10.0% + 4.4% ∗ 90.0%
Here is the two-step binomial tree for this example. Note that the four terminal nodes are
mutually exclusive and cumulatively exhaustive. Specifically:
Unconditional Conditional
60.0% B defaults
10.0% defaults
40.0% B survives
Bond A
4.4% B defaults
90.0% survives
95.6% B survives
In summary:
The prior is the unconditional (marginal) probability that bond A will default, P(A)= 10%
The posterior probability that bond A will default, conditional on the observation that
bond B defaulted, is P(A|B) = 60%
Miller’s Sample Problem #1: Imagine there is a disease that afflicts just 1 in every 100 people
in the population. A new test has been developed to detect the disease that is 99% accurate.
That is, for people with the disease, the test correctly indicates that they have the disease in
99% of cases. Similarly, for those who do not have the disease, the test correctly indicates that
they do not have the disease in 99% of cases. If a person takes the test and the result of the
test is positive, what is the probability that he or she actually has the disease?
99
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Assuming 10,000 trials and using notations T+ and T- for those testing positive and negative for
the test respectively, we have the table below. If you check the numbers, you’ll see 1% of the
population with the disease, and 99% accuracy in each column.
The unconditional probability of a positive test is 1.98%, which is simply the probability of
a positive test being produced by somebody with the disease plus the probability of a
positive test being produced by somebody without the disease.
We can then calculate the probability of having the disease given a positive test,
P[H’|T+] using Bayes’ theorem. As shown in the calculations in the table there is only a
50.0% chance that the person who tests positive actually has the disease.
The exhibit below shows the binomial tree for this example.
Unconditional Conditional
99.0% Positive
1.0% Disease
1.0% Negative
Actual
1.0% Positive
99.0% Healthy
99.0% Negative
100
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Sample Problem #2: Based on an examination of historical data, all fund managers of Astra
Fund of Funds fall into one of two groups: 1) Stars are the best managers and the probability
that a star will beat the market in any given year is 75%. 2) Ordinary, nonstar managers, by
contrast, are just as likely to beat the market as they are to underperform it.
For both types of managers, the probability of beating the market is independent from one year
to the next. Stars are rare and given a pool of managers, only 16% turn out to be stars. A new
manager was added to your portfolio three years ago. Since then, the new manager has beaten
the market every year.
1. What was the probability that the manager was a star when the manager was first
added to the portfolio?
2. What is the probability that this manager is a star now?
3. After observing the manager beat the market over the past three years, what is the
probability that the manager will beat the market next year?
101
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The probability that a manager beats the market given that the manager is a star is
75%(=12/16). The probability that a nonstar manager will beat the market is
50%(=42/84)
To answer the first question: At the time the new manager was added to the portfolio,
the probability that the manager was a star was just the probability of any manager being
a star, 16%, the unconditional probability.
To answer the second question: For this, we first need the probability or likelihood of the
manager beating the market, assuming that the manager was a star, P[3B|S] which is
the probability that a star beats the market in any given year to the third power,
calculated as 42.19%. We finally need to find the probability that the manager is a star,
given that the manager has beaten the market three years in a row. Using Bayes
theorem and as shown in the table, this probability P[S|3B] is calculated as 39.13%.
To answer the last question: The probability that the manager beats the market next
year is just the probability that a star would beat the market plus the probability that a
nonstar would beat the market, weighted by our new beliefs. Our updated belief about
the manager being a star is 39.13%, so the probability that the manager is not a star
must be 60.87%. Using these, the probability P[B] is then calculated to be 59.78%
Summary: Thus, when using Bayes’ theorem to update beliefs, we often refer to prior and
posterior beliefs and probabilities. In this sample problem, the prior probability, that is, before
seeing the manager beat the market three times, our belief that the manager was a star was
16%. The posterior probability, that is, after seeing the manager beat the market three times,
our belief that the manager was a star was 39.13%.
Bayes’ Theorem: Example #1, GARP’s 2011 Practice Exam, Part I, Question #5
Question: John is forecasting a stock’s performance in 2010 conditional on the state of the
economy of the country in which the firm is based. He divides the economy’s performance into
three categories of “GOOD”, “NEUTRAL” and “POOR” and the stock’s performance into three
categories of “increase”, “constant” and “decrease”.
He estimates:
The probability that the state of the economy is GOOD is 20%. If the state of the
economy is GOOD, the probability that the stock price increases is 80% and the
probability that the stock price decreases is 10%.
The probability that the state of the economy is NEUTRAL is 30%. If the state of the
economy is NEUTRAL, the probability that the stock price increases is 50% and the
probability that the stock price decreases is 30%.
If the state of the economy is POOR, the probability that the stock price increases is
15% and the probability that the stock price decreases is 70%.
Billy, his supervisor, asks him to estimate the probability that the state of the economy is
NEUTRAL given that the stock performance is constant. John’s best assessment of that
probability is closest to what?
102
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Alternative approach to same Answer: This may be easier to follow if we represent question’s
assumptions as a matrix (joint probabilities):
Bayes’ Theorem: Example #2, GARP’s 2011 Practice Exam, Part 2, Question #5
Question: John is forecasting a stock’s price in 2011 conditional on the progress of certain
legislation in the United States Congress. He divides the legislative outcomes into three
categories of “Passage”, “Stalled” and “Defeated” and the stock’s performance into three
categories of “increase”, “constant” and “decrease” and estimates the following events:
A portfolio manager would like to know that if the stock price does not change in 2011, what the
probability that the legislation passed is. Based on John’s estimates, this probability is:
103
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Answer:
Take another example of a coin that you believed was fair, with a 50% chance of landing heads
or tails when flipped. If you flip the coin 10 times and it lands heads each time, you might start to
suspect that the coin is not fair. Ten heads in a row could happen, but the odds of seeing 10
heads in a row is only 1: 1,024 for a fair coin, (1/2)10 = 1/1,024.
How do you update your beliefs after seeing 10 heads? If you believed there was a 90%
probability that the coin was fair before you started flipping, then after seeing 10 heads your
belief that the coin is fair should probably be somewhere between 0% and 90%. You believe it is
less likely that the coin is fair after seeing 10 heads (so less than 90%), but there is still some
probability that the coin is fair (so greater than 0%). Based on these priors (unconditional or
prior probabilities), Bayes’ theorem provides a framework for deciding exactly what our new
beliefs should be.
Similar to frequentist approach, as seen in the example of coin, the Bayesian approach
also counts the number of positive results. The conclusion is different because the Bayesian
approach starts with a prior belief about the probability and works towards deriving conditional
probabilities.
104
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
It is hard to say:
Proponents of Bayesian analysis point to the absurdity of the frequentist approach when
applied to small data sets. For instance, it is not justified to conclude the probability of a
positive result is 100% after observing three out of three positive (3/3) results.
Proponents of the frequentist approach point to the arbitrariness of Bayesian priors. How
did we arrive at our priors? In most cases the prior is either subjective or based on
frequentist analysis.
Most practitioners tend to take a more balanced view, realizing that there are situations
that lend themselves to frequentist analysis and others that lend themselves to Bayesian
analysis.
o When there is very little data, we tend to prefer Bayesian analysis. When we
have lots of data, the conclusions of frequentist and Bayesian analysis are often
similar, and the frequentist results are often easier to calculate.
o In risk management, performance analysis and stress testing are examples of
areas where we often have very little data, and the data we do have is very
noisy. These areas are likely to lend themselves to Bayesian analysis.
In the two previous sample problems, each variable could exist in only one of two states: a
person either had the disease or did not have the disease; a manager was either a star or a
nonstar. The Bayesian analysis can be easily extended to any number of possible outcomes as
shown in the examples below rather than limiting ourselves with only two possibilities of
outcomes.
Sample Problem #1: Suppose there are three types of managers: the underperformers beat
the market only 25% of the time, the in-line performers beat the market 50% of the time, and the
outperformers beat the market 75% of the time. Our prior belief is that a manager has a 60%
probability of being an in-line performer, a 20% chance of being an underperformer, and a 20%
chance of being an outperformer. If the manager beats the market two years in a row what
should our updated beliefs be?
105
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Given these assumptions, the binomial tree diagram and associated probability matrix are
shown below:
Our prior beliefs can be summarized as: P[MO] = 20%, P[MI] = 60% and P[MU] = 20%.
To calculate the updated beliefs, we start by calculating the likelihoods, the probability of
beating the market two years in a row, for each type of manager as shown in the
calculations below. So, P[2B|MO] = 56.25%, P[2B|MI] = 25.00% and P[2B|MU] = 6.25%
Hence, the unconditional probability of observing the manager beat the market two years
in a row, given our prior beliefs is P[2B] =27.50%
[2 ] = [2 | ] × P[MO] + P[2B|MI] × P[MI] + P[2B|MU] × P[MU]
106
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Using Bayes’ theorem, for example, we can calculate our posterior belief that the
manager is an outperformer as 40.91%:
[2 | ]× [ ]
[ |2 ] = = 0.56 × 0.20/0.275 = 40.91%
[2 ]
Similarly, we can show that the posterior probability that the manager is an in-line
performer is 54.55% and that the posterior probability that the manager is an
underperformer is 4.55%.
As we would expect, given that the manager beat the market two years in a row, the
posterior probability that the manager is an outperformer has increased, from 20% to
40.91%, and the posterior probability that the manager is an underperformer has
decreased, from 20% to 4.55%.
Although the probabilities changed, the sum of the probabilities remains equal to 100%.
Sample Problem #2: Using the same prior distributions as in the prior example, what would the
posterior probabilities be for an underperformer, an in-line performer, or an outperformer if
instead of beating the market two years in a row, the manager beat the market in 6 of the
next 10 years?
107
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Our prior beliefs are P[MO] = 20%, P[MI] = 60% and P[MU] = 20%.
Using a shortcut method, for each type of manager, the likelihood of beating the market
6 times out of 10 can be determined using a binomial distribution:
[6 | ] = 10 (1 − ) .
6
[6 | ] = 10 0.75 (1 − 0.75) = 14.60%. Similarly, for inline performers and
6
underperformers this probability is found to be 20.51% and 1.62% respectively.
Hence, the unconditional probability of observing the manager beat the market two years
in a row, given our prior beliefs is P[6B] =15.55% (see learning XLS for details).
[6 ] = [6 | ] × P[MO] + P[6B|MI] × P[MI] + P[6B|MU] × P[MU]
Using Bayes’ theorem, for example, we can calculate our posterior belief that the
manager is an outperformer as 18.78%:
[6 | ]× [ ]
[ |6 ] = = 0.146 × 0.20/0.1555 = 18.78%
[6 ]
Similarly, we can show that the posterior probability that the manager is an in-line
performer is 79.13% and that the posterior probability that the manager is an
underperformer is 2.09%.
So, the probability that the manager is an in-line performer has increased from 60% to
79.13%. The probability that the manager is an outperformer decreased slightly from
20% to 18.78%. It now seems very unlikely that the manager is an underperformer
(2.09% probability compared to our prior belief of 20%).
Below is a summary of the update from prior probabilities (e.g., unconditional probability a
manager underperforms is 20.0%) to posterior probabilities (e.g., if a manager beats the market
six out of ten years then he/she has only a 2.0% probability of being an underperformer).
108
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The probability that gross domestic product (GDP) decreases is 20%. The probability that
unemployment increases is 10%. The probability that unemployment increases given that GDP
has decreased is 40%. What is the probability that GDP decreases given that unemployment
has increased?
Answer:
40% ∙ 20%
[GDP down|unemployment up] = = 80%
10%
Question 2:
An analyst develops a model for forecasting bond defaults. The model is 90% accurate. In other
words, of the bonds that actually default, the model identifies 90% of them; likewise, of the
bonds that do not default, the model correctly predicts that 90% will not default. You have a
portfolio of bonds, each with a 5% probability of defaulting. Given that the model predicts that a
bond will default, what is the probability that it actually defaults?
Answer:
90% ∙ 5%
[actual = D|model = D] = = 32.14%
90% ∙ 5% + 10% ∙ 95%
Even though the model is 90% accurate, 95% of the bonds don’t default, and of those 95% the
model predicts that 10% of them will default. Within the bond portfolio, the model identifies 9.5%
of the bonds as likely to default, even though they won’t. Of the 5% of bonds that actually
default, the model correctly identifies 90%, or 4.5% of the portfolio. This 4.5% correctly identified
is overwhelmed by the 9.5% incorrectly identified.
109
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 3:
As a risk analyst, you are asked to look at EB Corporation, which has issued both equity and
bonds. The bonds can either be downgraded, be upgraded, or have no change in rating. The
stock can either outperform the market or underperform the market. You are given the following
probability matrix from an analyst who had worked on the company previously, but some of the
values are missing. Fill in the missing values. What is the conditional probability that the bonds
are downgraded given that the equity has underperformed?
Answer:
+ 5% = 15%
= 10%
45% + = 65%
= 20%
To calculate Y, we can sum down the first column, using our previously calculated value for W:
= 5%
Using this result, we can sum across the third row to get Z:
+ 15% = 5% + 15% =
= 20%
110
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The last part of the question asks us to find the conditional probability, which we can express
as:
[Downgrade|Underperform]
We can solve this by taking values from the completed probability matrix. The equity
underperforms in 40% of scenarios. The equity underperforms and the bonds are downgraded
in 15% of scenarios. Dividing, we get our final answer, 37.5%.
[Downgrade ∩ Underperform]
[Downgrade|Underperform] =
[Underperform]
15%
[Downgrade|Underperform] = = 37.5%
40%
Question 4:
Your firm is testing a new quantitative strategy. The analyst who developed the strategy claims
that there is a 55% probability that the strategy will generate positive returns on any given day.
After 20 days the strategy has generated a profit only 10 times. What is the probability that the
analyst is right and the actual probability of positive returns for the strategy is 55%? Assume
that there are only two possible states of the world: Either the analyst is correct, or there the
strategy is equally likely to gain or lose money on any given day. Your prior assumption was that
these two states of the world were equally likely.
Answer:
[ = 0.55] = 50%
[ = 0.50] = 50%
The probability of the strategy generating 10 positive returns over 20 days if the analyst is
correct is:
111
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
To get our final answer, the probability that p =0.55 given the 10 positive returns, we use Bayes’
theorem:
20
0.55 0.45 ∙ 0.50
[ = 0.55|10 +] = 10
20
0.50 (0.55 0.45 + 0.50 0.50 )
10
0.55 0.45
[ = 0.55|10 +] =
(0.55 0.45 + 0.50 0.50 )
1
[ = 0.55|10 +] = = 47.49%
100
1 + 99
The final answer is 47.49%. The strategy generated a profit in only 10 out of 20 days, so our
belief in the analyst’s claim has decreased. That said, with only 20 data points, it is hard to tell
the difference between a strategy that generates profits 55% of the time and a strategy that
generates profits 50% of the time. Our belief decreased, but not by much.
112
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 5:
Your firm has created two equity baskets. One is procyclical, and the other is countercyclical.
The procyclical basket has a 75% probability of being up in years when the economy is up, and
a 25% probability of being up when the economy is down or flat. The probability of the economy
being down or flat in any given year is only 20%. Given that the procyclical index is up, what is
the probability that the economy is also up?
Answer:
The final answer is 92.31%. Use +to signify the procyclical index being up, G to signify that the
economy is up (growing), and G to signify that the economy is down or flat (not growing). We
are given the following information:
[+| ] = 75%
[+|G] = 25%
[G] = 20%
[+|G] [G]
[G|+] =
[+]
We were not given P[G], but we know the economy must be either growing or not growing;
therefore:
We can also calculate the unconditional probability that the index is up, P[+]:
60%
[G|+] = = 92.31%
65%
113
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 6:
You are an analyst at Astra Fund of Funds, but instead of believing that there are two or three
types of portfolio managers, your latest model classifies managers into four categories.
Managers can be underperformers, in-line performers, stars, or superstars. In any given year,
these managers have a 40%, 50%, 60%, and 80% chance of beating the market, respectively.
In general, you believe that managers are equally likely to be any one of the four types of
managers. After observing a manager beat the market in three out of five years, what do you
believe the probability is that the manager belongs in each of the four categories?
Answer:
The prior beliefs for beating the market in any given year are:
1
[ = 0.40] =
4
1
[ = 0.50] =
4
1
[ = 0.60] =
4
1
[ = 0.80] =
4
The probability of beating the market three out of five years is:
[3 | = ]= 5 (1 − )
3
[ = |3 ] = ∙ [3 | = ]∙ [ = ]
1
[ = |3 ] = ∙ 5 (1 − ) ∙
3 4
[ = |3 ] = 1
5 1 (1 − ) =1
∙ ∙
3 4
4
=
5 ∑ (1 − )
3
-continued on next page-
114
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
4 5 1
[ = |3 ] = ∙ (1 − ) ∙
5 ∑ (1 − ) 3 4
3
(1 − )
[ = |3 ] =
∑ (1 − )
To get the final answer, we simply substitute in the four possible values for pi . For example, the
posterior probability that the manager is an underperformer is:
0.40 (1 − 0.40)
[ = 0.40|3 ] =
∑ (1 − )
0.40 0.60
[ = 0.40|3 ] =
0.40 0.60 + 0.50 0.50 + 0.60 0.40 + 0.80 0.20
4 6
[ = 0.40|3 ] =
4 6 +5 5 +6 4 +8 2
2,304
[ = 0.40|3 ] = = 21.1%
10,933
The other three probabilities can be found in a similar fashion. The final answer is that the
posterior probabilities of the manager being an underperformer, an in-line performer, a star, or a
superstar are 21.1%, 28.6%, 31.6%, and 18.7%, respectively. Interestingly, even though the
manager beat the market 60% of the time, the manager is almost as likely to be an
underperformer or an in-line performer (49.7% probability) as a star or a superstar (50.3%
probability).
115
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 7:
You have a model that classifies Federal Reserve statements as either bullish or bearish. When
the Fed makes a bullish announcement, you expect the market to be up 75% of the time. The
market is just as likely to be up as it is to be down or flat, but the Fed makes bullish
announcements 60% of the time. What is the probability that the Fed made a bearish
announcement, given that the market was up?
Answer:
[+| ] = 75%
[+] = 50%
[ ] = 60%
You are asked to find P[Bear | +]. A direct application of Bayes’ theorem will not work. Instead
we need to use the fact that the Federal Reserve’s statement must be either bearish or bullish,
no matter what the market does; therefore:
[+| ] [ ]
[ |+] = 1 − [ |+] = 1 −
[+]
33
75% ∙ 60%
[ |+] = 1 − = 1 − 45
50% 1
2
9
[ |+] = 1 − = 10%
10
116
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 8:
You are monitoring a new strategy. Initially, you believed that the strategy was just as likely to
be up as it was to be down or flat on any given day, and that the probability of being up was
fairly close to 50%. More specifically, your initial assumption was that the probability of being up,
p, could be described by a beta distribution, β(4,4). Over the past 100 days, the strategy has
been up 60 times. What is your new estimate for the distribution of the parameter p? What is the
probability that the strategy will be up the next day?
Answer:
Because the prior distribution is a beta distribution and the likelihood can be described by a
binomial distribution, we know the posterior distribution must also be a beta distribution. Further,
we know that the parameters of the posterior distribution can be found by adding the number of
successes to the first parameter, and the number of failures to the second. In this problem the
initial distribution was β(4,4) and there were 60 successes (up days), and 100 – 60 =40 failures.
Therefore, the final distribution is β(64,44). The mean of a beta distribution, β(a,b), is simply a
̸(a + b). The mean of our posterior distribution is then:
64 64
= = = = 59.26%
+ 64 + 44 108
We therefore believe there is a 59.26% probability that the strategy will be up tomorrow.
Question 9:
For the Bayesian network in Exhibit 6.9, each node can be in one of three states: up, down, or
no change. How many possible states are there for the entire network? What is the minimum
number of probabilities needed to completely define the network?
Answer:
There are 27 possible states for the network: 33=27. The minimum number of probabilities
needed to define the network is 22. As an example, we could define P[A =up], and P[A
=unchanged] for node A, which would allow us to calculate P[A =down] =1 − P[A =up] −P[A
=unchanged]. Similarly, we could define two probabilities for node B. For node C, there are nine
possible input combinations (each of three possible states for A can be combined with three
possible states from B). For each combination, we can define two conditional probabilities and
infer the third. For example, we could define P[C =up | A =up, B =up] and P[C =unchanged | A
=up, B =up], which would allow us to calculate P[C =down | A =up, B =up] =1 − P[C =up | A =up,
B =up] − P[C =unchanged | A =up, B =up]. This gives us a total of 22 probabilities that we need
to define: 2 +2 +9 ×2 =22.
117
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 10:
Calculate the correlation matrix for Network 1, the network on the left, in Exhibit 6.6. Start by
calculating the covariance matrix for the network.
Answer:
Question 11:
Calculate the correlation matrix for Network 2, the network on the right, in Exhibit 6.6. Start by
calculating the covariance matrix for the network.
Answer:
118
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Define and construct an appropriate null and alternative hypothesis, and calculate the
test statistic.
Differentiate between a one-tailed and a two-tailed test and explain the circumstances in
which to use each test.
1 1
̂= =
The sample mean is equivalent to the sum of n i.i.d. random variables, each with a mean of µ
and a standard deviation of σ/n.
( − )
=
−1
The mean of the sample mean is the true mean µ of the distribution and given by: [ ̂ ] =
The variance of the sample mean, if σ2 is the true variance of the data generating process is:
The variance of our sample mean doesn’t just decrease with the sample size; it decreases in a
predictable way, in proportion to the sample size. It follows that the standard deviation of the
sample mean decreases with the square root of n.
119
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
̂−
=
/√
By looking up the appropriate values for the t distribution, we can establish the probability that
our t-statistic is contained within a certain range:
̂−
≤ ≤ =
/√
where xL and xU are constants, which, respectively, define the lower and upper bounds of the
range within the t distribution, and γ is the probability or the confidence level that our t-statistic
will be found within that range.
Rather than working directly with the confidence level, we often work with (1 – γ), which is the
significance level and is often denoted by α. The smaller the confidence level is, the higher the
significance level.
The population mean is normally unknown, and we rearrange the equation so that:
̂− ≤ ≤ ̂+ =
√ √
This confidence interval is the probability that the population mean will be contained
within the defined range.
Typically, the confidence interval uses the product of [standard error х critical “lookup” t].
− ≤ ≤ +
√ √
This confidence interval is a random interval. Why? Because it will vary randomly with each
sample, whereas we assume the population mean is static.
The confidence level is selected by the user; e.g., 95% (0.95) or 99% (0.99) and significance
= 1 – confidence level.
We don’t say the probability is 95% that the “true” population mean lies within this
interval. That implies the true mean is variable. Instead, we say the probability is 95% that
the random interval contains the true mean. See how the population mean is trusted to
be static and the interval varies?
120
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
: =
: ≠
In many cases, in practice, the test is a significance test such that it is often assumed that both
(i) the null is two-tailed and further (ii) that the null hypothesis is that the estimate is equal to
zero. Symbolically, then, the following is a very common test:
: =0
: ≠0
As Miller says, “in many scientific fields where positive and negative deviations are equally
important, two-tailed confidence levels are the more prevalent. In risk management, more often
than not, we are more concerned with the probability of bad outcomes, and this concern
naturally leads to one-tailed tests.”
A one-tailed test rejects the null only if the estimate is either significantly above or
significantly below, but only specifies one direction. For example, the following null
hypothesis is not rejected if the estimate is greater than the value c; we are here concerned with
deviations in one direction only:
: ≥
: <
Example - EOC Problem #1: Given the following data sample, how confident can we be that
the mean is greater than 40?
121
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
122
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Example - Miller EOC Problem #2: You are given the following sample of annual returns for a
portfolio manager. If you believe that the distribution of returns has been stable over time and
will continue to be stable over time, how confident should you be that the portfolio manager will
continue to produce positive returns?
The desired condition is that the portfolio manager will continue to produce positive
returns. The null hypothesis is normally constructed such that the desired result is false.
In this case we would make the null hypothesis µ ≤ 0, so that if it gets rejected then the
expected mean returns are greater than 0 (or positive).
We use a t-test with n-1=9 degrees of freedom. The critical t -value or a one-tailed
rejection point at 0.05 significance level for 9 d.f is 1.833.
The calculated t-value is:
−
= = 0.93
/√
123
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Since the calculated t-value is less than the critical t-value we do not reject the null
hypothesis that the returns are positive at the 0.05 level. That is, we can say with a high
level of confidence (95%) that the portfolio manager will not continue to produce positive
returns. Explained in another way, we can say that we can reject the null hypothesis that
the returns are not positive with only (1-p) %, in this case, 81.1% of confidence.
Three questions that apply the t-statistic from the Bionic Turtle question database
209.1. Nine (9) companies among a random sample of 60 companies defaulted. The companies
were each in the same highly speculative credit rating category: statistically, they represent a
random sample from the population of CCC-rated companies. The rating agency contends that
the historical (population) default rate for this category is 10.0%, in contrast to the 15.0% default
rate observed in the sample. Is there statistical evidence, with any high confidence, that the true
default rate is different than 10.0%; i.e., if the null hypothesis is that the true default rate is
10.0%, can we reject the null?
a) No, the t-statistic is 0.39
b) No, the t-statistic is 1.08
c) Yes, the t-statistic is 1.74
d) Yes, the t-statistic is 23.53
124
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
209.2. Over the last two years, a fund produced an average monthly return of +3.0% but with
monthly volatility of 10.0%. That is, assume the random sample size (n) is 24, with mean of
3.0% and sigma of 10.0%. Are the returns statistically significant; in other words, can we decide
the true mean return is great than zero with 95% confidence?
a) No, the t-statistic is 0.85
b) No, the t-statistic is 1.47
c) Yes, the t-statistic is 2.55
d) Yes, the t-statistic is 3.83
209.3. Assume the frequency of internal fraud (an operational risk event type) occurrences per
year is characterized by a Poisson distribution. Among a sample of 43 companies, the mean
frequency is 11.0 with a sample standard deviation of 4.0. What is the 90% confidence interval
of the population's mean frequency?
a) 10.0 to 12.0
b) 8.8 to 13.2
c) 7.5 to 14.5
d) Need more information (Poisson parameter)
Answers:
209.1. B. No, the t-statistic is only 1.08. For a large sample, the distribution is normally
approximated, such that at 5.0% two-tailed significance, we reject if the abs(t-statistic)
exceeds 1.96.
We don't really need the lookup table or a calculator: the t-statistic tells us that the observed
sample mean is only 1.08 standard deviations (standard errors) away from the hypothesized
population mean.
A two-tailed 90% confidence interval implies 1.64 standard errors, so this (72.8%) is much less
confident than even 90%.
125
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The central limit theorem (CLT) says, if the sample is random (i.i.d.), the sampling distribution of
the sample mean tends toward the normal REGARDLESS of the underlying distribution!
In the below table, the green shaded area represents values less than three (< 3.0). Think of
it as the “sweet spot.” For confidences less than 99% and d.f. > 13, the critical t is always less
than 3.0. So, for example, a computed t of 7 or 13 will generally be significant. Keep this in
mind because in many cases, you do not need to refer to the lookup table if the computed t is
large; you can simply reject the null.
126
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
The subsequent AIMs breakdown the following general hypothesis testing framework:
: = : ≠
In this case, implies that extreme positive or negative values would cause us to reject the
null hypothesis. Thus, if we are concerned with both sides of the distribution (both tails), we
should choose a two-tailed test.
127
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
In risk management, more often than not we are more concerned with the probability of extreme
negative outcomes, and this concern naturally leads to one-tailed tests.
: ≥ : <
In this case, we will reject only if the estimate of µ is significantly less than c. Thus, if we are
only concerned with deviations in one direction, we should use a one-tailed test.
As long as the null hypothesis is clearly stated, the choice of a one-tailed or twotailed
confidence level should be obvious.
The null hypothesis always includes the equal sign (=), regardless of a one-tailed or two-
tailed test! The null cannot include only less than (<) or greater than (>).
The 95% confidence level is a very popular choice for confidence levels.
For a two-tailed test, a 95% confidence level is equivalent to approximately 1.96
standard deviations. That is, a normal distribution 95% of the mass is within +/–1.96
standard deviations.
For a one-tailed test, though, 95% of the mass is within either +1.64 or –1.64 standard
deviations.
If we reject a hypothesis which is actually true, we have committed a Type I error. If, on the
other hand, we accept a hypothesis that should have been rejected, we have committed a Type
II error.
Type I error = significance level = α = P [reject | is true]
Type II error = β = P [“accept” | is false]
We can reject null with (1-p)% confidence
128
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Under these circumstances, a Type I error is the following: we decide that excess is
significant and the manager adds value, but actually the out-performance was random
(he did not add skill). In technical terms, we mistakenly rejected the null.
Under these circumstances, a Type II error is the following: we decide the excess is
random and, to our thinking, the out-performance was random. But actually it was not
random and he did add value. In technical terms, we falsely accepted the null.
Example – Sample Problem: At the start of the year, you think the annualized volatility of XYZ
Corporation’s equity was 45%. At the end of the year, you have collected a year of daily returns,
256 business days’ worth. You calculate the standard deviation, annualize it, and come up with
a value of 48%. Can you reject the null hypothesis, H0: σ = 45%, at the 95% confidence level?
In this case, the appropriate test statistic is a chi-squared distribution with 255 degrees
of freedom:
0.48
( − 1) = (256 − 1) = 290.13~
0.45
Notice that annualizing the standard deviation has no impact on the test statistic. The
same factor would appear in the numerator and the denominator, leaving the ratio
unchanged.
For a chi-squared distribution with 255 degrees of freedom, 290.13 corresponds to a
probability of 6.44%. It can be rejected only at 93.56% confidence level.
Also, since the value of 48% falls within the confidence interval levels (44.2% to 52.6%)
we fail to reject the null hypothesis at the 95% confidence level.
129
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Chebyshev’s inequality
So far, we were working with sample statistics where the shape of the distribution was known.
However, even if we do not know the entire distribution of a random variable, we can form a
confidence interval, as long as we know the variance of the variable. For a random variable, X,
with a standard deviation of σ, the probability that X is within n standard deviations of µ is less
than or equal to 1/n2. This is a result of what is known as Chebyshev’s inequality:
1
[ − |≥ ] ≤
For a given level of variance, Chebyshev’s inequality places an upper limit on the probability of
a variable being more than a certain distance from its mean. For a given distribution, the actual
probability may be considerably less.
130
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
If an actual loss equals or exceeds the predicted VaR threshold, that event is known as an
exceedance. In case of backtesting of one-day 95% VaR, there is a 5% chance of an
exceedance event each day, and a 95% chance of no exceedance. Because exceedance
events are independent, over the course of n days the distribution of exceedances follows a
binomial distribution as given below.
[ = ]= (1 − )
where, n is the number of periods that we are using in our backtest, k is the number of
exceedances, and (1 − p) is our confidence level.
Example - Sample problem: Consider a daily 95% VaR statistic for a large fixed income
portfolio. Over the past 100 days, there have been four exceedances. How many exceedances
should you have expected? What was the probability of exactly four exceedances during this
time? The probability of four or less? Four or more?
131
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
For a 95% VaR measure, over 100 days we would expect to see five exceedances: (1 –
95%) ×100 = 5. The probability of exactly four exceedances is 17.81%:
100
[ = 4] = 0.05 (1 − 0.05) = 0.1781
4
The probability of four or fewer exceedances is 43.60%. Here we simply do the same
calculation again but for zero, one, two, three, and four exceedances. It’s important not
to forget zero as the case of no exceedance is also possible:
100
[ ≤ 4] = 0.05 (1 − 0.05)
132
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Chapter Summary
The sample mean is given by:
1 1
̂= =
( − )
=
−1
The variance of the sample mean, if σ2 is the true variance is given by:
=
√
Confidence interval gives the probability that a population parameter is contained within a
defined range. A two-tailed null hypothesis takes the form:
: =0
: ≠0
A one-tailed test rejects the null only if the estimate is either significantly above or significantly
below, but only specifies one direction.
: ≥
: <
133
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
If, on the other hand, we accept a hypothesis that should have been rejected, we have
committed a Type II error.
The standard error is equal to the sample standard deviation divided by the square root of the
sample size:
=
√
Confidence intervals:
The lower limit of the confidence interval is given by: the sample mean minus the
critical t multiplied by the standard error
The upper limit of the confidence interval is given by: the sample mean plus the
critical t multiplied by the standard error
: − ≤ ≤ +
√ √
Backtesting is a process is used for checking the predicted outcome of a model against
actual data and to assess the probability of the losses exceeding the VaR for a given
confidence level:
[ = ]= (1 − )
134
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
313.2. A random sample of 36 observations drawn from a normal population returns a sample
mean of 18.0 with sample variance of 16.0. Our hypothesis is: the population mean is 15.0 with
population variance of 10.0. Which are nearest, respectively, to the test statistics of the sample
mean and sample variance (given the hypothesized values, naturally)?
a) t-stat of 3.0 and chi-square stat of 44.3
b) t-stat of 4.5 and chi-square stat of 56.0
c) t-stat of 6.8 and chi-square stat of 57.6
d) t-stat of 9.1 and chi-square stat of 86.4
314.1. You are given the following sample of annual returns for a portfolio manager: -6.0%, -
3.0%, -2.0%, 0.0%, 1.0%, 2.0%, 4.0%, 5.0%, 7.0%, 10.0%. The sample mean of these ten (n =
10) returns is +1.80%. The sample standard deviation is 4.850%. The sample mean is positive,
but how confident are we that the population mean is positive? (note: this is a simplified version
of Miller's problem 5.2, since it provides the sample mean and standard deviation, but it
nevertheless does require calculations/lookup)
a) t-stat of 1.17 implies one-sided confidence of about 86.5%
b) t-stat of 1.29 implies two-sided confidence of about 88.3%
c) t-stat of 2.43 implies one-sided confidence of about 90.7%
d) t-stat of 3.08 implies two-sided confidence of about 97.4%
135
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
314.2. A sample of 25 money market funds shows an average return of 3.0% with standard
deviation also of 3.0%. Your colleague Peter conducted a significance test of the following
alternative hypothesis: the true (population) average return of such funds is GREATER THAN
the risk-free rate (Rf). He concludes that he can reject the null hypothesis with a confidence of
83.64%; i.e., there is a 16.36% chance (p value) that the true return is less than or equal to the
risk-free rate. What is the risk-free rate, Rf? (note: this requires lookup-calculation)
a) 1.00%
b) 1.90%
c) 2.00%
d) 2.40%
315.1. Roger collects a set of 61 daily returns over a calendar quarter for the stock of XYZ
corporation. He computes the sample's daily standard deviation, which is annualized in order to
generate a sample volatility of 27.0%. His null hypothesis is that the true (population) volatility is
30.0%. Can he reject the null with 95% confidence?
a) No, the test statistic is 1.59
b) No, the test statistic is 48.60
c) Yes, the test statistic is 24.03
d) Yes, the test statistic is 72.57
136
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Answers:
313.1. C. 0.58
A Poisson distribution has both mean and variance equal to its only parameter, lambda. In this
case, the variance per month is therefore 4.
The variance of the sample mean = 4/n. In this case, with 12 observations (months), the
variance of the sample mean = 4/12.
The standard error (standard deviation) = SQRT(4/12) = 0.5774
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-313-millers-
hypothesis-testing.7108
With a t-stat of 4.5, we can reject the null hypothesis that the population mean is 15.0
(the two-sided p-value is 0.007% such that we can reject with any confidence of 99.993% or
less)
As the population is normal, the test of the sample variance relies on the chi-square value = (n-
1)*(sample variance/hypothesized variance). In this case, the chi-square statistic = (36-1)*16/10
= 56.00, which follows a chi-square distribution with 35 degrees of freedom. (We could reject
null with 95% confidence but we fail to reject null with 99% confidence).
137
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
314.2. D. 2.40%
The one-tailed t-stat that is associated with 16.36% with 24 degrees of freedom is 1.00; e.g.,
T.INV(16.36%, 24) = -1.00 and T.DIST(-1.00, 24 df, true = CDF) = 16.36%.
Standard error (SE) of sample mean = 3.0%/SQRT(25) = 0.60%.
Since t-stat = 1.0, (3.0% - Rf)/0.60% = 1.0, such that Rf = 3.0% - 0.60% = 2.40%.
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-314-millers-one-and-
two-tailed-hypotheses.7118
138
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Given the following data sample, how confident can we be that the mean is greater than 40?
Answer:
Mean =45.0; standard deviation =29.3; standard deviation of mean =9.3. For the hypothesis that
the mean is greater than 40, the appropriate t-statistic has a value of 0.54. For a one-sided t-test
with 9 degrees of freedom, the associated probability is 70%. There is a 30% chance that the
true mean is found below 40, and a 70% chance that it is greater than 40.
Question 2:
You are given the following sample of annual returns for a portfolio manager. If you believe that
the distribution of returns has been stable over time and will continue to be stable over time,
how confident should you be that the portfolio manager will continue to produce positive
returns?
Answer:
The mean is 6.9% and the standard deviation of the returns is 23.5%, giving a standard
deviation of the mean of 7.4%. The t-statistic is 0.93. With 9 degrees of freedom, a one-sided t-
test produces a probability of 81%. In other words, even though the sample mean is positive,
there is a 19% chance that the true mean is negative.
Question 3:
You are presented with an investment strategy with a mean return of 20% and a standard
deviation of 10%. What is the probability of a negative return if the returns are normally
distributed? What if the distribution is symmetrical, but otherwise unknown?
Answer:
A negative return would be greater than two standard deviations below the mean. For a normal
distribution, the probability (one-tailed) is approximately 2.28%. If we do not know the
distribution, then, by Chebyshev’s inequality, the probability of a negative return could be as
high as 12.5% =1/2 ×1/(22). There could be a 25% probability of a +/–2 standard deviation
event, but we’re interested only in the negative tail, so we multiply by ½. We can perform this
last step only because we were told the distribution is symmetrical.
139
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 4:
Suppose you invest in a product whose returns follow a uniform distribution between −40% and
+60%. What is the expected return? What is the 95% VaR? The expected shortfall?
Answer:
The expected return is +10%. The 95% VaR is 35% (i.e., 5% of the returns are expected to be
worse than –35%). The expected shortfall is 37.5% (again the negative is implied).
Question 5:
You are the risk manager for a portfolio with a mean daily return of 0.40% and a daily standard
deviation of 2.3%. Assume the returns are normally distributed (not a good assumption to make,
in general). What is the 95% VaR?
Answer:
For a normal distribution, 5% of the weight is less than –1.64 standard deviations from the
mean. The 95% VaR can be found as: 0.40% – 1.64 ∙2.30% =–3.38%. Because of our quoting
convention for VaR, the final answer is VaR =3.38%.
Question 6:
You are told that the log annual returns of a commodities index are normally distributed with a
standard deviation of 40%. You have 33 years of data, from which you calculate the sample
variance. What is the standard deviation of this estimate of the sample variance?
Answer:
We can use Equation 7.4 to calculate the expected variance of the sample variances. Because
we are told the underlying distribution is normal, the excess kurtosis can be assumed to equal
zero and n =33; therefore:
2 2 0.16
[( − ) ]= + = 0.40 = = 0.04
−1 32 4
140
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 7:
In the previous question, you were told that the actual standard deviation was 40%. If, instead of
40%, the measured standard deviation turns out to be 50%, how confident can you be in the
initial assumption? State a null hypothesis and calculate the corresponding probability.
Answer:
An appropriate null hypothesis would be: H0: σ =40%. The appropriate test statistic is:
0.50
(33 − 1) = = 50
0.40
Using a spreadsheet or other program, we calculate the corresponding probability for a chi-
squared distribution with 32 degrees of freedom. Only 2.23% of the distribution is greater than
50. At a 95% confidence level, we would reject the null hypothesis.
Question 8:
A hedge fund targets a mean annual return of 15% with a 10% standard deviation. Last year,
the fund returned –5%. What is the probability of a result this bad or worse happening, given the
target mean and standard deviation? Assume the distribution is symmetrical.
Answer:
1
[| − | ≥ ]≤
1
[| − 15%| ≥ 2 ∙ 10%] ≤ = 25%
2
Because the distribution of returns is symmetrical, half of these extreme returns are greater than
+2 standard deviations, and half are less than –2 standard deviations. This leads to the final
result, 12.5%.
141
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 9:
A fund of funds has investments in 36 hedge funds. At the end of the year, the mean return of
the constituent hedge funds was 18%. The standard deviation of the funds’ returns was 12%.
The benchmark return for the fund of funds was 14%. Is the difference between the average
return and the benchmark return statistically significant at the 95% confidence level?
Answer:
12%
= = 2%
√36
This makes the difference between the average fund return and the benchmark, 18% – 14%
=4%, a +2 standard deviation event. For a t distribution with 35 degrees of freedom, the
probability of being more than +2 standard deviations is just 2.67%. We can reject the null
hypothesis, H0: µ =14%, at the 95% confidence level. The difference between the average
return and the benchmark return is statistically significant.
Question 10:
The probability density function for daily profits at Box Asset Management can be described by
the following function (see Exhibit 7.5):
1
= − 100 ≤ ≤ 100
200
142
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Answer:
= 0.05
Solving, we have:
1 + 100
= = = 0.05
200 200 200
The VaR is a loss of 90. Alternatively, we could have used geometric arguments to arrive at the
same conclusion. In this problem, the PDF describes a rectangle whose base is 200 units and
whose height is 1/200. As required, the total area under the PDF, base multiplied by height, is
equal to one. The leftmost fraction of the rectangle, from –100 to –90, is also a rectangle, with a
base of 10 units and the same height, giving an area of 1/20, or 5% of the total area. The edge
of this area is our VaR, as previously found by integration.
Question 11:
Continuing with our example of Box Asset Management, find the expected shortfall, using the
same PDF and the calculated VaR from the previous question.
Answer:
In the previous question we found that the VaR, v, was equal to –90. To find the expected
shortfall, we need to solve the following equation:
1
=
0.05
Solving, we find:
1 1 1 1 1
= = 2 = [ ] = ((−90) − (−100) ) = −95
0.05 200 20 20 20
The final answer, a loss of 95 for the expected shortfall, makes sense. The PDF in this problem
is a uniform distribution, with a minimum at –100. Because it is a uniform distribution, all losses
between the (negative) VaR, –90, and the minimum, –100, are equally likely; therefore, the
average loss, given a VaR exceedance, is halfway between –90 and –100.
143
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Question 12:
The probability density function for daily profits at Pyramid Asset Management can be described
by the following functions (see Exhibit 7.6):
3 1
= + − 15 ≤ ≤5
80 400
5 1
= − 5< ≤ 25
80 400
The density function is zero for all other values of π. What is the one-day 95% VaR for Pyramid
Asset Management?
Answer:
= 0.05
By inspection, half the distribution is below 5, so we need only bother with the first half of the
function:
3 1 3 1 3 1
+ = + = ( + 15) + ( − 225) = 0.05
80 400 80 80 80 80
+ 30 + 185 = 0
-continued on next page-
144
Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
Because the distribution is not defined for π< –15, we can ignore the negative, giving us the
final answer:
The one-day 95% VaR for Pyramid Asset Management is approximately 8.68.
145