QA Chapter 1 2 3 4 5

Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.
The information provided in this document is intended solely for you. Please do not freely distribute.
P1.T2. Quantitative Analysis
Miller, Mathematics & Statistics for Financial Risk

Management
Bionic Turtle FRM Study Notes

By David Harper, CFA FRM CIPM and Deepa Raju
www.bionicturtle.com
Miller, Chapter 2: Probabilities

DESCRIBE AND DISTINGUISH BETWEEN CONTINUOUS AND DISCRETE RANDOM VARIABLES. ............................... 4
DEFINE AND DISTINGUISH BETWEEN THE PROBABILITY DENSITY FUNCTION, THE CUMULATIVE DISTRIBUTION
FUNCTION (CDF) AND THE INVERSE CDF …................................................................................................. 7
CALCULATE THE PROBABILITY OF AN EVENT GIVEN A DISCRETE PROBABILITY FUNCTION. ................................. 9
DISTINGUISH BETWEEN INDEPENDENT AND MUTUALLY EXCLUSIVE EVENTS. .................................................... 9
DEFINE JOINT PROBABILITY, DESCRIBE A PROBABILITY MATRIX AND CALCULATE JOINT PROBABILITIES USING
PROBABILITY MATRICES. ............................................................................................................................ 10
DEFINE AND CALCULATE A CONDITIONAL PROBABILITY, AND DISTINGUISH BETWEEN CONDITIONAL AND
UNCONDITIONAL PROBABILITIES. ................................................................................................................ 12
CHAPTER SUMMARY.................................................................................................................................. 15
QUESTIONS & ANSWERS: .......................................................................................................................... 17
Miller, Chapter 3: Basic Statistics
INTERPRET THE MEAN, STANDARD DEVIATION, AND VARIANCE OF A RANDOM VARIABLE.................................. 26
CALCULATE THE MEAN, STANDARD DEVIATION, AND VARIANCE OF A DISCRETE RANDOM VARIABLE. ................ 31
INTERPRET AND CALCULATE THE EXPECTED VALUE OF A DISCRETE RANDOM VARIABLE.................................. 31
CALCULATE AND INTERPRET THE COVARIANCE AND CORRELATION BETWEEN TWO RANDOM VARIABLES.......... 33
CALCULATE THE MEAN AND VARIANCE OF SUMS OF LARGER VARIABLES........................................................ 36
DESCRIBE THE FOUR CENTRAL MOMENTS OF A STATISTICAL VARIABLE OR DISTRIBUTION: MEAN, VARIANCE,
SKEWNESS AND KURTOSIS. ........................................................................................................................ 38
INTERPRET THE SKEWNESS AND KURTOSIS OF A STATISTICAL DISTRIBUTION, AND INTERPRET THE CONCEPTS OF
COSKEWNESS AND COKURTOSIS. ............................................................................................................... 39
DESCRIBE AND INTERPRET THE BEST LINEAR UNBIASED ESTIMATOR (BLUE). ............................................... 43
CHAPTER SUMMARY.................................................................................................................................. 44
Miller, Chapter 4: Distributions
DESCRIBE THE KEY PROPERTIES OF THE UNIFORM DISTRIBUTION, BERNOULLI DISTRIBUTION, BINOMIAL
DISTRIBUTION, POISSON DISTRIBUTION, NORMAL DISTRIBUTION, LOGNORMAL DISTRIBUTION, CHI-SQUARED
DISTRIBUTION, STUDENT’S T AND F-DISTRIBUTIONS, AND IDENTIFY COMMON OCCURRENCES OF EACH
DISTRIBUTION............................................................................................................................................ 59
ADDITIONAL DISTRIBUTIONS: NOT IN SYLLABUS BUT OCCASIONALLY RELEVANT ............................................. 73
DESCRIBE THE CENTRAL LIMIT THEOREM AND THE IMPLICATIONS IT HAS WHEN COMBINING I.I.D. RANDOM
VARIABLES. ............................................................................................................................................... 79
DESCRIBE INDEPENDENT AND IDENTICALLY DISTRIBUTED (I.I.D) RANDOM VARIABLES AND THE IMPLICATIONS OF
THE I.I.D. ASSUMPTION WHEN COMBINING RANDOM VARIABLES. .................................................................... 80
DESCRIBE A MIXTURE DISTRIBUTION AND EXPLAIN THE CREATION AND CHARACTERISTICS OF MIXTURE
DISTRIBUTIONS.......................................................................................................................................... 81
CHAPTER SUMMARY.................................................................................................................................. 83
Miller, Chapter 6 (pp. 113-124 only): Bayesian Analysis
DESCRIBE BAYES’ THEOREM AND APPLY … ................................................................................................ 98
COMPARE THE BAYESIAN APPROACH TO THE FREQUENTIST APPROACH...................................................... 104
APPLY BAYES’ THEOREM TO SCENARIOS WITH MORE THAN TWO POSSIBLE OUTCOMES. ............................... 105
Miller, Chapter 7: Hypothesis Testing and Confidence Intervals
CALCULATE AND INTERPRET THE SAMPLE MEAN AND SAMPLE VARIANCE. .................................................... 119
DEFINE AND CONSTRUCT A CONFIDENCE INTERVAL. .................................................................................. 120
DEFINE AND CONSTRUCT AN APPROPRIATE NULL AND ALTERNATIVE HYPOTHESIS, AND CALCULATE AN
APPROPRIATE TEST STATISTIC.................................................................................................................. 121
DIFFERENTIATE BETWEEN A ONE-TAILED AND A TWO-TAILED TEST AND EXPLAIN THE CIRCUMSTANCES IN WHICH
TO USE EACH TEST. ................................................................................................................................. 127
INTERPRET THE RESULTS OF HYPOTHESIS TESTS WITH A SPECIFIC LEVEL OF CONFIDENCE. ......................... 128
DEMONSTRATE THE PROCESS OF BACKTESTING VAR BY CALCULATING THE NUMBER OF EXCEEDANCES....... 131
CHAPTER SUMMARY................................................................................................................................ 133
QUESTIONS & ANSWERS: ........................................................................................................................ 135
2
Miller, Chapter 2: Probabilities

Describe and distinguish between continuous and discrete random variables.
Define and distinguish between the probability density function, the cumulative
distribution function and the inverse cumulative distribution function, and calculate
probabilities based on each of these functions.
Calculate the probability of an event given a discrete probability function.
Distinguish between independent and mutually exclusive events.
Define joint probability, describe a probability matrix and calculate joint probabilities
using probability matrices.
Define and calculate a conditional probability, and distinguish between conditional and
unconditional probabilities.
Selected key terms:
 Statistical or random experiment: An observation or measurement process with

multiple but uncertain outcomes
 Population or sample space: Set of all possible outcomes of an experiment.
 Sample point: Each member or outcome of the sample space.
 Outcome: The result of a single trial. For example, if we roll two dice, an outcome might
be a three (3) and a four (4); a different outcome might be a (5) and a (2).
 Event: The result that reflects none, one, or more outcomes in the sample space.
Events can be simple or compound. An event is a subset of the sample space. If we roll
two-dice, an example of an event might be rolling a seven (7) in total.
 Random variable (or stochastic variable): A stochastic or random variable (r.v.) is a
“variable whose value is determined by the outcome of an experiment”.
 Discrete random variable: A random variable (r.v.) that can take a finite number of
values (or countably infinite). For example, coin, six-sided die, bond default (yes or no).
 Continuous random variable: A random variable (r.v.) that can take any value in some
interval; e.g., asset returns, time.
 Mutually exclusive events: Events which cannot simultaneously occur. If A and B are
mutually exclusive, the probability of (A and B) is zero. Put another way, their
intersection is the null set.
 Collectively exhaustive events (a.k.a., cumulatively exhaustive): Events that
cumulatively describe all possible outcomes.
3
Describe and distinguish between continuous and discrete random

variables.
We characterize (describe) a random variable with a probability distribution. The random
variable can be discrete or continuous; in either the discrete or continuous case, the probability
can be local (pmf or pdf) or cumulative (CDF).
A random variable’s value is determined by the outcome of an experiment (aka,

stochastic variable). “A random variable is a numerical summary of a random outcome. The
number of times your computer crashes while you are writing a term paper is random and takes
on a numerical value, so it is a random variable.”—Stock & Watson
Continuous random variable
A continuous random variable (X) can take on an infinite number of values within an interval:
( < < )= ( )
4
Discrete random variable
A discrete random variable (X) assumes a value among a finite set including x1, x2, x3 and so
on. The probability function is expressed by:
( = )= ( )
Continuous versus discrete random variables
 Discrete random variables can be counted. Continuous random variables must be

measured.
 Examples of a discrete random variable include:
o Coin toss (head or tails, nothing in between) or roll of the dice (1, 2, 3, 4, 5, 6)
o Toggle: “did the fund beat the benchmark?” (yes, no).
o In risk, common discrete random variables are default/no default (0/1) and loss
frequency.
 Examples of a continuous random variable include distance and time.
o A common example of a continuous variable in risk is loss severity.
 Note the similarity between the summation (∑) under the discrete variable and the
integral (∫) under the continuous variable.
o The summation (∑) of all discrete outcomes must equal one.
o Similarly, the integral (∫) captures the area under the continuous distribution
function. The total area “under this curve,” from (-∞) to (∞), must equal one.
 All four of the so-called sampling distributions—each of which converges to the
normal—are continuous:
 normal, student’s t, chi-square, and F distribution.
5
Comparison
Continuous Discrete
Are measured Are counted
Infinite Finite
Applications in Finance
Distance, Time (e.g.) Default (1,0) (e.g.)
Severity of loss (e.g.) Frequency of loss (e.g.)
Asset returns (e.g.)
For example:
Normal Bernoulli (0/1)
Student’s t Binomial (series i.i.d. Bernoullis)
Chi-square Poisson
F distribution Logarithmic
Lognormal
Exponential The four in orange are “sampling
Gamma, Beta distributions”: e.g., the student’s t is
EVT Distributions (GPD, GEV) used to test the sample mean
Gamma, Beta
EVT Distributions (GPD, GEV)
Miller on discrete versus continuous:
“… we will be working with both discrete and continuous random variables. Discrete random
variables can take on only a countable number of values—for example, a coin, which can only
be heads or tails, or a bond, which can only have one of several letter ratings (AAA, AA, A,
BBB, etc.).
… In contrast to a discrete random variable, a continuous random variable can take on any
value within a given range. A good example of a continuous random variable is the return of a
stock index. If the level of the index can be any real number between zero and infinity, then the
return of the index can be any real number greater than −1.
Even if the range that the continuous variable occupies is finite, the number of values that it can
take is infinite. For this reason, for a continuous variable, the probability of any specific value
occurring is zero.
Even though we cannot talk about the probability of a specific value occurring, we can talk about
the probability of a variable being within a certain range. Take, for example, the return on a
stock market index over the next year. We can talk about the probability of the index return
being between 6% and 7%, but talking about the probability of the return being exactly 6.001%
is meaningless. Even between 6.0% and 7.0% there are an infinite number of possible values.
The probability of any one of those infinite values occurring is zero.”
6
Define and distinguish between the probability density function, the

cumulative distribution function (CDF) and the inverse CDF.
Probability density functions (pdf): Top row in the graph below
The probability density function answers a “local” question: If the random variable is discrete,
the pdf (a.k.a., probability mass function, pmf, if discrete) is the probability the variable will
assume an exact value of x; i.e., : ( = ).
If the random variable is continuous, the pdf tells us the likelihood of outcomes occurring on
an interval between any two points. Given our continuous random variable, X, with a probability
p of being between r1 and r2, we can define our pdf, f(x), such that: :∫ ( ) =
The pdf functions are illustrated on the top row in the graph below with continuous (left-hand)
and discrete (right-hand) functions:
[ < < ]=∫ ( ) = ( )= ( = )
Cumulative distribution functions (CDF): bottom row in the above graph
The cumulative density function (CDF) associates with either a PMF or PDF (i.e., the CDF can
apply to either a continuous or random variable). The CDF gives the probability the random
variable will be less than, or equal to, some value. CDF: P (X  x).
The continuous (left-hand) and discrete (right-hand) cumulative distribution functions are:
( )=∫ ( ) = [ ≤ ] ( )= ( ≤ )
7
Inverse cumulative distribution function
If F(x) is a cumulative distribution function, then we define F−1(p), the inverse cumulative
distribution, as follows:
( )= ⇔ ( )= s. t. 0 ≤ ≤1
We can examine the inverse cumulative distribution function by applying it to the standard
normal distribution, N(0,1). For example, F-1(95%) = 1.645 because at +1.645 standard
deviations to the right of mean in a standard normal distribution, 95% of the area is to the left
(under the curve).
The inverse cumulative distribution function (CDF) is also called the quantile function;
see http://en.wikipedia.org/wiki/Quantile_function. We can now see why Dowd says that
“VaR is just a quantile [function]”. For example, the 95% VaR is the inverse CDF for p = 5% or p
= 95%; for the standard normal, that is = NORM.S.INV(5%) = -1.645.
Here are common inverse CDFs:
Standard Normal
Distribution
p F-1(p)
1.0% (2.326)
5.0% (1.645)
50.0% 0
84.13% 1.000
90.0% 1.282
95.0% 1.645  Must know
97.72% 2.000
99.0% 2.326  Must know
Univariate versus multivariate probability density functions
A single variable (univariate) probability distribution is concerned with only a single random
variable; e.g., roll of a die, default of a single obligor.
A multivariate probability density function concerns the outcome of an experiment with more
than one random variable. This includes the simplest case of two variables (i.e., a bivariate
distribution).
Density Cumulative
Univariate f(x)= P (X = x) F(x) = P (X ≤ x)
Bivariate f(x)= P (X = x, Y =y) F(x) = P (X ≤ x, Y ≤ y)
8
Calculate the probability of an event given a discrete probability

function.
Probability: Classical or “a priori” definition
The probability of outcome (A) is given by:
( )=
For example, consider a craps roll of two six-sided dice. What is the probability of rolling a
seven? i.e., P[X=7]. There are six outcomes that generate a roll of seven: 1+6, 2+5, 3+4, 4+3,
5+2, and 6+1. Further, there are 36 total outcomes. Therefore, the probability is 6/36 =1/6.
In this case, the outcomes need to be mutually exclusive, equally likely, and “cumulatively
exhaustive” (i.e., all possible outcomes included in total). A key property of a probability is that
the sum of the probabilities for all (discrete) outcomes is 1.0.
Probability: Relative frequency or empirical definition
Relative frequency is based on an actual number of historical observations (or Monte Carlo
simulations). For example, here is a simulation (produced in Excel) of one hundred (100) rolls of
a single six-sided die:
Empirical Distribution
Roll Freq. %
1 11 11%
2 17 17%
3 18 18%
4 21 21%
5 18 18%
6 15 15%
Total 100 100%
Note the difference between an a priori probability and an empirical probability:

 The a priori (classical) probability of rolling a three (3) is 1/6.
 But the empirical frequency, based on this sample, is 18%. If we generate another
sample, we will produce a different empirical frequency.
This relates also to sampling variation. The a priori probability is based on population properties;
in this case, the a priori probability of rolling any number is clearly 1/6th. However, a sample of
100 trials will exhibit sampling variation: the number of threes (3s) rolled above varies from the
parametric probability of 1/6th. We do not expect the sample to produce 1/6th perfectly for each
outcome.
9
Distinguish between independent and mutually exclusive events.

Mutually exclusive events (One random variable and two mutually exclusive events)
For a given random variable, the probability of any of two mutually exclusive events occurring is
just the sum of their individual probabilities. In statistical notation, we can write:
[ ∪ ] = [ ] + [ ] if mutally exclusive
where A  B is the union of A and B; i.e., the probability of either A or B occurring.
This equality is true only for mutually exclusive events. This property of mutually exclusive
events can be extended to any number of events. The probability that any of n mutually
exclusive events occurs is the sum of the probabilities of (each of) those n events.
Independent events (More than one random variable)
The random variables X and Y are independent if the conditional distribution of Y given X equals
the marginal distribution of Y. Since independence implies P (Y=y | X=x) = P(Y=y):
P(X = x, Y = y)
P( = | = )=
P (X = x)
Statistical independence is when the value taken by one variable has no effect on the value
taken by the other variable. If the variables are independent, their joint probability will equal
the product of their marginal probabilities. If they are not independent, they are dependent.
Thus, the most useful test of statistical independence is given by:
P(X = x, Y = y) = P(X = x) × P(Y = y)
That is, random variables X and Y are independent if their joint distribution is equal to the
product of their marginal distributions. For example, when rolling two dice, the outcome of the
second one will be independent of the first. This independence implies that the probability of
rolling double-sixes is equal to the product of P (rolling one six) and P(rolling one six). So, if two
die are independent, then:
P (first roll = 6, second roll = 6) = P (rolling a six) × P (rolling a six) = (1/6) × (1/6) = 1/36
Define joint probability, describe a probability matrix and calculate

joint probabilities using probability matrices.
The joint probability is the probability that the random variables (in this case, two random
variables) take on certain values simultaneously. Thus, we refer to the probability of two events
occurring together as their joint probability.
“The joint probability distribution of two discrete random variables, say X and Y, is the probability
that the random variables simultaneously take on certain values, say x and y. The probabilities
of all possible (x, y) combinations sum to 1. The joint probability distribution can be written as
the function P (X = x, Y = y).” — Stock & Watson
10
When dealing with the joint probabilities of two variables, it is convenient to summarize the
various probabilities in a probability matrix (a.k.a., probability table).
Example: (Consider this same example for calculating joint, unconditional and conditional
probabilities.) In Miller’s example, we assume a company that issues both bonds and stock.
The bonds can either be downgraded, be upgraded, or have no change in rating. The stock can
either outperform the market or underperform the market.
In the probability matrix above, the interior cells represent the joint probabilities. In this table
below, the joint probabilities are highlighted and the calculation of joint probabilities using the
probability matrix is shown.
 For example, the joint probability of both the company's stock outperforming the market
and the bonds being upgraded is 15%.
 Similarly, the joint probability of the stock underperforming the market and the bonds
having no change in rating is 25%.
 Importantly, all of the joint probabilities add to 100%. Given all the possible events, one
of them must happen.
11
We can also derive the joint probability when conditional probabilities (refer to the next section)
are given, like in the table below:
Joint probability = Conditional probability × Marginal Probability
 For eg. the joint probability of both the company's stock outperforming the market and
the bonds being upgraded can be calculated as conditional probability of bonds being
upgraded given that stocks are outperforming multiplied by the marginal (unconditional)
probability of stocks outperforming = 30.0% × 50.0% =15.0%.
Define and calculate a conditional probability, and distinguish

between conditional and unconditional probabilities.
Marginal (a.k.a., unconditional) probability functions: A marginal probability is the simple
case; it is the probability that does not depend on a prior event or prior information. The
marginal probability is also called the unconditional probability.
The marginal (unconditional) probability is: P(Y = y) = ∑ (X = , = )
“The marginal probability distribution of a random variable Y is just another name for its
probability distribution. This term distinguishes the distribution of Y alone (marginal distribution)
from the joint distribution of Y and another random variable. The marginal distribution of Y can
be computed from the joint distribution of X and Y by adding up the probabilities of all possible
outcomes for which Y takes on a specified value”— Stock & Watson
12
We can get the unconditional (a.k.a., marginal) probabilities, by adding across a row or down a
column as exhibited in the exterior cells of the probability matrix.
 For example, as calculated in the table above, the probability of the bonds being
downgraded, irrespective of the stock's performance, is 25%.
 Similarly, the probability of the equity outperforming the market is 50%.
Conditional probability function: The conditional probability is the probability of an outcome
given (or conditional on) another outcome.
Conditional probability = Joint Probability ÷ Marginal Probability
Pr(X=x,Y=y)
Conditional probability is: P( = | = )= Pr(X=x)
“The distribution of a random variable Y conditional on another random variable X taking on a

specific value is called the conditional distribution of Y given X. The conditional probability that Y
takes on the value y when X takes on the value x is written: P (Y = y | X = x).” –S&W
What is the probability of B occurring, given that A has already occurred?
P(A ∩ B)
(B | ) = ⇒ P(A)P(B | ) = ( ∩ )
P(A)
Again, using the same probability matrix, conditional probability can be calculated.
 For example, the conditional probability of a bond upgrade given stocks are
outperforming is the joint probability of a bond upgrade and stock outperformance
divided by the unconditional or marginal probability of stock outperformance =
15.0%/50.0%= 30.0%.
13
Conditional and unconditional expectation: The conditional concept extends to expectations

also:
 An unconditional expectation is the expected value of the variable without any
restrictions (or lacking any prior information)
 A conditional expectation is an expected value for the variable conditional on prior
information or some restriction (e.g., the value of a correlated variable). The conditional
expectation of Y, conditional on X = x, is given by E(Y | X= x)
Also note:
 Please note that we can refer also to a conditional variance of Y, conditional on X=x, is
given by variance (Y | X = x)
 The two-variable regression is an important conditional expectation. In this case, we say
the expected Y is conditional on X: E(Y | ) = +
Miller: “The concept of independence is closely related to the concept of conditional probability.
Rather than trying to determine the probability of the market being up and having rain, we can
ask, ‘What is the probability that the stock market is up given that it is raining?’ We can write this
as a conditional probability: P [ market up | rain].
The vertical bar tells us the probability of the first argument is conditional on the second. We
read this as “The probability of ‘market up’ given ‘rain’ is equal to p.” If the weather and the
stock market are independent, then the probability of the market being up on a rainy day is the
same as the probability of the market being up on a sunny day.
If the weather somehow affects the stock market, however, then the conditional probabilities
might not be equal. We could have a situation where: P [market up | rain] ≠ [market up | no
rain] … In this case, the weather and the stock market are no longer independent. We can no
longer multiply their probabilities together to get their joint probability.”
14
Chapter Summary
Essential distribution perspectives (terms):
 A random variable is described with a probability distribution. A continuous random
variable (X) has an infinite number of values within an interval:
P(a < X < b) = f(x)dx
 A discrete random variable (X) assumes a value among a finite set including x1, x2, x3
and so on. The probability function is expressed by:
P(X = x ) = f(x )
 The probability density function (pdf) answers a “local” question: If the random
variable is discrete, the pdf is the probability the variable will assume an exact value of x.
If the variable is continuous, the pdf tells the likelihood of outcomes occurring on an
interval between any two points.
 The cumulative distribution function gives the probability the random variable will be
less than, or equal to, some value. If F(x) is a cumulative distribution function, then
F−1(p), the inverse cumulative distribution, is defined as:
F(x) = p ⇔ F (p) = x s. t. 0 ≤ p ≤ 1
 A single variable (univariate) probability distribution is concerned with only a single
random variable. A multivariate probability density function concerns the outcome of
an experiment with more than one random variable.
Classical or “a priori” probability: The probability of outcome (A) is given by:
Number of outcomes favorable to A

P(A) =
Total number of outcomes
Relative frequency or empirical definition of probability is based on an actual number of
historical observations.
Essential probability terms:

 For a given random variable, the probability of any of two mutually exclusive events
occurring is just the sum of their individual probabilities.
 Random variables X and Y are independent if their joint distribution is equal to the
product of their marginal distributions.
 The joint probability distribution of two discrete random variables is the probability
that the random variables simultaneously take on certain values. The probabilities of all
possible combinations sum to 1.
 A marginal (or unconditional) probability is the probability that does not depend on a
prior event or prior information.
 The conditional probability is the probability of an outcome given (or conditional on)
another outcome.
15
 Conditional probability = Joint Probability ÷ Marginal Probability

P(A ∩ B)
P(B|A) =
P(A)
⇒ P(A)P(B|A) = P(A ∩ B)
 An unconditional expectation is the expected value of the variable without any
restrictions (or lacking any prior information).
 A conditional expectation is an expected value for the variable conditional on prior
information or some restriction.
16
Questions & Answers:

300.1. Assume the probability density function (pdf) of a zero-coupon bond with a notional value
of $10.00 is given by f(x) = x/8 - 0.75 on the domain [6,10] where x is the price of the bond:
f(x) = − 0.75 s. t. 6 ≤ x ≤ 10 where x = bond price

8
What is the probability that the price of the bond is between $8.00 and $9.00?
a) 25.750%
b) 28.300%
c) 31.250%
d) 44.667%
300.2. Assume the probability density function (pdf) of the final value (at maturity) of a zero-
coupon bond with a notional value of $5.00 is given by f(x) = (3/125)*x^2 on the domain [0,5]
where x is the price of the bond:
3
f(x) = s. t. 0 ≤ x ≤ 5 where x = bond price
125
Although the mean of this distribution is $3.75,assume the expected final payoff is a return of
the full par of $5.00.If we apply the inverse cumulative distribution function and find the price of
the bond (i.e.,the value of x) such that 5.0% of the distribution is less than or equal to (x),let this
price be represented by q(0.05); in other words,a 5% quantile function.If the 95.0% VaR is given
by -[q(0.05) - 5] or [5 - q(0.05)],which is nearest to this 95.0% VaR?
a) $1.379
b) $2.842
c) $2.704
d) $3.158
301.1. A random variable is given by the discrete probability function f(x) = P[X = x(i)] = a*X^3
such that x(i) is a member of {1, 2, 3} and (a) is a constant. That is, X has only three discrete
outcomes. What is the probability that X will be greater than its mean? (bonus: what is the
distribution's variance?)
( )= ∈ {1,2,3}
a) 45.8%
b) 50.0%
c) 62.3%
d) 75.0%
17
301.2. A credit asset has a principal value of $6.0 with probability of default (PD) of 3.0% and a
loss given default (LGD) characterized by the following continuous probability density function
(pdf): f(x) = x/18 such that 0 ≤ x ≤ $6. Let expected loss (EL) = E[PD*LGD]. If PD and LGD are
independent, what is the asset's expected loss? (note: why does independence matter?)
( )= 0≤ ≤6
18
a) $0.120
b) $0.282
c) $0.606
d) $1.125
302.1. There is a prior (unconditional) probability of 20.0% that the Fed will initiate Quantitative
Easing 4 (QE 4). If the Fed announces QE 4, then Macro Hedge Fund will outperform the
market with a 70% probability. If the Fed does not announce QE 4, there is only a 40%
probability that Macro will outperform (and a 60% that Acme will under-perform; like the Fed's
announcement, there are only two outcomes). If we observe that Macro outperforms the market,
which is nearest to the posterior probability that the Fed announced QE 4?
a) 20.0%
b) 27.9%
c) 30.4%
d) 41.6%
18
Answers:
300.1. C. 31.250%
The anti-derivative is F(X) = x^2/16 - 0.75*x + c.
We can confirm it is a probability by evaluating it on the domain [x = 6, x = 10]

= 10^2/16 - 0.75*10 - 6^2/16 - 0.75*6 = -1.25 - (-2.25) = 1.0.
(Question to ponder: how can this be a CDF if, as a function, it does not appear to start at zero
and end at 1.0?)
Probability [8 <= x <= 9] = [9^2/16 - 0.75*9] - [8^2/16 - 0.75*8]

= -1.68750 - (-2.000) = 31.250%
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-300-probability-

functions-miller.6728/
300.2. D. $3.158
As f(x) = 3/125*x^2, F(x) = 3/125*(1/3)*x^3 = p, such that:

p = F(x) = (3/125)*(1/3)*x^3 = x^3/125, solving for x:
x = (125*p)^(1/3) = 5*p^(1/3). For p = 5%, x = 5*5%^(1/3) = $1.8420.
As q(0.05) = $1.8420, 95% VaR = $5.00 - $1.8420 = $3.1580

functions-miller.6728
301.1. D. 75.0%
Because it is a probability function, a*1^3 + a*2^3 + a*3^3 = 1.0; i.e., 1a + 8a + 27a = 1.0, such
that a = 1/36.
Mean = 1*(1/36) + 2*(8/36) + 3*(27/36) = 2.722.
The P [X > 2.2722] = P[X = 3] = (1/36)*3^3 = 27/36 = 75.0%
Bonus: Variance = (1 -2.722)^2*(1/36) + (2 -2.722)^2*(8/36) + (3 -2.722)^2*(27/36) = 0.2562,
with standard deviation = SQRT(0.2562) = 0.506135
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-301-millers-

probability-matrix.6757
19
301.2. A. $0.120
If PD and LGD are not independent, then E[PD*LGD] <> E(PD) * E(LGD); for example, if they
are positively correlated, then E[PD*LGD] > E(PD) * E(LGD).
For the E[LGD], we integrate the

pdf: if f(x) = x/18 s.t. 0 < x < $6,
then F'(x) = (1/18)*(1/2)*x^2 =
x^2/36 (note this satisfied the
definition of a probability over the
domain (0,6) as 6^2/36 = 1.0).
The mean of f(x) integrates xf(x)

where xf(x) = x*x/18 = x^2/18,
which integrates to 1/18*(x^3/3) =
x^3/54, so E[LGD] = 6^3/54 = $4.0.
Therefore, the expected loss =

E[PD * LGD] = 3.0%*$4.0 = $0.120.

probability-matrix.6757
302.1. C. 30.4%
Per Bayes, P(QE 4 | Macro Outperforms) = Joint Prob (QE 4, Outperforms) / Unconditional Prob
(Outperforms) = (20%*70%)/(20%*70% + 80%*40%) = 14%/46% = 30.435%
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-302-bayes-theorem-

miller.6767
20
End of Chapter Questions & Answers

Question 1:
You are invested in two hedge funds. The probability that hedge fund Alpha generates positive
returns in any given year is 60%. The probability that hedge fund Omega generates positive
returns in any given year is 70%. Assume the returns are independent. What is the probability
that both funds generate positive returns in a given year? What is the probability that both funds
lose money?
Answer:
Probability that both generate positive returns = 60% × 70% = 42%.

Probability that both funds lose money = (1 – 60%) × (1 – 70%) = 40% × 30% = 12%.
Question 2:
Corporation ABC issues $100 million of bonds. The bonds are rated BBB. The probability that
the rating on the bonds is upgraded within the year is 8%. The probability of a downgrade is 4%.
What is the probability that the rating remains unchanged?
Answer:
88%. The sum of all three events—upgrade, downgrade, and no change—must sum to one.
There is no other possible outcome. 88% + 8% + 4% = 100%.
Question 3:
Stock XYZ has a 20% chance of losing more than 10% in a given month. There is also a 30%
probability that XYZ gains more than 10%. What is the probability that stock XYZ either loses
more than 10% or gains more than 10%?
Answer:
50%. The outcomes are mutually exclusive; therefore, 20% + 30% = 50%.
Question 4:
There is a 30% chance that oil prices will increase over the next six months. If oil prices
increase, there is a 60% chance that the stock market will be down. What is the probability that
oil prices increase and the stock market is down over the next six months?
Answer:
[oil up ∩ stock market down] = [stock market down|oil up] ∙ [oil up]
[oil up ∩ stock market down] = 60% ∙ 30% = 18%
21
Question 5:
Given the following density function:
( )= (100 − )for − 10 ≤ x ≤ 10
0 otherwise
Calculate the value of c.
Answer:
Given the density function, we can find cby noting that the sum of probabilities must be equal to
one:
1
( ) = (100 − ) = 100 −
3
3
=
4,000
Question 6:
Given the following cumulative distribution function, F(x), for 0 ≤ x ≤10:
( )= (20 − )
100
Check that this is a valid CDF; that is, show that (0) = 0 and (10) = 1. Calculate the
probability density function, f(x).
Answer:
First we check that this is a valid CDF, by calculating the value of the CDF for the minimum and
maximum values of x:
0
(0) = (20 − 0) = 0
100
10
(10) = (20 − 10) = 1
100
Next we calculate the PDF by taking the first derivative of the CDF:
20 2 1
( )= ( )= − = (10 − )
100 100 50
22
Question 7:
Given the probability density function, f(x):
( )=
where 1 ≤ ≤ . Calculate the cumulative distribution function, F(x), and solve for the
constant c.
Answer:
We first calculate the CDF by integrating the PDF:
( )= ( ) = = [ln ] = ln
We first try to find cusing the fact that the CDF is zero at the minimum value of x, x =0.
(0) = ln(1) = ∙ 0 = 0
As it turns out, any value of cwill satisfy this constraint, and we cannot use this to determine c.
If we use the fact that the CDF is 1 for the maximum value of x, x = e, we find that c =1:
( ) = ln( ) = ∙ 1 =
∴ =1
The CDF can then be expressed simply as:
( ) = ln( )
Question 8:
You own two bonds. Both bonds have a 30% probability of defaulting. Their default probabilities
are statistically independent. What is the probability that both bonds default? What is the
probability that only one bond defaults? What is the probability that neither bond defaults?
Answer:
[both bonds default] = 30% × 30% = 9%.

[one defaults] = 2 × 30% × (1 − 30%) = 42%
[neither defaults] = (1 − 30%) × (1 − 30%) = 49%
For the second part of the question, remember that there are two scenarios in which only one
bond defaults: Either the first defaults and the second does not, or the second defaults and the
first does not.
23
Question 9:
The following table is a one-year ratings transition matrix. Given a bond’s rating now, the matrix
gives the probability associated with the bond having a given rating in a year’s time. For
example, a bond that starts the year with an A rating has a 90% chance of maintaining that
rating and an 8% chance of migrating to a B rating. Given a B-rated bond, what is the probability
that the bond defaults (D rating) over one year? What is the probability that the bond defaults
over two years?
Answer:
The probability that a B-rated bond defaults over one year is 2%. This can be read directly from
the last column of the second row of the ratings transition matrix.
The probability of default over two years is 4.8%. During the first year, a B-rated bond can either
be upgraded to an A rating, stay at B, be downgraded to C, or default. From the transition
matrix, we know that the probability of these events is 10%, 80%, 8%, and 2%, respectively. If
the bond is upgraded to A, then there is zero probability of default in the second year (the last
column of the first row of the matrix is 0%). If it remains at B, there is a 2% probability of default
in the second year, the same as in the first year. If it is downgraded to C, there is a 15%
probability of default in the second year. Finally, if a bond defaulted in the first year, it stays
defaulted (the last column of the last row is 100%). Putting all this together, we have:
[default] = 10% × 0% + 80% × 2% + 8% × 15% + 2% × 100% = 4.8%
24
Question 10:
Your firm forecasts that there is a 50% probability that the market will be up significantly next
year, a 20% probability that the market will be down significantly next year, and a 30%
probability that the market will be flat, neither up or down significantly. You are asked to
evaluate the prospects of a new portfolio manager. The manager has a long bias and is likely to
perform better in an up market. Based on past data, you believe that the probability that the
manager will be up if the market is up significantly is 80%, and that the probability that the
manager will be up if the market is down significantly is only 10%. If the market is flat, the
manager is just as likely to be up as to be down. What is the unconditional probability that the
manager is up next year?
Answer:
Using Mto represent the market and Xto represent the portfolio manager, we are given the
following information:
M = 50%
[M ] = 20%
[M ] = 30%
X M = 80%
X M = 10%
X M = 50%
The unconditional probability that the manager is up next year, [X ], is then 57%:
X = X M ∙ M + X M ∙ [M ] + X M ∙ [M ]
X = 80% ∙ 50% + 10% ∙ 20% + 50% ∙ 30%
X = 40% + 2% + 15% = 57%
25
Miller, Chapter 3: Basic Statistics

Interpret and apply the mean, standard deviation, and variance of a random variable.
Calculate the mean, standard deviation, and variance of a discrete random variable
Interpret and calculate the expected value of a discrete random variable.
Calculate and interpret the covariance and correlation between two random variables.
Calculate the mean and variance of sums of larger variables.
Describe the four central moments of a statistical variable or distribution: mean,

variance, skewness and kurtosis.
Interpret the skewness and kurtosis of a statistical distribution, and interpret the
concepts of coskewness and cokurtosis.
Describe and interpret the best linear unbiased estimator (BLUE).
Interpret and apply the mean, standard deviation, and variance of a

random variable.
If we can characterize a random variable (e.g., if we know all the outcomes and that each
outcome is equally likely—as is the case when you roll a single die)—the expectation of the
random variable is often called the mean or arithmetic mean.
Expected value (mean)
Expected value exists when we have a parametric (e.g., normal, binomial) distribution or
probabilities. Expected value is the weighted average of possible values.
In the case of a discrete random variable, expected value is given by:
( )= + +. . . + =
In the case of a continuous random variable, expected value is given by:
( )= ( )
26
Population mean versus sample mean
If we have a complete data set, then the mean is a population mean which implies that the
mean is exactly the true (and only true) mean:
1 1
= = ( + +. . . + )
However, in practice, we typically do not have the population. Rather, more often we have only
a subset of the population or a dataset that cannot realistically be considered comprehensive;
e.g., the most recent year of equity returns. A mean of such a dataset, which is much more
likely in practice, is called the sample mean. The sample mean, of course, uses the same
formula:
1 1
̂= = ( + +. . . + )
But the difference between a population parameter (e.g., population mean) and a sample
estimate (e.g., sample mean) is essential to statistics:
 Each sample will produce a different sample mean, which is likely to be near the “true”
population mean but different depending on the sample.
 We use the sample estimate to infer something about the unobserved population
parameter.
Variance
Variance (and standard deviation) is the second moment, the most common measures of
dispersion. The variance of a discrete random variable Y is given by:
= variance(Y) = E Y − = ( − )
Variance is also expressed as the difference between the expected value of Y2 and the square
of the expected value of Y. This is the more useful variance formula:
= ( − ) = ( ) − [ ( )]
Please memorize this variance formula above: it comes in handy!
For example, if the probability of loan default (PD) is a Bernoulli trial, what is the variance of
PD?
As [ ] = and [ ] = :
[ ] − ( [ ]) = − = × (1 − )
27
Properties of variance
=0
= + ; only if independent
=
=
=
= ( )− ( )
Standard deviation
As variance is the square of standard deviation, standard deviation is square root of

variance. Standard deviation is given by:
= var(Y) = E( − ) = ( − )
Example 1: Variance of a Bernoulli distribution: For eg. a coin toss where p = 1-p = 1/2 = 0.50.
 Mean: µ = 0.50 × 0 + 0.50 × 1 = 0.50

 Variance: = 0.50 × (0 − 0.50) + 0.50 × (1 − 0.50) = 0.25
 Standard deviation: = √0.25 = 0.50
Example 2: A derivative has a 50/50 chance of being worth either +10 or −10 at expiry. The
variance and the standard deviation of the derivative’s value are calculated below.
28
 Mean: µ = 0.50 × 10 + 0.50 × (−10) = 0

 Variance: = 0.50 × (10 − 0) + 0.50 × (10 − 0) = 100 and = √100 = 10
Example 3: Variance of a single six-sided die.
 The expected value of a single six-sided die is 3.5 (the average outcome or mean).
 First, we need to solve for the expected value of X-squared, E[X2]. This is given by:
1 1 1 1 1 1 91
[ ]= (1 ) + (2 ) + (3 ) + (4 ) + (5 ) + (6 ) =
6 6 6 6 6 6 6
 Then, we need to square the expected value of X, [E(X)]2 such that the variance of a
single six-sided die is given by:
91
( )= ( ) − [ ( )] = − (3.5) ≅ 2.92
6
 The standard deviation is therefore the square root of variance =√2.92 = 1.708
Variance of the total of two six-sided die cast together: It is simply the variance X (=2.92)
plus the variance Y(=2.92) or about 5.83. The reason we can simply add them together is that
they are independent random variables.
29
Sample Variance
The unbiased estimate of the sample variance is given by:
= ∑ ( − )
If the mean is known or we are calculating the population variance, then we divide by k but if
instead the mean is being estimated, then we divide by k – 1 like it is shown here.
The above sample variance is used by Hull, for example, to calculate historical variance (and
volatility) of asset returns. Specifically, it employs a sample variance (which divides by k-1
or n-1) to compute historical volatility. Admittedly, because the variable is daily returns, Hull
subsequently makes two simplifying assumptions, including reversion to division by (n) or (k).
However, the point remains: when computing the volatility (standard deviation) of an historical
set of the returns, the square root of the above sample variance it typically appropriate: it gives
an unbiased estimate (of the variance, at least).
Sample Standard Deviation:
This is merely the square root of the sample variance. The unbiased estimate of the sample
standard deviation is given by:
1
= ( − )
−1
This formula is important because this is the technically a safe way to calculate sample
volatility; i.e., when in doubt, you are rarely mistaken to employ the (n-1) or (k-1).
Example: Assume that the mean of daily Standard & Poor’s (S&P) 500 Index returns is zero.
You observe the returns (Daily Return) given in the table over the course of 10 days. Estimate
the standard deviation of daily S&P 500 Index returns.
30
Closing Price Daily Squared

Day Price Relative Return Return
S(i)/S(i-1) u(i) = u(i)^2
LN[S(i)/S(i-1)]
0 $10.00
1 10.73 1.073 7.0% 0.004900
2 10.30 0.961 -4.0% 0.001600
3 11.50 1.116 11.0% 0.012100
4 12.46 1.083 8.0% 0.006400
5 12.84 1.030 3.0% 0.000900
6 14.05 1.094 9.0% 0.008100
7 11.39 0.811 -21.0% 0.044100
8 12.59 1.105 10.0% 0.010000
9 11.50 0.914 -9.0% 0.008100
10 11.39 0.990 -1.0% 0.000100
Average: 0.01300 0.009630
SQRT(avg[u(i)^2]) 0.098133
aka, daily volatility 9.813%
Note: We were told to assume the mean was known, so we divide by n = 10, not n−1 = 9.
1
 Sample variance is: = ∑ ( − 0) = 0.009630.
 So, the sample standard deviation is √0.009630 = 9.813%
Calculate the mean, standard deviation, and variance of a discrete

random variable.
Interpret and calculate the expected value of a discrete random

variable.
For a discrete random variable, we can also calculate the mean, median, and mode. Note that,
we know a discrete random variable can only take finite number of values and sum of
their probabilities amounts to 1.0 (100%).
Mean: The mean of a discrete random variable is a special case of the weighted mean,
where the outcomes are weighted by their probabilities, and the sum of the weights is
equal to one. For a random variable, X, with possible values, xi, and corresponding
probabilities pi, we define the mean, μ, as:
31
Median: The median of a discrete random variable is the value such that the probability
that a value is less than or equal to the median is equal to 50%. Working from the other end
of the distribution, we can also define the median such that 50% of the values are greater than
or equal to the median. For a random variable, X, if we denote the median as m, we have:
[ ≥ ] = [ ≤ ] = 0.50
Mode: For a discrete random variable, the mode is the value associated with the highest
probability. As with population and sample data sets, the mode of a discrete random variable
need not be unique.
Example: At the start of the year, a bond portfolio consists of two bonds, each worth $100. At
the end of the year, if a bond defaults, it will be worth $20. If it does not default, the bond will be
worth $100. The probability that both bonds default is 20%. The probability that neither bond
defaults is 45%. What are the mean, median, and mode of the year-end portfolio value?
Solution: We are given the probability for two outcomes for the value of portfolio at the end of
the year:
 [ = $40] = 20%
 [ = $200] = 45%
The third and only alternate outcome can be the scenario where one bond defaults whereas the
other one does not. In this case, the value of portfolio will be:
 [ = $20 + $100] = $120
At year-end, the value of the portfolio, V, can have only one of three values, and the sum of all
the probabilities must sum to 100% (property of discrete random variable).
This allows us to calculate the final probability: [ = $120] = 100% − 20% − 45% = 35%
 The mean of V is then $140: = 0.20 × $40 + 0.35 × $120 + 0.45 × $200 = $140
 The median of the distribution is $120; the 50th percentile falls at $120.
 The mode of the distribution is $200; this is the most likely single outcome because its
probability is highest at 45%.
32
Calculate and interpret the covariance and correlation between two

random variables.
Covariance
Covariance is analogous to variance, but instead of looking at the deviation from the mean of
one variable, we are look at the relationship between the deviations of two variables. Put
another way, the covariance is the average cross-product: if the means of both variables, X and
Y, are known we can use the following formula for covariance, which might be called a
population covariance.
1
= ( − )( − )
If the true means are unknown, we calculate covariance using the same means; however, if we
know only the sample means we calculate a sample covariance. The sample covariance divides
the sum of cross-products by (n-1) rather than n.
1
= ( − ̂ )( − ̂ )
−1
What is the covariance of a variable with itself, i.e., what is covariance (X,X)? It is the
variance of X. It will be helpful to keep in mind that a variable’s covariance with itself is its
variance. For example, knowing this, we realize that the diagonal in a covariance matrix is
populated with variances, because variance is a special case of covariance!
Properties of covariance
If X & Y are independent:

 cov(X,Y): = 0
 cov(a + bX, c +dY): , =
 cov(X,X): =
If X & Y are not independent:

 = + +2
 = + −2
33
Correlation
Correlation is a key measure in the FRM and is typically denoted by Greek rho (ρ). Correlation
is the covariance between two variables divided by the product of their respective standard
deviations (a.k.a., volatilities).
= where = ( , ) = [( − ) − ]
The correlation coefficient translates covariance into a unitless metric that runs from -1.0
to +1.0.
We may also refer to the computationally similar sample correlation, which is sample covariance
divided by the product of sample standard deviations:
Key properties of correlation:
 Correlation has the same sign (+/-) as covariance.

 Correlation measures the linear relationship between two variables.
 It runs between -1.0 and +1.0, inclusive.
 Correlation is a unit-less metric.
 Zero covariance → zero correlation (But the converse not necessarily true. For example,
Y=X2 is nonlinear)
Correlation (or dependence) is not causation. For example, in a basket credit default swap,
the correlation (dependence) between the obligors is a key input. But we do not assume there is
mutual causation (e.g., that one default causes another). Rather, more likely, different obligors
are similarly sensitive to economic conditions. So, economic deterioration may be the external
cause that all obligors have in common. Consequently, their defaults exhibit dependence. But
the causation is not internal.
Further, note that (linear) correlation is a special case of dependence. Dependence is

more general and includes non-linear relationships.
Example: Below we illustrate the application of the covariance and the correlation coefficient.
Given in the table below are the growth projections of two products, Gold (G) and Bitcoin (B)
and their joint probabilities. For both products, we have three scenarios (bad, medium, and
good). Probabilities are assigned to each growth scenario:
 20% chance of gold growing at 3.0% and bitcoin growing at 5.0%
34
As seen in the table above, for solving this, the calculation of expected values is required: E(G),
E(B), E(GB), E(G2) and E(B2). Make sure you can replicate the following two steps:
 The covariance is equal to E(GB) – E(G)E(B) = 62.40 – (8.40 × 7.00) =3.60.
 The correlation coefficient is equal to the Cov(G,B) divided by the product of the
standard deviations: 3.60/(2.94 × 1.26) = 0.968 = 97% .
Another example: Here is another example with the same products, Gold (G) and Bitcoin (B)
but with different scenarios (lower and higher growth) and their joint probabilities. The joint
probabilities are: 30% chance of gold growing at 4.0% and bitcoin growing at 3.0%; 15% chance
of gold growing at 4.0% and bitcoin growing at 5.0 %; 20% chance of gold growing at 9.0% and
bitcoin growing at 3.0 %; and 35% chance of gold growing at 9.0% and bitcoin growing at 5.0 %:
The table below shows the calculation of covariance (calculated in two different ways) and
correlation. Note that variance does not divide sum by (n-1) because these are not samples.
 Covariance method #1 (M1) is [( − ) − ]. For example, first value is: 0.83 =
(4.0 - 6.75) * (3.0 - 4.00) * 30.0%. Likewise, all these values are summed to give
covariance of 0.750.
 Covariance method #2 (M2) is equivalent to [ ] − [ ] [ ]: 27.750 – (6.750*4.000)
= 0.750
 Correlation (30%) is the Cov(G,B) divided by the product of the standard deviations.
35
Calculate the mean and variance of sums of variables.

Mean
( + + )= + +
Variance
Regarding the sum of correlated variables, the variance of correlated variables is given by the
following equations. Please know this substitution: = .
= + + or = + +
With reference to the difference between correlated variables, the variance is given by:
= + − or = + −
Variance with constants (a) and (b)
 Variance of sum includes covariance (X, Y):

( + )= +2 +
 If X and Y are independent, the covariance term drops out and the variance simply adds:
( + )= +
36
Application: Portfolio Variance and Hedging
If we need to determine the variance of a portfolio of securities, all we need to know is the
variance of the underlying securities and their respective correlations. If the portfolio is a linear
combination of XA and XB, such that = + , then,
= +2 +
Suppose we have $1 of Security A, and we wish to hedge it with $h of Security B (if h is

positive, we are buying the security; if h is negative, we are shorting the security). In other
words, h is the hedge ratio. If P is our hedged portfolio, such that P= + ℎ , then,
= + 2ℎ +ℎ
The minimum variance hedge ratio is ratio which would achieve the portfolio with the least
variance. It is found by taking the derivative of the equation for the portfolio variance with
respect to h, and setting it equal to zero such that:
ℎ∗ = −
Example: A portfolio manager owns (has a long position in) $100 of Security A which has a
volatility of 12.0%. She wants to hedge with Security B which has a volatility of 30.0%. The
correlation between the securities is 0.60. What position in Security B utilizes the minimum
variance hedge ratio to create a portfolio with the minimum dollar ($) standard deviation?
 The minimum hedge ratio, in this case is -0.24, such that the trade is short 0.24 for each
$1.0 in Security A. Therefore, -0.24 * $100 in Security A = short $24.00 in Security B.
 So, asset B is short $24.00 such that the net portfolio size happens to end up at net long
$76.00 (=100 – 24).
 The portfolio's standard deviation is the minimum standard deviation which is:
100 × 12% + (−24) × 30% + 2 ∗ 0.60 × $100 × (−$24) × 12% × 30% = $9.6
37
The below graph illustrates the smallest variance that can be achieved with this minimum
variance hedge ratio calculated in the example.
Describe the four central moments of a statistical variable or

distribution: mean, variance, skewness and kurtosis.
Moments of a distribution
The kth moment of X is defined as:
= [ ]
We refer to as the kth moment of X. But this is a raw moment and we are generally
more concerned with central moments; central moments are “moments about the mean”
or sometimes “moments around the mean.”
The kth moment about the mean (), or kth central moment, is given by:
∑ ( − )
moment =
or equivalently,
= [( − ) ]
In this way, the difference of each data point from the mean is raised to a power (k=1, k=2, k=3,
and k=4). There are the four moments of the distribution:
 If k=1, this refers to the first moment about zero: the mean.
 If k=2, this refers to the second moment about the mean: the variance.
38
With respect to skewness and kurtosis, it is common to standardize the moment, such that:
 If k=3, then the third moment divided by the cube of the standard deviation returns the
skewness
 If k=4, then the fourth moment divided by the square of the variance (standard
deviation^4) about the mean returns the kurtosis; a.k.a., tail density, peakedness.
Interpret the skewness and kurtosis of a statistical distribution, and

interpret the concepts of coskewness and cokurtosis.
Skewness (asymmetry)
Skewness refers to whether a distribution is symmetrical about the mean. An asymmetrical

distribution is skewed, either positively (to the right) or negatively (to the left). The measure of
“relative skewness” is given by the equation below, where zero indicates symmetry (no
skewness):
[( − ) ]
Skewness = =
Note: If you aren't using a probability distribution (but rather a sample), you need to take the
average of the sum of ( − ) to get the [( − ) ]. But that's a "raw" or un-
standardized 3rd central moment. To standardize it, it is then divided by the cube of standard
deviation.
So, skewness is not actually the (raw) third moment, or even the third moment about the mean.
Skewness is the standardized central third moment: the third moment about the mean
divided by the cube of the standard deviation. For example, the gamma distribution has positive
skew (skew > 0) as seen in the figure below.
Gamma Distribution
Positive (Right) Skew
1.20
1.00
alpha=1, beta=1
0.80
0.60 alpha=2, beta=.5
0.40
-
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
39
Skewness is a measure of asymmetry:

 If a distribution is symmetrical, mean = median = mode.
 If a distribution has positive skew, then mean > median > mode.
 If a distribution has negative skew, then mean < median < mode.
Kurtosis
Kurtosis measures the degree of “peakedness” of the distribution, and consequently of

“heaviness of the tails.” A value of three (3) indicates normal peakedness. The normal
distribution has kurtosis of 3, such that “excess kurtosis” equals (kurtosis – 3).
[( − )
Kurtosis = =
Please note that kurtosis is not actually the (raw) fourth moment, or even the fourth moment
about the mean. Kurtosis is the standardized central fourth moment: the fourth moment
about the mean divided by square of the variance (or the fourth power of standard deviation).
A normal distribution has relative skewness of zero and kurtosis of three (or the same
idea put another way: excess kurtosis of zero).
 Relative skewness > 0 indicates positive skewness (a longer right tail) and relative
skewness < 0 indicates negative skewness (a longer left tail).
 Kurtosis greater than three (>3), which is the same thing as saying “excess kurtosis > 0,”
indicates high peaks and fat tails (leptokurtic). Kurtosis less than three (<3), which is the
same thing as saying “excess kurtosis < 0,” indicates lower peaks.
Kurtosis is thus a measure of tail weight (heavy, normal, or light-tailed) and
“peakedness”: kurtosis > 3.0 (or excess kurtosis > 0) implies heavy-tails. Financial asset
returns are typically considered leptokurtic (i.e., heavy or fat- tailed). For example, the logistic
distribution exhibits leptokurtosis (heavy-tails; kurtosis > 3.0):
Logistic Distribution
Heavy-tails (excess kurtosis > 0)
0.50
0.40 alpha=0, beta=1
0.10
N(0,1)
-
1 4 7 10 13 16 19 22 25 28 31 34 37 40
40
Example - Miller EOC Problem #6: Calculate the skewness and kurtosis of each of the
following two series (X and Y), given in the table below.
( ) −37128
 The skewness of Y can be calculated as := 3 = −0.63
39
[( ) 4133697
 The kurtosis of Y is := 4 = 1.787
39
X central moments Y central moments

X Y 2 3 4 2 3 4
-51.0 -61.0 2,601.0 (132,651.0) 6,765,201.0 3,721.0 (226,981.0) 13,845,841.0
-21.0 -7.0 441.0 (9,261.0) 194,481.0 49.0 (343.0) 2,401.0
21.0 33.0 441.0 9,261.0 194,481.0 1,089.0 35,937.0 1,185,921.0
51.0 35.0 2,601.0 132,651.0 6,765,201.0 1,225.0 42,875.0 1,500,625.0
Average: 0.00 0.00 1,521.0 - 3,479,841.0 1,521.0 (37,128.0) 4,133,697.0
Std Deviations 39.00 39.00
Std 3rd central moments Skew(X) 0.00 Skew(Y) -0.63
Std 4th central moments Kurt(X) 1.504 Kurt(Y) 1.787
Coskewness and Cokurtosis
Just as we generalized the concept of mean and variance to moments and central moments, we
can generalize the concept of covariance to cross central moments. The third and fourth
standardized cross central moments are referred to as coskewness and cokurtosis,
respectively.
For two random variables, there are two non-trivial coskewness statistics.
= [( − ) ( − )]/ 2
2
= [( − )( − ) ]/
In general, for (n) random variables, the number of non-trivial cross-central moments of
order (m) is given by:
( + − 1)!
K= −
! ( − 1)!
In this case, nontrivial means that we have excluded the cross moments that involve only one
variable (i.e., standard skewness and kurtosis). To include the nontrivial moments, we would
simply add n to this result.
In the case of m=3, we have coskewness which is given by:
( + 2)( + 1)
= −
6
41
Example: Continuing with our earlier example of Gold and Bitcoin and given their probability
matrix, we find the coskewness and cokurtosis.
Assumptions: Probability Matrix
X
4.0 9.0
3.0 30% 20% 50%
Y
5.0 15% 35% 50%
45% 55% 100%
X central moments Y central moments
X Y Prob[X,Y] X Y 2 3 4 2 3 4
4.0 3.0 30.0% 1.20 0.90 2.27 -6.24 17.16 0.30 -0.30 0.30
4.0 5.0 15.0% 0.60 0.75 1.13 -3.12 8.58 0.15 0.15 0.15
9.0 3.0 20.0% 1.80 0.60 1.01 2.28 5.13 0.20 -0.20 0.20
9.0 5.0 35.0% 3.15 1.75 1.77 3.99 8.97 0.35 0.35 0.35
Avg: 6.50 4.00
Sum: 6.75 4.00 6.188 -3.094 39.832 1.000 0.000 1.000
Standard Deviations Std Dev: 2.487 1.000
Standardized 3rd central moments Skew(X) -0.20 Skew(Y) 0.00
Standardized 4th central moments Kurt(X) 1.040 Kurt(Y) 1.000
Co-Skew Co-Kurtosis
S(XXY) S(XYY) K(XXXY) K(XXYY) K(XYYY)
(2.27) (0.83) 6.24 2.27 0.83
1.13 (0.41) (3.12) 1.13 (0.41)
(1.01) 0.45 (2.28) 1.01 (0.45)
1.77 0.79 3.99 1.77 0.79
Cross-central Moments (CCM; sum) -0.3750 0.0000 4.8281 6.1875 0.7500
Standardized CCM; i.e., co-skew, co-kurt -0.0606 0.0000 0.3137 1.0000 0.3015
 From the table, for eg. when calculating coskewness , initially to find the cross
central moment, the first value is found as: -2.27 = (4.0 - 6.75)2 * (3.0 - 4.00) * 30.0%.
Likewise, all such four values are summed to give the cross central moment of -0.3750.
 Since = [( − ) ( − )]/ 2 , we divide by the cross central moment by
square of the standard deviation of X times the standard deviation of Y to get the
standardized cross central moment of -0.0606 = -0.3750 / (2.4872 * 1.000).
 Similarly, coskewness is found to be 0.0000 and cokurtosis , and
are 0.3137, 1.0000 and 0.3015 respectively.
42
Describe and interpret the best linear unbiased estimator (BLUE).

An estimator is a function of a sample of data drawn randomly from a population.
 An estimate is the numerical value of the estimator when it is actually computed using
data from a specific sample. An estimator is a random variable because of randomness
in selecting the sample, while an estimate is a nonrandom number.
For example, the sample mean is the best linear unbiased estimator (BLUE) as this provides
an unbiased estimate of the true mean.
1
̂=
In the Stock & Watson example, the average (mean) wage among 200 people is $22.64 as
shown below in the table:
Sample Mean $22.64

Sample Standard Deviation $18.14
Sample size (n) 200
Standard Error 1.28
H0: Population Mean = $20.00
Test t statistic 2.06
p value 4.09%
Please note:
 The average wage of n = 200 observations is $22.64
 The standard deviation of this sample is $18.14
 The standard error of the sample mean is $1.28 because $18.14/SQRT(200) = $1.28
 The degrees of freedom (d.f.) in this case are 199 = 200 – 1
In the above example, the sample mean is an estimator of the unknown, true population mean
(in this case, the same mean estimator gives an estimate of $22.64).
“An estimator is a recipe for obtaining an estimate of a population parameter. A simple analogy
explains the core idea: An estimator is like a recipe in a cook book; an estimate is like a cake
baked according to the recipe.” - Barreto & Howland, Introductory Econometrics
What makes one estimator superior to another?

 Unbiased: The mean of the sampling distribution is the population mean (mu)
 Consistent. When the sample size is large, the uncertainty about the value arising from
random variations in the sample is very small.
 Variance and efficiency. Among all unbiased estimators, the estimator that has the
smallest variance is “efficient.”
If the sample is random (i.i.d.), the sample mean is the Best Linear Unbiased Estimator
(BLUE): The sample mean is consistent, and the most efficient among all linear unbiased
estimators of the population mean.
43
Chapter Summary
The expectation of the random variable is often called the mean or arithmetic mean.
Expected value is the weighted average of possible values. In the case of a discrete random
variable, expected value is given by:
( )= + +. . . + =
In the case of a continuous random variable, expected value is given by:
( )= ( )
If we have a complete data set, then the mean is a population mean which implies that the
mean is exactly the true (and only true) mean:
1 1
= = ( + +. . . + )
A mean of a subset of the population is called the sample mean.
The variance of a discrete random variable Y is given by:
Variance(Y): = E[( − ) ]= ( − )
Variance is also expressed as:
= [( − ) ]= ( ) − [ ( )]
The unbiased estimate of the sample variance is given by:

1
= ( − )
−1
Some of the properties of variance are listed below:

=0
= + only if independent
=
=
=
= ( )− ( )
44
Standard deviation is given by:
= var(Y) = [( − ) ]= ( − )
As variance = standard deviation^2, standard deviation = Square Root [variance]
The unbiased estimate of the sample standard deviation is given by:
1
= ( − )
−1
Population covariance can be calculated as:
1
= ( − )( − )
Sample covariance is calculated as:
1
= ( − )( − )
−1
Some of the properties of covariance are listed below:
 If X & Y are independent:

( , ): =0
( + , + )= ( , )
( , )= ( ): =
 If X & Y are not independent,
= + +2
= + −2
Correlation is the covariance between two variables divided by the product of their respective
standard deviations:
= where = cov(X, Y) = E[( − )( − )]
The correlation coefficient translates covariance into a unit less metric that runs from -1.0 to
+1.0.
45
The mean of larger variables is given by ( + + )= + +
With regard to the sum of correlated variables, the variance of correlated variables is:
= + +2 , (and given that = )

= + +2
With regard to the difference between correlated variables, the variance is:
= + −2 (and given that = )

= + −2
With regard to the sum of variance with constants (a) and (b), variance of sum includes
covariance (X,Y):
( + )= +2 +
The kth moment about the mean (), or kth central moment, is given by:
=
∑
=1( − )
or = [( − ) ]
 If k=1, this refers to the first moment about zero: the mean.
 If k=2, this refers to the second moment about the mean: the variance.
 If k=3, then the third moment divided by the cube of the standard deviation returns the
skewness
 If k=4, then the fourth moment divided by the square of the variance (or fourth power of
standard deviation) about the mean returns the kurtosis; a.k.a., tail density,
peakedness.
Skewness refers to whether a distribution is symmetrical. An asymmetrical distribution is

skewed, either positively (to the right) or negatively (to the left) skewed.
[( − ) ]
= =
 If a distribution is symmetrical, mean = median = mode.

 If a distribution has positive skew, the mean > median > mode.
 If a distribution has negative skew, the mean < median < mode.
Kurtosis measures the degree of “peakedness” of the distribution, and consequently of

“heaviness of the tails.”
( )
= =
A normal distribution has relative skewness of zero and kurtosis of three (or excess
kurtosis of zero). Relative skewness > 0 indicates positive skewness. Kurtosis greater than
three (>3) indicates high peaks and fat tails (leptokurtic).
46
Coskewness and cokurtosis: The third and fourth standardized cross central moments are
referred to as coskewness and cokurtosis, respectively. In general, for (n) random variables, the
number of non-trivial cross-central moments of order (m) is given by:
( + − 1)!
=
! ( − 1)!
If the sample is random (i.i.d.), the sample mean is the Best Linear Unbiased Estimator
(BLUE). The sample mean is consistent, and the most efficient among all linear unbiased
estimators of the population mean.
47

303.1. Assume a continuous probability density function (pdf) is given by f(x) =a*x such that 0 ≤
x ≤ 12, where (a) is a constant (we can retrieve this constant, knowing this is a probability
density function):
( )= . . 0≤ ≤ 12
What is the mean of (x)?

a) 5.5
b) 6.0
c) 8.0
d) 9.3
304.1. Two assets, X and Y, produce only three joint outcomes: Prob[X = -3.0%, Y = -2.0%] =
30%, Prob[X = +1.0%, Y = +2.0%] = 50%, and Prob[X = +5.0%, Y = +3.0%] = 20%:
What is the correlation between X & Y? (Bonus question: if we removed the probabilities and
instead simply treated the three sets of returns as a small, [tiny actually!] historical sample,
would the sample correlation be different?)
a) 0.6330
b) 0.7044
c) 0.8175
d) 0.9286
305.1. A two-asset portfolio contains a long position in commodity (T) with volatility of 10.0%
and a long position in stock (S) with volatility of 30.0%. The assets are uncorrelated: rho(T,S) =
zero (0). What weight (0 to 100%) of the portfolio should be allocated to the commodity if the
goal is a minimum variance portfolio (in percentage terms, as no dollars are introduced)?
a) 62.5%
b) 75.0%
c) 83.3%
d) 90.0%
48
306.1. In credit risk (Part 2) of the FRM, a single-factor credit risk model is introduced. This
model gives a firm's asset return, r(i), by the following sum of two components:
= + 1− , ~ (0,1)
In this model, a(i) is a constant, while (F) and epsilon (e) are random variables. Specifically, (F)
and (e) are standard normal deviates with, by definition, mean of zero and variance of one ("unit
variance"). If the value of a(i) is 0.750 and the covariance[F,e(i)] is 0.30, which is nearest to
variance of the asset return, variance[r(i)]?
a) 0.15
b) 1.30
c) 1.47
d) 1.85
307.1. A bond has a default probability of 5.0%. Which is nearest, respectively, to the skew (S)
and kurtosis (K) of the distribution?
a) S = 0.0, K = 2.8
b) S = 0.8, K = -7.5
c) S = 4.1, K = 18.1
d) S = 18.9, K = 4.2
49
Answers:
303.1. C. 8.0
If this is a valid probability (pdf) then a*(1/2)*x^2 evaluated over [0,12] must equal one:
a*(1/2)*12^2 = 1.0, and a = 1/72.
Therefore, the pdf function is given by f(x) = x/72 over the domain of [0,12].
The mean = Integral of x*f(x) = x*(1/72)*x = Integral of x^2/72 over [0,12] = x^3/216 over [0,12] =
12^3/216 = 8.0
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-303-mean-and-

variance-of-continuous-probability-density-functions-pdf.6783
304.1. D. 0.9286
As Covariance(X,Y)= 0.0520%, StdDev(X) = 2.8%, StdDev(Y) = 2.0%, correlation =

0.0520%/(2.8%*2.0%) = 0.9286. See snapshot below, some key points:
 Variances: first row of Variance(X) = (-3.0% - 0.60%)^2*30%, and variance(X) is sum of
the three probability-weighted squared deviations: 0.0784% =
0.0389%+0.0008%+0.0387%
 Covariance Method 1: first row of Covariance (Method 1) = (-3.0%-0.60%)*(-2.0%-
1.0%)*30%; then Covariance (M 1) is sum of three rows.
 Covariance Method 2: first row of Covariance (Method 2) = -3.0%*-2.0%*30% =
0.0180%, second row = 1.0%*2.0%*50%=0.010%. Covariance (M2) = 0.0580% -
0.60%*1.0% = 0.0580%. This employs the highly useful Cov(x,y) = E[xy] - E[x]E[y],
which includes the special case Cov(x,x) = Variance(x) = E[x^2] - (E[x])^2
Spreadsheet at https://www.dropbox.com/s/c841f0yftlpl4wj/T2.304.1_covariance.xlsx
50
If we removed probabilities and treated as a (very) small historical sample, the sample is
different at 0.945. There are two reasons:
1. The historical sample (by default) treats the observations as equally-weighted; and,
2. A sample correlation divides the sample covariance by sample standard deviations,
where (n-1) is used in the denominator instead of (n);
In this way the sample covariance is larger, ceteris paribus, than a population-type variance,
and so are sample standard deviations.
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-304-covariance-

miller.6791
305.1. D. 90.0%
If w = weight in the commodity,the two-asset portfolio variance, VARP = w^2*10%^2 + (1-

w)^2*30%^2 + 2*0*w*(1-w)*10%*30% = w^2*0.01 + (1-w)^2*0.09.
We want the value of (w) that minimizes the portfolio variance, so we take the first derivative
with respect to w:
dPVARP/dw = d[w^2*0.01 + 0.09*(1 - 2*w + w^2)]/dw = d[w^2*0.01 + 0.09 - 0.18*w +
0.09*w^2)]/dw
= 0.02*w - 0.18 + 0.18*w = 0.20*w - 0.18. To find the local minimum, we set the first derivative
equal to zero, and solve for w: let 0 = 0.20*w - 0.18, such that w = 0.18/0.20 = 90.0%.
A portfolio with 90% weight in the commodity and 10% in the stock will have the lowest variance
at 0.0090, which is equal to standard deviation of SQRT(0.0090) = 9.486%;
i.e., lower than either of the asset volatilities. Of course, this optimal mix is variant to changes in
the correlation.
The first derivative can be taken of the generic two-asset portfolio variance such that its
minimum variance is given by:
Variance(minimum) = (sigma2^2 - rho*sigma1*sigma2) / (sigma1^2 + sigma2^2 -

2*rho*sigma1*sigma2).
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-305-minimum-

variance-hedge-miller.6800
51
306.1. B. 1.30
var(x+y) = var(x) + var(y) + 2*cov(x,y). In this case, x = a*F and y = sqrt(1-a^2)*e, such that:
var[a*F + sqrt(1-a^2)*e] = var(a*F) + var[sqrt(1-a^2)*e] + 2*cov[a*F, sqrt(1-a^2)*e]
= a^2*var(F) + (1-a^2)*var(e) + 2*cov[a*F, sqrt(1-a^2)*e], and since var(F) and var(e) = 1.0, this
is equal to:
= a^2*1.0 + (1-a^2)*1.0 + 2*cov[a*F, sqrt(1-a^2)*e]
= a^2 + 1-a^2 + 2*cov[a*F, sqrt(1-a^2)*e]
= 1.0 + 2*cov[a*F, sqrt(1-a^2)*e]
= 1.0 + 2*a*sqrt(1-a^2)*cov[F,e(i)]; and per cov(a*x,b*y) = a*b*cov(x,y):
= 1.0 + 2*0.75*SQRT(1-0.75^2)*0.30 = 1.2976
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-306-calculate-the-

mean-and-variance-of-sums-of-variables.6810
307.1. C. S = 4.1, K = 18.1

Let X = 0 with prob 95.0% and X = 1 with prob 5.0%, such that mean (X) = 0.050:
3rd central moment = (1-0.05)^3*5% + (0-0.05)^3*95% = 0.04275, such that skew =
0.04275/(5%*95%)^(3/2) = 4.1295 (or -4.1295)
4th central moment = (1-0.05)^4*5% + (0-0.05)^4*95% = 0.04073, such that kurtosis =
0.04073/(5%*95%)^2= 18.053
i.e., excess kurtosis = 18.053 - 3.0 = 15.053
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-307-skew-and-

kurtosis-miller.6825/
52

Question 1:
Compute the mean and the median of the following series of returns:
Answer:
Mean =6.43%; median =5%.
Question 2:
Compute the sample mean and the standard deviation of the following returns:
Answer:
Mean =3%; standard deviation =6.84%.
Question 3:
Prove that Equation 3.2 is an unbiased estimator of the mean. That is, show that [ ̂ ] = .
Equation 3.2:
1 1
̂= = ( + +. . . + )
Answer:
1 1
̂= = ( + +. . . + + )
1 1 1
[ ̂] = [ ]= = ∙ ∙ =
53
Question 4:
What is the standard deviation of the estimator in Equation 3.2? Assume the various data points
are i.i.d.
Equation 3.2:
1 1
̂= = ( + +. . . + )
Answer:
Using the results of question 3, we first calculate the variance of the estimator of the mean:
1 1
[( ̂ − ) ] = − = ( − )
1
[( ̂ − ) ] = ( − ) + ( − ) −
1 1
[( ̂ − ) ] = ( − ) + ( − ) −
1 1
[( ̂ − ) ] = [( − ) ] + [( − ) − ]
1 1
[( ̂ − ) ] = + 0
1
[( ̂ − ) ] = =
where σ is the standard deviation of r. In the second to last line, we rely on the fact that,
because the data points are i.i.d., the covariance between different data points is zero. We
obtain the final answer by taking the square root of the variance of the estimator:
= =
√
54
Question 5
Calculate the population covariance and correlation of the following series:
Answer:
Covariance =0.0487; correlation =82.40%.
Question 6
Calculate the population mean, standard deviation, and skewness of each of the following two
series:
Answer:
Series #1: Mean =0, standard deviation =39, skewness =0.

Series #2: Mean =0, standard deviation =39, skewness =–0.63.
Question 7
Calculate the population mean, standard deviation, and kurtosis for each of the following two
series:
Answer:
Series #1: Mean =0, standard deviation =17, kurtosis =1.69.

Series #2: Mean =0, standard deviation =17, kurtosis =1.
55
Question 8
Given the probability density function for a random variable X,
( )= for 0 ≤ x ≤ 6
18
find the variance of X.
Answer:
The mean, µ, is
6 0 6
= = = − = =4
18 3 ∙ 18 3 ∙ 18 3 ∙ 18 3
The variance, σ2, is then:
1
= ( − 4) = ( −8 + 16 )
18 18
1 1 8 6 1 8
= − +8 = 6 − 6+8
18 4 3 18 4 3
= 2(9 − 16 + 8) = 2
56
Question 9
Prove that Equation 3.19, reproduced here, is an unbiased estimator of variance.
1
= ( − ̂ )
−1
[ ]=
Answer:
We start by expanding the mean:
1 1 −1 1
= ( − ̂ ) = −
−1 −1
By carefully rearranging terms, we are left with:
1 1
= −
( − 1)
Assuming that all the different values of Xare uncorrelated with each other, we can use the
following two relationships:
= +
= =0∀ ≠
Then:
1 1
[ ]= ( + )− ( − 1) =
( − 1)
57
Question 10
Given two random variables, XA and XB, with corresponding means µA and µB and standard
deviations σA and σB, prove that the variance of XA plus XB is:
Var[ + ]= + +2
where ρAB is the correlation between XA and XB.
Answer:
First we note that the expected value of XA plus XB is just the sum of the means:
[ + ]= [ ]+ [ ]= +
Substituting into our equation for variance, and rearranging, we get:
[ + ] = [( + − [ + ]) ]
= ( − )+( − )
Expanding the squared term and solving:
[ + ] = [( − ) +( − ) + 2( − )( − )]
[ + ] = [( − ) ] + [( − ) ] + 2 [( − )( − )]
[ + ] + +2 [ , ]
Using our definition of covariance, we arrive at our final answer:
[ + ] + +2
Question 11
A $100 notional, zero coupon bond has one year to expiry. The probability of default is 10%. In
the event of default, assume that the recovery rate is 40%. The continuously compounded
discount rate is 5%. What is the present value of this bond?
Answer:
If the bond does not default, you will receive $100. If the bond does default, you will receive
40% ×$100 =$40. The future value, the expected value of the bond at the end of the year, is
then $94:
[ ] = .90 ∙ $100 + 0.10 ∙ $40 = $94
The present value of the bond is approximately $89.42:

.
= $94 = $89.42
58
Miller, Chapter 4: Distributions

Describe the key properties of the … and identify common occurrences of each
distribution.
Describe the central limit theorem and the implications it has when combining i.i.d.
random variables.
Describe the properties of independent and identically distributed (i.i.d.) random

variables.
Describe a mixture distribution and explain the creation and characteristics of mixture
distributions.
Describe the key properties of the uniform distribution, Bernoulli

distribution, Binomial distribution, Poisson distribution, normal
distribution, lognormal distribution, Chi-squared distribution,
Student’s t and F-distributions, and identify common occurrences of
each distribution.
Uniform distribution
If the random variable, X, is discrete, then the uniform distribution is given by the following
probability mass function (pmf):
1
( )=
This is an extremely simple distribution. Common examples of discrete uniform distributions are:
 A coin, where n=2, such that the probability: P[heads] = 1/2 and P[tails] = 1/2; or
 A six-sided die, where for example, probability of rolling a one is: P[rolling a one] = 1/6
If the random variable, X, is continuous, the uniform distribution is given by the following
probability density function (pdf):
1
for ≤ x ≤
( )= −
0 for x < or x >
Using this pdf, the mean, is calculated as the average of the start and end values of the
distribution. Similarly, the variance, is calculated as shown below.
1
= ( + )
2
1
= ( − )
12
59
The uniform distribution is characterized by the following cumulative distribution function (CDF):
− 1
[ ≤ ]=
2− 1
Bernoulli distribution
A random variable X is called Bernoulli distributed with parameter (p) if it has only two
possible outcomes, often encoded as 1 (“success” or “survival”) or 0 (“failure” or “default”), and
if the probability for realizing “1” equals p and the probability for “0” equals 1 – p. The classic
example for a Bernoulli-distributed random variable is the default event of a company.
1
A Bernoulli variable is discrete and has two possible outcomes: =
0
Binomial distribution
A binomial distributed random variable is the sum of (n) independent and identically distributed
(i.i.d.) Bernoulli-distributed random variables. The probability of observing (k) successes is:
!
( = )= (1 − ) where = − ! !
The mean of this random variable is and the variance of a binomial distribution is (1 − ).
The below exhibit shows binomial distribution with p = 0.10, for n = 10, 50, and 100.
60
Poisson distribution
A Poisson-distributed random variable is usually used to describe the random number of

events occurring over a certain time interval, for example, the number of bond defaults in a
portfolio or the number of crashes in equity markets. The Poisson distribution depends upon
only one parameter, lambda λ, and can be interpreted as an approximation to the binomial
distribution. The lambda parameter (λ) indicates the rate of occurrence of the random
events; i.e., it tells us how many events occur on average per unit of time and n is the number
of events that occur in an interval.
In the Poisson distribution, the random number of events that occur during an interval of time,
(e.g., losses/ year, failures/ day) is given by:
( = )=
!
If the rate at which events occur over time is constant, and the probability of any one event
occurring is independent of all other events, then the events follow a Poisson process, where t
is the amount of time elapsed (i.e, the expected number of events before time t is equal to λt):
( = )=
!
In Poisson, lambda is both the expected value (the mean) and the variance!
The exhibit below represents a Poisson distribution for λ and n =2, 4 and 10.
61
Normal distribution
The normal or Gaussian distribution is often referred to as the bell curve because of the shape
of its probability density function. Characteristics of the normal distribution include:
 The middle of the distribution, mu (µ), is the mean (and median). This first moment is
also called the “location”.
 Standard deviation and variance are measures of dispersion (a.k.a., shape). Variance is
the second-moment; typically, variance is denoted by sigma-squared such that standard
deviation is sigma.
 The distribution is symmetric around µ. In other words, the normal has skewness = 0
 The normal has kurtosis = 3 or “excess kurtosis” = 0
Properties of normal distribution:
 Location-scale invariance: Imagine random variable X, which is normally distributed

with the parameters µ and σ. Now consider random variable Y, which is a linear function
of X, such that: Y = aX + b. In general, the distribution of Y might substantially differ from
the distribution of X, but in the case where X is normally distributed, the random variable
Y is again normally distributed with parameters mean (= a*µ + b) and variance (=a^2* σ).
Specifically, we do not leave the class of normal distributions if we multiply the
random variable by a factor or shift the random variable.
 Summation stability: If you take the sum of several independent random variables,
which are all normally distributed with mean (µi) and standard deviation (σi), then the
sum will be normally distributed again.
 The normal distribution possesses a domain of attraction. The central limit theorem
(CLT) states that—under certain technical conditions—the distribution of a large sum of
random variables behaves necessarily like a normal distribution. The normal distribution
is not the only class of probability distributions having a domain of attraction. Actually,
three classes of distributions have this property: they are called stable distributions.
Below is an exhibit of a normal distribution for µ =10 and at various levels of σ (1, 2 and 3)
62
For a random variable X, the probability density function for the normal distribution is:
2
1 −1 −
( )= 2
√2
Conventionally, this is written as X is normally distributed with a mean of µ and variance of σ2:
~ ( , )
The normal distribution is commonplace for at least three (or four) reasons:
 The central limit theorem (CLT) says that sampling distribution of sample means tends
to be normal (i.e., converges toward a normally shaped distributed) regardless of the
shape of the underlying distribution; this explains much of the “popularity” of the normal
distribution.
 The normal is economical (elegant) because it only requires two parameters (mean
and variance). The standard normal is even more economical: it requires no
parameters.
 The normal is tractable: it is easy to manipulate (especially in regard to closed-form
equations like the Black-Scholes)
 Parsimony: It requires (or is fully described by) only two parameters: mean and
variance
It is common to retrieve an historical dataset such as a series of monthly returns and compute
the mean and standard deviation of the series. In some cases, the analyst will stop at that point,
having determined the first and second moments of the data.
Often times, the user is implicitly “imposing normality” by assuming the data is normally
distributed. For example, the user might multiply the standard deviation of the dataset by 1.645
or 2.33 (i.e., normal distribution deviates) in order to estimate a value-at-risk. But notice what
happens in this case: without a test (or QQ-plot, for example) the analyst is merely assuming
normality because the normal distribution is conveniently summarized by only the first two
moments! Many other non-normal distributions have first (aka, location) and second moments
(aka, scale or shape).
In this way, it is not uncommon to see the normal distribution used merely for the sake of
convenience: when we only have the first two distributional moments, the normal is
implied perhaps merely because they are the only moments that have been computed.
63
Standard normal distribution
A normal distribution is fully specified by two parameters, mean and variance (or standard
deviation). We can transform a normal into a unit or standardized variable:
 Standard normal has mean = 0, and variance = 1
 No parameters required!
This unit or standardized variable is normally distributed with zero mean and variance of
one. Its standard deviation is also one (variance = 1.0 and standard deviation = 1.0). This is
written as:
Variable Z is approximately (“asymptotically”) normally distributed: Z ~ N(0,1)
Standard normal distribution: Critical Z values
Key locations on the normal distribution are noted below. In the FRM curriculum, the choice of
one-tailed 5% significance and 1% significance (i.e., 95% and 99% confidence) is
common, so please pay particular attention to the yellow highlights:
Critical Two-sided One-sided

z values Confidence Significance
1.00 ~ 68% ~ 15.87%
1.645 (~1.65) ~ 90% ~ 5.0 %
1.96 ~ 95% ~ 2.5%
2.327(~2.33) ~ 98% ~ 1.0 %
2.58 ~ 99% ~ 0.5%
Memorize the two common critical values: 1.65 and 2.33. These correspond to
confidence levels, respectively, of 95% and 99% for a one-tailed test. For VAR, the one-
tailed test is relevant because we are concerned only about losses (left-tail) not gains (right-tail).
Multivariate normal distributions
Normal can be generalized to a joint distribution

of normal; e.g., bivariate normal distribution. Properties include:
 If X and Y are bivariate normal, then aX + bY is normal;
any linear combination is normal.
 If a set of variables has a multivariate normal distribution,
the marginal distribution of each is normal
 If variables with a multivariate normal distribution have covariances that equal zero, then
the variables are independent.
64
Common examples of Bernoulli, binomial, normal and Poisson
In the FRM, these four distributions are quite common:

 The Bernoulli is invoked when there are only two outcomes. It is used to characterize
a default: an obligor or bond will either default or survive. Most bonds “survive” each
year, until perhaps one year they default. At any given point in time, or (for example)
during any given year, the bond will be in one of two states.
 The binomial is a series of independent and identically distributed (i.i.d.) Bernoulli
variables, such that the binomial is commonly used characterize a portfolio of credits.
 The normal distribution is the most common:
o Typically, the central limit theorem (CLT) will justify the significance test of the
sample average in a large sample - for example, to test the sample average
asset return or excess return.
o In many cases, due to convenience, the normal distribution is employed to model
equity returns for short horizons; typically this is an assumption made with the
understanding that it may not be realistic.
 The Poisson distribution has two very common purposes:
o Poisson is often used, as a generic stochastic process, to model the time of
default in some credit risk models.
o As a discrete distribution, the Poisson is arguably the most common distribution
employed for operational loss frequency (but not for loss severity, which wants a
continuous distribution).
Normal Binomial Poisson

Mean = =
Variance = =
Standard Dev. = =√
Bernoulli Binomial Normal Poisson
•Default (0/1) •Basket of •Significance •Operational

credits; test of large loss frequency
•Basket of sample
credit default average (CLT)
swaps (CDS) •Short horizon
equity returns
65
Lognormal
The lognormal is common in finance: If an asset return (r) is normally distributed, the
continuously compounded future asset price level (or ratio or prices; i.e., the wealth ratio) is
lognormal. Expressed in reverse, if a variable is lognormal, its natural log is normal. Here is an
exhibit of lognormal distribution for µ =10 and at various levels of σ (0.25, 0.5 and 1)
The lognormal distribution is extremely common in finance because it is often the distribution
assumed for asset prices (e.g., stock prices). Specifically, it is common to assume that log
(i.e., continuously compounded) asset returns are normally distributed such that, by
definition, asset prices have a lognormal distribution.
The density function of the lognormal distribution is given by:

2
1 −1 ln −
( )= 2
√2
Miller: “If a variable has a lognormal distribution, then the log of that variable has a normal
distribution. So, if log returns are assumed to be normally distributed, then one plus the
standard return will be lognormally distributed.
Unlike the normal distribution, which ranges from negative infinity to positive infinity, the
lognormal distribution is undefined, or zero, for negative values. Given an asset with a standard
return, R, if we model (1 +R) using the lognormal distribution, then R will have a minimum value
of –100%. This feature, which we associate with limited liability, is common to most financial
assets. Using the lognormal distribution provides an easy way to ensure that we avoid returns
less than –100%.
It is convenient to be able to describe the returns of a financial instrument as being lognormally

distributed, rather than having to say the log returns of that instrument are normally distributed.
When it comes to modeling, though, even though they are equivalent, it is often easier to work
with log returns and normal distributions than with standard returns and lognormal distributions.”
66
Chi-squared distribution
Chi-squared distribution is the sum of the squares of k independent standard normal random
variables. The variable k is referred to as the degrees of freedom. The below exhibit shows the
probability density functions for some chi-squared distributions with different values for k(1, 2
and 3).
Properties of the chi-squared distribution include:

 Nonnegative (>0), since it is a sum of squared values.
 Skewed right, but as d.f. increases it approaches normal.
 Expected value (mean) = k and variance = 2k, where k = degrees of freedom.
 The sum of two independent chi-square variables is also a chi-squared variable.
Using a chi-square distribution, we can observe a sample variance and compare to hypothetical
population variance. This variable has a chi-square distribution with (n-1) d.f.
Example (Google’s stock return variance): Google’s sample variance over 30 days is
0.0263%. We can test the hypothesis that the population variance (Google’s “true” variance) is
0.02%. The chi-square variable = 38.14:
Sample variance (30 days) 0.0263%

Degrees of freedom (d.f.) 29
Population variance? 0.0200%
Chi-square variable 38.14 = 0.0263%/0.02%*29
=CHIDIST() = p value 11.93% @ 29 d.f., P[.1] = 39.0875
Area under curve (1- ) 88.07%
With 29 degrees of freedom (d.f.), 38.14 corresponds to 11.93% (i.e., to left of 0.10 on the
lookup table). Therefore, we can reject the null with only 88% confidence; i.e., we are likely to
accept the probability that the true variance is 0.02%.
67
Student t’s distribution (for large samples, approximates the normal)
The student’s t distribution (t distribution) is among the most commonly used distributions. As
the degrees of freedom (d.f.) increases, the t-distribution converges with the normal distribution.
It is similar to the normal, except that it exhibits slightly heavier tails (the lower the d.f., heavier
the tails). The below exhibit shows the basic shape of the student t’s distribution and how it
changes with k (specifically the shape of its tail).
The student’s t variable is given by:
−
=
/√
Properties of the t-distribution:
 Like the normal, it is symmetrical

 Like the standard normal, it has mean of zero (mean = 0)
 Its variance for k >2 is k/(k-2) where k = degrees of freedom. Note, as k increases, the
variance approaches 1.0 and approximates the standard normal distribution.
 Always slightly heavy-tail (kurtosis>3.0) but converges to normal. But the student’s t is
not considered a really heavy-tailed distribution.
In practice, the student’s t is the mostly commonly used distribution. When we test the
significance of regression coefficients, the central limit theorem (CLT) justifies the normal
distribution (because the coefficients are effectively sample means). But we rarely know the
population variance, such that the student’s t is the appropriate distribution. When the d.f. is
large (e.g., sample over ~30), as the student’s t approximates the normal, we can use the
normal as a proxy. In the assigned Stock & Watson, the sample sizes are large (e.g., 420
students), so they tend to use the normal.
68
Example: For example, Google’s average periodic return over a ten-day sample period was
+0.02% with sample standard deviation of 1.54%. Here are the statistics:
Sample Mean 0.02%

Sample Std Dev 1.54%
Days (n=10) 10
Confidence 95%
Significance 5%
Critical t 2.262
Lower limit -1.08%
Upper limit 1.12%
The sample mean is a random variable. If we know the population variance, we assume the
sample mean is normally distributed. But if we do not know the population variance (typically the
case!), the sample mean is a random variable following a student’s t distribution. In the
above example, we can use this to construct a confidence (random) interval:
±
√
We need the critical (lookup) t value. The critical t value is a function of:
 Degrees of freedom (d.f.); e.g., 10-1 =9 in this example

 Significance: 1-95% confidence = 5% in this example
How do we here retrieve the critical-t value of 2.262?
The critical-t is just a lookup (reference to) the student's t distribution as opposed to a computed
t-statistic, aka t-ratio. In this way, a critical t is an inverse CDF (quantile function) just like, for a
normal distribution, the "critical one-tailed value" at 1% is -2.33 and at 5% is -1.645. In this case
we want the critical t for (n-1) degrees of freedom and two-tailed 5% significance (= one tailed
2.5%). We can find 2.262 on the student's t lookup table where column = 2-tail 0.05 and d.f. = 9.
In Excel, 2.262 = T.INV.2t (5%, 9). The 95% confidence interval can be computed.
The upper limit is given by:
1.54%
+ (2.262) = 1.12%
√10
And the lower limit is given by:
1.54%
− (2.262) = −1.08%
√10
Please make sure you can take a sample
standard deviation, compute the critical t
value and construct the confidence
interval.
69
Both the normal (Z) and student’s t (t) distribution characterize the sampling distribution of the
sample mean. The difference is that the normal is used when we know the population variance;
the student’s t is used when we must rely on the sample variance. In practice, we don’t know
the population variance, so the student’s t is typically appropriate.
( − ) ( − )
= =
√ √
F-Distribution
The F distribution is also called the variance ratio distribution (it may be helpful to think of it as
the variance ratio!). The F ratio is the ratio of sample variances, with the greater sample
variance in the numerator:
Properties of F distribution:
 Nonnegative (>0) and skewed to the right

 Like the chi-square distribution, as d.f. increases, it approaches normal
 Square of a variable with t-distribution and k d.f. has an F distribution with (1, k) d.f:
X2 ~ F(1,k)
70
Example: Based on two 10-day samples, we calculated the sample variance of Google and
Yahoo. Google’s variance was 0.0237% and Yahoo’s was 0.0084%. Find the F ratio.
GOOG YHOO
=VAR() 0.0237% 0.0084%
=COUNT() 10 10
F ratio 2.82
Confidence 90%
Significance 10%
=FINV() 2.44
 The F ratio, therefore, is 2.82 (divide higher variance by lower variance; the F ratio
must be greater than, or equal to, 1.0).
 At 10% significance, with (10-1) and (10-1) degrees of freedom, the critical F value is
2.44. Because our F ratio of 2.82 is greater than (>) 2.44, we reject the null (i.e., that the
population variances are the same).
 We conclude the population variances are different.
Triangular Distribution
The triangular distribution is a distribution whose PDF is a triangle, say with a minimum of a, a
maximum of b, and a mode of c. Like the uniform distribution, it has a finite range, but being
only slightly more complex than a uniform distribution, it has more flexibility. The triangular
distribution has a unique mode, and can be symmetric, positively skewed, or negatively skewed.
Its PDF is described by the following two-part function:
⎧ 2( − )
≤ ≤
⎪( − )( − )
( )=
⎨ 2( − )
≤ ≤
⎪( − )( − )
⎩
The exhibit shows a triangular distribution where a, b, and c are 0.0, 1.0, and 0.8, respectively.
71
Triangular distribution
Three params: a, b & c (mode)
2.5
2.0
a= 0, b= 1, c= 0.8
1.5
1.0
0.5
0.0
0.0 0.3 0.5 0.8 1.0
 PDF is zero at both a and b, and the value of f(x) reaches a maximum, 2/(b − a), at c.
 The mean and variance are given by:
+ +
=
3
+ + − − −
=
18
Beta distribution
The beta distribution has two parameters: alpha (“center”) and beta (“shape”). The beta
distribution is very flexible, and popular for modeling default and recovery rates.
72
Example: The beta distribution is often used to model recovery rates. Here are two examples:
one beta distribution to model a junior class of debt (i.e., lower mean recovery) and another for
a senior class of debt (i.e., lower loss given default):
Junior Senior
alpha (center) 2.0 4.0
beta (shape) 6.0 3.3
Mean recovery 25% 55%
Beta distribution
for recovery/LGD
0.03
0.02
Senior
0.01 Junior
0.00
0%
7%
14%
21%
28%
35%
42%
49%
56%
63%
70%
77%
84%
91%
98%
Recovery (Residual Value)
Additional distributions: not in syllabus but occasionally relevant

The following distributions are not explicitly assigned in this section (Miller), but have
historically been relevant to the FRM, to various degrees
Exponential
The exponential distribution is popular in queuing theory. It is used to model the time we have
to wait until a certain event takes place.
Exponential
2.50
2.00
1.50 0.5
1.00 1
0.50 2
0.00
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
4.0
4.4
4.8
-
73
According to the text, examples include “the time until the next client enters the store, the time
until a certain company defaults or the time until some machine has a defect.” The exponential
function is non-zero:
( )= , = 1⁄ , >0
Weibull
Weibull is a generalized exponential distribution; i.e., the exponential is a special case of the
Weibull where the alpha parameter equals 1.0.
( )= 1− , >0
Weibull distribution
2.00
1.50
alpha=.5, beta=1
1.00
alpha=2, beta=1
-
0.0
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
4.0
4.4
4.8
The main difference between the exponential distribution and the Weibull is that, under the
Weibull, the default intensity depends upon the point in time t under consideration. This allows
us to model the aging effect or teething troubles:
 For α > 1—also called the “light-tailed” case—the default intensity is monotonically
increasing with increasing time, which is useful for modeling the “aging effect” as it
happens for machines: The default intensity of a 20-year old machine is higher than the
one of a 2-year old machine.
 For α < 1—the “heavy-tailed” case—the default intensity decreases with increasing
time. That means we have the effect of “teething troubles,” a ﬁgurative explanation for
the effect that after some trouble at the beginning things work well, as it is known from
new cars. The credit spread on noninvestment-grade corporate bonds provides a good
example: Credit spreads usually decline with maturity. The credit spread reﬂects the
default intensity and, thus, we have the effect of “teething troubles.” If the company
survives the next two years, it will survive for a longer time as well, which explains the
decreasing credit spread.
 For α = 1, Weibull distribution reduces to an exponential distribution with parameter β.
74
Gamma distribution
The family of Gamma distributions forms a two-parameter probability distribution family with pdf:
1 /
( )= , >0
Γ( )
Gamma distribution
1.20
1.00
0.20
-
 For alpha = 1, Gamma distribution becomes exponential distribution

 For alpha = k/2 and beta = 2, Gamma distribution becomes Chi-square distribution
Logistic
A logistic distribution has heavy tails.
Logistic distribution
0.50
0.40
alpha=0, beta=1
0.30
alpha=2, beta=1
N(0,1)
0.10
-
1 4 7 10 13 16 19 22 25 28 31 34 37 40
75
Extreme Value Theory
Measures of central tendency and dispersion (variance, volatility) are impacted more by
observations near the mean than outliers. The problem is that, typically, we are concerned with
outliers; we want to size the likelihood and magnitude of low frequency, high severity (LFHS)
events. Extreme value theory (EVT) solves this problem by fitting a separate distribution to
the extreme tail loss. EVT uses only the tail of the distribution, not the entire dataset.
In applying extreme value theory (EVT), the two general approaches are:
 Block maxima (BM): The classic approach
 Peaks over threshold (POT): The modern approach that is often preferred.
Block maxima
The dataset is parsed into (m) identical, consecutive and non-overlapping periods called blocks.
The length of the block should be greater than the periodicity; e.g., if the returns are daily,
blocks should be weekly or more. Block maxima partitions the set into time-based intervals. It
requires that observations be identically and independently (i.i.d.) distributed.
76
Generalized extreme value (GEV) fits block maxima. The Generalized extreme value (GEV)
distribution is given by:
 1
  

exp  (1   y )    0
H ( y )    
 
 y
exp( e )  0
The  (xi) parameter is the “tail index;” it represents the fatness of the tails. In this expression, a
lower tail index corresponds to fatter tails.
Generalized Extreme Value (GEV)

0.15
0.10
0.05
0.00
0
10
15
20
25
30
35
40
45
Per the (unassigned) Jorion reading on EVT, the key thing to know here is that (1) among
the three classes of GEV distributions (Gumbel, Frechet, and Weibull), we only care
about the Frechet because it fits to fat-tailed distributions, and (2) the shape parameter
determines the fatness of the tails (higher shape → fatter tails)
Peaks over threshold (POT)
Peaks over threshold (POT) collects the dataset of losses above (or in excess of) some
threshold.
77
The cumulative distribution function here refers to the probability that the “excess loss” (i.e., the
loss, X, in excess of the threshold, u, is less than some value, y, conditional on the loss
exceeding the threshold):
( )= ( − ≤ | > )
u X
-4 -3 -2 -1 0 1 2 3 4
1
  x 
1  (1  )  0
 
G , ( x )  
1  exp(  x )  0
 
Generalized Pareto Distribution

(GPD)
1.50
1.00
0.50
-
0 1 2 3 4
Block maxima is time-based (i.e., blocks of time), traditional, less sophisticated and more
restrictive in its assumptions (i.i.d.) while peaks over threshold (POT) is more modern,
has at least three variations (semi-parametric, unconditional parametric and conditional
parametric) and is more flexible.
EVT Highlights: Both GEV and GPD are parametric distributions used to model heavy-tails.
GEV (Block Maxima)
 Has three parameters: location, scale and tail index
 If tail > 0: Frechet
GPD (peaks over threshold, POT)
 Has two parameters: scale and tail (or shape)
 But must select threshold (u)
78
Describe the central limit theorem and the implications it has when
combining i.i.d. random variables.
In brief:
 Law of large numbers: Under general conditions, the sample mean will be near the
population mean.
 Central limit theorem (CLT): As the sample size increases, regardless of the
underlying distribution, the sampling distribution approximates (tends toward) normal.
Central limit theorem (CLT)
We assume a population with a known mean and finite variance, but not necessarily a normal
distribution (we may not know the distribution!). Random samples of size (n) are then
drawn from the population. The expected value of each random variable is the population’s
mean. Further, the variance of each random variable is equal the population’s variance divided
by n (note: this is equivalent to saying the standard deviation of each random variable is equal
to the population’s standard deviation divided by the square root of n).
The central limit theorem says that this random variable (i.e., of sample size n, drawn from the
population) is itself normally distributed, regardless of the shape of the underlying
population. Given a population described by any probability distribution having mean () and
finite variance (2), the distribution of the sample mean computed from samples (where each
sample equals size n) will be approximately normal. Generally, if the size of the sample is at
least 30 (n  30), then we can assume the sample mean is approximately normal!
Each sample has a sample mean. There are many sample means. The sample means
have variation: a sampling distribution. The central limit theorem (CLT) says the
sampling distribution of sample means is asymptotically normal.
79
Summary of central limit theorem (CLT):
 We assume a population with a known mean and finite variance, but not necessarily a
normal distribution.
 Random samples (size n) drawn from the population.
 The expected value of each random variable is the population mean
 The distribution of the sample mean computed from samples (where each sample
equals size n) will be approximately (asymptotically) normal.
 The variance of each random variable is equal to population variance divided by n
(equivalently, the standard deviation is equal to the population standard deviation
divided by the square root of n).
Sample Statistics and Sampling Distributions
When we draw from (or take) a sample, the sample is a random variable with its own
characteristics. The “standard deviation of a sampling distribution” is called the standard error.
The mean of the sample or the sample mean is a random variable defined by:
+ +
=
Describe independent and identically distributed (i.i.d) random

variables and the implications of the i.i.d. assumption when
combining random variables.
A random sample is a sample of random variables that are independent and identically
distributed (i.i.d.)
Independent and identically distributed (i.i.d.) variables:

 Each random variable has the same (identical) probability distribution (PDF/PMF, CDF).
 Each random variable is drawn independently of the others: no serial or auto-correlation.
The concept of independent and identically distributed (i.i.d.) variables is a key

assumption we often encounter: to scale volatility by the square root of time requires
i.i.d. returns. If returns are not i.i.d., then scaling volatlity by the square root of time will
give an incorrect answer.
80
Describe a mixture distribution and explain the creation and

characteristics of mixture distributions.
A mixture distribution is a sum of other distribution functions but weighted by probabilities. The
density function of a mixture distribution is, then, the probability-weighted sum of the
component density function
( )= ( ), =1
where fi(x)’s are the component distributions, and wi’s are the mixing proportions or weights.
Note: The sum of the component weights must equal one.
For example, consider a stock whose log returns follow a normal distribution with low volatility
90% of the time, and a normal distribution with high volatility 10% of the time. Most of the time
the stock just bounces along but occasionally, the stock’s behavior may be more extreme. In
this Miller’s example, the mixture distribution is:
( )= ( )+ ( )
According to Miller, “Mixture distributions are extremely flexible. In a sense they occupy a
realm between parametric distributions and non-parametric distributions. In a typical mixture
distribution, the component distributions are parametric but the weights are based on empirical
(non-parametric) data. Just as there is a trade-off between parametric distributions and non-
parametric distributions, there is a trade-off between using a low number and a high number of
component distributions. By adding more and more component distributions, we can
approximate any data set with increasing precision. At the same time, as we add more and
more component distributions, the conclusions that we can draw become less and less general
in nature.”
Normal mixture distribution
A mixture distribution is extremely flexible. If two normal distributions have the same mean, they
combine (mix) to produce mixture distribution with leptokurtosis (heavy-tails). Otherwise,
mixtures are infinitely flexible.
So, just by adding two normal distributions together, we can develop a large number of
interesting distributions. For example, if we combine two normal distributions with the same
mean but different variances, we can get a symmetrical mixture distribution that displays excess
kurtosis.
81
By shifting the mean of one distribution, we can also create a distribution with positive or
negative skew. Finally, if we move the means of far enough apart, the resulting mixture
distribution will be bimodal. This exhibit below shows that we have a PDF with two distinct
maxima.
82
Chapter Summary
A parametric distribution can be described by a mathematical function, for example, the
normal distribution. A nonparametric distribution cannot be summarized by a mathematical
formula; in its simplest form it is “just a collection of data.”
If the random variable, X, is continuous, the uniform distribution is given by
1
−
for ≤x≤
( )= 2 1
0 for x < or x >
If the random variable, X, is discrete, the uniform distribution is given by
1
( )=
A random variable X is called Bernoulli distributed with parameter (p) if it has only two
possible outcomes.
A binomial distributed random variable is the sum of (n) independent and identically distributed
(i.i.d.) Bernoulli-distributed random variables. The probability of observing (k) successes is given
by:
!
( = )= (1 − ) where = − ! !
The Poisson distribution depends upon only one parameter, lambda λ. The random number of
events that occur during an interval of time, (e.g., losses/ year, failures/ day) is given by:
( = )=
!
In Poisson, lambda is both the expected value (the mean) and the variance.
Characteristics of the normal distribution include:

 The middle of the distribution, mu (µ), is the mean (and median).
 Standard deviation and variance are measures of dispersion.
 The distribution is symmetric around µ.
 The normal has skew = 0
 The normal has kurtosis = 3 or “excess kurtosis” = 0
Properties of normal distribution include location-scale invariance, summation stability and

possessing a domain of attraction.
83
A normal distribution can be transformed into a unit or standardized variable that has mean =
0, and variance = 1 and requires no parameters.
Common examples:
 The Bernoulli distribution is used to characterize default.
 The binomial distribution is commonly used to characterize a portfolio of credits.
 The normal distribution is used to test the sample average asset return and to model
equity returns for short horizons.
 The Poisson distribution is used to model the time of default in credit risk models and
to calculate operational loss frequency.
 If a variable has a lognormal distribution, then the log of that variable has a normal
distribution.
 The exponential distribution is used to model the time we have to wait until a certain
event takes place.
Weibull is a generalized exponential distribution; i.e., the exponential is a special case of the
Weibull where the alpha parameter equals 1.0.
( )= 1− , >0
The family of Gamma distributions forms a two-parameter probability distribution family with
the density function (pdf) given by:
1 /
( )= , >0
Γ( )
The beta distribution has two parameters: alpha (“center”) and beta (“shape”). The beta
distribution is popular for modeling recovery rates.
Extreme value theory (EVT) fits a separate distribution to the extreme loss tail. EVT uses
only the tail of the distribution, not the entire dataset. In applying extreme value theory (EVT),
two general approaches are:
 Block maxima (BM). The classic approach
 Peaks over threshold (POT). The modern approach that is often preferred.
Both GEV and GPD are parametric distributions used to model heavy-tails.
GEV (Block Maxima)

 Has three parameters: location, scale and tail index
 If tail > 0: Frechet
GPD (peaks over threshold, POT)
 Has two parameters: scale and tail (or shape)
 But must select threshold (u)
84
Central limit theorem (CLT): As the sample size increases, regardless of the underlying
distribution, the sampling distributions approximates (tends toward) normal. CLT says the
sampling distribution of sample means is asymptotically normal.
Independent and identically distributed (i.i.d.) variables:

 Each random variable has the same (identical) probability distribution (PDF/PMF, CDF)
 Each random variable is drawn independently of the others: no serial- or auto-correlation
A mixture distribution is a sum of other distribution functions but weighted by probabilities.

The density function of a mixture distribution is the probability-weighted sum of the
component density function
( )= ( ) where (. )
85

309.1. Next month, the short interest rate will be either 200 basis points with probability of
28.0%, or 300 basis points. What is nearest to the implied rate volatility?
a) 17.30 bps
b) 44.90 bps
c) 83.50 bps
d) 117.70 bps
309.2. At the start of the year, a stock price is $100.00. A twelve-step binomial model describes
the stock price evolution such that each month the extremely volatility price will either jump from
S(t) to S(t)*u with 60.0% probability or down to S(t)*d with 40.0% probability. The up jump (u) =
1.1 and the down jump (d) = 1/1.1; note these (u) and (d) parameters correspond to an annual
volatility of about 33% as exp[33%*SQRT(1/12)] ~= 1.10. At the end of the year, which is
nearest to the probability that the stock price will be exactly $121.00?
a) 0.33%
b) 3.49%
c) 12.25%
d) 22.70%
310.1. A large bond portfolio contains 100 obligors. The average default rate is 4.0%. Analyst
Joe assumes defaults follow a Poisson distribution but his colleague Mary assumes the defaults
instead follow a binomial distribution. If they each compute the probability of exactly four (4)
defaults, which is nearest to the difference between their computed probabilities?
a) 0.40%
b) 1.83%
c) 3.55%
d) 7.06%
86
311.1. George the analyst creates a model, displayed below, which generates two series of
random but correlated asset returns. Both asset prices begin at a price of $10.00 with a periodic
mean return of +1.0%. Series #1 has periodic volatility of 10.0% while Series #2 has periodic
volatility of 20.0%. The desired correlation of the simulated series is 0.80. Each series steps
according to a discrete version of geometric Brownian motion (GBM) where price(t+1) = price (t)
+ price(t)*(mean + volatility*standard random normal). Two standard random normals are
generated at each step, X(1) and X(2), but X(2) is transformed into correlated Y(1) with Y(1) =
rho*X(1) + SQRT(1 - rho^2)*X(2), such that Y(1) informs Series #2. The first five steps are
displayed below:
At the fourth step, when the Series #1 Price = $10.81, what is Y(1) and the Series #2 Price [at
Step 4], both of which cells are highlighted in orange above?
a) -0.27 and $9.08
b) +0.55 and $9.85
c) +0.99 and $11.33
d) +2.06 and $12.40
312.1. A random variable X has a density function that is a normal mixture with two independent
components: the first normal component has an expectation (mean) of 4.0 with variance of 16.0;
the second normal component has an expectation (mean) of 6.0 with variance of 9.0. The
probability weight on the first component is 0.30 such that the weight on the second component
is 0.70. What is the probability that X is less than zero; i.e., Prob [X<0]?
a) 0.015%
b) 1.333%
c) 6.352%
d) 12.487%
87
Answers:
309.1. B. 44.90 bps
Expected rate = 28%*200 + 72%*300 = 272, and

Variance = (200-272)^2*28% + (300-272)^2*72% = 2,016.0 bps^2, such that
Standard deviation = SQRT(2,016) = 44.90 basis points.

distributions-i-miller-chapter-4.7025
309.2. D. 22.70%
There are 13 outcomes at the end of the 12-step binomial, with $100 as the outcome that must
correspond to six up jumps and six down jumps. Therefore, $121.0 must be the outcome due to
seven up jumps and five down jumps: $100*1.1^7*(1/1.1)^5 = $121.00
Such that we want the binomial probability given by:
Binomial Prob [X = 7 | n = 12, p = 60%] = 22.70%.

distributions-i-miller-chapter-4.7025
310.1. A. 0.40%
Binomial Prob [X = 4 | n = 100 and p = 4%] = 19.939%, and

Poisson Prob [X = 4 | lamda = 100*4%] = 19.537%, such that difference = 0.4022%.

distributions-ii-miller-chapter-4.7036
311.1. C. 0.99 and $11.33
Correlated Series #2 = 0.80*1.02 + SQRT(1-0.80^2)*0.28 = 0.99; i.e., the standard random

normal 0.28 is transformed into another, correlated standard random normal of 0.99.
The Series #2 Price [Step 4] = $9.38 + 9.38*(1% + 0.99*20%) = $11.33

distributions-iii-miller.7066
312.1. C. 6.352%
Because the normal mixture distribution function is a probability-weighted sum of its component
distribution functions, it is true that:
Prob(mixture)[X < 0] = 0.30*Prob(1st component)[X < 0] + 0.70*Prob(2nd component)[X < 0].
In regard to the 1st component, Z = (0-4)/sqrt(16) = -4/4 = -1.0.
In regard to the 2nd component, Z = (0-6)/sqrt(9) = -6/3 = -2.0. Such that:
Prob(mixture)[X<0] = 0.30*[Z < -1.0] + 0.70*Prob[Z < -2.0],
Prob(mixture)[X<0] = 0.30*15.87% + 0.70*2.28% = 6.352%.
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-312-mixture-

distributions.7103
88

Question 1:
XYZ Corporation announces its earnings four times per year. Based on historical data, you
estimate that in any given quarter the probability that XYZ Corporation’s earnings will exceed
consensus estimates is 30%. Also, the probability of exceeding the consensus in any one
quarter is independent of the outcome in any other quarter. What is the probability that XYZ
Corporation will exceed estimates three times in a given year?
Answer:
The number of times XYZ Corporation exceeds consensus estimates follows a binomial
distribution; therefore:
[ = 3] = 4 0.03 0.70 = 4 ∙ 0 ∙ 30 ∙ 0.07 = 7.56%

3
Question 2
The market risk group at your firm has developed a value at risk (VaR) model. In Chapter 7 we
examine VaR models more closely. In the meantime, assume the probability of an exceedance
event on any given day is 5%, and the probability of an exceedance event occurring on any
given day is independent of an exceedance event having occurred on any previous day. What is
the probability that there are two exceedances over 20 days?
Answer:
The number of exceedance events follows a binomial distribution; therefore:
[ = 2] = 20 0.05 0.95 = 18.87%

2
Question 3
Assume the annual returns of Fund A are normally distributed with a mean and standard
deviation of 30%. The annual returns of Fund B are also normally distributed, but with a mean
and standard deviation of 40%. The returns of both funds are independent of each other. What
is the mean and standard deviation of the difference of the returns of the two funds, Fund B
minus Fund A? At the end of the year, Fund B has returned 80%, and Fund A has lost 12%.
How likely is it that Fund B outperforms Fund A by this much or more?
Answer:
Because the annual returns of both funds are normally distributed and independent, the
difference in their returns is also normally distributed:
~ ( − , + )
The mean of this distribution is 10%, and the standard deviation is 50%. At the end of the year,
the difference in the expected returns is 92%. This is 82% above the mean, or 1.64 standard
deviations. Using Excel or consulting the table of confidence levels in the chapter, we see that
this is a rare event. The probability of more than a 1.64 standard deviation event is only 5%.
89
Question 4
The number of defaults per month in a large bond portfolio follows a Poisson process. On
average, there are two defaults per month. The number of defaults is independent from one
month to the next. What is the probability that there are five defaults over five months? Ten
defaults? Fifteen defaults?
Answer:
The average number of defaults over five months is 10; therefore:
10
[ = 5] = = 3.78%
5!
10
[ = 10] = = 12.51%
10!
10
[ = 15] = = 3.47%
15!
Question 5
The annual returns of an emerging markets bond fund have a mean return of 10% and a
standard deviation of 15%. Your firm invests $200 million into the fund. What is the probability of
losing more than $18.4 million? Assume the returns are normally distributed, and ignore the
limited liability constraint (i.e., the impossibility of losing more than the initial $200 million
investment).
Answer:
If the returns of the fund are normally distributed with a mean of 10% and a standard deviation
of 15%, then the returns of $200 million invested in the fund are also normally distributed, but
with an expected return of $20 million and a standard deviation of $30 million. A loss of $18.4
million represents a –1.28 standard deviation move:
−$18.4 − $20
= = −1.28
$30
This is a one-tailed problem. By consulting the table of confidence intervals or using a

spreadsheet, we determine that just 10% of the normal distribution lies below –1.28 standard
deviations.
90
Question 6
The annual returns of an emerging markets exchange-traded fund (ETF) have an expected
return of 20.60% and a standard deviation of 30.85%. You are asked to estimate the likelihood
of extreme return scenarios. Assume the returns are normally distributed. What is the probability
that returns are worse than −30%?
Answer:
The return of –30% is approximately a –1.64 standard deviation event:
−30% − 20.60%
= = −1.64
30.85%
According to the table of confidence intervals, 5% of the normal distribution lies below –1.64
standard deviations. The probability of a return less than –30% is then 5%.
Question 7
For a uniform distribution with a lower bound x1and an upper bound x2, prove that the formulas
for calculating the mean and variance are:
1
= ( + )
2
1
= ( − )
12
Answer:
For the mean:
1 1
= = = = ( − )
2 2
From a previous example, we know that c =1/(x2– x1); therefore:
1( − ) 1( − )( + ) 1
= = = ( + )
2( − ) 2 ( − ) 2
For the variance:
1
( − ) = ( −2 + ) = − +
3
-continued-
91
Substituting in for c and µ from above:
1 1 1 1
= ( − )− ( − )( − )+ ( + ) ( − )
− 3 2 4
For the final step, we need to know that:
− =( − )( + + )
Substituting in and solving, we have:
1
= ( − )
12
Question 8
Prove that the normal distribution is a proper probability distribution. That is, show that:
1 ( )
=1
√2
You may find it necessary to use the Gaussian integral:
=√
Answer:
−
=
√2
= √2
1 ( ) 1 1
= √ =1
√2 √ √
92
Question 9
Prove that the mean of the normal distribution, as specified in Equation 4.12, is µ. That is, show
that:
1 ( )
=
√2
Answer:
Using the same substitution as in the previous question:

−
=
√2
= √2
1 ( ) 1
√2 +
√2 √
1 ( ) √2
= +
√2 √ √
1 ( ) √2
= −2 +
√2 √
1 ( )
=
√2
93
Question 10
Prove that the variance of a normal distribution, as specified in Equation 4.12, is σ2. You may
find the following result useful:
1
= √
2
Answer:
Using the same substitution as before:

−
=
√2
= √2
1 ( ) 2
[ ]= ( − ) =
√2 √
For the final step, we need to know that:
1
= √
2
Using this result, we achieve the desired result:
2 1
[ ]= √ =
√ 2
94
Question 11
Prove that the correlation between XA and XB is ρ, where:
= + 1−
= + 1−
and X1, X2, and X3 are uncorrelated standard normal variables.
Answer:
First we note that the mean of XAis zero:
[ ]= + 1− = [ ]+ 1− [ ]= ∙0+ 1− ∙0=0
Similarly, the mean of XB is zero. Next, we want to calculate the variance. In order to do that, it
will be useful to know two relationships. First we rearrange the equation for variance, Equation
3.20, to get:
= [ ] + [ ] = 1 + 0 = 1 for = 1,2,3
Similarly, we can rearrange our equation for covariance, Equation 3.26, to get:
= , + [ ] =0+0∙0 =0∀ ≠
With these results in hand, we now show that the variance of XA is one:
[ ]= [ ]− [ ] = [ ]
[ ]= +2 (1 − ) + (1 − )
[ ]= [ ]+2 (1 − ) [ ] + (1 − )
[ ]= ∙1+2 (1 − ) ∙ 0 + (1 − ) ∙ 1 = 1
The variance of XB is similarly 1. Next we calculate the covariance of XA and XB:
[ , ]= [ ]− [ ] [ ]= [ ]
[ , ]= + 1− ( + ) + (1 − )
[ , ]= [ ]+ 1− ( [ ]+ [ ]) + (1 − ) [ ]
[ , ]= ∙1+ 1 − (0 + 0) + (1 − ) ∙ 0 =
Putting the last two results together completes the proof:
[ , ]
[ , ]= = =
[ ]∙ [ ] √ .
95
Question 12
Imagine we have two independent uniform distributions, A and B. A ranges between −2 and −1,
and is zero everywhere else. B ranges between +1 and +2, and is zero everywhere else. What
are the mean and standard deviation of a portfolio that consists of 50% A and 50% B? What are
the mean and standard deviation of a portfolio where the return is a 50/50 mixture distribution of
A and B?
Answer:
For the portfolio consisting of 50% Aand 50% B, we can proceed two ways. The PDF of the
portfolio is a triangle, from –0.5 to +0.5, with height of 2.0 at 0. We can argue that the mean is
zero based on geometric arguments. Also, because both distributions are just standard uniform
variables shifted by a constant, they must have variance of 1/12; 50% of each asset would have
a variance of 1/4 this amount, and—only because the variables are independent—we can add
the variance of the variables, giving us:
1 1 1
=2 =
4 12 24
1 1 1
= =
24 2 6
Alternatively, we could calculate the mean and variance by integration:

.
= (2 + 4 ) + (2 − 4 )
.
. .
4 4
= + 3 + + 3 =0
3 . 3
.
= (2 + 4 ) + (2 − 4 )
.
.
2 2 1
= + + − =
3 . 3 24
This confirms our earlier answer.
For the 50/50 mixture distribution, the PDF is bimodal and symmetrical around zero, giving a
mean of zero:
1 1
= 0.5 + 0.5 = 0.5 +
2 2
1
= 0.5 (1 − 4 + 4 − 1) = 0
2
-continued on next page-
96
For the variance we have:
1 1
= 0.5 + = 0.5 +
3 3
1 7
= (−1 + 8 + 8 − 1) =
6 3
7
=
3
Notice that, while the mean is the same, the variance for the mixture distribution is significantly
higher.
97
Miller, Chapter 6 (pp. 113-124 only): Bayesian Analysis

Describe Bayes’ theorem and apply this theorem in the calculation of conditional
probabilities.
Compare the Bayesian approach to the frequentist approach.
Apply Bayes’ theorem to scenarios with more than two possible outcomes.
Describe Bayes’ theorem and apply this theorem in the calculation of

conditional probabilities
Bayes theorem shows how a conditional probability, denoted P(B|A), may be combined with the
prior unconditional probability P(A) to obtain the posterior probability P(A|B):
[ | ]⋅ [ ] [ | ]⋅ [ ]
[ | ] = =
[ ] [ | ] ⋅ [ ] + [ | ] ⋅ [ ′]
For example: Assume two bonds, Bond A and Bond B, each with a 10% probability of
defaulting over the next year. Further assume that the probability that both bonds default is 6%,
and that the probability that neither bond defaults is 86%. The ensuing probability matrix shows
the probability that only Bond A or Bond B defaults (aka, exclusive or) is 4%:
Without the application of Bayes, the unconditional (marginal) probability that bond A
defaults is 10%. This is the probability without any prior information.
In Bayes terminology, this is called a prior probability. The prior is denoted P[A] in the formula
above. The prior is the unconditional probability which assumes no additional information or
evidence. Next, we will observe the evidence, denoted P[B] in the formula above, that allows us
to update from the prior probability to a posterior probability, denoted P[A|B].
In summary, Bayes uses the evidence P[B] to update from prior P[A] to the posterior P[A|B].
98
To apply Bayes Theorem: Assume in the example we are given additional information (aka,
evidence). Specifically, we are told that bond B has defaulted. Now, what is the probability that
Bond A defaults, given that Bond B has defaulted? Bayes’ Theorem solves for a conditional
probability. In this case, Bond B defaults in 10% of the scenarios, but the probability that both
Bond A and Bond B default is only 6%. In other words, Bond A defaults in 60% of the scenarios
in which Bond B defaults.
P[A ∩ B] [ | ]⋅ [ ] [ | ] ⋅ [ ] 60.0% ∗ 10.0%

[ | ] = = = = = 60%
[ ] [ ] [ ] 10.0%
Because P(B) is itself the sum of two possible outcomes, the denominator can be expanded and
the we get the elaborate version of the Bayes’ formula:
[ | ]⋅ [ ] 60.0% ∗ 10.0%
[ | ] = = = 60%
[ | ] ⋅ [ ] + [ | ] ⋅ [ ′] 60.0% ∗ 10.0% + 4.4% ∗ 90.0%
Here is the two-step binomial tree for this example. Note that the four terminal nodes are
mutually exclusive and cumulatively exhaustive. Specifically:
10.0%*60.0% + 10.0%*40.0% + 90.0%*4.4% + 90.0%*95.6% = 6% + 4% + 4% + 86% = 100%.
Unconditional Conditional
60.0% B defaults
10.0% defaults
40.0% B survives
Bond A
4.4% B defaults
90.0% survives
95.6% B survives
In summary:
 The prior is the unconditional (marginal) probability that bond A will default, P(A)= 10%
 The posterior probability that bond A will default, conditional on the observation that
bond B defaulted, is P(A|B) = 60%
Miller’s Sample Problem #1: Imagine there is a disease that afflicts just 1 in every 100 people
in the population. A new test has been developed to detect the disease that is 99% accurate.
That is, for people with the disease, the test correctly indicates that they have the disease in
99% of cases. Similarly, for those who do not have the disease, the test correctly indicates that
they do not have the disease in 99% of cases. If a person takes the test and the result of the
test is positive, what is the probability that he or she actually has the disease?
99
Assuming 10,000 trials and using notations T+ and T- for those testing positive and negative for
the test respectively, we have the table below. If you check the numbers, you’ll see 1% of the
population with the disease, and 99% accuracy in each column.
 The unconditional probability of a positive test is 1.98%, which is simply the probability of
a positive test being produced by somebody with the disease plus the probability of a
positive test being produced by somebody without the disease.
 We can then calculate the probability of having the disease given a positive test,
P[H’|T+] using Bayes’ theorem. As shown in the calculations in the table there is only a
50.0% chance that the person who tests positive actually has the disease.
The exhibit below shows the binomial tree for this example.
Unconditional Conditional
99.0% Positive
1.0% Disease
1.0% Negative
Actual
1.0% Positive
99.0% Healthy
99.0% Negative
100
Sample Problem #2: Based on an examination of historical data, all fund managers of Astra
Fund of Funds fall into one of two groups: 1) Stars are the best managers and the probability
that a star will beat the market in any given year is 75%. 2) Ordinary, nonstar managers, by
contrast, are just as likely to beat the market as they are to underperform it.
For both types of managers, the probability of beating the market is independent from one year
to the next. Stars are rare and given a pool of managers, only 16% turn out to be stars. A new
manager was added to your portfolio three years ago. Since then, the new manager has beaten
the market every year.
1. What was the probability that the manager was a star when the manager was first
added to the portfolio?
2. What is the probability that this manager is a star now?
3. After observing the manager beat the market over the past three years, what is the
probability that the manager will beat the market next year?
101
 The probability that a manager beats the market given that the manager is a star is
75%(=12/16). The probability that a nonstar manager will beat the market is
50%(=42/84)
 To answer the first question: At the time the new manager was added to the portfolio,
the probability that the manager was a star was just the probability of any manager being
a star, 16%, the unconditional probability.
 To answer the second question: For this, we first need the probability or likelihood of the
manager beating the market, assuming that the manager was a star, P[3B|S] which is
the probability that a star beats the market in any given year to the third power,
calculated as 42.19%. We finally need to find the probability that the manager is a star,
given that the manager has beaten the market three years in a row. Using Bayes
theorem and as shown in the table, this probability P[S|3B] is calculated as 39.13%.
 To answer the last question: The probability that the manager beats the market next
year is just the probability that a star would beat the market plus the probability that a
nonstar would beat the market, weighted by our new beliefs. Our updated belief about
the manager being a star is 39.13%, so the probability that the manager is not a star
must be 60.87%. Using these, the probability P[B] is then calculated to be 59.78%
Summary: Thus, when using Bayes’ theorem to update beliefs, we often refer to prior and
posterior beliefs and probabilities. In this sample problem, the prior probability, that is, before
seeing the manager beat the market three times, our belief that the manager was a star was
16%. The posterior probability, that is, after seeing the manager beat the market three times,
our belief that the manager was a star was 39.13%.
Bayes’ Theorem: Example #1, GARP’s 2011 Practice Exam, Part I, Question #5
Question: John is forecasting a stock’s performance in 2010 conditional on the state of the
economy of the country in which the firm is based. He divides the economy’s performance into
three categories of “GOOD”, “NEUTRAL” and “POOR” and the stock’s performance into three
categories of “increase”, “constant” and “decrease”.
He estimates:
 The probability that the state of the economy is GOOD is 20%. If the state of the
economy is GOOD, the probability that the stock price increases is 80% and the
probability that the stock price decreases is 10%.
 The probability that the state of the economy is NEUTRAL is 30%. If the state of the
economy is NEUTRAL, the probability that the stock price increases is 50% and the
probability that the stock price decreases is 30%.
 If the state of the economy is POOR, the probability that the stock price increases is
15% and the probability that the stock price decreases is 70%.
Billy, his supervisor, asks him to estimate the probability that the state of the economy is
NEUTRAL given that the stock performance is constant. John’s best assessment of that
probability is closest to what?
Answer (given by GARP):
Use Bayes’ Theorem: P(NEUTRAL | Constant) = P(Constant | Neutral) * P(Neutral) /

P(Constant) = 0.2 * 0.3 / (0.1 * 0.2 + 0.2 * 0.3 + 0.15 * 0.5) = 0.387
102
Alternative approach to same Answer: This may be easier to follow if we represent question’s
assumptions as a matrix (joint probabilities):
Bayes’ Theorem: Example #2, GARP’s 2011 Practice Exam, Part 2, Question #5
Question: John is forecasting a stock’s price in 2011 conditional on the progress of certain
legislation in the United States Congress. He divides the legislative outcomes into three
categories of “Passage”, “Stalled” and “Defeated” and the stock’s performance into three
categories of “increase”, “constant” and “decrease” and estimates the following events:
Passage Stalled Defeated
Prob of legislative outcome 20% 50% 30%
Prob of Increase in Stock Price 10% 40% 70%

given legislative outcome
Prob of Decrease in Stock Price 60% 30% 10%

given legislative outcome
A portfolio manager would like to know that if the stock price does not change in 2011, what the
probability that the legislation passed is. Based on John’s estimates, this probability is:
103
Answer:
Compare the Bayesian approach to the frequentist approach.

Imagine you are an analyst given daily profit and loss (P&L) data for a fund. The fund had
positive returns for 560 of the past 1,000 trading days. What is the probability that the fund will
generate a positive return tomorrow? Without further instructions, it is tempting to say that the
probability is 56%, (560/1,000 = 56%). This approach of concluding only on the basis of
observed frequency (of positive results in this case that the return next days will be 56%) is
called frequentist Approach.
Take another example of a coin that you believed was fair, with a 50% chance of landing heads
or tails when flipped. If you flip the coin 10 times and it lands heads each time, you might start to
suspect that the coin is not fair. Ten heads in a row could happen, but the odds of seeing 10
heads in a row is only 1: 1,024 for a fair coin, (1/2)10 = 1/1,024.
How do you update your beliefs after seeing 10 heads? If you believed there was a 90%
probability that the coin was fair before you started flipping, then after seeing 10 heads your
belief that the coin is fair should probably be somewhere between 0% and 90%. You believe it is
less likely that the coin is fair after seeing 10 heads (so less than 90%), but there is still some
probability that the coin is fair (so greater than 0%). Based on these priors (unconditional or
prior probabilities), Bayes’ theorem provides a framework for deciding exactly what our new
beliefs should be.
Similar to frequentist approach, as seen in the example of coin, the Bayesian approach
also counts the number of positive results. The conclusion is different because the Bayesian
approach starts with a prior belief about the probability and works towards deriving conditional
probabilities.
104
Which approach is better – Bayesian or frequentist?
It is hard to say:
 Proponents of Bayesian analysis point to the absurdity of the frequentist approach when
applied to small data sets. For instance, it is not justified to conclude the probability of a
positive result is 100% after observing three out of three positive (3/3) results.
 Proponents of the frequentist approach point to the arbitrariness of Bayesian priors. How
did we arrive at our priors? In most cases the prior is either subjective or based on
frequentist analysis.
 Most practitioners tend to take a more balanced view, realizing that there are situations
that lend themselves to frequentist analysis and others that lend themselves to Bayesian
analysis.
o When there is very little data, we tend to prefer Bayesian analysis. When we
have lots of data, the conclusions of frequentist and Bayesian analysis are often
similar, and the frequentist results are often easier to calculate.
o In risk management, performance analysis and stress testing are examples of
areas where we often have very little data, and the data we do have is very
noisy. These areas are likely to lend themselves to Bayesian analysis.
Apply Bayes’ theorem to scenarios with more than two possible

outcomes.
Many-state problems
In the two previous sample problems, each variable could exist in only one of two states: a
person either had the disease or did not have the disease; a manager was either a star or a
nonstar. The Bayesian analysis can be easily extended to any number of possible outcomes as
shown in the examples below rather than limiting ourselves with only two possibilities of
outcomes.
Sample Problem #1: Suppose there are three types of managers: the underperformers beat
the market only 25% of the time, the in-line performers beat the market 50% of the time, and the
outperformers beat the market 75% of the time. Our prior belief is that a manager has a 60%
probability of being an in-line performer, a 20% chance of being an underperformer, and a 20%
chance of being an outperformer. If the manager beats the market two years in a row what
should our updated beliefs be?
105
Given these assumptions, the binomial tree diagram and associated probability matrix are
shown below:
 Our prior beliefs can be summarized as: P[MO] = 20%, P[MI] = 60% and P[MU] = 20%.
 To calculate the updated beliefs, we start by calculating the likelihoods, the probability of
beating the market two years in a row, for each type of manager as shown in the
calculations below. So, P[2B|MO] = 56.25%, P[2B|MI] = 25.00% and P[2B|MU] = 6.25%
 Hence, the unconditional probability of observing the manager beat the market two years
in a row, given our prior beliefs is P[2B] =27.50%
[2 ] = [2 | ] × P[MO] + P[2B|MI] × P[MI] + P[2B|MU] × P[MU]
106
 Using Bayes’ theorem, for example, we can calculate our posterior belief that the
manager is an outperformer as 40.91%:
[2 | ]× [ ]
[ |2 ] = = 0.56 × 0.20/0.275 = 40.91%
[2 ]
 Similarly, we can show that the posterior probability that the manager is an in-line
performer is 54.55% and that the posterior probability that the manager is an
underperformer is 4.55%.
 As we would expect, given that the manager beat the market two years in a row, the
posterior probability that the manager is an outperformer has increased, from 20% to
40.91%, and the posterior probability that the manager is an underperformer has
decreased, from 20% to 4.55%.
 Although the probabilities changed, the sum of the probabilities remains equal to 100%.
Sample Problem #2: Using the same prior distributions as in the prior example, what would the
posterior probabilities be for an underperformer, an in-line performer, or an outperformer if
instead of beating the market two years in a row, the manager beat the market in 6 of the
next 10 years?
107
 Our prior beliefs are P[MO] = 20%, P[MI] = 60% and P[MU] = 20%.
 Using a shortcut method, for each type of manager, the likelihood of beating the market
6 times out of 10 can be determined using a binomial distribution:
[6 | ] = 10 (1 − ) .
6
 [6 | ] = 10 0.75 (1 − 0.75) = 14.60%. Similarly, for inline performers and
6
underperformers this probability is found to be 20.51% and 1.62% respectively.
 Hence, the unconditional probability of observing the manager beat the market two years
in a row, given our prior beliefs is P[6B] =15.55% (see learning XLS for details).
[6 ] = [6 | ] × P[MO] + P[6B|MI] × P[MI] + P[6B|MU] × P[MU]
 Using Bayes’ theorem, for example, we can calculate our posterior belief that the
manager is an outperformer as 18.78%:
[6 | ]× [ ]
[ |6 ] = = 0.146 × 0.20/0.1555 = 18.78%
[6 ]
 Similarly, we can show that the posterior probability that the manager is an in-line
performer is 79.13% and that the posterior probability that the manager is an
underperformer is 2.09%.
 So, the probability that the manager is an in-line performer has increased from 60% to
79.13%. The probability that the manager is an outperformer decreased slightly from
20% to 18.78%. It now seems very unlikely that the manager is an underperformer
(2.09% probability compared to our prior belief of 20%).
Below is a summary of the update from prior probabilities (e.g., unconditional probability a
manager underperforms is 20.0%) to posterior probabilities (e.g., if a manager beats the market
six out of ten years then he/she has only a 2.0% probability of being an underperformer).
108

Question 1:
The probability that gross domestic product (GDP) decreases is 20%. The probability that
unemployment increases is 10%. The probability that unemployment increases given that GDP
has decreased is 40%. What is the probability that GDP decreases given that unemployment
has increased?
Answer:
[GDP down|unemployment up]) =
[unemployment up|GDP down] ∙ [GDP down]

[unemployment up]
40% ∙ 20%
[GDP down|unemployment up] = = 80%
10%
Question 2:
An analyst develops a model for forecasting bond defaults. The model is 90% accurate. In other
words, of the bonds that actually default, the model identifies 90% of them; likewise, of the
bonds that do not default, the model correctly predicts that 90% will not default. You have a
portfolio of bonds, each with a 5% probability of defaulting. Given that the model predicts that a
bond will default, what is the probability that it actually defaults?
Answer:
32.14%. By applying Bayes’ theorem, we can calculate the result:
[model = D|actual = D] ∙ [actual = D]

[actual = D|model = D] =
[model = D]
90% ∙ 5%
[actual = D|model = D] = = 32.14%
90% ∙ 5% + 10% ∙ 95%
Even though the model is 90% accurate, 95% of the bonds don’t default, and of those 95% the
model predicts that 10% of them will default. Within the bond portfolio, the model identifies 9.5%
of the bonds as likely to default, even though they won’t. Of the 5% of bonds that actually
default, the model correctly identifies 90%, or 4.5% of the portfolio. This 4.5% correctly identified
is overwhelmed by the 9.5% incorrectly identified.
109
Question 3:
As a risk analyst, you are asked to look at EB Corporation, which has issued both equity and
bonds. The bonds can either be downgraded, be upgraded, or have no change in rating. The
stock can either outperform the market or underperform the market. You are given the following
probability matrix from an analyst who had worked on the company previously, but some of the
values are missing. Fill in the missing values. What is the conditional probability that the bonds
are downgraded given that the equity has underperformed?
Answer:
We can start by summing across the first row to get W:
+ 5% = 15%
= 10%
In a similar fashion, we can find X by summing across the second row:
45% + = 65%
= 20%
To calculate Y, we can sum down the first column, using our previously calculated value for W:
+ 45% + = 10% + 45% + = 60%
= 5%
Using this result, we can sum across the third row to get Z:
+ 15% = 5% + 15% =
= 20%
110
The completed probability matrix is:
The last part of the question asks us to find the conditional probability, which we can express
as:
[Downgrade|Underperform]
We can solve this by taking values from the completed probability matrix. The equity
underperforms in 40% of scenarios. The equity underperforms and the bonds are downgraded
in 15% of scenarios. Dividing, we get our final answer, 37.5%.
[Downgrade ∩ Underperform]
[Downgrade|Underperform] =
[Underperform]
15%
[Downgrade|Underperform] = = 37.5%
40%
Question 4:
Your firm is testing a new quantitative strategy. The analyst who developed the strategy claims
that there is a 55% probability that the strategy will generate positive returns on any given day.
After 20 days the strategy has generated a profit only 10 times. What is the probability that the
analyst is right and the actual probability of positive returns for the strategy is 55%? Assume
that there are only two possible states of the world: Either the analyst is correct, or there the
strategy is equally likely to gain or lose money on any given day. Your prior assumption was that
these two states of the world were equally likely.
Answer:
The prior probabilities are:
[ = 0.55] = 50%
[ = 0.50] = 50%
The probability of the strategy generating 10 positive returns over 20 days if the analyst is
correct is:
[10 + | = 0.55] = 20 0.55 0.45

10
111
The unconditional probability of 10 positive returns is:
[10 +] = [10 + | = 0.55] [ = 0.55] + [10 + | = 0.50] [ = 0.50]
[10 +] = 20 0.55 0.45 ∙ 0.50 +

20
0.50 0.50 ∙ 0.50
10 10
[10 +] = 0.50 20 (0.55 0.45 + 0.50 0.50 )

10
To get our final answer, the probability that p =0.55 given the 10 positive returns, we use Bayes’
theorem:
[10 + | = 0.55] [ = 0.55]

[ = 0.55|10 +] =
[10+]
20
0.55 0.45 ∙ 0.50
[ = 0.55|10 +] = 10
20
0.50 (0.55 0.45 + 0.50 0.50 )
10
0.55 0.45
[ = 0.55|10 +] =
(0.55 0.45 + 0.50 0.50 )
1
[ = 0.55|10 +] = = 47.49%
100
1 + 99
The final answer is 47.49%. The strategy generated a profit in only 10 out of 20 days, so our
belief in the analyst’s claim has decreased. That said, with only 20 data points, it is hard to tell
the difference between a strategy that generates profits 55% of the time and a strategy that
generates profits 50% of the time. Our belief decreased, but not by much.
112
Question 5:
Your firm has created two equity baskets. One is procyclical, and the other is countercyclical.
The procyclical basket has a 75% probability of being up in years when the economy is up, and
a 25% probability of being up when the economy is down or flat. The probability of the economy
being down or flat in any given year is only 20%. Given that the procyclical index is up, what is
the probability that the economy is also up?
Answer:
The final answer is 92.31%. Use +to signify the procyclical index being up, G to signify that the
economy is up (growing), and G to signify that the economy is down or flat (not growing). We
are given the following information:
[+| ] = 75%
[+|G] = 25%
[G] = 20%
We are asked to find P[G | +]. Using Bayes’ theorem, we have:
[+|G] [G]
[G|+] =
[+]
We were not given P[G], but we know the economy must be either growing or not growing;
therefore:
[G] = 1 − [G] = 80%
We can also calculate the unconditional probability that the index is up, P[+]:
[+] = [+|G] [G] + [+|G] [G]
[+] = 75% ∙ 80% + 25% ∙ 20%
[+] = 60% + 5% = 65%
Putting it all together, we arrive at our final answer:
[+|G] [G] 75% ∙ 80%

[G|+] = =
[+] 65%
60%
[G|+] = = 92.31%
65%
113
Question 6:
You are an analyst at Astra Fund of Funds, but instead of believing that there are two or three
types of portfolio managers, your latest model classifies managers into four categories.
Managers can be underperformers, in-line performers, stars, or superstars. In any given year,
these managers have a 40%, 50%, 60%, and 80% chance of beating the market, respectively.
In general, you believe that managers are equally likely to be any one of the four types of
managers. After observing a manager beat the market in three out of five years, what do you
believe the probability is that the manager belongs in each of the four categories?
Answer:
The prior beliefs for beating the market in any given year are:
1
[ = 0.40] =
4
1
[ = 0.50] =
4
1
[ = 0.60] =
4
1
[ = 0.80] =
4
The probability of beating the market three out of five years is:
[3 | = ]= 5 (1 − )
3
Given a constant, c, the posterior probability can be defined as:
[ = |3 ] = ∙ [3 | = ]∙ [ = ]
1
[ = |3 ] = ∙ 5 (1 − ) ∙
3 4
We know that all of the posterior probabilities must add to one:
[ = |3 ] = 1
5 1 (1 − ) =1
∙ ∙
3 4
4
=
5 ∑ (1 − )
3
114
The posterior probabilities are then:
4 5 1
[ = |3 ] = ∙ (1 − ) ∙
5 ∑ (1 − ) 3 4
3
(1 − )
[ = |3 ] =
∑ (1 − )
To get the final answer, we simply substitute in the four possible values for pi . For example, the
posterior probability that the manager is an underperformer is:
0.40 (1 − 0.40)
[ = 0.40|3 ] =
∑ (1 − )
0.40 0.60
[ = 0.40|3 ] =
0.40 0.60 + 0.50 0.50 + 0.60 0.40 + 0.80 0.20
4 6
[ = 0.40|3 ] =
4 6 +5 5 +6 4 +8 2
2,304
[ = 0.40|3 ] = = 21.1%
10,933
The other three probabilities can be found in a similar fashion. The final answer is that the
posterior probabilities of the manager being an underperformer, an in-line performer, a star, or a
superstar are 21.1%, 28.6%, 31.6%, and 18.7%, respectively. Interestingly, even though the
manager beat the market 60% of the time, the manager is almost as likely to be an
underperformer or an in-line performer (49.7% probability) as a star or a superstar (50.3%
probability).
115
Question 7:
You have a model that classifies Federal Reserve statements as either bullish or bearish. When
the Fed makes a bullish announcement, you expect the market to be up 75% of the time. The
market is just as likely to be up as it is to be down or flat, but the Fed makes bullish
announcements 60% of the time. What is the probability that the Fed made a bearish
announcement, given that the market was up?
Answer:
10%. You are given the following:
[+| ] = 75%
[+] = 50%
[ ] = 60%
You are asked to find P[Bear | +]. A direct application of Bayes’ theorem will not work. Instead
we need to use the fact that the Federal Reserve’s statement must be either bearish or bullish,
no matter what the market does; therefore:
[+| ] [ ]
[ |+] = 1 − [ |+] = 1 −
[+]
33
75% ∙ 60%
[ |+] = 1 − = 1 − 45
50% 1
2
9
[ |+] = 1 − = 10%
10
116
Question 8:
You are monitoring a new strategy. Initially, you believed that the strategy was just as likely to
be up as it was to be down or flat on any given day, and that the probability of being up was
fairly close to 50%. More specifically, your initial assumption was that the probability of being up,
p, could be described by a beta distribution, β(4,4). Over the past 100 days, the strategy has
been up 60 times. What is your new estimate for the distribution of the parameter p? What is the
probability that the strategy will be up the next day?
Answer:
Because the prior distribution is a beta distribution and the likelihood can be described by a
binomial distribution, we know the posterior distribution must also be a beta distribution. Further,
we know that the parameters of the posterior distribution can be found by adding the number of
successes to the first parameter, and the number of failures to the second. In this problem the
initial distribution was β(4,4) and there were 60 successes (up days), and 100 – 60 =40 failures.
Therefore, the final distribution is β(64,44). The mean of a beta distribution, β(a,b), is simply a
̸(a + b). The mean of our posterior distribution is then:
64 64
= = = = 59.26%
+ 64 + 44 108
We therefore believe there is a 59.26% probability that the strategy will be up tomorrow.
Question 9:
For the Bayesian network in Exhibit 6.9, each node can be in one of three states: up, down, or
no change. How many possible states are there for the entire network? What is the minimum
number of probabilities needed to completely define the network?
Answer:
There are 27 possible states for the network: 33=27. The minimum number of probabilities
needed to define the network is 22. As an example, we could define P[A =up], and P[A
=unchanged] for node A, which would allow us to calculate P[A =down] =1 − P[A =up] −P[A
=unchanged]. Similarly, we could define two probabilities for node B. For node C, there are nine
possible input combinations (each of three possible states for A can be combined with three
possible states from B). For each combination, we can define two conditional probabilities and
infer the third. For example, we could define P[C =up | A =up, B =up] and P[C =unchanged | A
=up, B =up], which would allow us to calculate P[C =down | A =up, B =up] =1 − P[C =up | A =up,
B =up] − P[C =unchanged | A =up, B =up]. This gives us a total of 22 probabilities that we need
to define: 2 +2 +9 ×2 =22.
117
Question 10:
Calculate the correlation matrix for Network 1, the network on the left, in Exhibit 6.6. Start by
calculating the covariance matrix for the network.
Answer:
The correlation matrix for the first network is:
Question 11:
Calculate the correlation matrix for Network 2, the network on the right, in Exhibit 6.6. Start by
calculating the covariance matrix for the network.
Answer:
The correlation matrix for the second network is:
118
Miller, Chapter 7: Hypothesis Testing and Confidence

Intervals
Calculate and interpret the sample mean and sample variance.
Define and construct a confidence interval.
Define and construct an appropriate null and alternative hypothesis, and calculate the
test statistic.
Differentiate between a one-tailed and a two-tailed test and explain the circumstances in
which to use each test.
Interpret the results of hypothesis tests with a specific level of confidence.
Demonstrate the process of backtesting VaR by calculating the number of exceedances
Calculate and interpret the sample mean and sample variance.

The sample mean is given by:
1 1
̂= =
The sample mean is equivalent to the sum of n i.i.d. random variables, each with a mean of µ
and a standard deviation of σ/n.
The sample variance is given by:
( − )
=
−1
The mean of the sample mean is the true mean µ of the distribution and given by: [ ̂ ] =
The variance of the sample mean, if σ2 is the true variance of the data generating process is:
The variance of our sample mean doesn’t just decrease with the sample size; it decreases in a
predictable way, in proportion to the sample size. It follows that the standard deviation of the
sample mean decreases with the square root of n.
Standard deviation of the sample mean is given by:

=
√
119
Define and construct a confidence interval.

If we first standardize our estimate of the sample mean using the sample standard deviation, the
new random variable follows a Student’s t distribution with (n − 1) degrees of freedom, where
the numerator is simply the difference between the sample mean and the population mean,
while the denominator is the sample standard deviation divided by the square root of the sample
size.
̂−
=
/√
By looking up the appropriate values for the t distribution, we can establish the probability that
our t-statistic is contained within a certain range:
̂−
≤ ≤ =
/√
where xL and xU are constants, which, respectively, define the lower and upper bounds of the
range within the t distribution, and γ is the probability or the confidence level that our t-statistic
will be found within that range.
Rather than working directly with the confidence level, we often work with (1 – γ), which is the
significance level and is often denoted by α. The smaller the confidence level is, the higher the
significance level.
The population mean is normally unknown, and we rearrange the equation so that:
̂− ≤ ≤ ̂+ =
√ √
This confidence interval is the probability that the population mean will be contained
within the defined range.
Typically, the confidence interval uses the product of [standard error х critical “lookup” t].
− ≤ ≤ +
√ √
This confidence interval is a random interval. Why? Because it will vary randomly with each
sample, whereas we assume the population mean is static.
The confidence level is selected by the user; e.g., 95% (0.95) or 99% (0.99) and significance
= 1 – confidence level.
We don’t say the probability is 95% that the “true” population mean lies within this
interval. That implies the true mean is variable. Instead, we say the probability is 95% that
the random interval contains the true mean. See how the population mean is trusted to
be static and the interval varies?
120
Define and construct an appropriate null and alternative hypothesis,

and calculate an appropriate test statistic.
A two-tailed null and alternate hypothesis takes the form:
: =
: ≠
In many cases, in practice, the test is a significance test such that it is often assumed that both
(i) the null is two-tailed and further (ii) that the null hypothesis is that the estimate is equal to
zero. Symbolically, then, the following is a very common test:
: =0
: ≠0
As Miller says, “in many scientific fields where positive and negative deviations are equally
important, two-tailed confidence levels are the more prevalent. In risk management, more often
than not, we are more concerned with the probability of bad outcomes, and this concern
naturally leads to one-tailed tests.”
A one-tailed test rejects the null only if the estimate is either significantly above or
significantly below, but only specifies one direction. For example, the following null
hypothesis is not rejected if the estimate is greater than the value c; we are here concerned with
deviations in one direction only:
: ≥
: <
Example - EOC Problem #1: Given the following data sample, how confident can we be that
the mean is greater than 40?
121
To construct a confidence interval with the dataset above:

 Determine degrees of freedom (d.f.). d.f. = sample size – 1. In this case, it is 9 d.f.
 Select confidence. In this case, confidence coefficient = 0.95 = 95%
 We are constructing an interval, so we need the critical t value for 5% significance with
two-tails.
 The critical t value is equal to 2.262. That’s the value with 9 d.f. and either 2.5% one-
tailed significance or 5% two-tailed significance (see how they are the same provided the
distribution is symmetrical?)
 The standard error is equal to the sample standard deviation divided by the square root
of the sample size (not d.f.!). In this case, it is 9.260.
 The lower limit of the confidence interval is 24.1 which is calculated as sample mean
minus the critical t multiplied by the standard error.
 The upper limit of the confidence interval is 65.9, calculated as the sample mean plus
the critical t multiplied by the standard error.
122
To do hypothesis testing with this dataset:

 The null hypothesis is constructed such that the desired result is false. In this case we
would make the null hypothesis µ ≤ 40, so that if it gets rejected then the expected mean
would be greater than 40.
 The critical t -value or a one-tailed rejection point at 0.05 significance level for 9 d.f is
1.833.
 The calculated t-value is:
−
= = 0.54
/√
 Here since the calculated t-value of 0.54 is less than the t-statistic value (1.833 in one
tailed) should we accept the null hypothesis that there is a high probability that the
expected mean is less than 40. That is, we fail to reject the null hypothesis at the 95%
confidence level.
Example - Miller EOC Problem #2: You are given the following sample of annual returns for a
portfolio manager. If you believe that the distribution of returns has been stable over time and
will continue to be stable over time, how confident should you be that the portfolio manager will
continue to produce positive returns?
 The desired condition is that the portfolio manager will continue to produce positive
returns. The null hypothesis is normally constructed such that the desired result is false.
In this case we would make the null hypothesis µ ≤ 0, so that if it gets rejected then the
expected mean returns are greater than 0 (or positive).
 We use a t-test with n-1=9 degrees of freedom. The critical t -value or a one-tailed
rejection point at 0.05 significance level for 9 d.f is 1.833.
 The calculated t-value is:
−
= = 0.93
/√
123
 Since the calculated t-value is less than the critical t-value we do not reject the null
hypothesis that the returns are positive at the 0.05 level. That is, we can say with a high
level of confidence (95%) that the portfolio manager will not continue to produce positive
returns. Explained in another way, we can say that we can reject the null hypothesis that
the returns are not positive with only (1-p) %, in this case, 81.1% of confidence.
Three questions that apply the t-statistic from the Bionic Turtle question database
209.1. Nine (9) companies among a random sample of 60 companies defaulted. The companies
were each in the same highly speculative credit rating category: statistically, they represent a
random sample from the population of CCC-rated companies. The rating agency contends that
the historical (population) default rate for this category is 10.0%, in contrast to the 15.0% default
rate observed in the sample. Is there statistical evidence, with any high confidence, that the true
default rate is different than 10.0%; i.e., if the null hypothesis is that the true default rate is
10.0%, can we reject the null?
a) No, the t-statistic is 0.39
b) No, the t-statistic is 1.08
c) Yes, the t-statistic is 1.74
d) Yes, the t-statistic is 23.53
124
209.2. Over the last two years, a fund produced an average monthly return of +3.0% but with
monthly volatility of 10.0%. That is, assume the random sample size (n) is 24, with mean of
3.0% and sigma of 10.0%. Are the returns statistically significant; in other words, can we decide
the true mean return is great than zero with 95% confidence?
a) No, the t-statistic is 0.85
b) No, the t-statistic is 1.47
c) Yes, the t-statistic is 2.55
d) Yes, the t-statistic is 3.83
209.3. Assume the frequency of internal fraud (an operational risk event type) occurrences per
year is characterized by a Poisson distribution. Among a sample of 43 companies, the mean
frequency is 11.0 with a sample standard deviation of 4.0. What is the 90% confidence interval
of the population's mean frequency?
a) 10.0 to 12.0
b) 8.8 to 13.2
c) 7.5 to 14.5
d) Need more information (Poisson parameter)
Answers:
209.1. B. No, the t-statistic is only 1.08. For a large sample, the distribution is normally
approximated, such that at 5.0% two-tailed significance, we reject if the abs(t-statistic)
exceeds 1.96.
The standard error = SQRT(15%*85%/60) = 0.046098; please note: if you used

SQRT(10%*90%/60) for the standard error, that is not wrong, but also would not change the
conclusion as the t-statistic would be 1.29. The t statistic = (15%-10%)/0.046098 = 1.08;
The two-sided p value is 27.8%, but as the t statistic is well below 2.0, we cannot confidently
reject.
We don't really need the lookup table or a calculator: the t-statistic tells us that the observed
sample mean is only 1.08 standard deviations (standard errors) away from the hypothesized
population mean.
A two-tailed 90% confidence interval implies 1.64 standard errors, so this (72.8%) is much less
confident than even 90%.
209.2. B. No, the t-statistic is 1.47
The standard error = 10%/SQRT(24) = 0.020412

The t statistic = (3.0% - 0%)/0.020412 = 1.47.
The one-tailed critical t, at 95% with 23 df, is 1.71; two-tailed is 2.07.
(even if we assume normal one-sided, the 95% critical Z is 1.645, of course.)
125
209.3. A. 10.0 to 12.0
The central limit theorem (CLT) says, if the sample is random (i.i.d.), the sampling distribution of
the sample mean tends toward the normal REGARDLESS of the underlying distribution!
The standard error = SQRT(4^2/43) = 4/SQRT(43) = 0.609994.

The 90% confidence interval = 11.0 +/- 1.645*0.609994 = 11.0 +/- 1.0 = 10.0 to 12.0
... did you realize that a 90% two-side confidence INTERVAL implies the same deviate (1.645)
as a 95% one-sided deviate?
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-209-t-statistic-and-

confidence-interval.5318
Student’s t Lookup Table
Excel function: = TINV(two-tailed probability [larger #], d.f.)
In the below table, the green shaded area represents values less than three (< 3.0). Think of
it as the “sweet spot.” For confidences less than 99% and d.f. > 13, the critical t is always less
than 3.0. So, for example, a computed t of 7 or 13 will generally be significant. Keep this in
mind because in many cases, you do not need to refer to the lookup table if the computed t is
large; you can simply reject the null.
1-tail: 0.25 0.1 0.05 0.025 0.01 0.005 0.001

d.f. 2-tail: 0.50 0.2 0.1 0.05 0.02 0.01 0.002
1 1.000 3.078 6.314 12.706 31.821 63.657 318.309
2 0.816 1.886 2.920 4.303 6.965 9.925 22.327
3 0.765 1.638 2.353 3.182 4.541 5.841 10.215
4 0.741 1.533 2.132 2.776 3.747 4.604 7.173
5 0.727 1.476 2.015 2.571 3.365 4.032 5.893
6 0.718 1.440 1.943 2.447 3.143 3.707 5.208
7 0.711 1.415 1.895 2.365 2.998 3.499 4.785
8 0.706 1.397 1.860 2.306 2.896 3.355 4.501
9 0.703 1.383 1.833 2.262 2.821 3.250 4.297
10 0.700 1.372 1.812 2.228 2.764 3.169 4.144
11 0.697 1.363 1.796 2.201 2.718 3.106 4.025
12 0.695 1.356 1.782 2.179 2.681 3.055 3.930
13 0.694 1.350 1.771 2.160 2.650 3.012 3.852
14 0.692 1.345 1.761 2.145 2.624 2.977 3.787
15 0.691 1.341 1.753 2.131 2.602 2.947 3.733
16 0.690 1.337 1.746 2.120 2.583 2.921 3.686
17 0.689 1.333 1.740 2.110 2.567 2.898 3.646
18 0.688 1.330 1.734 2.101 2.552 2.878 3.610
19 0.688 1.328 1.729 2.093 2.539 2.861 3.579
20 0.687 1.325 1.725 2.086 2.528 2.845 3.552
21 0.686 1.323 1.721 2.080 2.518 2.831 3.527
22 0.686 1.321 1.717 2.074 2.508 2.819 3.505
23 0.685 1.319 1.714 2.069 2.500 2.807 3.485
24 0.685 1.318 1.711 2.064 2.492 2.797 3.467
25 0.684 1.316 1.708 2.060 2.485 2.787 3.450
26 0.684 1.315 1.706 2.056 2.479 2.779 3.435
27 0.684 1.314 1.703 2.052 2.473 2.771 3.421
28 0.683 1.313 1.701 2.048 2.467 2.763 3.408
29 0.683 1.311 1.699 2.045 2.462 2.756 3.396
30 0.683 1.310 1.697 2.042 2.457 2.750 3.385
126
The general framework
The subsequent AIMs breakdown the following general hypothesis testing framework:
Define & interpret the null

hypothesis and the alternative
Distinguish between one‐sided

and two‐sided hypotheses
Describe the confidence interval

approach to hypothesis testing
Describe the test of significance

approach to hypothesis testing
Define, calculate and interpret

type I and type II errors
Define and interpret the p value
Differentiate between a one-tailed and a two-tailed test and explain the

circumstances in which to use each test.
In many scientific fields where positive and negative deviations are equally important, two-tailed
confidence levels are more prevalent. Your default assumption should be a two-sided
hypothesis. If unsure, assume two-sided. A two-tailed null and alternate hypothesis takes the
form:
: = : ≠
In this case, implies that extreme positive or negative values would cause us to reject the
null hypothesis. Thus, if we are concerned with both sides of the distribution (both tails), we
should choose a two-tailed test.
127
In risk management, more often than not we are more concerned with the probability of extreme
negative outcomes, and this concern naturally leads to one-tailed tests.
: ≥ : <
In this case, we will reject only if the estimate of µ is significantly less than c. Thus, if we are
only concerned with deviations in one direction, we should use a one-tailed test.
As long as the null hypothesis is clearly stated, the choice of a one-tailed or twotailed
confidence level should be obvious.
The null hypothesis always includes the equal sign (=), regardless of a one-tailed or two-
tailed test! The null cannot include only less than (<) or greater than (>).
The 95% confidence level is a very popular choice for confidence levels.
 For a two-tailed test, a 95% confidence level is equivalent to approximately 1.96
standard deviations. That is, a normal distribution 95% of the mass is within +/–1.96
standard deviations.
 For a one-tailed test, though, 95% of the mass is within either +1.64 or –1.64 standard
deviations.
Interpret the results of hypothesis tests with a specific level of

confidence.
In the significance approach, instead of defining the confidence interval, we compute the
standardized distance in standard deviations from the observed mean to the null hypothesis.
This is the test statistic (or computed/calculated t value). We then compare it to the critical (or
lookup) value. If the test statistic is greater than the critical (lookup) value, then we reject the
null. If the test statistic is lesser than the critical t value, then we do not reject the null.
If we reject a hypothesis which is actually true, we have committed a Type I error. If, on the
other hand, we accept a hypothesis that should have been rejected, we have committed a Type
II error.
 Type I error = significance level = α = P [reject | is true]
 Type II error = β = P [“accept” | is false]
 We can reject null with (1-p)% confidence
Type I: To reject a true hypothesis

Type II: To accept a false hypothesis
Type I and Type II errors: Example

Suppose we want to hire a portfolio manager who has produced an average return of +8%
versus an index that returned +7%. We conduct a test statistical test to determine whether the
“excess +1%” is due to luck or “alpha” skill. We set a 95% confidence level for our test. In
technical parlance, our null hypothesis is that the manager adds no skill (i.e., the expected
return is 7%).
128
 Under these circumstances, a Type I error is the following: we decide that excess is
significant and the manager adds value, but actually the out-performance was random
(he did not add skill). In technical terms, we mistakenly rejected the null.
 Under these circumstances, a Type II error is the following: we decide the excess is
random and, to our thinking, the out-performance was random. But actually it was not
random and he did add value. In technical terms, we falsely accepted the null.
Example – Sample Problem: At the start of the year, you think the annualized volatility of XYZ
Corporation’s equity was 45%. At the end of the year, you have collected a year of daily returns,
256 business days’ worth. You calculate the standard deviation, annualize it, and come up with
a value of 48%. Can you reject the null hypothesis, H0: σ = 45%, at the 95% confidence level?
 In this case, the appropriate test statistic is a chi-squared distribution with 255 degrees
of freedom:
0.48
( − 1) = (256 − 1) = 290.13~
0.45
 Notice that annualizing the standard deviation has no impact on the test statistic. The
same factor would appear in the numerator and the denominator, leaving the ratio
unchanged.
 For a chi-squared distribution with 255 degrees of freedom, 290.13 corresponds to a
probability of 6.44%. It can be rejected only at 93.56% confidence level.
 Also, since the value of 48% falls within the confidence interval levels (44.2% to 52.6%)
we fail to reject the null hypothesis at the 95% confidence level.
129
Chebyshev’s inequality
So far, we were working with sample statistics where the shape of the distribution was known.
However, even if we do not know the entire distribution of a random variable, we can form a
confidence interval, as long as we know the variance of the variable. For a random variable, X,
with a standard deviation of σ, the probability that X is within n standard deviations of µ is less
than or equal to 1/n2. This is a result of what is known as Chebyshev’s inequality:
1
[ − |≥ ] ≤
For a given level of variance, Chebyshev’s inequality places an upper limit on the probability of
a variable being more than a certain distance from its mean. For a given distribution, the actual
probability may be considerably less.
Take, for example, a standard normal variable.

 Chebyshev’s inequality tells us that the probability of being greater than two standard
deviations from the mean is less than or equal to 25%. The exact probability for a
standard normal variable may be closer to 5%, which is indeed less than 25%.
Chebyshev’s inequality makes clear how assuming normality can be very
anticonservative.
 If a variable is normally distributed, the probability of a three-standard deviation event is
very small, 0.27%. If we assume normality, we will assume that three standard deviation
events are very rare. For other distributions, though, Chebyshev’s inequality tells us that
the probability could be as high as 1/9, or approximately 11%. Eleven percent is hardly a
rare occurrence. Thus, assuming normality when a random variable is in fact not normal
can lead to a severe underestimation of risk.
130
Demonstrate the process of backtesting VaR by calculating the

number of exceedances
Backtesting entails checking the predicted outcome of a model against actual data. Good
risk managers should regularly backtest their models. Any model parameter can be backtested.
If an actual loss equals or exceeds the predicted VaR threshold, that event is known as an
exceedance. In case of backtesting of one-day 95% VaR, there is a 5% chance of an
exceedance event each day, and a 95% chance of no exceedance. Because exceedance
events are independent, over the course of n days the distribution of exceedances follows a
binomial distribution as given below.
[ = ]= (1 − )
where, n is the number of periods that we are using in our backtest, k is the number of
exceedances, and (1 − p) is our confidence level.
The probability of a VaR exceedance should be conditionally independent of all available

information at the time the forecast is made. In other words, if we are calculating the 95% VaR
for a portfolio, then the probability of an exceedance should always be 5%. The probability
shouldn’t be different under any circumstances. More importantly, the probability should not vary
because there was an exceedance the previous day, or because risk levels are elevated.
Example - Sample problem: Consider a daily 95% VaR statistic for a large fixed income
portfolio. Over the past 100 days, there have been four exceedances. How many exceedances
should you have expected? What was the probability of exactly four exceedances during this
time? The probability of four or less? Four or more?
Solution on next page.
131
 For a 95% VaR measure, over 100 days we would expect to see five exceedances: (1 –
95%) ×100 = 5. The probability of exactly four exceedances is 17.81%:
100
[ = 4] = 0.05 (1 − 0.05) = 0.1781
4
 The probability of four or fewer exceedances is 43.60%. Here we simply do the same
calculation again but for zero, one, two, three, and four exceedances. It’s important not
to forget zero as the case of no exceedance is also possible:
100
[ ≤ 4] = 0.05 (1 − 0.05)
[ ≤ 4] = 0.0059 + 0.0312 + 0.0812 + 0.1396 + 0.1781 = 0.4360

 For the final result, we realize the sum of all probabilities 100 must be 100%; therefore, if
the probability of K ≤ 4 is 43.60%, then the probability of K > 4 must be 56.40%, i.e.:
[ > 4] = 100% − 43.60% = 56.40%
 Now, our aim is to compute the probability for K ≥ 4. To get this, we simply add the
probability that K = 4, from the first part of our question, to get the final answer. Hence,
probability of four or more exceedance is 74.12%.
[ ≥ 4] = [ > 4] + [ = 4] = 0.560 + 0.1781 = 0.7412
132
Chapter Summary
The sample mean is given by:
1 1
̂= =
The sample variance is given by:
( − )
=
−1
The variance of the sample mean, if σ2 is the true variance is given by:
Standard deviation of the sample mean is given by:
=
√
Confidence interval gives the probability that a population parameter is contained within a
defined range. A two-tailed null hypothesis takes the form:
: =0
: ≠0
A one-tailed test rejects the null only if the estimate is either significantly above or significantly
below, but only specifies one direction.
: ≥
: <
The general hypothesis testing framework:

(1) Define & interpret the null hypothesis and the alternative
(2) Distinguish between one-sided and two-sided hypotheses
(3) Describe the confidence interval approach to hypothesis testing
(4) Describe the test of significance approach to hypothesis testing
(5) Define, calculate and interpret type I and type II errors
(6) Define and interpret the p value
If we are only concerned with deviations in one direction, we should use a one-tailed test.
In the significance approach, we compute the standardized distance in standard deviations
from the observed mean to the null hypothesis: this is the test statistic. We compare it to the
critical value. If the test statistic is greater than the critical value, then we reject the null.
If we reject a hypothesis which is actually true, we have committed a Type I error.
133
If, on the other hand, we accept a hypothesis that should have been rejected, we have
committed a Type II error.
The standard error is equal to the sample standard deviation divided by the square root of the
sample size:
=
√
Confidence intervals:
 The lower limit of the confidence interval is given by: the sample mean minus the
critical t multiplied by the standard error
 The upper limit of the confidence interval is given by: the sample mean plus the
critical t multiplied by the standard error
: − ≤ ≤ +
√ √
Backtesting is a process is used for checking the predicted outcome of a model against
actual data and to assess the probability of the losses exceeding the VaR for a given
confidence level:
[ = ]= (1 − )
134

313.1. Defaults in a large bond portfolio follow a Poisson process where the expected number of
defaults each month is four (λ = 4 per month). The number of defaults that occur during a single
month is denoted by d(i). Therefore, over a one-year period, a sample of twelve observations is
produced: d(1), d(2), ... , d(12). The average of these twelve observations is the monthly sample
mean. This sample mean naturally has an expected value of four. Which is nearest to the
standard error of this monthly sample mean; i.e., the standard deviation of the sampling
distribution of the mean?
a) 0.11
b) 0.33
c) 0.58
d) 4.00
313.2. A random sample of 36 observations drawn from a normal population returns a sample
mean of 18.0 with sample variance of 16.0. Our hypothesis is: the population mean is 15.0 with
population variance of 10.0. Which are nearest, respectively, to the test statistics of the sample
mean and sample variance (given the hypothesized values, naturally)?
a) t-stat of 3.0 and chi-square stat of 44.3
b) t-stat of 4.5 and chi-square stat of 56.0
c) t-stat of 6.8 and chi-square stat of 57.6
d) t-stat of 9.1 and chi-square stat of 86.4
314.1. You are given the following sample of annual returns for a portfolio manager: -6.0%, -
3.0%, -2.0%, 0.0%, 1.0%, 2.0%, 4.0%, 5.0%, 7.0%, 10.0%. The sample mean of these ten (n =
10) returns is +1.80%. The sample standard deviation is 4.850%. The sample mean is positive,
but how confident are we that the population mean is positive? (note: this is a simplified version
of Miller's problem 5.2, since it provides the sample mean and standard deviation, but it
nevertheless does require calculations/lookup)
a) t-stat of 1.17 implies one-sided confidence of about 86.5%
b) t-stat of 1.29 implies two-sided confidence of about 88.3%
c) t-stat of 2.43 implies one-sided confidence of about 90.7%
d) t-stat of 3.08 implies two-sided confidence of about 97.4%
135
314.2. A sample of 25 money market funds shows an average return of 3.0% with standard
deviation also of 3.0%. Your colleague Peter conducted a significance test of the following
alternative hypothesis: the true (population) average return of such funds is GREATER THAN
the risk-free rate (Rf). He concludes that he can reject the null hypothesis with a confidence of
83.64%; i.e., there is a 16.36% chance (p value) that the true return is less than or equal to the
risk-free rate. What is the risk-free rate, Rf? (note: this requires lookup-calculation)
a) 1.00%
b) 1.90%
c) 2.00%
d) 2.40%
315.1. Roger collects a set of 61 daily returns over a calendar quarter for the stock of XYZ
corporation. He computes the sample's daily standard deviation, which is annualized in order to
generate a sample volatility of 27.0%. His null hypothesis is that the true (population) volatility is
30.0%. Can he reject the null with 95% confidence?
a) No, the test statistic is 1.59
b) No, the test statistic is 48.60
c) Yes, the test statistic is 24.03
d) Yes, the test statistic is 72.57
136
Answers:
313.1. C. 0.58
A Poisson distribution has both mean and variance equal to its only parameter, lambda. In this
case, the variance per month is therefore 4.
 The variance of the sample mean = 4/n. In this case, with 12 observations (months), the
variance of the sample mean = 4/12.
 The standard error (standard deviation) = SQRT(4/12) = 0.5774
hypothesis-testing.7108
313.2. B. t-stat of 4.5 and chi-square stat of 56.0

If we do not know the population variance, the test of the sample mean relies on the t-statistic
where the standard error (SE) = SQRT(16/36) = 4/6,
and the t-statistic = ABS(18-15)/(4/6) = 4.5;
With a t-stat of 4.5, we can reject the null hypothesis that the population mean is 15.0
(the two-sided p-value is 0.007% such that we can reject with any confidence of 99.993% or
less)
As the population is normal, the test of the sample variance relies on the chi-square value = (n-
1)*(sample variance/hypothesized variance). In this case, the chi-square statistic = (36-1)*16/10
= 56.00, which follows a chi-square distribution with 35 degrees of freedom. (We could reject
null with 95% confidence but we fail to reject null with 99% confidence).

hypothesis-testing.7108
314.1. A. t-stat of 1.17 implies one-sided confidence of about 86.5%
SE = 4.850%/SQRT(10) = 1.53370% and the t-stat = (1.80% - 0)/1.53370% = 1.1736.

 The alternative hypothesis is the one we want to prove, in this case, that the population
mean is positive.
 Therefore, the null is that the population mean is less than or equal to zero; i.e., one-
sided.
 The one-side p-value = 13.5% such that we can only reject the null hypothesis with
86.5% confidence.
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-314-millers-one-and-
two-tailed-hypotheses.7118
137
314.2. D. 2.40%
The one-tailed t-stat that is associated with 16.36% with 24 degrees of freedom is 1.00; e.g.,
 T.INV(16.36%, 24) = -1.00 and T.DIST(-1.00, 24 df, true = CDF) = 16.36%.
 Standard error (SE) of sample mean = 3.0%/SQRT(25) = 0.60%.
 Since t-stat = 1.0, (3.0% - Rf)/0.60% = 1.0, such that Rf = 3.0% - 0.60% = 2.40%.
Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-314-millers-one-and-
two-tailed-hypotheses.7118
315.1. B. No, the chi-square test statistic is 48.60

The chi-square test statistic = 60*27%^2/30%^2 = 48.60.
This is within the two-sided chi-square lookup values, at 95% confidence and with 60 degrees of
freedom, of ~40.5 (at 2.5%) and ~83.3 (at 97.5%; or right-sided at 2.5%),
such that we fail to reject; i.e., the population variance might bee 30%^2.

hypothesis-tests-continued.7128
138

Question 1:
Given the following data sample, how confident can we be that the mean is greater than 40?
Answer:
Mean =45.0; standard deviation =29.3; standard deviation of mean =9.3. For the hypothesis that
the mean is greater than 40, the appropriate t-statistic has a value of 0.54. For a one-sided t-test
with 9 degrees of freedom, the associated probability is 70%. There is a 30% chance that the
true mean is found below 40, and a 70% chance that it is greater than 40.
Question 2:
You are given the following sample of annual returns for a portfolio manager. If you believe that
the distribution of returns has been stable over time and will continue to be stable over time,
how confident should you be that the portfolio manager will continue to produce positive
returns?
Answer:
The mean is 6.9% and the standard deviation of the returns is 23.5%, giving a standard
deviation of the mean of 7.4%. The t-statistic is 0.93. With 9 degrees of freedom, a one-sided t-
test produces a probability of 81%. In other words, even though the sample mean is positive,
there is a 19% chance that the true mean is negative.
Question 3:
You are presented with an investment strategy with a mean return of 20% and a standard
deviation of 10%. What is the probability of a negative return if the returns are normally
distributed? What if the distribution is symmetrical, but otherwise unknown?
Answer:
A negative return would be greater than two standard deviations below the mean. For a normal
distribution, the probability (one-tailed) is approximately 2.28%. If we do not know the
distribution, then, by Chebyshev’s inequality, the probability of a negative return could be as
high as 12.5% =1/2 ×1/(22). There could be a 25% probability of a +/–2 standard deviation
event, but we’re interested only in the negative tail, so we multiply by ½. We can perform this
last step only because we were told the distribution is symmetrical.
139
Question 4:
Suppose you invest in a product whose returns follow a uniform distribution between −40% and
+60%. What is the expected return? What is the 95% VaR? The expected shortfall?
Answer:
The expected return is +10%. The 95% VaR is 35% (i.e., 5% of the returns are expected to be
worse than –35%). The expected shortfall is 37.5% (again the negative is implied).
Question 5:
You are the risk manager for a portfolio with a mean daily return of 0.40% and a daily standard
deviation of 2.3%. Assume the returns are normally distributed (not a good assumption to make,
in general). What is the 95% VaR?
Answer:
For a normal distribution, 5% of the weight is less than –1.64 standard deviations from the
mean. The 95% VaR can be found as: 0.40% – 1.64 ∙2.30% =–3.38%. Because of our quoting
convention for VaR, the final answer is VaR =3.38%.
Question 6:
You are told that the log annual returns of a commodities index are normally distributed with a
standard deviation of 40%. You have 33 years of data, from which you calculate the sample
variance. What is the standard deviation of this estimate of the sample variance?
Answer:
We can use Equation 7.4 to calculate the expected variance of the sample variances. Because
we are told the underlying distribution is normal, the excess kurtosis can be assumed to equal
zero and n =33; therefore:
2 2 0.16
[( − ) ]= + = 0.40 = = 0.04
−1 32 4
The standard deviation of the sample variances is then 4.0%.
140
Question 7:
In the previous question, you were told that the actual standard deviation was 40%. If, instead of
40%, the measured standard deviation turns out to be 50%, how confident can you be in the
initial assumption? State a null hypothesis and calculate the corresponding probability.
Answer:
An appropriate null hypothesis would be: H0: σ =40%. The appropriate test statistic is:
0.50
(33 − 1) = = 50
0.40
Using a spreadsheet or other program, we calculate the corresponding probability for a chi-
squared distribution with 32 degrees of freedom. Only 2.23% of the distribution is greater than
50. At a 95% confidence level, we would reject the null hypothesis.
Question 8:
A hedge fund targets a mean annual return of 15% with a 10% standard deviation. Last year,
the fund returned –5%. What is the probability of a result this bad or worse happening, given the
target mean and standard deviation? Assume the distribution is symmetrical.
Answer:
The answer is 12.5%. This is a –2 standard deviation event. According to Chebyshev’s

inequality, the probability of being more than two standard deviations from the mean is less than
or equal to 25%.
1
[| − | ≥ ]≤
1
[| − 15%| ≥ 2 ∙ 10%] ≤ = 25%
2
Because the distribution of returns is symmetrical, half of these extreme returns are greater than
+2 standard deviations, and half are less than –2 standard deviations. This leads to the final
result, 12.5%.
141
Question 9:
A fund of funds has investments in 36 hedge funds. At the end of the year, the mean return of
the constituent hedge funds was 18%. The standard deviation of the funds’ returns was 12%.
The benchmark return for the fund of funds was 14%. Is the difference between the average
return and the benchmark return statistically significant at the 95% confidence level?
Answer:
The standard deviation of the mean is 2%:
12%
= = 2%
√36
This makes the difference between the average fund return and the benchmark, 18% – 14%
=4%, a +2 standard deviation event. For a t distribution with 35 degrees of freedom, the
probability of being more than +2 standard deviations is just 2.67%. We can reject the null
hypothesis, H0: µ =14%, at the 95% confidence level. The difference between the average
return and the benchmark return is statistically significant.
Question 10:
The probability density function for daily profits at Box Asset Management can be described by
the following function (see Exhibit 7.5):
1
= − 100 ≤ ≤ 100
200
= 0 − 100 > > 100
What is the one-day 95% VaR of Box Asset Management?
142
Answer:
To find the 95% VaR, we need to find v, such that:
= 0.05
Solving, we have:
1 + 100
= = = 0.05
200 200 200
The VaR is a loss of 90. Alternatively, we could have used geometric arguments to arrive at the
same conclusion. In this problem, the PDF describes a rectangle whose base is 200 units and
whose height is 1/200. As required, the total area under the PDF, base multiplied by height, is
equal to one. The leftmost fraction of the rectangle, from –100 to –90, is also a rectangle, with a
base of 10 units and the same height, giving an area of 1/20, or 5% of the total area. The edge
of this area is our VaR, as previously found by integration.
Question 11:
Continuing with our example of Box Asset Management, find the expected shortfall, using the
same PDF and the calculated VaR from the previous question.
Answer:
In the previous question we found that the VaR, v, was equal to –90. To find the expected
shortfall, we need to solve the following equation:
1
=
0.05
Solving, we find:
1 1 1 1 1
= = 2 = [ ] = ((−90) − (−100) ) = −95
0.05 200 20 20 20
The final answer, a loss of 95 for the expected shortfall, makes sense. The PDF in this problem
is a uniform distribution, with a minimum at –100. Because it is a uniform distribution, all losses
between the (negative) VaR, –90, and the minimum, –100, are equally likely; therefore, the
average loss, given a VaR exceedance, is halfway between –90 and –100.
143
Question 12:
The probability density function for daily profits at Pyramid Asset Management can be described
by the following functions (see Exhibit 7.6):
3 1
= + − 15 ≤ ≤5
80 400
5 1
= − 5< ≤ 25
80 400
The density function is zero for all other values of π. What is the one-day 95% VaR for Pyramid
Asset Management?
Answer:
To find the 95% VaR, we need to find v, such that:
= 0.05
By inspection, half the distribution is below 5, so we need only bother with the first half of the
function:
3 1 3 1 3 1
+ = + = ( + 15) + ( − 225) = 0.05
80 400 80 80 80 80
+ 30 + 185 = 0
144
We can use the solution to the quadratic equation:
−30 ± √900 − 4 ∙ 185

= = −15 ± 2√10
2
Because the distribution is not defined for π< –15, we can ignore the negative, giving us the
final answer:
= −15 + 2√10 = −8.68
The one-day 95% VaR for Pyramid Asset Management is approximately 8.68.
145

QA Chapter 1 2 3 4 5

Uploaded by

Copyright:

Available Formats

QA Chapter 1 2 3 4 5

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

QA Chapter 1 2 3 4 5

Uploaded by

Copyright:

Available Formats

Licensed to Rajvi Sampat at rajvi.sampat@gmail.com. Downloaded August 4, 2019.

P1.T2. Quantitative Analysis

Miller, Mathematics & Statistics for Financial Risk

Bionic Turtle FRM Study Notes

Miller, Chapter 2: Probabilities

Miller, Chapter 2: Probabilities

Calculate the probability of an event given a discrete probability function.

Distinguish between independent and mutually exclusive events.

Selected key terms:

 Statistical or random experiment: An observation or measurement process with

Describe and distinguish between continuous and discrete random

A random variable’s value is determined by the outcome of an experiment (aka,

Continuous random variable

Discrete random variable

Continuous versus discrete random variables

 Discrete random variables can be counted. Continuous random variables must be

Miller on discrete versus continuous:

Define and distinguish between the probability density function, the

[ < < ]=∫ ( ) = ( )= ( = )

Cumulative distribution functions (CDF): bottom row in the above graph

Inverse cumulative distribution function

Here are common inverse CDFs:

Univariate versus multivariate probability density functions

Calculate the probability of an event given a discrete probability

The probability of outcome (A) is given by:

Probability: Relative frequency or empirical definition

Note the difference between an a priori probability and an empirical probability:

Distinguish between independent and mutually exclusive events.

where A  B is the union of A and B; i.e., the probability of either A or B occurring.

Independent events (More than one random variable)

Thus, the most useful test of statistical independence is given by:

P(X = x, Y = y) = P(X = x) × P(Y = y)

Define joint probability, describe a probability matrix and calculate

Joint probability = Conditional probability × Marginal Probability

Define and calculate a conditional probability, and distinguish

The marginal (unconditional) probability is: P(Y = y) = ∑ (X = , = )

Conditional probability = Joint Probability ÷ Marginal Probability

“The distribution of a random variable Y conditional on another random variable X taking on a

What is the probability of B occurring, given that A has already occurred?

Conditional and unconditional expectation: The conditional concept extends to expectations

P(a < X < b) = f(x)dx

Classical or “a priori” probability: The probability of outcome (A) is given by:

Number of outcomes favorable to A

Essential probability terms:

 Conditional probability = Joint Probability ÷ Marginal Probability

Questions & Answers:

f(x) = − 0.75 s. t. 6 ≤ x ≤ 10 where x = bond price

The anti-derivative is F(X) = x^2/16 - 0.75*x + c.

We can confirm it is a probability by evaluating it on the domain [x = 6, x = 10]

Probability [8 <= x <= 9] = [9^2/16 - 0.75*9] - [8^2/16 - 0.75*8]

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-300-probability-

As f(x) = 3/125*x^2, F(x) = 3/125*(1/3)*x^3 = p, such that:

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-300-probability-

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-301-millers-

For the E[LGD], we integrate the

The mean of f(x) integrates xf(x)

Therefore, the expected loss =

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-301-millers-

Discuss here in forum: https://www.bionicturtle.com/forum/threads/p1-t2-302-bayes-theorem-

Probability [8 <= x <= 9] = [9^2/16 - 0.759] - [8^2/16 - 0.758]

As f(x) = 3/125x^2, F(x) = 3/125(1/3)*x^3 = p, such that: