Lecture 09
Lecture 09
Lecture 09
Contents
1 Midterm Review
1.1 Distribution of a sum of independent random variables . .
1.2 The gamma function . . . . . . . . . . . . . . . . . . . . .
1.3 Graphical Explanation of Chebyshevs Inequality . . . . .
1.4 Constructing a negative binomial variable using indicators
.
.
.
.
2
2
2
4
5
9
10
13
.
.
.
.
1
1.1
Midterm Review
Distribution of a sum of independent random variables
It is a general question when this works out nicely. You have seen the most
important examples for applications in statistics (binomial, Poisson, normal,
geometric, negative binomial, exponential, gamma ) in Worksheets 1 and 2.
Another example is compound Poisson distributions in Worksheet 4. This
leads to the general theory of infinitely divisible distributions https://en.
wikipedia.org/wiki/Infinite_divisibility_(probability). The distribution of X is called infinitely divisible if for every n it is possible to
construct a sequence Xn,1 , . . . , Xn,n of n IID random variables such that
d P
X = ni=1 Xn,i . All of the above examples except binomial are infinitely divisible, as you can easily check. Another very interesting infinitely divisible
law is the Cauchy distribution of Y with density
dy
P (Y dy) =
, yR
(1)
(1 + y 2 )
This distribution of Y has the amazing property that if Y1 , Y2 ... Yn are IID
copies of Y , then
Y1 + Y2 + . . . + Yn d
=Y
(2)
n
In other words, averaging IID copies of Y does not reduce at all the spread
in the distribution of Y . Compare with the more familiar
Z1 + Y2 + . . . + Zn d Z
(3)
= 1/2
n
n
for IID normal(0, 2 ) Zi , any > 0. It seems at first that (1) goes against
the law of large numbers, according to which averaging independent random
variables should reduce their variability. But the law of large numbers applies only to IID random variables Yi with E|Y | < , and for the Cauchy
distribution we have E|Y | = . The exact convolution rule (??) can be
derived with pain from the convolution formula for densities, and without
pain using the characteristic function of Y , which is E exp(itY ) = e|t| for
real t, and the uniqueness theorem for characteristic functions.
1.2
(4)
As such, it is just the necessary normalization constant in the probability density of a random variable X on (0, ) with density fX (x)
xr1 ex 1(x > 0). So the distribution of X is called gamma(r, 1) iff
fX (x) = (r)1 xr1 ex 1(x > 0) and gamma(r, ) for > 0 iff X
gamma(r, 1).
Recursion property: The gamma function satisfies the generalized factorial
recursion:
(r + 1) = r(r)
(5)
Particular Values:
Z
(1) =
ex dx = 1
(6)
(2) = 1(1) = 1
(3) = 2(1) = 2 1 = 2
(7)
(8)
(n) = (n 1)!
The Gamma function for half-integer values is given by,
1
( ) =
2
3
1 1
1
( ) = ( ) =
2
2 2
2
5 3 1
7
( ) =
2
222
(9)
(10)
(11)
r (
y r 1)e( y) for y > 0
(r)
Y (r, 1) fY y =
y ( r 1) (
e y) for y > 0
(r)
(12)
2n
n n
e
3
as n
(13)
(r + 1) 2n
as r
(14)
e
To get asymptotics for (r) instead of (r + 1), simply use Gamma(r + 1) =
r(r) to see
2 r r
(r)
as r .
(15)
r e
1.3
(X a)2
b2
(16)
E(X a)2
.
b2
(17)
P (|X E(X)| kX )
1
, for any real k > 0.
k2
(18)
Only the case where k > 1 is of any interest, because for k 1 the right
hand side exceeds the trivial bound of 1.
1.4
(19)
(20)
The equation above is referred called the tail-sum formula for E(X). It
applies to any X 0, 1, 2, 3, . . ..
Example: Assume a random variable X = Gp Geom(p) on 0, 1, 2, . . .. Get
E(Gp ) =
P (Gp n)
n=1
inf
X
(1 p)n
n=1
(1 p)
(1 p)
=
1 (1 p)
p
(21)
Z
(U )) =
F 1 (u)du
The lengths F 1 (u) for selected u [0, 1] are represented by horizontal lines
in Figure 4. The shaded area can be computed in two different ways, either
7
as illustracted in Figure 6. In the example of Figure 6, assuming the distribution is all concentrated on the interval of X values shown in the diagram
(so there must be atoms at the endpoints) it appears that E(X ) is slightly
greater than E(X+ ). So E(X) must be slightly negative.
X = Y = (X) = (Y )
(23)
2.1
(25)
where goes through the remaining 4 possible order relations for 3 variables. All of these probabilities are equal. If we assume that there is a
joint density, or that the Yi are IID with a continuous CDF, then e.g.
P (Y1 = Y2 ) = 0 and the same for any other event requiring ties between
2 or more values of the Yi . It follows in that case that the 3! probabilities in
(25) are not only all equal, they add up to 1. So each of these probabilities
is 1/3!. This argument is very general. It shows that for any sequence of
exchangeable random variables (Y1 , . . . , Yn ) such that P (Y1 = Y2 ) = 0, in
particular for any such sequence with a joint density, or if the Yi are IID
with a continous CDF, then each of the n! events requiring the Yi to be in a
particular order has probability 1/n!. This fact is the key to all the problems
involving records of such Yi in Worksheet 3.
Next, define order statistics for Y1 , Y2 , . . . , Yn (assuming no ties) by
Yn,1 < Yn,2 < . . . < Yn,n with Yn,i = Y (i) for some random permutation of of 1, 2, . . . , n. Fig-8 below shows one such scenario. Note that
10
Figure 7: Exchangeable variables X and Y . For any two points (x, y) and
(y, x) which are reflections of each other across the diagonal, the joint density
(or joint probability function) at the two points must be the same.
11
Figure 9:
1
(3!fY (y1 dy1 , . . . , f yY (y3 )dy3 ))
3!
(28)
which is the required factorization: the 1/3! is the probability of the particular permutation, and the rest is the joint density of the order statistics. The
interpretation of the two factors is clear by either summing or integrating
over cases, as needed. (Usual story: factorization of a probability function or
density implies independence: the constants can be found by summation or
integration of the factored expression). The generalization of this argument
to n! permutations of n variables is rather obvious.
12
Application:
P (Yn,1 dy1 )
=
dy1
(n1)
previous expression
other variables
(29)
Alternate Method: It is not really necessary to deal with the joint density
of all n order statistics to find the distribution of just one or two of them.
Look for instance at
Yn,1 := min Yi
(30)
1in
= (1 FY (y))n
( by independence)
This implies,
P (Yn,1 dy)
d
d
=
P (Yn,1 y) = P (Yn,1 > y)
(31)
dy
dy
dy
d
= (1 FY (y))n = nfY (y1 )(1 F (y1 ))n1
dy
(32)
d
as before, but now using the chain rule of calculus and fY (y) = dy
FY (y).
Similarly, the desnity of the maximum can be written down just as easily:
P (Yn,1 dy)
= nfY (y)FY (y)n1
dy
(33)
2.2
(34)
Figure 10: The counts in three intervals to make the kth order statistic fall
in dy = dy1
y, one to fall in dy, and n k to fall after y. See Figure 10
Important Case: Distribution of Y 0 s is U [0, 1]. So we need only consider
0 < u < 1.
fU (u) = 1(0 < u < 1)
FU (u) = u for 0 < u < 1
1 FU (u) = 1 u for 0 < u < 1
(35)
(36)
(37)
15