Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables
Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables
Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables
1/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Bivariate random vector
Definition
A random variable is a function from a sample space C to R.
Definition
An n-dim random vector is a function from C to Rn .
I A 2-dim random vector is also called a bivariate random
variable.
Remark: X = (X1 , X2 )0 assigns to each element c of the
sample space C exactly one ordered pair of numbers X1 (c) = x1
and X2 (c) = x2 .
Example
1 Height and weight of respondent.
2 Fuel consumption and hours on an engine.
2/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Discrete Random Variables
3/115
Joint probability mass function
Definition
A joint probability mass function
pX1 ,X2 (x1 , x2 ) = p(X1 = x1 , X2 = x2 ) (or p(x1 , x2 ))
with space (x1 , x2 ) ∈ S has the properties that
(a) 0 ≤ p(x1 , x2 ) ≤ 1,
P
(b) p(x1 , x2 ) = 1,
(x1 ,x2 )∈S
P
(c) P [(X1 , X2 ) ∈ A] = (x1 ,x2 )∈A p(x1 , x2 ).
3/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
A restaurant serves three fixed-price dinners costing $7, $9, and
$10. For a randomly selected couple dinning at this restaurant, let
X1 = the cost of the man’s dinner and
X2 = the cost of the woman’s dinner.
The joint pmf of X1 and X2 is given in the following table:
x1
7 9 10
7 0.05 0.05 0.10
x2 9 0.05 0.10 0.35
10 0.00 0.20 0.10
4/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Marginal probability mass function
Definition
Suppose that X1 and X2 have the joint pmf p(x1 , x2 ). Then the
pmf for Xi , denoted by pi (·), i = 1, 2 is the marginal pmf.
P P
Note p1 (x1 ) = x2 p(x1 , x2 ) and p2 (x2 ) = x1 p(x1 , x2 ).
5/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
Let X1 =Smaller die face, X2 =Larger die face, when rolling a pair
of two dice. The following table shows a partition of the sample
space into 21 events.
x1
1 2 3 4 5 6
1 1/36 0 0 0 0 0
2 2/36 1/36 0 0 0 0
x2 3 2/36 2/36 1/36 0 0 0
4 2/36 2/36 2/36 1/36 0 0
5 2/36 2/36 2/36 2/36 1/36 0
6 2/36 2/36 2/36 2/36 2/36 1/36
6/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Expectation – discrete random variables
Definition
Let Y = u(X1 , X2 ). Then, Y is a random variable and
XX
E[u(X1 , X2 )] = u(x1 , x2 )p(x1 , x2 )
x1 x2
Example
Find E(max{X1 , X2 }) for the restaurant problem. 9.65.
7/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Continuous Random Variables
8/115
Joint density function
A joint density function fX1 ,X2 (x1 , x2 ) (or f (x1 , x2 )) with space
(x1 , x2 ) ∈ S has the properties that
(a) f (x1 , x2 ) > 0,
R
(b) f (x1 , x2 )dx1 dx2 = 1,
(x1 ,x2 )∈S
R
(c) P [(X1 , X2 ) ∈ A] = (x1 ,x2 )∈A f (x1 , x2 )dx1 dx2 .
8/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
Let X1 and X2 be continuous random variables with joint density
function
4x1 x2 for 0 < x1 , x2 < 1
f (x1 , x2 ) =
0 otherwise.
Suppose that X1 and X2 have the joint pdf f (x1 , x2 ). Then the pdf
for Xi , denoted by fi (·), i = 1, 2 is the marginal pdf.
R R
Note: f1 (x1 ) = x2 f (x1 , x2 )dx2 and f2 (x2 ) = x1 f (x1 , x2 )dx1 .
Example
Find the marginal pdf from the previous problem.
Solution:
f1 (x) = f2 (x) = 2x.
10/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
1 Find c.
2 Find P (X1 + X2 < 1).
3 Find marginal probability density function of X1 and X2 .
11/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Solution:
We have c = 8 because
Z 1Z 1
x1 x2 dx1 dx2 = 1/8 = 0.125.
0 x1
Z 1/2 Z 1−x1
8x1 x2 dx1 dx2 = 1/6 = 0.167.
0 x1
12/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Let X1 and X2 be continuous random variables with joint pdf
cx1 x2 for 0 < x1 < x2 < 1
f (x1 , x2 ) =
0 otherwise.
What is P {[X1 < X2 ] ∩ [X2 > 4(X1 − 1/2)2 ]}?
Solution:
We see 1/4 is the solution of x = 4(x − 21 )2 on 0 < x < 1. The
range of X2 is (1/4, 1). When X2 = x2 is given, we next get the
range of X1 . By X2 = 4(X1 − 1/2)2 , we have
r
1 X2
X1 = ± .
2 4
q
We determine the lower bound of X1 is ± X42 because the
1
2
intersection of X1 = X2 and X2 = 4(X1 − 1/2)2 is less than 1/2
when X1 ∈ (0, 1). We also have X1 < 1, so the probability is
Z 1 Z x1
1
√ 8x1 x2 dx1 dx2 = 0.974.
1 x2
4 2
− 4
13/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Expectation – continuous random variables
14/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
Find E(X1 X2 ).
Solution:
Z 1Z 1
36 2 2
(x x (1 − x1 x2 ))dx1 dx2 = 0.35.
0 0 5 1 2
15/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Theorem
Let (X1 , X2 ) be a random vector. Let Y1 = g1 (X1 , X2 ) and
Y2 = g2 (X1 , X2 ) be random variables whose expectations exist.
Then for all real numbers k1 and k2 ,
16/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.1.5 & 2.1.6
17/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Discrete & Continuous R.V.
18/115
Joint cumulative distribution function
Definition
The joint cumulative distribution function of (X1 , X2 ) is
18/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Joint cumulative distribution function (cont’d)
Definition
The joint cumulative distribution function of (X1 , X2 ) is
Properties:
1 F (x1 , x2 ) is nondecreasing in x1 and x2 .
2 F (−∞, x2 ) = F (x1 , −∞) = 0.
3 F (∞, ∞) = 1.
4 For a rectangle (a1 , b1 ] × (a2 , b2 ], we have
19/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.1.1
X1 \X2 0 1 2 3
0 1/8 1/8 0 0
1 0 2/8 2/8 0
2 0 0 1/8 1/8
20/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
1. Find the joint cdf of
2e−x1 −x2
0 < x1 , x2 < ∞
fX1 ,X2 (x1 , x2 ) =
0 otherwise.
Solution:
Z x1 Z x2
FX1 ,X2 (x1 , x2 ) = 2e−t1 −t2 dt1 dt2 = 2(1−e−x1 )(1−e−x2 ).
0 0
2. Find the joint cdf of
2e−x1 −x2
0 < x1 < x2 < ∞
fX1 ,X2 (x1 , x2 ) =
0 otherwise.
Solution:
Z min(x1 ,x2 ) Z x2
FX1 ,X2 (x1 , x2 ) = 2e−t1 −t2 dt2 dt1 .
0 t1
21/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Moment generating function (mgf)
Definition
Let X = (X1 , X2 )> be a random vector. If
exists for |t1 | < h1 and |t2 | < h2 , where h1 and h2 are positive,
then we call M (t1 , t2 ) the moment generating function (mgf) of
X = (X1 , X2 )> .
We may write
>
M (t1 , t2 ) = E et1 X1 +t2 X2 = E et X
Z ∞Z ∞
1
MX,Y (t1 , t2 ) = exp(t1 x+t2 y−y)dydx = ,
0 x (1 − t1 − t2 )(1 − t2 )
23/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Marginal mgf
Recall that
MX1 ,X2 (t1 , t2 ) = E et1 X1 +t2 X2 .
24/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.1.7 (cont’d)
Let the continuous-type random variables X and Y have the joint
pdf
e−y
0<x<y<∞
f (x, y) =
0 elsewhere.
Determine the marginal mgf.
Solution:
Z ∞Z ∞
1
MX,Y (t1 , t2 ) = exp(t1 x+t2 y−y)dydx = ,
0 x (1 − t1 − t2 )(1 − t2 )
provided that t1 + t2 < 1 and t2 < 1.
1
MX (t1 ) = MX,Y (t1 , 0) = , t1 < 1,
1 − t1
1
MY (t2 ) = MX,Y (0, t2 ) = , t2 < 1.
(1 − t2 )2
25/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.1.7 (cont’d)
Let the continuous-type random variables X and Y have the joint
pdf
e−y
0<x<y<∞
f (x, y) =
0 elsewhere.
Determine the marginal mgf.
Solution:
1
MX (t1 ) = MX,Y (t1 , 0) = , t1 < 1,
1 − t1
1
MY (t2 ) = MX,Y (0, t2 ) = , t2 < 1.
(1 − t2 )2
Note that
Z ∞
f1 (x) = e−y dy = e−x , 0 < x < ∞,
x
Z y
f2 (x) = e−y dx = ye−y , 0 < y < ∞.
0 26/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Fact: It can be shown that
dMX,Y (t1 , t2 )
E(XY ) = .
dt1 dt2
t1 =0,t2 =0
Method 2:
1
MX,Y (t1 , t2 ) = ,
(1 − t1 − t2 )(1 − t2 )
dMX,Y (t1 , t2 ) t1 + 3t2 − 3
=− ,
dt1 dt2 (t2 − 1)2 (−t1 − t2 + 1)3
dMX,Y (t1 , t2 )
where we see = 3 as well.
dt1 dt2
t1 =0,t2 =0
27/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Chapter 2 Multivariate Distributions
2.2 Transformation: Bivariate Random Variables
28/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Transformation of discrete random vectors
Y1 = u1 (X1 , X2 ), X1 = w1 (Y1 , Y2 ),
Y2 = u2 (X1 , X2 ), X2 = w2 (Y1 , Y2 ).
29/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.2.1
µx1 −µ1
pX (x) = e , x = 0, 1, 2, . . . .
x!
and
µy2 −µ2
pY (y) = e , y = 0, 1, 2, . . . .
y!
30/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Transformation of continuous random variables
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 ) (w1 (y1 , y2 ), w2 (y1 , y2 )) · |J(y1 , y2 )|.
31/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
A device containing two key components fails when, and only
when, both components fail. The lifetimes, X1 and X2 , of these
components have a joint pdf f (x1 , x2 ) = e−x1 −x2 , where
x1 , x2 > 0 and zero otherwise. The cost Y1 , of operating the
device until failure is Y1 = 2X1 + X2 .
1 Find the joint pdf of Y1 , Y2 where Y2 = X2 .
2 Find the marginal pdf for Y1 (Ans: e−y1 /2 − e−y1 , for y1 > 0)
32/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.2.5
10x1 x22
0<x<y<1
fX1 ,X2 (x1 , x2 ) =
0 elsewhere.
33/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Solution sketch
fY1 ,Y2 (y1 , y2 ) = 10y1 y2 y22 |y2 |, where y is defined above or 0 elsewhere.
34/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.2.4
35/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Solution sketch
fY1 ,Y2 (y1 , y2 ) = e−y1 −y2 /4×|2|, where y is defined above or 0 elsewhere.
37/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Solution sketch
Z ∞Z ∞
1
E(etY ) = et(x1 −x2 )/2 e−(x1 +x2 )/2 dx1 dx2
0 0 4
Z ∞ Z ∞
1 −x1 (1−t)/2 1 −x2 (1+t)/2
= e dx1 e dx2
0 2 0 2
1 1
=
1−t 1+t
1
= ,
1 − t2
provided that 1 − t > 0 and 1 + t > 0. This is equivalent to
∞
e−|x|
Z
1
etx = , −1 < t < 1,
−∞ 2 1 − t2
39/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Conditional probability for discrete r.v.
40/115
Motivating example
Let X1 =Smaller die face, X2 =Larger die face, when rolling a pair
of two dice. The following table shows a partition of the sample
space into 21 events.
x1
1 2 3 4 5 6
1 1/36 0 0 0 0 0
2 2/36 1/36 0 0 0 0
x2 3 2/36 2/36 1/36 0 0 0
4 2/36 2/36 2/36 1/36 0 0
5 2/36 2/36 2/36 2/36 1/36 0
6 2/36 2/36 2/36 2/36 2/36 1/36
P (A1 ∩ A2 )
P (A2 |A1 ) = .
P (A1 )
• We call pX2 |X1 (x2 |x1 ) the conditional pmf of X2 , given that
X1 = x1 .
41/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Verify pX2 |X1 (x2 |x1 ) satisfies the condition of being a pmf.
1 pX2 |X1 (x2 |x1 ) ≥ 0.
2
X pX (x1 , x2 )
1 ,X2
X
pX2 |X1 (x2 |x1 ) =
x2 x2
p X1 (x1 )
1 X
= pX1 ,X2 (x1 , x2 )
pX1 (x1 ) x
2
pX1 (x1 )
= = 1.
pX1 (x1 )
42/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Conditional expectation of discrete random variables:
X
E(X1 |X2 = x2 ) = x1 pX1 |X2 (x1 |x2 ).
x1
Example
Returning to the previous example, it is straightforward to work out
the conditional pmf as well as associated functions like
expectations. For instance,
2/5 if x1 = 1, 2
pX1 |X2 (x1 |X2 = 3) = 1/5 if x1 = 3
0 if x1 = 4, 5, 6.
43/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Conditional probability for continuous r.v.
44/115
I Let X1 and X2 denote continuous random variables with
joint pdf fX1 ,X2 (x1 , x2 ) and marginal pmfs fX1 (x1 ) and
fX2 (x2 ). Then for every x1 such that fX1 (x1 ) > 0, we define
45/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
Find the conditionals fX2 |X1 and fX1 |X2 for (X1 , X2 ) with joint cdf
2e−x1 −x2
0 < x1 < x2 < ∞
fX1 ,X2 (x1 , x2 ) =
0 otherwise.
46/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example (2.3.1)
Let X1 and X2 have the joint pdf
2 0 < x1 < x2 < 1
f (x1 , x2 ) =
0 elsewhere.
1 3
Find P 0 < X1 < 2 | X2 = 4 and Var (X1 |x2 ).
47/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example (2.3.2)
Let X1 and X2 have the joint pdf
6x2 0 < x2 < x1 < 1
f (x1 , x2 ) =
0 elsewhere.
1 Compute E(X2 ).
2 Compute the function h(x1 ) = E (X2 |x1 ). Then compute
E [h(X1 )] and Var [h(X1 )].
48/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Theorem 2.3.1
Interpretation:
I Both X2 and E(X2 |X1 ) are unbiased estimator of
E(X2 ) = µ2 .
I The part (b) shows that E(X2 |X1 ) is more reliable.
I We will talk more about this when studying sufficient statistics
in Chapter 7, Rao and Blackwell Theorem.
49/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
E [E (X2 |X1 )] = E (X2 ) .
Proof.
The proof is for the continuous case. The discrete case is proved
by using summations instead of integrals. We see
Z ∞ Z ∞
E(X2 ) = x2 f (x1 , x2 )dx2 dx1
−∞ −∞
Z ∞ Z ∞
f (x1 , x2 )
= x2 dx2 f1 (x1 )dx1
f1 (x1 )
Z−∞
∞
−∞
50/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Var(X2 ) = Var [ E(X2 |X1 ) ] + E [Var(X2 |X1 )] .
Proof.
The proof is for both the discrete and continuous cases:
52/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
Assume that the joint pdf for X2 |X1 = x1 on the support
S = {0 < x1 < 1, 0 < x2 < 2, x1 + x2 < 2} is
2x1
in S,
fX1 ,X2 (x1 , x2 ) = 2 − x1
0 otherwise.
53/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Solution:
The conditional pdf for X2 |X1 = x1 , 0 < x1 < 1 is
1/(2 − x1 ) if 0 < x2 < 2 − x1
fX2 |X1 (x2 |x1 ) =
0 otherwise.
and the marginal pdf for X1 is fX1 (x1 ) = 2x1 for 0 < x1 < 1 and
zero otherwise.
2−x1
2 − x1
Z
1
E(X2 |X1 ) = x2 dx2 = ,
0 2 − x1 2
1
2 − x1
Z
E(E(X2 |X1 )) = 2x1 dx1 = 2/3.
0 2
We can verify this by
Z 1 Z 2−x1
2x1
E(X2 ) = x2 dx2 dx1 = 2/3.
0 0 2 − x1
54/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Chapter 2 Multivariate Distributions
2.4 The Correlation Coefficient
55/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Recall the definition of the variance of X :
Definition
Let X and Y be two random variables with expectations µ1 = EX
and µ2 = EY , respectively. The covariance of X and Y , if it
exists, is defined to be
Computation shortcut:
56/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.4.1
57/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Definition
The correlation coefficient of X and Y is defined to be
Cov(X, Y )
ρ= p
Var(X)Var(Y )
Example
What is the correlation coefficient of the previous example?
The plot is from Wikipedia https://en.wikipedia.org/wiki/Correlation_and_dependence 58/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Linear conditional mean
59/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Theorem 2.4.1
60/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.4.2
E(Y |x) = 4x + 3
and
1
y − 3.
E(X|y) =
16
What are the values of µ1 , µ2 , ρ, and σ2 /σ1 ?
61/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Recall that the mgf of the random vector (X, Y ) is defined to be
M (t1 , t2 ) = E et1 X+t2 Y . It can be shown that
∂ k+m h
k m t1 X+t2 Y
i
M (t ,
1 2t ) = E X Y e .
∂tk1 ∂tm
2
∂ k+m
h i
k m
M (t 1 , t2 ) = E X Y .
∂tk1 ∂tm
2
t1 =t2 =0
∂M (0,0)
I µ1 = E(X) = ∂t1
∂M (0,0)
I µ2 = E(Y ) = ∂t2
∂ 2 M (0,0)
I Var(X) = E(X 2 ) − µ21 = ∂t21
− µ21
∂ 2 M (0,0)
I Var(Y ) = E(Y 2 ) − µ22 = ∂t22
− µ22
∂ 2 M (0,0)
I Cov(X, Y ) = E(XY ) − E(X)E(Y ) = ∂t1 ∂t2 − µ1 µ2
Cov(X,Y )
I ρ= √ √
Var(X) Var(Y )
62/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.4.4
e−y
0<x<y<∞
f (x, y) =
0 elsewhere.
1
M (t1 , t2 ) = , t1 + t2 < 1, t2 < 1.
(1 − t1 − t2 )(1 − t2 )
63/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Chapter 2 Multivariate Distributions
2.5 Independent Random Variables
64/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Motivation
it follows that
fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ) for all x1 ∈ SX1 , x2 ∈ SX2 . (2)
Clearly (1) and (2) are equivalent. Exactly the same logic applies
for a discrete random variable.
65/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Definition of independence
66/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Immediate indicators of dependency
67/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.5.1
68/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Theorem 2.5.1
69/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Sketch of proof
Thus, f (x1 , x2 ) = g(x1 )h(x2 ) = c1 g(x1 )c2 h(x2 ) = f1 (x1 )f2 (x2 ).
70/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Independence in terms of CDF
Theorem 2.5.2 Let (X1 , X2 ) have the joint cdf F (x1 , x2 ) and let
X1 and X2 have the marginal cdf F1 (x1 ) and F2 (x2 ), respectively.
Then X1 and X2 are independent if and only if
71/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.5.3
Let the joint pdf of X1 and X2 be
x1 + x2 0 < x1 < 1, 0 < x2 < 1
f (x1 , x2 ) =
0 elsewhere.
Are they independent?
Solution:
No, because
1 1 1 1
P (0 < X1 < , 0 < X2 < ) 6= P (0 < X1 < )P (0 < X2 < ) :
2 2 2 2
Z 1Z 1
1 1 2 2
P (0 < X1 < , 0 < X2 < ) = (x1 + x2 )dx1 dx2 = 1/8,
2 2 0 0
Z 1
1 2 1
P (0 < X1 < ) = (x1 + )dx1 = 3/8,
2 0 2
Z 1
1 2 1
P (0 < X2 < ) = (x2 + )dx2 = 3/8.
2 0 2 72/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Theorem 2.5.4
If X1 and X2 are independent and that E [u(X1 )] and E [v(X2 )]
exist. Then
E [u(X1 )v(X2 )] = E [u(X1 )] E [v(X2 )] .
Proof.
Z ∞ Z ∞
E[u(X1 )v(X2 )] = u(x1 )v(x2 )f (x1 , x2 )dx1 dx2
Z−∞ −∞
∞ Z ∞
= u(x1 )v(x2 )f1 (x1 )f2 (x2 )dx1 dx2
−∞ −∞
Z ∞ Z ∞
= u(x1 )f1 (x1 )dx1 f2 (x2 )v(x2 )dx2
−∞ −∞
= E [u(X1 )] E [v(X2 )] .
73/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Two special cases
74/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Independence always implies zero covariance (correlation).
Zero covariance (correlation) does NOT always imply
independence:
Example
Assume that
75/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Theorem 2.5.5
Suppose that (X1 , X2 ) have the joint mgf M (t1 , t2 ) and marginal
mgf’s M1 (t1 ) and M2 (t2 ), respectively. Then, X1 and X2 are
independent if and only if
76/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.5.5
e−y
0<x<y<∞
f (x, y) =
0 elsewhere.
1
M (t1 , t2 ) = , t1 + t2 < 1, t2 < 1.
(1 − t1 − t2 )(1 − t2 )
Because
78/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Examples
79/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Pmf and cdf for the discrete case
pX (x) = P [X1 = x1 , . . . , Xn = xn ].
FX (x) = P [X1 ≤ x1 , . . . , Xn ≤ xn ].
80/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Pdf and cdf for the continuous case
FX (x) = P [X1 ≤ x1 , . . . , Xn ≤ xn ].
∂n
FX (x) = fX (x).
∂x1 · · · ∂xn
81/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example
Let
8x1 x2 x3 for 0 < x1 , x2 , x3 < 1
f (x1 , x2 , x3 ) =
0 otherwise.
82/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Expectation
83/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
As before, E is a linear operator. That is,
Xm m
X
E kj Yj = kj E [Yj ] .
j=1 j=1
Example
Find E(5X1 X22 + 3X2 X34 ).
Solution:
Z 1Z 1Z 1
1
E(X1 X22 ) = (x1 x22 )8x1 x2 x3 dx3 dx2 dx1 = ,
0 0 0 3
Z 1Z 1Z 1
2
E(X2 X34 ) = (x2 x43 )8x1 x2 x3 dx3 dx2 dx1 = ,
0 0 0 9
4 2 4 2
E(5X1 X22 + 3X2 X34 ) = 5 · +3· = + =2
15 9 3 3
84/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
In an obvious way, we may extend the concepts of marginal pmf
and marginal pdf for the multidimensional case. For the discrete
case, the marginal pmf of (X1 , X2 ) is defined to be
X X
p12 (x1 , x2 ) = ··· pX (x1 , x2 , . . . , xn ).
x3 xn
85/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
We then extend the concept of conditional pmf and conditional pdf.
For the discrete case, suppose p1 (x1 ) > 0. We define the the
conditional pmf of (X2 , . . . , Xn ) given X1 = x1 to be
p(x1 , x2 , . . . , xn )
p2,...,n|1 (x2 , . . . , xn |x1 ) = .
p1 (x1 )
f (x1 , x2 , . . . , xn )
f2,...,n|1 (x2 , . . . , xn |x1 ) = .
f1 (x1 )
86/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
For the discrete case, suppose p1 (x1 ) > 0. Then we define the
conditional expectation of u(X2 , . . . , Xn ) given X1 = x1 to be
X X
E [u(X2 , . . . , Xn )|x1 ] = ··· u(x2 , . . . , xn )p2,...,n|1 (x2 , . . . , xn |x1 ).
x2 xn
For the continuous case, suppose f1 (x1 ) > 0. Then we define the
conditional expectation of u(X2 , . . . , Xn ) given X1 = x1 to be
E [u(X2 , . . . , Xn )|x1 ]
Z ∞ Z ∞
= ··· u(x2 , . . . , xn )f2,...,n|1 (x2 , . . . , xn |x1 )dx2 · · · dxn .
−∞ −∞
87/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Mutual Independence
88/115
We say that the n random variables X1 , . . . , Xn are mutually
independent if, for the discrete case,
88/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
If the n random variables X1 , . . . , Xn are mutually independent,
then
P (a1 < X1 < b1 , . . . , an < Xn < bn )
=P (a1 < X1 < b1 ) · · · P (an < Xn < bn ) .
We may rewrite the above equation as
n
\ n
Y
P (aj < Xj < bj ) = P (aj < Xj < bj ) .
j=1 j=1
89/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
If the n random variables X1 , X2 , . . . , Xn are mutually
independent, then
E [u1 (X1 )u2 (X2 ) · · · un (Xn )] = E [u1 (X1 )] E [u2 (X2 )] · · · E [un (Xn )] ,
n
Y n
Y
E uj (Xj ) = E [uj (Xj )] .
j=1 j=1
As a special case of the above, if the n random variables X1 , X2 ,
. . . , Xn are mutually independent, then for mgf,
n
Y
M (t1 , t2 , · · · , tn ) = Mj (tj ),
which can be seen from j=1
M (t1 , t2 , · · · , tn ) = E[exp(t1 X1 + t2 X2 + . . . + tn Xn )]
n
Y
= E exp(tj Xj )
j=1
n
Y n
Y
= E [exp(tj Xj )] = Mj (tj ).
j=1 j=1 90/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Mutual independence v.s. pairwise independence
91/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Compare “mutual independence” and “pairwise independence”.
Example (from S. Bernstein)
Consider a random vector (X1 , X2 , X3 ) that has joint pmf
p(x1 , x2 , x3 )
1
4 for (x1 , x2 , x3 ) ∈ {(1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 1)} .
=
0 otherwise.
Solution:
1
4 for (xi , xj ) ∈ {(0, 0), (1, 0), (0, 1), (1, 1)} .
pij (xi , xj ) =
0 otherwise.
1
2 for (xi ) ∈ {0, 1} .
pi (xi ) =
0 otherwise.
pairwise independence : pij (xi , xj ) = pi (xi )pj (xj ).
not mutual independence : p(x1 , x2 , x3 ) 6= p1 (x1 )p2 (x2 )p3 (x3 ).
92/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Multivariate Variance-Covariance Matrix
93/115
1 Let X = (X1 , · · · , Xn )> be a random vector.
2 We define the expectation of X as EX = (EX1 , · · · , EXn )> .
3 Let W = [Wij ] be a m × n matrix, where Wij are random
variables. That is,
W11 W12 · · · W1n
W21 W22 W2n
W=
···
= [Wij ]
m×n .
··· ··· ···
Wm1 Wm2 · · · Wmn
93/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Theorem 2.6.2
94/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Let X = (X1 , . . . Xn )> be an n-dimensional random vector with mean
vector µ. Then the variance-covariance matrix of X is defined to be
Cov(X)
h i
>
= E (X − µ) (X − µ)
(X1 − µ1 ) (X1 − µ1 ) (X1 − µ1 ) (X2 − µ2 ) ··· (X1 − µ1 ) (Xn − µn )
(X2 − µ2 ) (X1 − µ1 ) (X2 − µ2 ) (X2 − µ2 ) (X2 − µ2 ) (Xn − µn )
= E
··· ··· ··· ···
(Xn − µn ) (X1 − µ1 ) (Xn − µn ) (X2 − µ2 ) ··· (Xn − µn ) (Xn − µn )
σ11 σ12 · · · σ1n
σ21 σ22 σ2n
=
··· ··· ··· ···
95/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example of a covariance matrix
e−y
0<x<y<∞
f (x, y) =
0 elsewhere.
1 1 1
E(Z) = and Cov(Z) =
2 1 2
96/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Theorem 2.6.3 – Two properties of covariance matrix
Let X = (X1 , . . . Xn )> be an n-dimensional random vector with
mean vector µ. Then,
h i
Cov (X) = E XX > − µµ> . (3)
If further let A be an m × n constant matrix, then we have
Cov (AX) = ACov (X) A> .
Cov(X)
h i
>
= E (X − µ) (X − µ)
(X1 − µ1 ) (X1 − µ1 ) (X1 − µ1 ) (X2 − µ2 ) · · · (X1 − µ1 ) (Xn − µn )
(X2 − µ2 ) (X1 − µ1 ) (X2 − µ2 ) (X2 − µ2 ) (X2 − µ2 ) (Xn − µn )
= E ···
··· ··· ···
(Xn − µn ) (X1 − µ1 ) (Xn − µn ) (X2 − µ2 ) · · · (Xn − µn ) (Xn − µn )
E (X1 X1 ) − µ1 µ1 E (X1 X2 ) − µ1 µ2 · · · E (X1 Xn ) − µ1 µn
E (X2 X1 ) − µ2 µ1 E (X2 X2 ) − µ2 µ2 E (X2 Xn ) − µ2 µn
=
···
··· ··· ···
E (Xn X1 ) − µn µ1 E (Xn X2 ) − µn µ2 · · · E (Xn Xn ) − µn µn
(X1 X1 ) (X1 X2 ) · · · (X1 Xn ) µ1 µ1 µ1 µ2 · · · µ1 µn
(X2 X1 ) (X2 X2 ) (X2 Xn )
− µ2 µ1 µ2 µ2 µ2 µn
= E ···
··· ··· ··· ··· ··· ··· ···
(Xn X1 ) (Xn X2 ) · · · Xn Xn µn µ1 µn µ2 · · · µn µn
h i
> >
= E XX − µµ .
98/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
I All variance-covariance matrices are positive semi-definite,
that is a> Cov(X)a ≥ 0 for any a ∈ Rn .
I This is because
99/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Chapter 2 Multivariate Distributions
2.7 Transformation for Several Random Variables
100/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
One to one transformation
101/115
I Let X = (X1 , X2 , . . . , Xn ) be a random vector with pdf
fX (x1 , x2 , . . . , xn ) with support S . Let
y1 = g1 (x1 , x2 , . . . , xn )
y2 = g2 (x1 , x2 , . . . , xn )
..
.
yn = gn (x1 , x2 , . . . , xn )
101/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
I Let the Jacobian be
∂x1 ∂x1
··· ∂x1
∂y ∂y2 ∂yn
12
∂ (x1 , x2 , . . . , xn ) ∂x ∂x2
··· ∂x2
= .∂y1 ∂y2 ∂yn
J = .. ..
.
∂ (y1 , y2 , . . . , yn ) .. . .
∂xn ∂xn ∂xn
···
∂y ∂y2 ∂yn
1
fY (y1 , y2 , . . . , yn )
= |J| fX [h1 (y1 , y2 , . . . , yn ), h2 (y1 , y2 , . . . , yn ), . . . , hn (y1 , y2 , . . . , yn )] ,
for (y1 , y2 , . . . , yn ) ∈ T .
102/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.7.1
and let
Y1 = X1 /X2
Y2 = X2 /X3
Y3 = X3 .
103/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
104/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Multiple to one transformation
105/115
I Let X = (X1 , X2 , . . . , Xn ) be a random vector with pdf
fX (x1 , x2 , . . . , xn ) with support S . Let
y1 = g1 (x1 , x2 , . . . , xn )
y2 = g2 (x1 , x2 , . . . , xn )
..
.
yn = gn (x1 , x2 , . . . , xn )
fY (y1 , y2 , . . . , yn )
k
X
= |Ji | fX [h1i (y1 , y2 , . . . , yn ), h2i (y1 , y2 , . . . , yn ), . . . , hni (y1 , y2 , . . . , yn )] ,
i=1
for (y1 , y2 , . . . , yn ) ∈ T .
106/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.7.3
Let X1 and X2 have the joint pdf defined over the unit circle given
by
1
0 < x21 + x22 < 1
f (x1 , x2 ) = π
0 elsewhere.
Let
Y1 = X12 + X22
107/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
108/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Chapter 2 Multivariate Distributions
2.8 Linear Combinations of Random Variables
109/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Motivation
110/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Expectation of linear combinations
Pn
Theorem 2.8.1. Let T = i=1 ai Xi . Provided that E[|Xi |] < ∞,
for all i = 1, . . . , n, then
n
X
E(T ) = ai E(Xi ).
i=1
111/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Variance and covariance of linear combinations
Proof:
X m
n X
Cov(T, W ) = E (ai Xi − ai E(Xi ))(bj Yj − bj E(Yj ))
i=1 j=1
n X
X m
= E[(ai Xi − ai E(Xi ))(bj Yj − bj E(Yj ))].
i=1 j=1
112/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Pn
Corollary 2.8.1. Let T = i=1 ai Xi . Provided E[Xi2 ] < ∞, for
i = 1, . . . , n, then
n
X m
X
Var(T ) = Cov(T, T ) = a2i Var(Xi ) +2 ai aj Cov(Xi , Yj ).
i=1 i<j
114/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Example 2.8.2 – Sample variance
= σ2.
115/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018