Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables

Chapter 2 Multivariate Distributions
2.1 Distributions of Two Random Variables
1/115
Boxiang Wang, The University of Iowa Chapter 2 STAT 4100 Fall 2018
Bivariate random vector
Definition
A random variable is a function from a sample space C to R.
Definition
An n-dim random vector is a function from C to Rn .
I A 2-dim random vector is also called a bivariate random
variable.
Remark: X = (X1 , X2 )0 assigns to each element c of the
sample space C exactly one ordered pair of numbers X1 (c) = x1
and X2 (c) = x2 .
Example
1 Height and weight of respondent.
2 Fuel consumption and hours on an engine.
2/115
Discrete Random Variables
3/115
Joint probability mass function
Definition
A joint probability mass function
pX1 ,X2 (x1 , x2 ) = p(X1 = x1 , X2 = x2 ) (or p(x1 , x2 ))
with space (x1 , x2 ) ∈ S has the properties that
(a) 0 ≤ p(x1 , x2 ) ≤ 1,
P
(b) p(x1 , x2 ) = 1,
(x1 ,x2 )∈S
P
(c) P [(X1 , X2 ) ∈ A] = (x1 ,x2 )∈A p(x1 , x2 ).
3/115
Example
A restaurant serves three fixed-price dinners costing $7, $9, and
$10. For a randomly selected couple dinning at this restaurant, let
X1 = the cost of the man’s dinner and
X2 = the cost of the woman’s dinner.
The joint pmf of X1 and X2 is given in the following table:
x1
7 9 10
7 0.05 0.05 0.10
x2 9 0.05 0.10 0.35
10 0.00 0.20 0.10
I What is the probability of P (X1 ≥ 9, X2 ≤ 9)? 0.60.

I Does man’s dinner cost more?
4/115
Marginal probability mass function
Definition
Suppose that X1 and X2 have the joint pmf p(x1 , x2 ). Then the
pmf for Xi , denoted by pi (·), i = 1, 2 is the marginal pmf.
P P
Note p1 (x1 ) = x2 p(x1 , x2 ) and p2 (x2 ) = x1 p(x1 , x2 ).
Example Find the marginal pmf of the previous example.

x1 x2
7 9 10 7 9 10
0.10 0.35 0.55 0.20 0.50 0.30
5/115
Example
Let X1 =Smaller die face, X2 =Larger die face, when rolling a pair
of two dice. The following table shows a partition of the sample
space into 21 events.
x1
1 2 3 4 5 6
1 1/36 0 0 0 0 0
2 2/36 1/36 0 0 0 0
x2 3 2/36 2/36 1/36 0 0 0
4 2/36 2/36 2/36 1/36 0 0
5 2/36 2/36 2/36 2/36 1/36 0
6 2/36 2/36 2/36 2/36 2/36 1/36
Find the marginal pmf’s.
6/115
Expectation – discrete random variables
Definition
Let Y = u(X1 , X2 ). Then, Y is a random variable and
XX
E[u(X1 , X2 )] = u(x1 , x2 )p(x1 , x2 )
x1 x2
under the condition that

XX
|u(x1 , x2 )|p(x1 , x2 )| < ∞
x1 x2
Example
Find E(max{X1 , X2 }) for the restaurant problem. 9.65.
7/115
Continuous Random Variables
8/115
Joint density function
A joint density function fX1 ,X2 (x1 , x2 ) (or f (x1 , x2 )) with space
(x1 , x2 ) ∈ S has the properties that
(a) f (x1 , x2 ) > 0,
R
(b) f (x1 , x2 )dx1 dx2 = 1,
(x1 ,x2 )∈S
R
(c) P [(X1 , X2 ) ∈ A] = (x1 ,x2 )∈A f (x1 , x2 )dx1 dx2 .
8/115
Example
Let X1 and X2 be continuous random variables with joint density
function

4x1 x2 for 0 < x1 , x2 < 1
f (x1 , x2 ) =
0 otherwise.
1 Find P (1/4 < X1 < 3/4; 1/2 < X2 < 1).

2 Find P (X1 < X2 ).
3 Find P (X1 + X2 < 1).
Solution:
Z 1 Z 3/4
4x1 x2 dx1 dx2 = 3/8 = 0.375.
1/2 1/4
Z 1 Z x2
4x1 x2 dx1 dx2 = 1/2 = 0.5.
0 0
Z 1Z 1−x2
4x1 x2 dx1 dx2 = 1/6 = 0.167.
0 0 9/115
Marginal probability density function
Suppose that X1 and X2 have the joint pdf f (x1 , x2 ). Then the pdf
for Xi , denoted by fi (·), i = 1, 2 is the marginal pdf.
R R
Note: f1 (x1 ) = x2 f (x1 , x2 )dx2 and f2 (x2 ) = x1 f (x1 , x2 )dx1 .
Example
Find the marginal pdf from the previous problem.
Solution:
f1 (x) = f2 (x) = 2x.
10/115
Example

function

cx1 x2 for 0 < x1 < x2 < 1
f (x1 , x2 ) =
0 otherwise.
1 Find c.
2 Find P (X1 + X2 < 1).
3 Find marginal probability density function of X1 and X2 .
11/115
Solution:
We have c = 8 because
Z 1Z 1
x1 x2 dx1 dx2 = 1/8 = 0.125.
0 x1
Z 1/2 Z 1−x1
8x1 x2 dx1 dx2 = 1/6 = 0.167.
0 x1
For the marginal pdf, we have

Z 1
fX1 (x1 ) = 8x1 x2 dx2 = 4x1 − 4x31 ,
Zx1x2
fX2 (x2 ) = 8x1 x2 dx1 = 4x32 .
0
12/115
Let X1 and X2 be continuous random variables with joint pdf

cx1 x2 for 0 < x1 < x2 < 1
f (x1 , x2 ) =
0 otherwise.
What is P {[X1 < X2 ] ∩ [X2 > 4(X1 − 1/2)2 ]}?
Solution:
We see 1/4 is the solution of x = 4(x − 21 )2 on 0 < x < 1. The
range of X2 is (1/4, 1). When X2 = x2 is given, we next get the
range of X1 . By X2 = 4(X1 − 1/2)2 , we have
r
1 X2
X1 = ± .
2 4
q
We determine the lower bound of X1 is ± X42 because the
1
2
intersection of X1 = X2 and X2 = 4(X1 − 1/2)2 is less than 1/2
when X1 ∈ (0, 1). We also have X1 < 1, so the probability is
Z 1 Z x1
1
√ 8x1 x2 dx1 dx2 = 0.974.
1 x2
4 2
− 4
13/115
Expectation – continuous random variables
Let Y = u(X1 , X2 ). Then, Y is a random variable and

Z Z
E[u(X1 , X2 )] = u(x1 , x2 )f (x1 , x2 )dx2 dx1
x1 x2
under the condition that

Z Z
|u(x1 , x2 )|f (x1 , x2 )dx2 dx1 < ∞
x1 x2
14/115
Example

function

 (36/5)x1 x2 (1 − x1 x2 ) for 0 < x1 , x2 < 1
f (x1 , x2 ) =
0 otherwise.

Find E(X1 X2 ).
Solution:
Z 1Z 1
36 2 2
(x x (1 − x1 x2 ))dx1 dx2 = 0.35.
0 0 5 1 2
15/115
Theorem
Let (X1 , X2 ) be a random vector. Let Y1 = g1 (X1 , X2 ) and
Y2 = g2 (X1 , X2 ) be random variables whose expectations exist.
Then for all real numbers k1 and k2 ,
E(k1 Y1 + k2 Y2 ) = k1 E(Y1 ) + k2 E(Y2 ).
We also note that

Z ∞ Z ∞
Eg(X2 ) = g(x2 )f (x1 , x2 )dx1 dx2 = g(x2 )fX2 (x2 )dx2 .
−∞ −∞
16/115
Example 2.1.5 & 2.1.6
Let (X1 , X2 ) be a random vector with pdf

8x1 x2 0 < x1 < x2 < 1
f (x1 , x2 ) =
0 elsewhere.
Let Y1 = 7X1 X22 + 5X2 and Y2 = X1 /X2 . Determine E(Y1 ) and

E(Y2 ).
17/115
Discrete & Continuous R.V.
18/115
Joint cumulative distribution function
Definition
The joint cumulative distribution function of (X1 , X2 ) is
FX1 ,X2 (x1 , x2 ) = P [{X1 ≤ x1 } ∩ {X2 ≤ x2 }] for all (x1 , x2 ) ∈ R2 .
Relationship with pmf and pdf:

1 Discrete random variables:
X X
FX1 ,X2 (x1 , x2 ) = p(x1 , x2 ).
X1 ≤x1 X2 ≤x2
2 Continuous random variables:

Z x1 Z x2
FX1 ,X2 (x1 , x2 ) = fX1 ,X2 (x1 , x2 )dx1 dx2 .
0 0
18/115
Joint cumulative distribution function (cont’d)
Definition
The joint cumulative distribution function of (X1 , X2 ) is
FX1 ,X2 (x1 , x2 ) = P [{X1 ≤ x1 } ∩ {X2 ≤ x2 }] for all (x1 , x2 ) ∈ R2 .
Properties:
1 F (x1 , x2 ) is nondecreasing in x1 and x2 .
2 F (−∞, x2 ) = F (x1 , −∞) = 0.
3 F (∞, ∞) = 1.
4 For a rectangle (a1 , b1 ] × (a2 , b2 ], we have
P { (X1 , X2 ) ∈ (a1 , b1 ] × (a2 , b2 ] }

=F (b1 , b2 ) − F (a1 , b2 ) − F (b1 , a2 ) + F (a1 , a2 ).
19/115
Example 2.1.1
Consider the discrete random vector (X1 , X2 ). Its pmf is given in

the following table:
X1 \X2 0 1 2 3
0 1/8 1/8 0 0
1 0 2/8 2/8 0
2 0 0 1/8 1/8
Find the value of the joint cdf F (x1 , x2 ) at (1, 2).

Solution: 3/4.
20/115
Example
1. Find the joint cdf of
2e−x1 −x2

0 < x1 , x2 < ∞
fX1 ,X2 (x1 , x2 ) =
0 otherwise.
Solution:
Z x1 Z x2
FX1 ,X2 (x1 , x2 ) = 2e−t1 −t2 dt1 dt2 = 2(1−e−x1 )(1−e−x2 ).
0 0
2. Find the joint cdf of
2e−x1 −x2

0 < x1 < x2 < ∞
fX1 ,X2 (x1 , x2 ) =
0 otherwise.
Solution:
Z min(x1 ,x2 ) Z x2
FX1 ,X2 (x1 , x2 ) = 2e−t1 −t2 dt2 dt1 .
0 t1
21/115
Moment generating function (mgf)
Definition
Let X = (X1 , X2 )> be a random vector. If
M (t1 , t2 ) = E et1 X1 +t2 X2

exists for |t1 | < h1 and |t2 | < h2 , where h1 and h2 are positive,
then we call M (t1 , t2 ) the moment generating function (mgf) of
X = (X1 , X2 )> .
We may write
>
M (t1 , t2 ) = E et1 X1 +t2 X2 = E et X

where t> is a row vector (t1 , t2 ) and X is a column vector

(X1 , X2 )> .
22/115
Example 2.1.7
Let the continuous-type random variables X and Y have the joint

pdf
e−y

0<x<y<∞
f (x, y) =
0 elsewhere.
Determine the joint mgf.
Solution:
Z ∞Z ∞
1
MX,Y (t1 , t2 ) = exp(t1 x+t2 y−y)dydx = ,
0 x (1 − t1 − t2 )(1 − t2 )
provided that t1 + t2 < 1 and t2 < 1.
23/115
Marginal mgf
Recall that
MX1 ,X2 (t1 , t2 ) = E et1 X1 +t2 X2 .

The marginal mgf is given by
MX1 (t1 ) = E et1 X1 = MX1 ,X2 (t1 , 0),

MX2 (t2 ) = E et2 X2 = MX1 ,X2 (0, t2 ).

24/115
Example 2.1.7 (cont’d)
pdf
e−y

0<x<y<∞
f (x, y) =
0 elsewhere.
Determine the marginal mgf.
Solution:
Z ∞Z ∞
1
MX,Y (t1 , t2 ) = exp(t1 x+t2 y−y)dydx = ,
0 x (1 − t1 − t2 )(1 − t2 )
provided that t1 + t2 < 1 and t2 < 1.
1
MX (t1 ) = MX,Y (t1 , 0) = , t1 < 1,
1 − t1
1
MY (t2 ) = MX,Y (0, t2 ) = , t2 < 1.
(1 − t2 )2
25/115
Example 2.1.7 (cont’d)
pdf
e−y

0<x<y<∞
f (x, y) =
0 elsewhere.
Determine the marginal mgf.
Solution:
1
MX (t1 ) = MX,Y (t1 , 0) = , t1 < 1,
1 − t1
1
MY (t2 ) = MX,Y (0, t2 ) = , t2 < 1.
(1 − t2 )2
Note that
Z ∞
f1 (x) = e−y dy = e−x , 0 < x < ∞,
x
Z y
f2 (x) = e−y dx = ye−y , 0 < y < ∞.
0 26/115
Fact: It can be shown that
dMX,Y (t1 , t2 )
E(XY ) = .
dt1 dt2

t1 =0,t2 =0
Example: Method 1: In the previous example,

Z ∞Z y
E(XY ) = xye−y dxdy = 3.
0 0
Method 2:
1
MX,Y (t1 , t2 ) = ,
(1 − t1 − t2 )(1 − t2 )
dMX,Y (t1 , t2 ) t1 + 3t2 − 3
=− ,
dt1 dt2 (t2 − 1)2 (−t1 − t2 + 1)3
dMX,Y (t1 , t2 )
where we see = 3 as well.
dt1 dt2

t1 =0,t2 =0
27/115
2.2 Transformation: Bivariate Random Variables
28/115
Transformation of discrete random vectors
I Assume there is a one to one mapping between

X = (X1 , X2 )> and Y = (Y1 , Y2 )> :
Y1 = u1 (X1 , X2 ), X1 = w1 (Y1 , Y2 ),
Y2 = u2 (X1 , X2 ), X2 = w2 (Y1 , Y2 ).
I Transformation of discrete random variable:
pY1 ,Y2 (y1 , y2 ) = pX1 ,X2 (w1 (y1 , y2 ), w2 (y1 , y2 )).
29/115
Example 2.2.1
Let X and Y be independent random variables such that
µx1 −µ1
pX (x) = e , x = 0, 1, 2, . . . .
x!
and
µy2 −µ2
pY (y) = e , y = 0, 1, 2, . . . .
y!
I Find the pmf of U = X + Y .

I Determine the mgf of U .
30/115
Transformation of continuous random variables
Let J denote the Jacobian of the transformation. This is the

determinant of the 2 × 2 matrix
!
∂x1 ∂x1
∂y1 ∂y2
∂x2 ∂x2
∂y1 ∂y2
∂x1 ∂x2 ∂x1 ∂x2

The determinant is J(y1 , y2 ) = ∂y1 · ∂y2 − ∂y2 · ∂y1 .
Transformation formula: The joint pdf of the continuous random

vector Y = (Y1 , Y2 )> is
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 ) (w1 (y1 , y2 ), w2 (y1 , y2 )) · |J(y1 , y2 )|.
Notice the bars around the function J , denoting absolute value.
31/115
Example
A device containing two key components fails when, and only
when, both components fail. The lifetimes, X1 and X2 , of these
components have a joint pdf f (x1 , x2 ) = e−x1 −x2 , where
x1 , x2 > 0 and zero otherwise. The cost Y1 , of operating the
device until failure is Y1 = 2X1 + X2 .
1 Find the joint pdf of Y1 , Y2 where Y2 = X2 .
2 Find the marginal pdf for Y1 (Ans: e−y1 /2 − e−y1 , for y1 > 0)
32/115
Example 2.2.5
Suppose (X1 , X2 ) has joint pdf
10x1 x22

0<x<y<1
fX1 ,X2 (x1 , x2 ) =
0 elsewhere.
Let Y1 = X1 /X2 and Y2 = X2 . Find the joint and marginal pdf’s of

Y1 and Y2 .
33/115
Solution sketch
1. One to one transformation:
y1 = x1 /x2 , y2 = x2 , 0 < x1 < x2 < 1

x1 = y1 y2 , x2 = y2 , 0 < y1 < 1, 0 < y2 < 1.
2. Give the joint pdf:
fY1 ,Y2 (y1 , y2 ) = 10y1 y2 y22 |y2 |, where y is defined above or 0 elsewhere.
3. Give the marginal pdf of Y1 :

Z 1
fY1 (y1 ) = fY1 ,Y2 (y1 , y2 )dy2 = 2y1 , 0 < y1 < 0.
0
34/115
Example 2.2.4

( 1 x1 + x2
exp(− ) 0 < x1 < ∞, 0 < x2 < ∞
fX1 ,X2 (x1 , x2 ) = 4 2
0 elsewhere.
Let Y1 = 1/2(X1 − X2 ) and Y2 = X2 . Find the joint and marginal

pdf’s of Y1 and Y2 .
35/115
Solution sketch
1. One to one transformation:

1
y1 = (x1 − x2 ), y2 = x2 , 0 < x1 < ∞, 0 < x2 < ∞.
2
x1 = 2y1 + y2 , x2 = y2 , −2y1 < y2 , 0 < y2 < ∞.
2. Give the joint pdf:
fY1 ,Y2 (y1 , y2 ) = e−y1 −y2 /4×|2|, where y is defined above or 0 elsewhere.
3. Give the marginal pdf of Y1 :

(R ∞
fY1 ,Y2 (y1 , y2 )dy2 = ey1 /2, −∞ < y1 < 0,
fY1 (y1 ) = R−2y
∞
1
−y1 /2, 0 ≤ y < ∞,
0 fY1 ,Y2 (y1 , y2 )dy2 = e 1
which gives fY1 (y1 ) = e−|y1 | , −∞ < y < ∞.

36/115
Example 2.2.7

( 1 x1 + x2
exp(− ) 0 < x1 < ∞, 0 < x2 < ∞
fX1 ,X2 (x1 , x2 ) = 4 2
0 elsewhere.
Let Y1 = 1/2(X1 − X2 ). What is the mgf of Y1 ?
37/115
Solution sketch
Z ∞Z ∞
1
E(etY ) = et(x1 −x2 )/2 e−(x1 +x2 )/2 dx1 dx2
0 0 4
Z ∞ Z ∞
1 −x1 (1−t)/2 1 −x2 (1+t)/2
= e dx1 e dx2
0 2 0 2

1 1
=
1−t 1+t
1
= ,
1 − t2
provided that 1 − t > 0 and 1 + t > 0. This is equivalent to
∞
e−|x|
Z
1
etx = , −1 < t < 1,
−∞ 2 1 − t2
which is the mgf of double exponential distribution.

38/115
2.3 Conditional Distributions and Expectations
39/115
Conditional probability for discrete r.v.
40/115
Motivating example
Let X1 =Smaller die face, X2 =Larger die face, when rolling a pair
of two dice. The following table shows a partition of the sample
space into 21 events.
x1
1 2 3 4 5 6
1 1/36 0 0 0 0 0
2 2/36 1/36 0 0 0 0
x2 3 2/36 2/36 1/36 0 0 0
4 2/36 2/36 2/36 1/36 0 0
5 2/36 2/36 2/36 2/36 1/36 0
6 2/36 2/36 2/36 2/36 2/36 1/36
Recalling our definition of conditional probability for events, we

have (for example)
P [{X1 = 2} ∩ {X2 = 4}] 2/36 2
P (X2 = 4|X1 = 2) = = = .
P (X1 = 2) 9/36 9 40/115
• Recall that for two events A1 and A2 with P (A1 ) > 0, the
conditional probability of A2 given A1 is
P (A1 ∩ A2 )
P (A2 |A1 ) = .
P (A1 )
• Let X1 and X2 denote discrete random variables with joint pmf

pX1 ,X2 (x1 , x2 ) and marginal pmfs pX1 (x1 ) and pX2 (x2 ). Then for
every x1 such that pX1 (x1 ) > 0, we have
P (X1 = x1 , X2 = x2 ) pX1 ,X2 (x1 , x2 )

P (X2 = x2 |X1 = x1 ) = = .
P (X1 = x1 ) pX1 (x1 )
We use a simple notation:
pX1 ,X2 (x1 , x2 )

pX2 |X1 (x2 |x1 ) = p2|1 (x2 |x1 ) = .
pX1 (x1 )
• We call pX2 |X1 (x2 |x1 ) the conditional pmf of X2 , given that
X1 = x1 .
41/115
Verify pX2 |X1 (x2 |x1 ) satisfies the condition of being a pmf.
1 pX2 |X1 (x2 |x1 ) ≥ 0.
2
X pX (x1 , x2 )
1 ,X2
X
pX2 |X1 (x2 |x1 ) =
x2 x2
p X1 (x1 )
1 X
= pX1 ,X2 (x1 , x2 )
pX1 (x1 ) x
2
pX1 (x1 )
= = 1.
pX1 (x1 )
42/115
Conditional expectation of discrete random variables:
X
E(X1 |X2 = x2 ) = x1 pX1 |X2 (x1 |x2 ).
x1
Example
Returning to the previous example, it is straightforward to work out
the conditional pmf as well as associated functions like
expectations. For instance,

 2/5 if x1 = 1, 2
pX1 |X2 (x1 |X2 = 3) = 1/5 if x1 = 3
0 if x1 = 4, 5, 6.

and E(X1 |X2 = 3) = 9/5.
43/115
Conditional probability for continuous r.v.
44/115
I Let X1 and X2 denote continuous random variables with
joint pdf fX1 ,X2 (x1 , x2 ) and marginal pmfs fX1 (x1 ) and
fX2 (x2 ). Then for every x1 such that fX1 (x1 ) > 0, we define
fX1 ,X2 (x1 , x2 )

fX2 |X1 (x2 |x1 ) = f2|1 (x2 |x1 ) = .
fX1 (x1 )
I Verify that fX2 |X1 satisfies the conditions of being a pdf.
(1) fX2 |X1 (x2 |x1 ) ≥ 0.

Z ∞ Z ∞
fX1 ,X2 (x1 , x2 )
(2) fX2 |X1 (x2 |x1 )dx2 = dx2
−∞ −∞ fX1 (x1 )
Z ∞
1
= fX ,X (x1 , x2 )dx2
fX1 (x1 ) −∞ 1 2
fX1 (x1 )
= = 1.
fX1 (x1 )
44/115
Conditional expectation of continuous random variables
I If u(X2 ) is a function of X2 , the conditional expectation of

u(X2 ), given that X1 = x1 , if it exists, is given by
Z ∞
E[u(X2 )|x1 ] = u(x2 )f2|1 (x2 |x1 ) dx2 .
−∞
I If they do exist, then E(X2 |x1 ) is the conditional mean and
Var(X2 |x1 ) = E{[X2 − E(X2 |x1 )]2 |x1 }
is the conditional variance of X2 , given X1 = x1 .
45/115
Example
Find the conditionals fX2 |X1 and fX1 |X2 for (X1 , X2 ) with joint cdf
2e−x1 −x2

0 < x1 < x2 < ∞
fX1 ,X2 (x1 , x2 ) =
0 otherwise.
I Calculate P (a < X2 ≤ b|X1 = x1 ).

I Calculate the expectation E [u(X2 )|X1 = x1 ].
I Calculate the variance Var (X2 |X1 = x1 ).
46/115
Example (2.3.1)
Let X1 and X2 have the joint pdf

2 0 < x1 < x2 < 1
f (x1 , x2 ) =
0 elsewhere.
1 3

Find P 0 < X1 < 2 | X2 = 4 and Var (X1 |x2 ).
47/115
Example (2.3.2)
Let X1 and X2 have the joint pdf

6x2 0 < x2 < x1 < 1
f (x1 , x2 ) =
0 elsewhere.
1 Compute E(X2 ).
2 Compute the function h(x1 ) = E (X2 |x1 ). Then compute
E [h(X1 )] and Var [h(X1 )].
48/115
Theorem 2.3.1
Let (X1 , X2 ) be a random vector. Then

(a) E [E (X2 |X1 )] = E (X2 ) ,
(b) Var(X2 ) = Var [ E(X2 |X1 ) ] + E [Var(X2 |X1 )] .
Interpretation:
I Both X2 and E(X2 |X1 ) are unbiased estimator of
E(X2 ) = µ2 .
I The part (b) shows that E(X2 |X1 ) is more reliable.
I We will talk more about this when studying sufficient statistics
in Chapter 7, Rao and Blackwell Theorem.
49/115
E [E (X2 |X1 )] = E (X2 ) .
Proof.
The proof is for the continuous case. The discrete case is proved
by using summations instead of integrals. We see
Z ∞ Z ∞
E(X2 ) = x2 f (x1 , x2 )dx2 dx1
−∞ −∞
Z ∞ Z ∞
f (x1 , x2 )
= x2 dx2 f1 (x1 )dx1
f1 (x1 )
Z−∞
∞
−∞
= E(X2 |x1 )f1 (x1 )dx1

−∞
= E[E(X2 |X1 )].
50/115
Var(X2 ) = Var [ E(X2 |X1 ) ] + E [Var(X2 |X1 )] .
Proof.
The proof is for both the discrete and continuous cases:
E[Var(X2 |X1 )] = E[E(X22 |X1 ) − (E(X2 |X1 ))2 ]

= E[E(X22 |X1 )] − E[E(X2 |X1 )2 ]
= E(X22 ) − E[E(X2 |X1 )2 ];
Var[E(X2 |X1 )] = E[E(X2 |X1 )2 ] − {E[E(X2 |X1 )]}2

= E[E(X2 |X1 )2 ] − [E(X2 )]2 .
Thus,
E[Var(X2 |X1 )] + Var[E(X2 |X1 )] = E(X22 ) − [E(X2 )]2 = Var(X2 ).
We further see that

Var [ E(X2 |X1 ) ] ≤ Var(X2 ). 51/115
Example 2.3.3
Let X1 and X2 be discrete random variables. Suppose the

conditional pmf of X1 given X2 and the marginal distribution of X2
are given by
x2

x2 1
p(x1 |x2 ) = , x1 = 0, 1, . . . , x2 ,
x1 2
2 1 x2 −1

p(x2 ) = , x2 = 1, 2, 3 . . . .
3 3
Determine the mgf of X1 .
52/115
Example
Assume that the joint pdf for X2 |X1 = x1 on the support
S = {0 < x1 < 1, 0 < x2 < 2, x1 + x2 < 2} is

2x1
in S,

fX1 ,X2 (x1 , x2 ) = 2 − x1
 0 otherwise.
Find E(X2 ) through E(X2 ) = E[E(X2 |X1 )].
53/115
Solution:
The conditional pdf for X2 |X1 = x1 , 0 < x1 < 1 is

1/(2 − x1 ) if 0 < x2 < 2 − x1
fX2 |X1 (x2 |x1 ) =
0 otherwise.
and the marginal pdf for X1 is fX1 (x1 ) = 2x1 for 0 < x1 < 1 and
zero otherwise.
2−x1
2 − x1
Z
1
E(X2 |X1 ) = x2 dx2 = ,
0 2 − x1 2
1
2 − x1
Z
E(E(X2 |X1 )) = 2x1 dx1 = 2/3.
0 2
We can verify this by
Z 1 Z 2−x1
2x1
E(X2 ) = x2 dx2 dx1 = 2/3.
0 0 2 − x1
54/115
2.4 The Correlation Coefficient
55/115
Recall the definition of the variance of X :
Var(X) = E[(X − µ)2 ].
Definition
Let X and Y be two random variables with expectations µ1 = EX
and µ2 = EY , respectively. The covariance of X and Y , if it
exists, is defined to be
Cov(X, Y ) = E [(X − µ1 )(Y − µ2 )] .
Computation shortcut:
E [(X − µ1 )(Y − µ2 )] = E(XY ) − µ1 µ2 .
56/115
Example 2.4.1
Let X and Y be two random variables with joint pdf

x+y 0 < x, y < 1
f (x, y) =
0 elsewhere.
Determine the covariance of X and Y .
57/115
Definition
The correlation coefficient of X and Y is defined to be
Cov(X, Y )
ρ= p
Var(X)Var(Y )
Example
What is the correlation coefficient of the previous example?
The plot is from Wikipedia https://en.wikipedia.org/wiki/Correlation_and_dependence 58/115
Linear conditional mean
For two random variables X and Y , write u(x) = E (Y |x):

R∞
∞
−∞ yfX,Y (x, y)dy
Z
E(Y |x) = yf2|1 (y|x)dy = .
−∞ f1 (x)
If u(x) is a linear function of x, say
u(x) = E (Y |x) = a + bx,
then we say that the conditional mean of Y is linear in x. The

following theorem gives the values of a and b.
59/115
Theorem 2.4.1
Let X and Y be two random variables , with means µ1 , µ2 ,

variances σ12 , σ22 , and correlation coefficient ρ. If the conditional
mean of Y is linear in x, then
σ2
E(Y |X) = µ2 + ρ (X − µ1 ) ,
σ1
E [Var (Y |X)] = σ22 1 − ρ2 .

60/115
Example 2.4.2
Let X and Y have the linear conditional means
E(Y |x) = 4x + 3
and
1
y − 3.
E(X|y) =
16
What are the values of µ1 , µ2 , ρ, and σ2 /σ1 ?
61/115
Recall that the mgf of the random vector (X, Y ) is defined to be
M (t1 , t2 ) = E et1 X+t2 Y . It can be shown that
∂ k+m h
k m t1 X+t2 Y
i
M (t ,
1 2t ) = E X Y e .
∂tk1 ∂tm
2
∂ k+m
h i
k m

M (t 1 , t2 ) = E X Y .
∂tk1 ∂tm
2

t1 =t2 =0
∂M (0,0)
I µ1 = E(X) = ∂t1
∂M (0,0)
I µ2 = E(Y ) = ∂t2
∂ 2 M (0,0)
I Var(X) = E(X 2 ) − µ21 = ∂t21
− µ21
∂ 2 M (0,0)
I Var(Y ) = E(Y 2 ) − µ22 = ∂t22
− µ22
∂ 2 M (0,0)
I Cov(X, Y ) = E(XY ) − E(X)E(Y ) = ∂t1 ∂t2 − µ1 µ2
Cov(X,Y )
I ρ= √ √
Var(X) Var(Y )
62/115
Example 2.4.4
e−y

0<x<y<∞
f (x, y) =
0 elsewhere.
Determine the correlation coefficient of X and Y .

Solution:
The mgf is
1
M (t1 , t2 ) = , t1 + t2 < 1, t2 < 1.
(1 − t1 − t2 )(1 − t2 )
We have µ1 = 1, µ2 = 2, σ12 = 1, σ22 = 2, Cov(X, Y ) = 1.
63/115
2.5 Independent Random Variables
64/115
Motivation
Suppose the bivariate random variables (X1 , X2 ) is continuously

distributed, and for all x1 ∈ SX1 , and x2 ∈ SX2 ,
fX1 |X2 (x1 |x2 ) = fX1 (x1 ). (1)
Since, by the definition of conditional pdf,
fX1 ,X2 (x1 , x2 )

fX1 |X2 (x1 |x2 ) = ,
fX2 (x2 )
it follows that
fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ) for all x1 ∈ SX1 , x2 ∈ SX2 . (2)
Clearly (1) and (2) are equivalent. Exactly the same logic applies
for a discrete random variable.
65/115
Definition of independence
We say two random variables X1 and X2 are independent if

I (Continuous case) their joint pdf is equal to the product of their
marginal pdf’s:
f (x1 , x2 ) ≡ f1 (x1 )f2 (x2 ).
I (Discrete case) their joint pmf is equal to the product of their

marginal pmf’s:
p(x1 , x2 ) ≡ p1 (x1 )p2 (x2 ).
66/115
Immediate indicators of dependency
Suppose that X1 and X2 have a joint support S = {(x1 , x2 )} and

marginal supports S1 = {x1 } and S2 = {x2 }. If X1 and X2 are
independent, then
S = S1 × S2 .
In other words,
I (Continuous case) If the joint support S is not a rectangle,
then X1 and X2 are dependent.
I (Discrete case) If there is a zero entry in the table of pmf, then
X1 and X2 are dependent.
67/115
Example 2.5.1
Let the joint pdf of X1 and X2 be

x1 + x2 0 < x1 < 1, 0 < x2 < 1
f (x1 , x2 ) =
0 elsewhere.
Are they independent?

Solution:
No, because f (x1 , x2 ) 6= f1 (x1 )f2 (x2 ):
Z ∞ Z 1
f1 (x1 ) = f (x1 , x2 )dx2 = (x1 + x2 )dx2 = x1 + 1/2, 0 < x1 < 1,
−∞ 0
Z ∞ Z 1
f2 (x2 ) = f (x1 , x2 )dx1 = (x1 + x2 )dx1 = x2 + 1/2, 0 < x2 < 1,
−∞ 0
68/115
Theorem 2.5.1
Two random variables X1 and X2 are independent if and only if

I (Continuous case) their joint pdf can be written as a product
of a nonnegative function of x1 and a nonnegative function of
x2 :
f (x1 , x2 ) ≡ g(x1 )h(x2 ) for all (x1 , x2 ) ∈ R2
I (Discrete case) their joint pmf can be written as a product of
a nonnegative function of x1 and a nonnegative function of x2 :
p(x1 , x2 ) ≡ g(x1 )h(x2 ).
69/115
Sketch of proof
I Only if: Independence ⇒ f (x1 , x2 ) ≡ g(x1 )h(x2 ):

This can be seen as g(x1 ) = f1 (x1 ) and h(x2 ) = f2 (x2 ).
I If: Independence ⇐ f (x1 , x2 ) ≡ g(x1 )h(x2 ):
If we have f (x1 , x2 ) ≡ g(x1 )h(x2 ), we have
Z ∞ Z ∞
f1 (x1 ) = g(x1 )h(x2 )dx2 = g(x1 ) h(x2 )dx2 = c1 g(x1 ),
−∞
Z ∞ Z−∞
∞
f2 (x2 ) = g(x1 )h(x2 )dx1 = h(x2 ) g(x1 )dx1 = c2 h(x2 ),
−∞ −∞
where c1 and c2 are constants. We see c1 c2 = 1 because
Z ∞ Z ∞ Z ∞
1= g(x1 )h(x2 )dx1 dx2 = g(x1 )dx1 h(x2 )dx2 = c2 c1 .
−∞ −∞ −∞
Thus, f (x1 , x2 ) = g(x1 )h(x2 ) = c1 g(x1 )c2 h(x2 ) = f1 (x1 )f2 (x2 ).
70/115
Independence in terms of CDF
Theorem 2.5.2 Let (X1 , X2 ) have the joint cdf F (x1 , x2 ) and let
X1 and X2 have the marginal cdf F1 (x1 ) and F2 (x2 ), respectively.
Then X1 and X2 are independent if and only if
F (x1 , x2 ) = F1 (x1 )F2 (x2 ), ∀(x1 , x2 ) ∈ R2 .
Theorem 2.5.3 The random variables X1 and X2 are

independent random variables if and only if the following condition
holds
P (a < X1 ≤ b, c < X2 ≤ d) = P (a < X1 ≤ b)P (c < X1 ≤ d),
for every a < b and c < d, where a, b, c, d are constants.
71/115
Example 2.5.3
Let the joint pdf of X1 and X2 be

x1 + x2 0 < x1 < 1, 0 < x2 < 1
f (x1 , x2 ) =
0 elsewhere.
Solution:
No, because
1 1 1 1
P (0 < X1 < , 0 < X2 < ) 6= P (0 < X1 < )P (0 < X2 < ) :
2 2 2 2
Z 1Z 1
1 1 2 2
P (0 < X1 < , 0 < X2 < ) = (x1 + x2 )dx1 dx2 = 1/8,
2 2 0 0
Z 1
1 2 1
P (0 < X1 < ) = (x1 + )dx1 = 3/8,
2 0 2
Z 1
1 2 1
P (0 < X2 < ) = (x2 + )dx2 = 3/8.
2 0 2 72/115
Theorem 2.5.4
If X1 and X2 are independent and that E [u(X1 )] and E [v(X2 )]
exist. Then
E [u(X1 )v(X2 )] = E [u(X1 )] E [v(X2 )] .
Proof.
Z ∞ Z ∞
E[u(X1 )v(X2 )] = u(x1 )v(x2 )f (x1 , x2 )dx1 dx2
Z−∞ −∞
∞ Z ∞
= u(x1 )v(x2 )f1 (x1 )f2 (x2 )dx1 dx2
−∞ −∞
Z ∞ Z ∞
= u(x1 )f1 (x1 )dx1 f2 (x2 )v(x2 )dx2
−∞ −∞
= E [u(X1 )] E [v(X2 )] .
73/115
Two special cases
I For independent random variable:
E(X1 X2 ) = E(X1 )E(X2 ).
I Independence implies that covariance Cov(X1 , X2 ) = 0:
E[(X1 − µ1 )(X2 − µ2 )] = E(X1 − µ1 )E(X2 − µ2 ).
74/115
Independence always implies zero covariance (correlation).
Zero covariance (correlation) does NOT always imply
independence:
Example
Assume that
pX,Y (−1, 1) = pX,Y (1, 1) = 1/4; pX,Y (0, −1) = 1/2.
X and Y are not independent because (for example)

pY |X (−1|0) = 1 6= pY (−1) = 1/2 but Cov(X, Y ) = 0 (check).
75/115
Theorem 2.5.5
Suppose that (X1 , X2 ) have the joint mgf M (t1 , t2 ) and marginal
mgf’s M1 (t1 ) and M2 (t2 ), respectively. Then, X1 and X2 are
independent if and only if
M (t1 , t2 ) ≡ M1 (t1 )M2 (t2 ).
76/115
Example 2.5.5
e−y

0<x<y<∞
f (x, y) =
0 elsewhere.

Solution:
The mgf is
1
M (t1 , t2 ) = , t1 + t2 < 1, t2 < 1.
(1 − t1 − t2 )(1 − t2 )
Because
M (t1 , t2 ) 6= M1 (t1 )M2 (t2 ) = M (t1 , 0)M (0, t2 ),
they are dependent.

77/115
2.6 Extension to Several Random Variables
78/115
Examples
1 Random experiment consists of drawing an individual c from a

population C .
Characteristics: height, weight, age, test scores, .....
2 Random experiments consists of the U.S economy at time t.
Characteristics: consumer prices, unemployment rate, Dow
Jones Industrial Average, Gross Domestic Product, ....
A note on notation. We will often use boldface letters to denote
vectors. For example, we use X to denote the random vector
(X1 , . . . , Xn ), and x to denote the observed values (x1 , . . . , xn ).
79/115
Pmf and cdf for the discrete case
I The joint pmf of a discrete random vector X is defined to be
pX (x) = P [X1 = x1 , . . . , Xn = xn ].
I The joint cdf of a discrete random vector X is defined to be
FX (x) = P [X1 ≤ x1 , . . . , Xn ≤ xn ].
I For the discrete case, pX (x) can be used to calculate

P (X ∈ A) for A ⊂ Rn :
X
P (X ∈ A) = pX (x) .
x∈A
80/115
Pdf and cdf for the continuous case
I The joint cdf of a continuous random vector X is defined to be
FX (x) = P [X1 ≤ x1 , . . . , Xn ≤ xn ].
I The joint pdf of a continuous random vector X is a function

fX (x) such that for any A ⊂ Rn
Z
P (X ∈ A) = fX (x)dx
ZA Z
= ... fX1 ,··· ,Xn (x1 , · · · , xn ) dx1 · · · dxn .
A
I For the continuous case, we have
∂n
FX (x) = fX (x).
∂x1 · · · ∂xn
81/115
Example
Let

 8x1 x2 x3 for 0 < x1 , x2 , x3 < 1
f (x1 , x2 , x3 ) =
0 otherwise.

Verify that this is a legitimate pdf.

Solution:
Z 1 Z 1 Z 1
8x1 x2 x3 dx3 dx2 dx1 = 1.
x1 =0 x2 =0 x3 =0
82/115
Expectation
I For the discrete case, the expectation of Y = u(X1 , . . . , Xn ),

if it exists, is defined to be
X X
E(Y ) = ··· u(x1 , . . . , xn )pX (x1 , . . . , xn ).
x1 ,...,xn
I For the continuous case, the expectation of

Y = u(X1 , . . . , Xn ), if it exists, is defined to be
Z Z
E(Y ) = ··· u(x1 , . . . , xn )fX (x1 , . . . , xn )dx1 · · · dxn .
x1 ,...,xn
83/115
As before, E is a linear operator. That is,
 
Xm m
X
E kj Yj  = kj E [Yj ] .
j=1 j=1
Example
Find E(5X1 X22 + 3X2 X34 ).
Solution:
Z 1Z 1Z 1
1
E(X1 X22 ) = (x1 x22 )8x1 x2 x3 dx3 dx2 dx1 = ,
0 0 0 3
Z 1Z 1Z 1
2
E(X2 X34 ) = (x2 x43 )8x1 x2 x3 dx3 dx2 dx1 = ,
0 0 0 9
4 2 4 2
E(5X1 X22 + 3X2 X34 ) = 5 · +3· = + =2
15 9 3 3
84/115
In an obvious way, we may extend the concepts of marginal pmf
and marginal pdf for the multidimensional case. For the discrete
case, the marginal pmf of (X1 , X2 ) is defined to be
X X
p12 (x1 , x2 ) = ··· pX (x1 , x2 , . . . , xn ).
x3 xn
For the continuous case, the marginal pdf of (X1 , X2 ) is defined

to be
Z ∞ Z ∞
f12 (x1 , x2 ) = ··· fX (x1 , x2 , . . . , xn )dx3 · · · dxn .
−∞ −∞
85/115
We then extend the concept of conditional pmf and conditional pdf.
For the discrete case, suppose p1 (x1 ) > 0. We define the the
conditional pmf of (X2 , . . . , Xn ) given X1 = x1 to be
p(x1 , x2 , . . . , xn )
p2,...,n|1 (x2 , . . . , xn |x1 ) = .
p1 (x1 )
For the continuous case, suppose f1 (x1 ) > 0. We define the

conditional pdf of (X2 , . . . , Xn ) given X1 = x1 to be
f (x1 , x2 , . . . , xn )
f2,...,n|1 (x2 , . . . , xn |x1 ) = .
f1 (x1 )
86/115
For the discrete case, suppose p1 (x1 ) > 0. Then we define the
conditional expectation of u(X2 , . . . , Xn ) given X1 = x1 to be
X X
E [u(X2 , . . . , Xn )|x1 ] = ··· u(x2 , . . . , xn )p2,...,n|1 (x2 , . . . , xn |x1 ).
x2 xn
For the continuous case, suppose f1 (x1 ) > 0. Then we define the
conditional expectation of u(X2 , . . . , Xn ) given X1 = x1 to be
E [u(X2 , . . . , Xn )|x1 ]
Z ∞ Z ∞
= ··· u(x2 , . . . , xn )f2,...,n|1 (x2 , . . . , xn |x1 )dx2 · · · dxn .
−∞ −∞
87/115
Mutual Independence
88/115
We say that the n random variables X1 , . . . , Xn are mutually
independent if, for the discrete case,
p(x1 , x2 , . . . , xn ) = p1 (x1 )p2 (x2 ) · · · pn (xn ), for all (x1 , · · · , xn ) ∈ Rn ,
or, for the continuous case,
f (x1 , x2 , . . . , xn ) = f1 (x1 )f2 (x2 ) · · · fn (xn ) for all (x1 , · · · , xn ) ∈ Rn .
88/115
If the n random variables X1 , . . . , Xn are mutually independent,
then
P (a1 < X1 < b1 , . . . , an < Xn < bn )
=P (a1 < X1 < b1 ) · · · P (an < Xn < bn ) .
We may rewrite the above equation as
 
n
\ n
Y
P (aj < Xj < bj ) = P (aj < Xj < bj ) .
j=1 j=1
89/115
If the n random variables X1 , X2 , . . . , Xn are mutually
independent, then
E [u1 (X1 )u2 (X2 ) · · · un (Xn )] = E [u1 (X1 )] E [u2 (X2 )] · · · E [un (Xn )] ,
 
n
Y n
Y
E uj (Xj ) = E [uj (Xj )] .
j=1 j=1
As a special case of the above, if the n random variables X1 , X2 ,
. . . , Xn are mutually independent, then for mgf,
n
Y
M (t1 , t2 , · · · , tn ) = Mj (tj ),
which can be seen from j=1
M (t1 , t2 , · · · , tn ) = E[exp(t1 X1 + t2 X2 + . . . + tn Xn )]
 
n
Y
= E exp(tj Xj )
j=1
n
Y n
Y
= E [exp(tj Xj )] = Mj (tj ).
j=1 j=1 90/115
Mutual independence v.s. pairwise independence
I We say the n random variables X1 , X2 , . . . , Xn are pairwise

independent if for all pairs (i, j) with i 6= j , the random
variables Xi and Xj are independent.
I Unless there is a possible misunderstanding between mutual

independence and pairwise independence, we usually drop
the modifier mutual.
I If the n random variables X1 , X2 , . . . , Xn are independent

and have the same distribution, then we say that they are
independent and identically distributed, which we
abbreviate as i.i.d..
91/115
Compare “mutual independence” and “pairwise independence”.
Example (from S. Bernstein)
Consider a random vector (X1 , X2 , X3 ) that has joint pmf
p(x1 , x2 , x3 )
1
4 for (x1 , x2 , x3 ) ∈ {(1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 1)} .
=
0 otherwise.
Solution:
1

4 for (xi , xj ) ∈ {(0, 0), (1, 0), (0, 1), (1, 1)} .
pij (xi , xj ) =
0 otherwise.
1

2 for (xi ) ∈ {0, 1} .
pi (xi ) =
0 otherwise.
pairwise independence : pij (xi , xj ) = pi (xi )pj (xj ).
not mutual independence : p(x1 , x2 , x3 ) 6= p1 (x1 )p2 (x2 )p3 (x3 ).
92/115
Multivariate Variance-Covariance Matrix
93/115
1 Let X = (X1 , · · · , Xn )> be a random vector.
2 We define the expectation of X as EX = (EX1 , · · · , EXn )> .
3 Let W = [Wij ] be a m × n matrix, where Wij are random
variables. That is,
 
W11 W12 · · · W1n
 W21 W22 W2n 
W=
 ···
 = [Wij ]
m×n .
··· ··· ··· 
Wm1 Wm2 · · · Wmn
4 We define the expectation of this random matrix as

E [W] = [E (Wij )]. That is,
 
E (W11 ) E (W12 ) ··· E (W1n )
 E (W21 ) E (W22 ) E (W2n ) 
E [W] =  
 = [E (Wij )]m×n .

··· ··· ··· ···
E (Wm1 ) E (Wm2 ) · · · E (Wmn )
93/115
Theorem 2.6.2
Let W and V be m × n random matrices, and let A and B be

k × m constant matrices, and let C be a n × l constant matrix.
Then,
E [AW + BV] = AE [W] + BE [V]
and
E [AWC] = AE [W] C.
Proof sketch:
The (i, j) of the first equation:
"m m
# m m
X X X X
E Ais Wsj + Bis Vsj = Ais E[Wsj ] + Bis E[Vsj ].
s=1 s=1 s=1 s=1
94/115
Let X = (X1 , . . . Xn )> be an n-dimensional random vector with mean
vector µ. Then the variance-covariance matrix of X is defined to be
Cov(X)
h i
>
= E (X − µ) (X − µ)
 
(X1 − µ1 ) (X1 − µ1 ) (X1 − µ1 ) (X2 − µ2 ) ··· (X1 − µ1 ) (Xn − µn )
 (X2 − µ2 ) (X1 − µ1 ) (X2 − µ2 ) (X2 − µ2 ) (X2 − µ2 ) (Xn − µn ) 
= E 
··· ··· ··· ··· 
(Xn − µn ) (X1 − µ1 ) (Xn − µn ) (X2 − µ2 ) ··· (Xn − µn ) (Xn − µn )
 
σ11 σ12 · · · σ1n
 σ21 σ22 σ2n 
= 
 ··· ··· ··· ··· 

σn1 σn2 · · · σnn
95/115
Example of a covariance matrix
e−y

0<x<y<∞
f (x, y) =
0 elsewhere.
We have µ1 = 1, µ2 = 2, σ12 = 1, σ22 = 2, σ1,2 = Cov(X, Y ) = 1.

Let Z = (X, Y )> , then

1 1 1
E(Z) = and Cov(Z) =
2 1 2
96/115
Theorem 2.6.3 – Two properties of covariance matrix
Let X = (X1 , . . . Xn )> be an n-dimensional random vector with
mean vector µ. Then,
h i
Cov (X) = E XX > − µµ> . (3)
If further let A be an m × n constant matrix, then we have
Cov (AX) = ACov (X) A> .
Proof. Cov(X) = E[(X − µ)(X − µ)> ]

= E[(XX > − µX > − Xµ> + µµ)> ]
= E[XX > ] − µE[X > ] − E[X]µ> + µµ> .
h i
Cov(AX) = E (AX)(AX)> − (Aµ)(Aµ)>
h i
= E AXX > A> − Aµµ> A>
h i
= AE XX > A> − Aµµ> A>
n h i o 97/115
> Chapter 2> >
=The
Boxiang Wang, AUniversity
E XX of Iowa − µµ STAT A4100 Fall 2018
Proof without matrix notation
Cov(X)
h i
>
= E (X − µ) (X − µ)
 
(X1 − µ1 ) (X1 − µ1 ) (X1 − µ1 ) (X2 − µ2 ) · · · (X1 − µ1 ) (Xn − µn )
 (X2 − µ2 ) (X1 − µ1 ) (X2 − µ2 ) (X2 − µ2 ) (X2 − µ2 ) (Xn − µn ) 
= E ···

··· ··· ··· 
(Xn − µn ) (X1 − µ1 ) (Xn − µn ) (X2 − µ2 ) · · · (Xn − µn ) (Xn − µn )
 
E (X1 X1 ) − µ1 µ1 E (X1 X2 ) − µ1 µ2 · · · E (X1 Xn ) − µ1 µn
 E (X2 X1 ) − µ2 µ1 E (X2 X2 ) − µ2 µ2 E (X2 Xn ) − µ2 µn 
= 
 ···

··· ··· ··· 
E (Xn X1 ) − µn µ1 E (Xn X2 ) − µn µ2 · · · E (Xn Xn ) − µn µn
   
(X1 X1 ) (X1 X2 ) · · · (X1 Xn ) µ1 µ1 µ1 µ2 · · · µ1 µn
 (X2 X1 ) (X2 X2 ) (X2 Xn ) 
 −  µ2 µ1 µ2 µ2 µ2 µn 

= E ···

··· ··· ···   ··· ··· ··· ··· 
(Xn X1 ) (Xn X2 ) · · · Xn Xn µn µ1 µn µ2 · · · µn µn
h i
> >
= E XX − µµ .
98/115
I All variance-covariance matrices are positive semi-definite,
that is a> Cov(X)a ≥ 0 for any a ∈ Rn .
I This is because
a> Cov(X)a = Var(a> X) ≥ 0,
where we note that a> X is a univariate random variable.
99/115
2.7 Transformation for Several Random Variables
100/115
One to one transformation
101/115
I Let X = (X1 , X2 , . . . , Xn ) be a random vector with pdf
fX (x1 , x2 , . . . , xn ) with support S . Let


 y1 = g1 (x1 , x2 , . . . , xn )
 y2 = g2 (x1 , x2 , . . . , xn )

..


 .
yn = gn (x1 , x2 , . . . , xn )

be a multivariate function that maps (x1 , x2 , . . . , xn ) ∈ S to

(y1 , y2 , . . . , yn ) ∈ T . Suppose that it is a one-to-one
correspondence.
I Suppose that the inverse functions are given by


 x1 = h1 (y1 , y2 , . . . , yn )
 x2 = h2 (y1 , y2 , . . . , yn )

.. .


 .
xn = hn (y1 , y2 , . . . , yn )

101/115
I Let the Jacobian be

∂x1 ∂x1
··· ∂x1
∂y ∂y2 ∂yn
12

∂ (x1 , x2 , . . . , xn ) ∂x ∂x2
··· ∂x2

= .∂y1 ∂y2 ∂yn
J = .. ..
.
∂ (y1 , y2 , . . . , yn ) .. . .

∂xn ∂xn ∂xn
···

∂y ∂y2 ∂yn

1
I Then, the joint pdf of Y1 , Y2 , . . . , Yn determined by the

mapping above is
fY (y1 , y2 , . . . , yn )
= |J| fX [h1 (y1 , y2 , . . . , yn ), h2 (y1 , y2 , . . . , yn ), . . . , hn (y1 , y2 , . . . , yn )] ,
for (y1 , y2 , . . . , yn ) ∈ T .
102/115
Example 2.7.1
Suppose X1 , X2 , and X3 have joint pdf

48x1 x2 x3 0 < x1 < x2 < x3 < 1
f (x1 , x2 , x3 ) =
0 elsewhere,
and let 
 Y1 = X1 /X2
Y2 = X2 /X3
Y3 = X3 .

Determine the joint pdf of Y1 , Y2 and Y3 .
103/115
104/115
Multiple to one transformation
105/115
I Let X = (X1 , X2 , . . . , Xn ) be a random vector with pdf
fX (x1 , x2 , . . . , xn ) with support S . Let


 y1 = g1 (x1 , x2 , . . . , xn )
 y2 = g2 (x1 , x2 , . . . , xn )

..


 .
yn = gn (x1 , x2 , . . . , xn )

be a multivariate function that maps X = (x1 , x2 , . . . , xn ) ∈ S

to Y = (y1 , y2 , . . . , yn ) ∈ T .
I Suppose that the support S can be represented as the union
of k mutually disjoint sets such that for each i, there is
one-to-one correspondence bewteen X and Y .
I Suppose that the inverse functions are given by


 x1 = h1i (y1 , y2 , . . . , yn )
 x2 = h2i (y1 , y2 , . . . , yn )

.. .


 .
xn = hni (y1 , y2 , . . . , yn )

105/115
Let the Jacobian be

∂h1i ∂h1i
··· ∂h1i
∂y ∂y2 ∂yn
∂h2i1

∂h2i ∂h2i
···

∂ (x1 , x2 , . . . , xn ) ∂y1 ∂y2 ∂yn
Ji = = .
∂ (y1 , y2 , . . . , yn ) ... ..
.
..
.

∂hni ∂hni ∂hni
∂y
1 ∂y2 ··· ∂yn

Then, the joint pdf of Y1 , Y2 , . . . , Yn determined by the mapping

above is
fY (y1 , y2 , . . . , yn )
k
X
= |Ji | fX [h1i (y1 , y2 , . . . , yn ), h2i (y1 , y2 , . . . , yn ), . . . , hni (y1 , y2 , . . . , yn )] ,
i=1
for (y1 , y2 , . . . , yn ) ∈ T .
106/115
Example 2.7.3
Let X1 and X2 have the joint pdf defined over the unit circle given
by
1
0 < x21 + x22 < 1

f (x1 , x2 ) = π
0 elsewhere.
Let
Y1 = X12 + X22

Y2 = X12 / X12 + X22 .

Determine the joint pdf of Y1 and Y2 .
107/115
108/115
2.8 Linear Combinations of Random Variables
109/115
Motivation
I We are interested in a function of T = T (X1 , . . . , Xn ) where

X1 , . . . , Xn is a random vector.
I For example, we let each Xi denote the final percentage of
STAT 4100 grade. Assume we know the distribution of each
Xi , can we know the distribution of the average percentage
X̄ ?
I In this section, we focus on linear combination of these
variables, i.e.,
n
X
T = ai Xi .
i=1
110/115
Expectation of linear combinations
Pn
Theorem 2.8.1. Let T = i=1 ai Xi . Provided that E[|Xi |] < ∞,
for all i = 1, . . . , n, then
n
X
E(T ) = ai E(Xi ).
i=1
This theorem follows immediately from the linearity of the

expectation operation.
111/115
Variance and covariance of linear combinations
Theorem 2.8.2. Let T = ni=1 ai Xi and W = m

P P
j=1 bj Yj . If
E[Xi2 ] < ∞ and E[Yj2 ] < ∞, for i = 1, . . . , n and j = 1, . . . , m,
then
n X
X m
Cov(T, W ) = ai bj Cov(Xi , Yj ).
i=1 j=1
Proof:
 
X m
n X
Cov(T, W ) = E  (ai Xi − ai E(Xi ))(bj Yj − bj E(Yj ))
i=1 j=1
n X
X m
= E[(ai Xi − ai E(Xi ))(bj Yj − bj E(Yj ))].
i=1 j=1
112/115
Pn
Corollary 2.8.1. Let T = i=1 ai Xi . Provided E[Xi2 ] < ∞, for
i = 1, . . . , n, then
n
X m
X
Var(T ) = Cov(T, T ) = a2i Var(Xi ) +2 ai aj Cov(Xi , Yj ).
i=1 i<j
Corollary 2.8.2. If X1 , . . . , Xn are independent random variables

with finite variances, then
n
X
Var(T ) = a2i Var(Xi ).
i=1
Special case If X1 and X2 have finite variances, then
Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ).
If they are also independent, then
Var(X + Y ) = Var(X) + Var(Y ).
Note that E(X + Y ) = E(X) + E(Y ) regardless of independence. 113/115

Example 2.8.1 – Sample mean
Let X1 , . . . , Xn be independent and identically distributed random

variables with common mean µ and variance σ 2 . The sample
mean is defined by X̄ = n−1 ni=1 Xi . This is a linear combination
P
of the sample observations with ai ≡ n−1 ; hence by Theorem
2.8.1 and Corollary 2.8.2, we have
E(X̄) = µ and Var(X̄) = σ 2 /n.
114/115
Example 2.8.2 – Sample variance
Define the sample variance by

n n
!
X X
−1
2
S = (n − 1) (Xi − X̄)2 = (n − 1)−1 Xi2 − nX̄ 2
.
i=1 i=1
Following from the fact that E(X 2 ) = σ 2 + µ2 ,

n
!
X
E(S 2 ) = (n − 1)−1 E(Xi2 ) − nE(X̄ 2 )
i=1
−1
= (n − 1) {nσ + nµ2 − n[(σ 2 /n + µ2 )]}
2
= σ2.
115/115

Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables

Uploaded by

Copyright:

Available Formats

Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables

Uploaded by

Copyright:

Available Formats

Chapter 2 Multivariate Distributions

2.1 Distributions of Two Random Variables

I What is the probability of P (X1 ≥ 9, X2 ≤ 9)? 0.60.

Example Find the marginal pmf of the previous example.

Find the marginal pmf’s.

under the condition that

1 Find P (1/4 < X1 < 3/4; 1/2 < X2 < 1).

Let X1 and X2 be continuous random variables with joint density

For the marginal pdf, we have

Let Y = u(X1 , X2 ). Then, Y is a random variable and

under the condition that

Let X1 and X2 be continuous random variables with joint density

E(k1 Y1 + k2 Y2 ) = k1 E(Y1 ) + k2 E(Y2 ).

We also note that

Let (X1 , X2 ) be a random vector with pdf

Let Y1 = 7X1 X22 + 5X2 and Y2 = X1 /X2 . Determine E(Y1 ) and

FX1 ,X2 (x1 , x2 ) = P [{X1 ≤ x1 } ∩ {X2 ≤ x2 }] for all (x1 , x2 ) ∈ R2 .

Relationship with pmf and pdf:

2 Continuous random variables:

FX1 ,X2 (x1 , x2 ) = P [{X1 ≤ x1 } ∩ {X2 ≤ x2 }] for all (x1 , x2 ) ∈ R2 .

P { (X1 , X2 ) ∈ (a1 , b1 ] × (a2 , b2 ] }

Consider the discrete random vector (X1 , X2 ). Its pmf is given in

Find the value of the joint cdf F (x1 , x2 ) at (1, 2).

M (t1 , t2 ) = E et1 X1 +t2 X2

where t> is a row vector (t1 , t2 ) and X is a column vector

Let the continuous-type random variables X and Y have the joint

provided that t1 + t2 < 1 and t2 < 1.

The marginal mgf is given by

MX1 (t1 ) = E et1 X1 = MX1 ,X2 (t1 , 0),

MX2 (t2 ) = E et2 X2 = MX1 ,X2 (0, t2 ).

Example: Method 1: In the previous example,

I Assume there is a one to one mapping between

I Transformation of discrete random variable:

pY1 ,Y2 (y1 , y2 ) = pX1 ,X2 (w1 (y1 , y2 ), w2 (y1 , y2 )).

Let X and Y be independent random variables such that

I Find the pmf of U = X + Y .

Let J denote the Jacobian of the transformation. This is the

∂x1 ∂x2 ∂x1 ∂x2

Transformation formula: The joint pdf of the continuous random

Notice the bars around the function J , denoting absolute value.

Suppose (X1 , X2 ) has joint pdf

Let Y1 = X1 /X2 and Y2 = X2 . Find the joint and marginal pdf’s of

1. One to one transformation:

y1 = x1 /x2 , y2 = x2 , 0 < x1 < x2 < 1

2. Give the joint pdf:

3. Give the marginal pdf of Y1 :

Suppose (X1 , X2 ) has joint pdf

Let Y1 = 1/2(X1 − X2 ) and Y2 = X2 . Find the joint and marginal

1. One to one transformation:

2. Give the joint pdf:

3. Give the marginal pdf of Y1 :

which gives fY1 (y1 ) = e−|y1 | , −∞ < y < ∞.

Suppose (X1 , X2 ) has joint pdf

Let Y1 = 1/2(X1 − X2 ). What is the mgf of Y1 ?

which is the mgf of double exponential distribution.

Recalling our definition of conditional probability for events, we

• Let X1 and X2 denote discrete random variables with joint pmf

P (X1 = x1 , X2 = x2 ) pX1 ,X2 (x1 , x2 )

We use a simple notation: