Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
56 views

Week 4 Lecture Notes

Uploaded by

Takudzwa Caitano
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Week 4 Lecture Notes

Uploaded by

Takudzwa Caitano
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Econ 102A

Introduction to Statistical Methods for Social Scientists

Stanford University

Course Materials for Week 4

Professor Scott M. McKeon

Autumn Quarter, 2020 - 21

© Scott M. McKeon
All Rights Reserved
Week 4

Goals:

1. Gaining practice in calculating expected values, variances and standard deviations of


random variables.

2. Understanding important properties of expected value and variance.

3. Getting acquainted with joint, marginal and conditional distributions.

4. Learning to appreciate the class topics discussed thus far through a variety of applied
examples.
Handout #25
Econ 102A Statistical Methods for Social Scientists Page 1 of 1

Week 4 Definitions

1. The expected value (also known as the expectation or mean) of a random variable, X,
(denoted μ X or E(X)) is the long-run overall average value of the random variable.

2. The variance of a random variable, X, (denoted σ 2x ) gives a measure of how much the
value of X varies around the expected value.

3. The standard deviation of a random variable, X, (denoted σ x ) is the (positive) square root
of the variance.

4. Consider two random variables, X and Y, that can each take on (possibly) different values.
We define the joint probability, P(X = x0, Y = y0), as:

P(X = x0, Y = y0) = P(X = x0 and Y = y0).

The collection of all joint probabilities is a joint distribution.

5. Given a joint distribution on X and Y, the marginal distribution of X or Y is simply the


probability distribution on each individual variable. That is, given a joint distribution on X
and Y, using the data to construct the distribution of X alone (irrespective of Y) is thereby
constructing the marginal distribution of X.

6. Consider a joint distribution on X and Y. Suppose we later learn the actual value of Y. We
can therefore update our distribution on X given knowledge of Y. This updated distribution
is called a conditional distribution.

7. Two random variables, X and Y, are independent if knowing the value of one of the
random variables does not affect the probability distribution on the other random variable.
That is, two random variables are independent if the conditional distribution of one of the
variables equals its marginal distribution (for all values of the random variables).
Alternatively, two random variables are independent if their joint distribution equals the
product of their marginal distributions (for all values of the random variables).
Handout #26
Econ 102A Statistical Methods for Social Scientists Page 1 of 1

What is Risk?

From the pages of Management Science:

When we speak of decisions under risk, we are referring to a class of decision problems for
which there is more than one state of nature and for which we make the assumption that the
decision maker can arrive at a probability estimate for the occurrence for each of the
various states of nature.

- Eppen, Gould & Schmidt, Introductory Management Science


4th Edition, Prentice Hall, 1993

From the pages of Microeconomics:

Some people distinguish between uncertainty and risk along the lines suggested by the
economist Frank Knight. Uncertainty can refer to situations in which many outcomes are
possible but their likelihoods are unknown. Risk then refers to situations in which we can
list all possible outcomes, and we know the likelihood that each outcome will occur. We
will always refer to risky situations but will simplify the discussion by using uncertainty
and risk interchangeably.
- Pindyck & Rubinfeld, Microeconomics, 2nd Edition,
Macmillan Publishing Company, 1992

From the pages of Finance:

Risk can be defined as the possibility that the actual return will deviate from that which was
expected.

- Van Horne, Financial Management and Policy, 8th Edition,


Prentice Hall, 1993

… risk takes into account both the probability of an outcome and its magnitude. Instead of
measuring the probability of a range of outcomes, one estimates the extent to which the
actual outcome is likely to diverge from the expected value.

- Sharpe, Investments, 3rd Edition, Prentice Hall, 1985

From the pages of Scott:

Risk is exposure to a negative outcome.


Handout #27
Econ 102A Statistical Methods for Social Scientists Page 1 of 2

An Alternate Equation for Variance

The initial in-class formula given to calculate variance is:

Var(X) = E[(X – μ X )2]

Alternatively, variance can also be calculated as:

Var(X) = E(X2) – μ 2X

As an example, recall that we calculated the variance of each of the ‘Investment Wheel’
options during lecture. Specifically, the variance of Option #1 was calculated as:

Var(X) = (1/3) (– 40 – 15)2 + (1/6) (20 – 15)2 + (1/2) (50 – 15)2 = 1625

If we want to use the alternate formula, the calculation would be as follows:

Var(X) = (1/3) (– 40)2 + (1/6) (20)2 + (1/2) (50)2 – (15)2 = 1625

This is the ‘E(X2)’ part This is the ‘ μ 2X ’ part


(the first three terms (only the last term)
added together)

In words, this alternate formula proceeds as follows:

(1) take all the different values of the variable and square them

(2) weight each term by its associated probability

(3) add up all the individual terms and then subtract the square of the expected value.

As you can see, both formulas (the initial one from lecture and the new one presented here)
produce the same result. It is up to you to decide which formula you personally prefer in doing
the calculation.

For anyone interested in why these two alternate formulas are equivalent, the proof is as
follows:
Handout #27
Page 2 of 2

By definition, we have: Var(X) = E[(X – μ X )2]

 Var(X) = E[(X – μ X ) (X – μ X )]

 Var(X) = E[(X2 – 2X μ X + μ 2X )]

By property (ii) of the expected value (from the lecture notes), we then have:

 Var(X) = E(X2) – E(2X μ X ) + E( μ 2X )

By property (i) of the expected value (from the lecture notes), we then have:

 Var(X) = E(X2) – 2 μ X E(X) + E( μ 2X )

Recalling that μ X is an actual number (as opposed to a random variable), we realize that
E( μ 2X ) = μ 2X . That is, the expected value of a number is simply the number itself. So, we
have:

 Var(X) = E(X2) – 2 μ X E(X) + μ 2X

Then, since E(X) = μ X , we have:

 Var(X) = E(X2) – 2 μ 2X + μ 2X

Combining the last two terms gives:

 Var(X) = E(X2) – μ 2X
Handout #28
Econ 102A Statistical Methods for Social Scientists Page 1 of 1

The Variance of a Continuous Random Variable

Recall Question 7 from the Week 3 Worksheet (Handout #16) which deals with a continuous
random variable. Specifically, that exercise assumes the lifetime of a car battery (in years) is
given by the probability density function:

 1 1
 8 x  2 if 0  x  4

f ( x)  
 0 otherwise

In lecture, we derive that E(x) = 4/3 and Var(x) = 8/9. The method used in class to obtain the
variance is by the equation Variance = E(x2) – E(x)2. However, recall that variance can also be
computed via the formula, Variance = E[(x – μ x )2]. This handout reinforces the computation
done in lecture by calculating the variance by this alternate formula.

If we want to calculate variance by using E[(x – μ x )2] for this exercise, we have:

 (x  μ x )
2
E[(x – μ x )2] = f(x) dx
0

4 4
1 1 4 1 1
 (x  μ x ) 2 ( x  ) dx =  (x  3 ) ( x  ) dx
2
=
0
8 2 0
8 2

4
8 16 1 1
 (x  x  ) ( x  ) dx
2
=
0
3 9 8 2

4
1 5 2 14 8
 ( 8 x  x  x  ) dx
3
=
0
6 9 9

 1 4 5 3 7 2 8  x 4
=  x  x  x  x
 32 18 9 9  x 0

 72  160  112  32 8
= =
9 9

and we see that calculating variance by this alternate formula produces the same result as in
lecture.
Handout #29
Econ 102A Statistical Methods for Social Scientists Page 1 of 3

Important Properties of Expected Value and Variance

• Properties of Expected Value

(i) Assertion: E(X + a) = E(X) + a

Interpretation: Suppose you have a random variable X with a known distribution and
you compute E(X). Now, consider adding a constant value ‘a’ to each of
the random variable values. Then, the expected value of this new
random variable will simply be this value ‘a’ added to the original
expected value of X.

Proof: E(X + a) =  P(x  x 0 ) (x 0  a ) =  P(x  x 0 ) x 0 +  P(x  x 0 ) a


x0 x0 x0
=  P(x  x 0 ) x 0 + a  P(x  x 0 ) = E(X) + a (1) = E(X) + a
x0 x0

(ii) Assertion: E(cX) = c E(X)

Interpretation: Suppose you have a random variable X with a known distribution and
you compute E(X). Now, consider multiplying the potential variable
values by a constant c. That is, you might double all the random variable
values (i.e., c = 2) or you might triple them (i.e., c = 3). Then, the
expected value of this new random variable, will always equate to the
original expected value of X multiplied by the constant c.

Proof: E(cX) =  P(x  x 0 ) cx 0 = c  P(x  x 0 ) x 0 = c E(X)


x0 x0

(iii) Assertion: E(X1 + X2 + X3 + … + Xn) = E(X1) + E(X2) + E(X3) + … + E(Xn)

Interpretation: Suppose you have a group of random variables that are being summed.
If you wish to know the expected value of the aggregately summed
group, you can simply sum the expected values of each variable
individually. That is, the expected value of the sum of variables is
always equal to the sum of the individual expected values. This
assertion is true whether the variables being summed are independent
or dependent.

Proof: For simplicity we will consider two variables only. Let X1 take on
values x11, x12, …, x1n and let X2 take on values x21, x22, …, x2m.
Then, E(X1 + X2) =  P(x1  x1i , x 2  x 2j )(x1i  x 2 j )
i j
Handout #29
Page 2 of 3

=  P(x1  x1i , x 2  x 2j ) (x1i ) +  P(x1  x1i , x 2  x 2j ) (x 2 j )


i j i j
=  P(x1  x1i ) (x1i ) +  P(x 2  x 2j ) (x 2 j )
i j
= E(X1) + E(X2)

• Properties of Variance

(i) Assertion: Var(X + b) = Var(X)

Interpretation: This property states that if you add or subtract a constant amount from
all random variable values, the overall ‘spread-outness’ or ‘variation’ of
the distribution is unaffected.
Specifically, suppose X represents the midterm examination score of
students in a particular class. Suppose, the students do rather poorly on
the exam and, in retrospect, the instructor decides that one of the
questions on the exam was unfair and arbitrarily decides to add 10 points
to everyone’s exam score (i.e., b = 10 in the above assertion). Then,
although students might feel happier to have received an extra 10 points
on the exam, the variance on the exam scores remains unchanged.

Proof: Var(X + b) = E[((X + b) – ( μ X + b))2]

= E[(X + b – μ X – b)2]

= E[(X – μ X )2]

= Var(X)

(ii) Assertion: Var(cX) = c2 Var(X)

Interpretation: Suppose X represents the initial amount of money you allocate to a


particular investment. Then, you decide to change your initial
investment by a certain factor. That is, you might, say, double your
investment (i.e., c = 2), triple your investment (i.e., c = 3), or halve your
investment (i.e., c = 1/2). Then your variance (i.e., risk as interpreted in
finance applications) changes by c2. That is, by tripling your investment
you take on (3)2 = 9 times the variance. Likewise, by halving your
investment you take on only (1/2)2 = 1/4 of the variance, and so on.

Proof: Var(cX) = E[(cX – c μ X )2] = E[c2(X – μ X )2]

By property (i) of the expected value above, we then have:

= c2 E[(X – μ X )2] = c2 Var(X)


Handout #29
Page 3 of 3

(iii) Assertion: Var(X + Y) = Var(X) + Var(Y)

(Note: this is only true if the random variables are independent)

Interpretation: Suppose X and Y represent returns from two independent investments.


Then, the variance of the combined portfolio of investments is equal to
the sum of the variances of each investment individually.

Proof: Please see the future handout (Handout #42) discussing the variance of
dependent random variables.
Handout #30
Econ 102A Statistical Methods for Social Scientists Page 1 of 2

Week 4 Worksheet

Consider a company that is modeling its stock performance in the upcoming year as well as the
number of new job opportunities they expect (if they decide to expand operations in the
subsequent year). Suppose, further, that we have the following:

• The year that is being modeled is a Presidential election year. The company feels that the
outcome of the election will influence the amount of government funding they will receive
which, in turn, will not only affect their stock performance but also their desire to expand
operations.

• The company definitely will not expand operations this upcoming year. However, the
board of directors is scheduled to meet in fifteen months to decide if they will expand
operations in the subsequent year. If the company’s stock performs well this year, they will
be more inclined to expand operations for that subsequent year.

The company models next year’s stock performance as either ‘good’, ‘fair’ or ‘poor’. A ‘good’
stock performance is a gain of 20% while ‘fair’ and ‘poor’ stock performances are gains of 5%
and - 10% respectively.

As mentioned, the company is thinking about expanding operations in the future. An


‘aggressive’ expansion would create 10,000 new jobs, a ‘modest’ expansion would create
3,000 new jobs and no expansion would create 0 new jobs.

The company has defined the following random variables:

X = A Bernoulli random variable which equals 1 if the Presidential election works out
favorably and 0 if it works out unfavorably

Y = The stock performance in the upcoming year

Z = The number of new job opportunities in response to expansion

A probability tree reflecting the company’s beliefs is presented on the following page. Upon
consulting the model, answer the following:

(a) Construct tables giving the joint distributions for each coupling of the variables.

(b) Determine the marginal distributions of X, Y and Z.

(c) Suppose, after all three issues have been resolved, you find out the company did not decide
to expand operations. Construct a distribution reflecting your new belief as to how the
stock performed.

(d) Determine whether or not Y and Z are independent random variables.


Handout #30
Page 2 of 2

Election Result? Stock Performance? Expansion? Probability? X Y Z

.60
Aggressive .2145 1 .20 10,000

.65 .30
Good Modest .1073 1 .20 3,000

None .0357 1 .20 0


.10

.48
Aggressive .0528 1 .05 10,000

.55 .20 .25


Fair Modest .0275 1 .05 3,000
Favorable

None .0297 1 .05 0


.27

.38
Aggressive .0313 1 - .10 10,000

.15 .27
Poor Modest .0223 1 - .10 3,000

None .0289 1 - .10 0


.35

.52
Aggressive .0655 0 .20 10,000

.28 .34
Good Modest .0428 0 .20 3,000

None .0177 0 .20 0


.14

.40
Aggressive .0828 0 .05 10,000

.45 .46 .30


Unfavorable Fair Modest .0621 0 .05 3,000

None .0621 0 .05 0


.30

.15
Aggressive .0175 0 - .10 10,000

.26 .27
Poor Modest .0316 0 - .10 3,000

None .0679 0 - .10 0


.58
Handout #31
Econ 102A Statistical Methods for Social Scientists Page 1 of 1

Week 4 Worksheet - Parts (a) and (b)

From the probability tree model we directly have:

X
0 1
- .10 0.117 0.0825 0.1995
Y .05 0.207 0.11 0.317
.20 0.126 0.3575 0.4835
0.45 0.55 1

X
0 1
0 0.1477 0.0943 0.242
Z 3,000 0.1365 0.1571 0.2936
10,000 0.1658 0.2986 0.4644
0.45 0.55 1

Y
- .10 .05 .20
0 0.0968 0.0918 0.0534 0.242
Z 3,000 0.0539 0.0896 0.1501 0.2936
10,000 0.0488 0.1356 0.28 0.4644
0.1995 0.317 0.4835 1
Handout #32
Econ 102A Statistical Methods for Social Scientists Page 1 of 4

Joint, Marginal and Conditional Distributions

Earlier, we discussed the ‘distribution’ of a random variable. Basically, it is a visual which


puts the variable’s possible values on the x-axis and the associated probabilities on the
y-axis. But, to speak of the ‘distribution’ is a bit vague because there are actually different
kinds of distributions. We now distinguish the following different types of distributions:

A joint distribution is a distribution about two or more variables. It is typically displayed in


tabular form (as opposed to a bar graph).

A marginal distribution is a distribution about one variable only. So, in Week 3 when I ask,
“What is the distribution of X?” I really should ask, “What is the marginal distribution of X?”
(because we are looking for the distribution of one variable only).

A conditional distribution is a distribution of one variable when knowing the value of another
variable. Consider two variables, X and Y. Suppose I tell you that I already know Y = 1 and I
ask you to construct a distribution for X. You would now be constructing a conditional
distribution, because you are constructing a bar graph for X under the condition that you
already know Y = 1.

As a general rule of thumb the word ‘distribution’ means that you are distributing 100% of
probability among the possible variable values. So, in each of the distributions cited above, the
probabilities should always add to 100%. If you ever construct a distribution where the
probabilities do not add up to one (whatever type of distribution it is) you have made an error.

• Joint Distributions

As a particular example of these concepts please see Handout #30 and Handout #31. In
Handout #30 a tree model has been set up which defines three different variables. Handout #31
shows three different joint distribution tables, one for X and Y, another for X and Z and finally
one for Y and Z.

These joint distribution tables are derived from directly adding up intersection probabilities in
the tree. First, though, let us make sure we are clear on the words and symbols. The
probabilities in a joint distribution table are ‘and’ statements. Looking at the upper-left cell in
the last table on Handout #31 (the example done in class) we are saying, “the probability that Y
= – .10 and Z = 0 is .0968.” That is, the chance that Y = – .10 and Z = 0 jointly is .0968 (hence,
joint distribution).

Notationally, we use a comma for the word ‘and.’ That is, the notation for the phrase, “the
probability that Y = – .10 and Z = 0 is .0968” would be P(Y = – .10, Z = 0) = .0968.
Handout #32
Page 2 of 4

How did I arrive at the probability .0968? I just went into the tree model and found all those
paths where Y = – .10 and also Z = 0. This is the 9th path of the tree along with the last path of
the tree. Adding intersections gives us .0289 + .0679 = .0968. Likewise, all other entries in the
table are found by adding appropriate intersection probabilities.

• Marginal Distributions

Once a joint distribution table is composed, it is usually augmented with the marginal
distributions of each variable. Look again at the last table in Handout #31, which gives the
joint distribution for Y and Z. Recall that the marginal distribution is the distribution for one
variable only. So, to ask, “What is the probability that Y = – .10?” is to ask for one of the
probabilities in the marginal distribution of Y.

But, if you have a joint distribution for two variables it is an easy matter to deduce the marginal
distribution of just one of them. How does one determine the probability that Y = – .10?
Merely look at all the instances of this in the joint distribution table and add them up. In the
specific example on the bottom of Handout #31, finding the probability that Y = – .10 is simply
a matter of adding the first column in the joint distribution table. This gives .0968 + .0539 +
.0488 = .1995. Likewise, to find the marginal probabilities P(Y = .05) and P(Y = .20), merely
add the second and third columns of the joint distribution table respectively, giving .317 and
.4835.

The complete marginal distribution of Y is then written as a row under the joint distribution
table. Written as a list, the marginal distribution of Y is P(Y = – .10) = .1995, P(Y = .05) =
.317 and P(Y = .20) = .4835.

In similar fashion we can deduce the marginal distribution of Z by adding the rows of the joint
distribution table (as opposed to the columns). The marginal distribution for Z shows up as the
last column in the table.

So, overall, the joint distribution constitutes the main body of the table, with the individual
marginal distributions showing up as the last row and last column of the table.

• Conditional Distributions

Please take a look at part (c) of the Week 4 Worksheet. This question asks us for the
distribution on stock performance given that the company chose not to expand operations.
In the absence of any information, what distribution would we write down to reflect our belief?
Well, since the stock performance is labeled as variable Y in the model, we would initially
write down the marginal distribution of Y.
Handout #32
Page 3 of 4

But, now we are told a further piece of information. Specifically, we are told that the company
chose not to expand operations. So, really, we need to write down a distribution for Y under
the condition that the company chose not to expand operations. That is, we need to construct a
conditional distribution.

First off, what does it mean in ‘variable-speak’ for the company not to expand operations?
Since Z is the variable that measures new job opportunities resulting from expansion, we are
being told that Z = 0. So, ultimately, we need to determine the distribution of Y given we
know that Z = 0.

Fortunately, the methods used in Week 2 of the class work equally well here. These
probabilities can be found by constructing a simple probability tree.

Take a look at the third table in Handout #31 (the one which relates Y and Z, the two variables
involved in this exercise). The only observation to make here is to ask, “What type of
probabilities show up in a joint distribution table?” Because they are ‘and’ statements, the
probabilities are intersection probabilities. Likewise, the marginal probabilities would show up
as first-stage probabilities in a tree.

Recalling that the objective here is to determine the distribution of Y given we know
Z = 0, the third table in Handout #31 can be used to construct the following (partial) probability
tree:
.40
Y = - .10 .0968
.242 .3793
Z=0 Y = .05 .0918
.2207 Y = .20 .0534

Notice that the joint probabilities are the intersections of the tree, and the marginal probability
is on the first stage of the tree. The conditional probabilities can then be found by division, as
has been our custom. Taken together, the conditional probabilities make up the conditional
distribution of Y given that Z = 0.

The notation here is the same as Week 2. Namely, we use a vertical line to express the word
‘given.’ Thus the notation P(Y = .05 | Z = 0) is read, “the probability that Y = .05 given I know
that Z = 0.”

Overall, then, the conditional distribution of Y given that Z = 0 (in list form as opposed to a bar
graph) is:
Handout #32
Page 4 of 4

P(Y = – .10 | Z = 0) = .40

P(Y = .05 | Z = 0) = .3793

P(Y = .20 | Z = 0) = .2207

So, unlike the joint and marginal distributions, the conditional distribution needs to be derived
by constructing a simple probability tree model (as opposed to blatantly reading it off the joint
distribution table).
Handout #33
Econ 102A Statistical Methods for Social Scientists Page 1 of 3

Random Variable Independence

Previously, we emphasized that the concept of ‘conditioning’ carries over from regular
probabilistic events to random variables. That is, it not only makes sense to consider
conditional probabilities like P(Drug User | Positive Drug Test) but also conditional
probabilities on random variables, such as P(Y = – .10 | Z = 0). The process used to find
conditional probabilities for random variable values is exactly the same as it was for regular
probabilistic events: just draw a tree and compute the probabilities on the second stage. In our
lecture example, this process gave us the conditional probabilities for Y given that
Z = 0. Take as a group, all these conditional probabilities make up the conditional distribution
of Y given that we know Z = 0. In notational terms, the conditional distribution is given as:
P(Y = – .10 | Z = 0) = .40
P(Y = .05 | Z = 0) = .3793
P(Y = .20 | Z = 0) = .2207

But, not only does the concept of ‘conditioning’ carry over from our past work, but so too does
the concept of ‘independence.’ That is, just as probabilistic events can be ‘independent’ or
‘dependent,’ so too are random variables either ‘independent’ or ‘dependent.’ And, the litmus
test for determining independence for random variables is exactly the same as the litmus test
for probabilistic events.

Recall that the litmus test for probabilistic events is P(A) P(B) = P(AB). If this equation holds
true for a probabilistic situation, the events are independent, otherwise they are dependent.
Likewise, the litmus test for random variables is similar: if you have two variables X and Y the
litmus test is:

P(X = X0) P(Y = Y0) = P(X = X0, Y = Y0)

The notation may be disconcerting for some, but this is the exact same litmus test as before.

As an example, consider again the exercise from the Week 4 Worksheet. Are variables Y and
Z dependent or independent?

Well, let us apply the litmus test. Start by choosing any particular value of Y and also any
particular value of Z. I will arbitrarily choose Y = .05 and Z = 0. Now, I need to consider three
probabilities (which I am getting from the joint distribution table of Handout #31).
Handout #33
Page 2 of 3

They are: P(Y = .05) = .317


P(Z = 0) = .242
P(Y = .05, Z = 0) = .0918

Does our litmus test work out? Let’s check:

The test is: P(Y = Y0) P(Z = Z0) = P(Y = Y0, Z = Z0)

which translates as:

P(Y = .05) P(Z = 0) = P(Y = .05, Z = 0)

The left hand side equates to (.317) (.242) = .076714, while the right hand side is directly given
by .0918. Since the left hand side is different than the right hand side, they are dependent
variables.

So, in retrospect, the litmus test for variables is the same as the litmus test for regular
probabilistic events. If the litmus test works with equality, then independent; if the litmus test
gives an inequality, then dependent.

There is one caveat here. Suppose the litmus test using Y = .05 and Z = 0 did work out so that
left hand side = right hand side. Can we conclude that the variables are independent?
Actually, not yet! We would then need to test the other values of the variables. That is, we
would need to do the litmus test for Y = .05 and Z = 3000 and then again for Y = – .10 and
Z = 0 and again for Y = – .10 and Z = 3000 and again and again and again until we have
exhausted ALL combinations of Y’s and Z’s (there are nine combinations in all). The variables
are only independent if the litmus test works out for ALL nine possible combinations.

So, overall, once we find any combination where the litmus test fails, we know the variables
are dependent. End of story. This was the case we had above for Y = .05 and Z = 0.

As stated above, if the litmus test works out, we have to find another combination of the
variables and test again. It is only if the litmus test works for all possible combinations of the
variables that we can conclude the variables are independent. (Said another way, if the litmus
test worked out for the first five combinations of variables that we test, but failed on the sixth
combination, the variables would then be dependent.)

As a concrete example of this process, please see Week 4 Practice Exercises, Question 9.
Handout #33
Page 3 of 3

Postscript:

At this point you might be wondering why we even care whether variables are dependent or
independent. After all, what use does all this have? The answer is …. A LOT. What may at
this point simply look like mathematical gibberish, is actually an extremely fundamental
concept in statistics. As we will later see in the course, a prerequisite to performing inferential
statistics on variables is that those variables be independent. If the variables are not
independent, the calculations we eventually perform on our data set will be nonsense!

What does this mean in a practical sense? Suppose we conduct a survey on how often an
average person attends a movie theater in a given year. For this survey, is it appropriate for us
to consult both a random person and their spouse as two separate subjects in our survey? No!
Because the response given by the random person and the response given by their spouse
would likely influence each other (since, one hopes, a married couple would often attend the
movie together). So, one person’s response is influenced by their spouse’s response and
vice-versa. That is, the responses are not independent (i.e., the responses are correlated). And,
as mentioned, if the survey responses are not independent the ensuing statistical calculations
we perform in this course will be meaningless.

So, in the end, independence of your survey responses (i.e., the random variables) is an
absolute necessity in many statistical applications. Although more will be said on this later,
here we just plant the seed that this concept of independence is actually quite important.
Handout #34
Econ 102A Statistical Methods for Social Scientists Page 1 of 4

Applications of Class Concepts

• Binomial Options Pricing

Consider a stock which sells for $62 per share. A certain call option on this stock has an
expiration date 5 months from now and a strike price of $60. From month to month the stock
price either goes up or goes down. If it goes up, the stock’s new value is 1.0594*(old value)
whereas if it goes down the stock’s new value is 1/1.0594*(old value) = .9439*(old value).
That is, from month to month the stock price changes by roughly  6%.

We believe the chance the stock goes up in any particular month is .55, with independence
from month to month. Determine the fair price of the option (ignoring interest rates).

• Optimal Portfolio Weights

Consider two independent investments with different expected values and variances.
Specifically, Investment A has an expected return of 5% and a variance of .08, while
Investment B has an expected return of 15% and a variance of .30.

You are going to invest a fraction (or weight) w of your money in Investment A and
(1-w) of your money in Investment B.

(a) Consider a graph which displays expected value on the vertical axis and standard
deviation on the horizontal axis. Plot the two points corresponding to Investment A and
Investment B on this graph. Determine how the graph changes as w ranges from 0 to 1.

(b) Determine the value of w such that you incur the lowest risk possible (i.e., the lowest
variance or standard deviation possible).

(c) Solve the problem generally. That is, if Investment A has variance σ 2A and
Investment B has variance σ 2B , determine which weights among the investments minimize
overall risk.
Handout #34
Page 2 of 4

Solutions

• Binomial Options Pricing

The Setup:

Stock Price Value of


in 5 Probability Option
Months
$82.75
$78.11
$73.72
$73.72
$69.59
$65.68 $69.59
$65.68
$65.68
$62.00
$62.00
$58.52 $58.52 $62.00
$58.52
$55.24
$52.14 $55.24
$52.14
$49.21

$46.45

Fair Price = Expected Value

= .0503 ($22.75) + .2059 ($13.72) + .3369 ($5.68) + .4069 ($0.00)

= $5.88

• Optimal Portfolio Weights

(a) The expected returns and standard deviations for various weights between the two
investments are given by (please note the final two columns):

Portfolio Portfolio Portfolio


w
Variance Std Dev Exp Return
0.00 0.300 0.548 0.150
0.05 0.271 0.521 0.145
0.10 0.244 0.494 0.140
0.15 0.219 0.467 0.135
0.20 0.195 0.442 0.130
0.25 0.174 0.417 0.125
Handout #34
Page 3 of 4

Portfolio Portfolio Portfolio


w
Variance Std Dev Exp Return
0.30 0.154 0.393 0.120
0.35 0.137 0.370 0.115
0.40 0.121 0.348 0.110
0.45 0.107 0.327 0.105
0.50 0.095 0.308 0.100
0.55 0.085 0.291 0.095
0.60 0.077 0.277 0.090
0.65 0.071 0.266 0.085
0.70 0.066 0.257 0.080
0.75 0.064 0.252 0.075
0.80 0.063 0.251 0.070
0.85 0.065 0.254 0.065
0.90 0.068 0.260 0.060
0.95 0.073 0.270 0.055
1.00 0.080 0.283 0.050

Graphing the portfolio expected return against the portfolio standard deviation gives:

0.160
0.140
0.120 100% Inv. B
Expected Return

0.100 (i.e., w = 0)
0.080
0.060
0.040 100% Inv. A
0.020 (i.e., w = 1)
0.000
0.100 0.200 0.300 0.400 0.500 0.600 0.700
Standard Deviation

(b) Upon observation of the table in part (a), we will obtain the least risky portfolio when
allocating roughly 80% of our money in Investment A. (For an exact solution see part (c).)
Handout #34
Page 4 of 4

(c) When allocating weight w to Investment A and (1-w) to Investment B, the overall portfolio
variance is:

w2 σ 2A + (1-w)2 σ 2B

To find the w which minimizes variance, we differentiate with respect to w and set equal to
zero. This gives:

2w σ 2A – 2(1-w) σ 2B = 0

 2w σ 2A – 2 σ 2B + 2w σ 2B = 0

 w σ 2A – σ 2B + w σ 2B = 0

σ 2B
 w =
σ 2A  σ 2B

(Notice that the second derivative with respect to w is 2 σ 2A + 2 σ 2B > 0 which implies that
we are indeed minimizing variance as opposed to maximizing variance.)

Applying this formula to our specific situation the weight, w, on Investment A which
minimizes variance is therefore:

σ 2B .30
w = =  79%
σ 2A  σ 2B .08  .30
Handout #35
Econ 102A Statistical Methods for Social Scientists Page 1 of 1

Optimal Investing

Consider five investment opportunities. You have a pool of money that you intend to invest in
various proportions among these five alternatives. We assume that each investment is
independent of any other investment. The historical (annual) expected value and standard
deviation for each investment is as follows:

Investment Expected Value Standard Deviation

Investment 1 10.70% 11.80%

Investment 2 7.00% 9.17%

Investment 3 16.20% 17.49%

Investment 4 12.90% 14.16%

Investment 5 3.81% 6.28%

Consider the following:

(a) Determine the proportion of your money that should be allocated to each investment
in order to generate the highest overall expected return.

(b) Determine the proportion of your money that should be allocated to each investment
in order to generate the lowest overall standard deviation.

(c) Determine the highest expected return possible if we are willing to accept an overall
standard deviation of 7.50% or less.

(d) Determine the least risky way to have an expected return of at least 10.50%.
Handout #36
Econ 102A Statistical Methods for Social Scientists Page 1 of 4

Week 4 Practice Exercises

1. The probability distribution for the damage claims paid by the Newton Automobile
Insurance Company on collision insurance is as follows:

Payment
Probability
Amount (X)

$0 .90
$400 .04
$1000 .03
$2000 .01
$4000 .01
$6000 .01

(a) Determine the collision premium that the company should charge its customers in
order to break even.

(b) Suppose the insurance company charges an annual rate of $260 for the collision
coverage. Determine the expected value of the collision policy for the policyholder.
Why would a policyholder purchase a collision policy with this expected value?

2. The J. R. Ryland Computer Company is considering a plant expansion that will enable the
company to begin production of a new computer product. The company’s president must
determine whether to make the company’s expansion a medium-scale or large-scale project.
An uncertainty is the demand for the new product, which for planning purposes may be low
demand, medium demand or high demand. The probability estimates for demand are
shown in the table below. Letting X and Y indicate the annual profit in $1000s, the firm’s
planners have developed the following profit forecasts for the medium-scale and large-scale
expansions:

Medium-Scale Large-Scale
Expansion Profits Expansion Profits

X P(X) Y P(Y)

Low 50 .20 0 .20


Demand Medium 150 .50 100 .40
High 200 .30 300 .40
Handout #36
Page 2 of 4

(a) Compute the expected value of the profit associated with the two expansion
alternatives. Which decision is preferred for the objective of maximizing the
expected profit?

(b) Compute the variance of the profit associated with the two expansion alternatives.
Which decision is preferred for the objective of minimizing the variance (or risk)?

3. The following gambling game, known as chuck-a-luck, is quite popular at many carnivals
and gambling casinos: A player bets on one of the numbers 1 through 6. Three dice are
then rolled, and if the number bet by the player appears i times, i = 1, 2, 3, then the player
not only gets their initial bet back but also wins i more dollars for every dollar bet. For
example, if the number appears 2 times, the player receives their initial bet back, plus two
more dollars. On the other hand, if the number bet by the player does not appear on any of
the dice, then the player loses the money he or she bet. Assuming that the player bets
$1.00, let X represent the player’s net gain after one play of the game.

(a) Determine the probability distribution of X.

(b) Determine the expected value, variance and standard deviation of X.

4. (Based on Exercise 5.41 from text) In lecture, we saw that probabilities for continuous
random variables are deciphered by computing proportionate areas. This exercise is meant
to provide practice on this concept.

Choose a point at random in the square with sides 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. Notice that the
total area of this square is one (i.e., 100%). Therefore, the probability that the point falls in
any region within the square is equal to the proportionate area of that region. Let X be the
x-coordinate and Y be the y-coordinate of the point chosen.

(a) Determine the intersection probability P(X ≥ .70 and Y ≤ .40).

(b) Determine the conditional probability P(Y ≤ .50 | Y ≥ X).

(Hint: Think visually. Draw a diagram and shade in the associated areas. Determine
probabilities as proportionate areas in your diagram.)

5. When a certain car breaks down, the time that it takes to fix it (in hours) is a continuous
random variable with the probability density function:

Ce 3x
 if 0  x  
f(x)  

0 otherwise
Handout #36
Page 3 of 4

(a) Determine the value of C in order that f(x) be a valid density function.

(b) Determine the probability that, when this car breaks down, it takes at most 30 minutes
to fix it.

(c) Suppose the car has broken down and it is currently in the auto body shop being
attended to by a mechanic. You have already spent 30 minutes in the waiting room
while the mechanic has been fixing the car. Determine the probability that it will take
more than 1 hour and 15 minutes total to fix the car.

6. Consider a random variable, X, such that the distribution of X is: P(X = 3) = .25,
P(X = 6) = .50 and P(X = 9) = .25. Further, consider a random variable, Y, such that the
distribution of Y is: P(Y = 2) = .20, P(Y = 4) = .30, P(Y = 7) = .40 and P(Y = 8) = .10.

(a) Compute E(X + Y).

(b) Compute both the variance and standard deviation of X + Y, assuming that X and Y
are independent random variables.

7. The time elapsed, in minutes, between the placement of an order of pizza and its delivery is
a random variable with the probability density function:

1 15 if 25  x  40
f(x)  
 0 otherwise

(a) Determine the expected value and standard deviation of the time it takes for the
pizza shop to deliver a pizza.

(b) Suppose that it takes 12 minutes for the pizza shop to bake a pizza. By using the
properties of expected value and variance, determine the expected value and standard
deviation of the time it takes for the delivery person to deliver the pizza after baking.

8. The life span of a particular mechanical part (in months) is a random variable described by
the probability density function:

f(x)

.40

x
1 2 3
Handout #36
Page 4 of 4

Determine the expected value, variance and standard deviation of the life span of one such
mechanical part.

9. Suppose that 15% of all first-year Stanford graduate students choose to not own a car
during their stay in Palo Alto. Further, suppose that 50% of these Stanford students own
one car while 35% own two cars (for their spouses needs). Among the students who own
cars, we have the following probabilistic structure:

• Among students who own one car, 54% bought a foreign model of car.

• Among students who own two cars, 40% initially bought a foreign model of car. Of
those who initially bought a foreign model of car, 70% purchased another foreign model
as their second car. Of those who initially bought a domestic model of car, 58%
purchased another domestic model as their second car.

Consider choosing a random first-year Stanford graduate student. Define the random
variables:

X = the number of foreign cars a random student owns

Y = the number of domestic cars a random student owns

(a) Construct a table giving the joint and marginal distributions of X and Y.

(b) Suppose a random first-year Stanford graduate student is chosen and you know for a
fact that this student has exactly 1 foreign car. Given this, determine the probability
distribution of Y for this student.

(c) Determine whether or not X and Y are independent random variables.


Handout #37
Econ 102A Statistical Methods for Social Scientists Page 1 of 8

Week 4 Practice Exercises – Solutions

1. Newton Automobile Insurance Exercise

(a) Expected collision payment = .90 (0) + .04 (400) + .03 (1000) + .01 (2000)
+ .01 (4000) + .01 (6000) = 166.

 a premium payment of $166.00 allows the company to break even.

(b) A policyholder expects to make $166.00 – $260.00 = – $94.00. That is, the
policyholder expects to lose $94.00 annually. Policyholders purchase insurance (even
though they expect to lose money) in order to have protection in the event that they are
involved in a collision which incurs a large expense. That is, we purchase insurance
policies to hedge against the risk that bad outcomes happen; we are willing to take an
expected moderate loss in order to avoid being exposed to the possibility of a huge loss
(albeit with low probability) due to risk aversion.

2. J. R. Ryland Computer Company Exercise

(a) E(X) = μ X = .20 (50) + .50 (150) + .30 (200) = 145

E(Y) = μ Y = .20 (0) + .40 (100) + .40 (300) = 160

 if J. R. Ryland wishes to maximize expected profit, the Large-Scale


Expansion is preferred.

(b) Var(X) = x2 = .20 (2500) + .50 (22500) + .30 (40000) – (145)2 = 2725

Var(Y) = y2 = .20 (0) + .40 (10000) + .40 (90000) – (160)2 = 14400

 if J. R. Ryland wishes to minimize variance (or, equivalently, minimize


standard deviation), the Medium-Scale Expansion is preferred.

3. Chuck-a-Luck Exercise

(a) If the chosen number appears 0 times, the net gain is - $1.00
If the chosen number appears 1 time, the net gain is $1.00
If the chosen number appears 2 times, the net gain is $2.00
If the chosen number appears 3 times, the net gain is $3.00

Whether or not the number chosen appears on any one of the die is a Bernoulli random
variable with P(yes) = 1/6.

 the total number of times the chosen number appears is binomially distributed
Handout #37
Page 2 of 8

 3
P(X = -1) = P(number appears 0 times) =   (1/6)0 (5/6)3 = .5787
0

 3
P(X = 1) = P(number appears 1 time) =   (1/6)1 (5/6)2 = .3472
1

 3
P(X = 2) = P(number appears 2 times) =   (1/6)2 (5/6)1 = .0694
 2

 3
P(X = 3) = P(number appears 3 times) =   (1/6)3 (5/6)0 = .0046
 3

In graphical form, the probability distribution of X is:

1
0.9
0.8
0.7
0.5787
0.6
P(X)

0.5
0.4 0.3472
0.3
0.2
0.0694
0.1 0.0046
0
-1 1 2 3
X

(b) Based on the above distribution, we have:

E(X) = μ X = .5787 (-1) + .3472 (1) + .0694 (2) + .0046 (3) = – .0789

Var(X) = x2 = .5787 (1) + .3472 (1) + .0694 (4) + .0046 (9) – (– .0789)2 = 1.2387

 x = 1.2387 = 1.1130

4. Probability as Area Exercise

(a) The visual is given by: Y

.40

X
.70 1
Handout #37
Page 3 of 8

The shaded area is rectangular, so the proportionate area can be computed using
geometry as (length) * (width). This gives P(X ≥ .70 and Y ≤ .40) = (.30) (.40) = .12.

(b) The visual is given by: Y

.50

X
.50 1

Since it is a given fact that Y ≥ X, we know we are somewhere within the light-shaded
triangle in the diagram (which overlaps the dark-shaded triangle). Therefore,
P(Y ≤ .50 | Y ≥ X) = (area of dark-shaded triangle) / (area of light-shaded triangle).
Recalling from geometry that the area of a triangle is (1/2) * (base) * (height) we can
compute:

P(Y ≤ .50 | Y ≥ X) =
1 / 21 / 21 / 2 = .25
1 / 211

5. Car Maintenance Exercise

(a) In order to be a valid probability density function, the area under the curve must equal
one. Therefore, we must have:

3x
 Ce dx = 1
0

  C 3x  x 
  e  =1
 3  x 0

C
 0–( )=1  C=3
3

(b) Recalling that x is measured in hours (and 30 minutes = ½ hour) the probability of
fixing the car in half-an-hour or less is:
1/2
3x
P(0 ≤ x ≤ 1/2) =  3e dx
0


=  e 3x  x 1/2

x 0
= – e-3/2 – (– 1) = 1 – e-3/2 = .7769
Handout #37
Page 4 of 8

(c) We want P(x > 1.25 | x > 0.50). This shows up as the conditional probability on the
following probability tree branch:

Time car fixed


is > 1 hour and
Time car fixed 15 minutes
is > 30 minutes

The probability on the first stage of this tree branch is simply 1 – .7769 = .2231 from
part (b).

Then, P(x > 0.50 and x > 1.25) = P(x > 1.25) and can be computed as:

P(x > 1.25) = 1 – P(x ≤ 1.25).

Now, P(x ≤ 1.25) can be computed from the probability density function as:

 
1.25
3x x 1.25
 3e dx =  e 3x = – e-3.75 – (– 1) = 1 – e-3.75 = .9765.
x 0
0

Therefore, P(x > 0.50 and x > 1.25) = P(x > 1.25) = 1 – P(x ≤ 1.25) = 1 – .9765
= .0235.

From the above calculations, we can now fill in the above probability tree branch as
follows:
Time car fixed
is > 1 hour and .0235
.2231 Time car fixed 15 minutes
is > 30 minutes

Then, P(x > 1.25 | x > 0.50) can be found through division (since it is the conditional
probability on the above tree branch) as .0235 / .2231 = .1053.

6. Random Variable Summation Exercise

(a) In general, E(X + Y) = E(X) + E(Y)


E(X) = μ X = .25 (3) + .50 (6) + .25 (9) = 6.00
E(Y) = μ Y = .20 (2) + .30 (4) + .40 (7) + .10 (8) = 5.20
 E(X + Y) = E(X) + E(Y) = 6.00 + 5.20 = 11.20
Handout #37
Page 5 of 8

(b) By independence, Var(X + Y) = Var(X) + Var(Y)

Var(X) = x2 = .25 (9) + .50 (36) + .25 (81) – (6.00) 2 = 4.50
Var(Y) = y2 = .20 (4) + .30 (16) + .40 (49) + .10 (64) – (5.20) 2 = 4.56
 Var(X + Y) = Var(X) + Var(Y) = 4.50 + 4.56 = 9.06
 x+y = 9.06 = 3.01

7. Pizza Delivery Exercise

 40
x
(a) Expected Value = E(x) = μ x =  x f(x) dx =  15 dx
 25

 x 2  x  40
=  = 1600/30 – 625/30 = 975/30 = 32.50 minutes.
 30  x  25

 40
x2
To find the variance, we first compute E(x2) =  x 2 f(x) dx =  15
dx
 25

 x 3  x  40
=  = 64000/45 – 15625/45 = 48375/45 = 1075.
 45  x  25

Then, variance = E(x2) – [E(x)]2 = 1075 – (32.50)2 = 18.75.

 standard deviation = 18.75 = 4.33 minutes.

(b) The initial range on total time taken from order to delivery-at-your-door is 25
minutes to 40 minutes. We are now told that baking the pizza consumes a (for sure)
12 minutes of that total time. That is, we have:

Total time from order to delivery = (Baking time) + (Time to delivery after baking)

So, we have:

E(Total time) = E[(Baking time) + (Time to delivery after baking)]

and, by the properties of expected value, this becomes:

E(Total time) = E(Baking time) + E(Time to delivery after baking)

But, E(Baking time) = 12 minutes and, from part (a), we have E(Total time from order
to delivery) = 32.50. So:
Handout #37
Page 6 of 8

32.50 = 12 + E(Time to delivery after baking)

 E(Time to delivery after baking) = 32.50 – 12 = 20.50 minutes.

In terms of variance,

Var(Total Time) = Var(Baking time + Time to delivery after baking)

Var(Total Time) = Var(12 + Time to delivery after baking)

But, recalling the property of variance which states that Var(X + c) = Var(X) (which is
the setup we have here) it follows that Var(Total Time) = Var(Time to delivery after
baking) = 18.75.

So, the standard deviation of the time to delivery after baking is 4.33, exactly as found
in part (a). Notice this makes intuitive sense since there is absolutely no variance in the
baking time; it is always 12 minutes exactly. So, the variance of the total time is all
captured by the variance of the time-to-delivery after baking.

8. Mechanical Part Exercise

We are given a visual of the probability density function, but not its functional form. We
can easily see that f(x) = .40 for the range from 0 ≤ x ≤ 2. We also note that the probability
density function is linear from x = 2 to x = 3. Since two points on this line are (2, .40) and
(3, 0) we can decipher the equation for the line. Ultimately, this equation is given by
f(x) = (-2/5)x + (6/5) for 2 ≤ x ≤ 3. Therefore, the probability density function is given by:

 .40 for 0  x  2

f(x)  (2 / 5) x  (6 / 5) for 2  x  3
 0
 otherwise

The expected value is therefore given by:


 2 3
2 6
 x f(x) dx =  .40 x dx +  ( 5 x  x) dx
2
Expected Value = E(x) = μ x =
 0 2
5


= .20x 2  x 2

x 0
 2 6 
+  x 3  x 2 
 15 10 
x 3

x 2
= .80 – 3.60 + 5.40 + 1.067 – 2.40 = 1.267.


2
x
2
To find the variance, we first compute E(x ) = f(x) dx

Handout #37
Page 7 of 8

2 3
2 6 2
 .40 x  ( 5 x 
2 3
= dx + x ) dx
0 2
5

 .40 3  x 2  2 4 6 3 x 3
= x  +  x  x 
 3  x 0  20 15  x 2

= 1.067 – 8.10 + 10.80 + 1.60 – 3.20 = 2.167.

Then, Var(x) = E(x2) – μ 2x = 2.167 – (1.267)2 = .5617.

 x = 0.5617 = .7495.

9. Stanford First-Year Graduate Student Survey

The probability tree is given by:

Number of Cars Owned? First Car? Second Car? Probability X Y

.15
0 .15 0 0

.54
Foreign .27 1 0
.50
1
.46 0 1
Domestic .23
.70
Foreign .098 2 0
.40
Foreign
.30
Domestic .042 1 1
.35
2 .42
Foreign .0882 1 1
.60
Domestic
.58
Domestic .1218 0 2

(a)
X
0 1 2 marginal
0 .1500 .2700 .0980 .5180
Y 1 .2300 .1302 0 .3602
2 .1218 0 0 .1218
marginal .5018 .4002 .0980 1.000
Handout #37
Page 8 of 8

(b) We need to find the conditional distribution P(Y = y0 | X = 1). We can find this
distribution either through a probability tree model or through the equation for
conditional probability. Using equations, we have P(Y = y0 | X = 1) =
P(Y = y0, X = 1) / P(X = 1)

Therefore, the conditional distribution is given by:


P(Y = 0 | X = 1) = P(Y = 0, X = 1) / P(X = 1) = .27 / .4002 = .6747
P(Y = 1 | X = 1) = P(Y = 1, X = 1) / P(X = 1) = .1302 / .4002 = .3253
P(Y = 2 | X = 1) = P(Y = 2, X = 1) / P(X = 1) = 0 / .4002 = 0

(c) To check independence, we consider whether

P(X = x0, Y = y0) = P(X = x0) P(Y = y0)

for all values of x0 and y0.

Considering the case X = 0 and Y = 0, from the table we have:

P(X = 0) = .5018 and P(Y = 0) = .5180.

Checking the independence formula for this case gives:

P(X = 0, Y = 0) = .1500  .2599324 = (.5018) (.5180) = P(X = 0) P(Y = 0).

So, X and Y are dependent.

Note that this conclusion is also evident because the marginal distribution of Y
(from part (a)) is different than the conditional distribution of Y given that X = 1
(from part (b)). That is, knowing X = 1 influences our thoughts about the distribution
of Y, which is the very definition of dependence.

You might also like