Practice Problems For Midterm 1
Practice Problems For Midterm 1
Practice Problems For Midterm 1
is the number of times that the outcome occurs in the long run.
equals M N, where M is the number of occurrences and N is the population size.
is the proportion of times that the outcome occurs in the long run.
equals the sample mean divided by the sample standard deviation.
Answer: c
)2
)3
)4
Answer: a
The expected value of a discrete random variable
a.
b.
c.
d.
Answer: d
9)
Answer: c
The conditional distribution of Y given X = x, Pr(Y y | X x ) , is
10)
a.
Pr(Y y )
.
Pr( X x )
l
b.
Pr( X x ,Y y ) .
i
i 1
Pr( X x, Y y )
.
Pr(Y y )
Pr( X x, Y y )
d.
.
Pr( X x )
c.
Answer: d
The conditional expectation of Y given X, E (Y | X x ) , is calculated as follows:
11)
a.
y Pr( X x | Y y ) .
i
i 1
b. E[ E (Y | X )] .
k
c.
y Pr(Y y
i
| X x) .
i 1
l
d.
E (Y | X x ) Pr( X x ) .
i
i 1
Answer: c
9)
Two random variables X and Y are independently distributed if all of the following
conditions hold, with the exception of
2
a.
b.
c.
d.
Pr(Y y | X x ) Pr(Y y ) .
knowing the value of one of the variables provides no information about the other.
if the conditional distribution of Y given X equals the marginal distribution of Y.
E (Y ) E [ E (Y | X )] .
Answer: d
9)
10)
Two variables are uncorrelated in all of the cases below, with the exception of
a. being independent.
b. having a zero covariance.
c. | XY | X2 Y2 .
d. E (Y | X ) 0 .
Answer: c
11)
var( aX bY )
a.
b.
c.
d.
a 2 X2 b2 Y2 .
a 2 X2 2ab XY b 2 Y2 .
XY X Y .
a X2 b Y2 .
Answer: b
12)
Answer: a
13)
a.
b.
c.
d.
( d 2 ) ( d1 ) .
(1.96) ( 1.96) .
( d 2 ) (1 ( d1 )) .
1 ( ( d 2 ) ( d1 )) .
Answer: a
The Student t distribution is
a. the distribution of the sum of m squared independent standard normal random
variables.
b. the distribution of a random variable with a chi-squared distribution with m
degrees of freedom, divided by m.
c. always well approximated by the standard normal distribution.
d. the distribution of the ratio of a standard normal random variable, divided by the
square root of an independently distributed chi-squared random variable with m
degrees of freedom divided by m.
Answer: d
17)
Answer: b
18)
Answer: b
4
19)
To infer the political tendencies of the students at your college/university, you sample 150
of them. Only one of the following is a simple random sample: You
a. make sure that the proportion of minorities are the same in your sample as in the
entire student body.
b. call every fiftieth person in the student directory at 9 a.m. If the person does not
answer the phone, you pick the next name listed, and so on.
c. go to the main dining hall on campus and interview students randomly there.
d. have your statistical package generate 150 random numbers in the range from 1 to
the total number of students in your academic institution, and then choose the
corresponding names in the student telephone directory.
Answer: d
20)
2
The variance of Y , Y , is given by the following formula:
a. Y2 .
Y
.
n
Y2
c.
.
n
Y2
d.
.
n
b.
Answer: c
21)
Answer: b
22)
Answer: d
23) The central limit theorem states that
Y Y
becomes arbitrarily well approximated by the standard
Y
normal distribution.
b. Y Y .
c. the probability that Y is in the range Y c becomes arbitrarily close to one as n
increases for any constant c 0 .
d. the t distribution converges to the F distribution for approximately n > 30.
Answer: a
24)
X2
.
Y2
Answer: b
Chapter 3
1)
An estimator is
a.
b.
c.
d.
an estimate.
a formula that gives an efficient guess of the true population value.
a random variable.
a nonrandom number.
6
Answer: c
)2
An estimate is
a.
b.
c.
d.
)3
Answer: b
An estimator Y of the population value Y is consistent if
p
a. Y Y .
b. its mean square error is the smallest possible.
c. Y is normally distributed.
p
d. Y 0 .
Answer: a
)4
Answer: d
5) The standard error of Y , SE (Y ) Y is given by the following formula:
1 n
(Yi Y ) 2 .
i.
n i 1
sY2
j.
.
n
k. sY .
sY
l.
.
n
Answer: d
7
7)
When you are testing a hypothesis against a two-sided alternative, then the alternative is
written as
a. E (Y ) Y ,0 .
b. E (Y ) Y ,0 .
c. Y Y ,0 .
d. E (Y ) Y ,0 .
Answer: d
8)
A scatterplot
a. shows how Y and X are related when their relationship is scattered all over the
place.
b. relates the covariance of X and Y to the correlation coefficient.
c. is a plot of n observations on X i and Yi , where each observation is represented by
the point ( X i , Yi ).
d. shows n observations of Y over time.
Answer: c
9)
The following types of statistical inference are used throughout econometrics, with the
exception of
a.
b.
c.
d.
confidence intervals.
hypothesis testing.
calibration.
estimation.
Answer: c
10)
Answer: b
11)
To derive the least squares estimator Y , you find the estimator m which minimizes
n
e.
(Y m)
i 1
f. | (Yi m) | .
i 1
g.
mY
i 1
h.
(Y m)
i 1
Answer: a
12)
14)
Answer: d
15)
n 1 i 1
1 n
2
(Yi Y ) 2 .
c. sY
n 1 i 1
2
a. sY
1 n 1
(Yi Y ) 2 .
d. s
n 1 i 1
2
Y
Answer: b
16)
Degrees of freedom
a. in the context of the sample variance formula means that estimating the mean uses
up some of the information in the data.
b. is something that certain undergraduate majors at your university/college other
than economics seem to have an amount of.
c. are (n-2) when replacing the population mean by the sample mean.
d. ensure that sY2 Y2 .
Answer: a
17)
a.
.e
.f
.g
Y Y ,0
Y2 .
n
Y Y ,0
t
.
SE (Y )
(Y Y ,0 ) 2
t
.
SE (Y )
1.96.
t
Answer: b
18)
19)
The sample covariance can be calculated in any of the following ways, with the exception
of:
10
a.
b.
c.
d.
1 n
( X i X )(Yi Y ) .
n 1 i 1
1 n
n
X iYi
XY .
n 1 i 1
n 1
1 n
( X i X )(Yi Y ) .
n i 1
rXY sY sY , where rXY is the correlation coefficient.
Answer: c
20)
When the sample size n is large, the 90% confidence interval for Y is
a.
b.
c.
d.
Y
Y
Y
Y
1.96 SE (Y ) .
1.64 SE (Y ) .
1.64 Y .
1.96 .
Answer: b
21)
The standard error for the difference in means if two random variables M and W , when
the two population variances are different, is
sM2 sW2
a.
.
nM nW
sM sW
b.
.
nM nW
c.
1 sM2 sW2
(
).
2 nM nW
d.
sM2 sW2
.
nM nW
Answer: d
22) The following statement about the sample correlation coefficient is true.
11
a. 1 rXY 1.
p
2
b. rXY
corr ( X i , Yi ) .
c. | rXY | 1 .
d. rXY
2
s XY
.
s X2 sY2
Answer: a
23)
Answer: b
Chapter 4
1)
When the estimated slope coefficient in the simple regression model, 1 , is zero, then
a.
b.
c.
d.
R2 = Y .
0 < R2 < 1.
R2 = 0.
R2 > (SSR/TSS).
Answer: c
2)
Answer: b
3)
With heteroskedastic errors, the weighted least squares estimator is BLUE. You should
12
Answer: b
4)
Answer: a
5)
Binary variables
a.
b.
c.
d.
Answer: d
6)
When estimating a demand function for a good where quantity demanded is a linear
function of the price, you should
a.
b.
c.
d.
not include an intercept because the price of the good is never zero.
use a one-sided alternative hypothesis to check the influence of price on quantity.
use a two-sided alternative hypothesis to check the influence of price on quantity.
reject the idea that price determines demand unless the coefficient is at least 1.96.
Answer: b
7)
13
Answer: d
8)
9)
10)
Answer: b
11)
Answer: c
14
12)
The slope estimator, 1, has a smaller standard error, other things equal, if
a.
b.
c.
d.
Answer: a
13)
Answer: b
14)
Answer: d
15)
16)
If the absolute value of your calculated t-statistic exceeds the critical value from the
standard normal distribution, you can
a. reject the null hypothesis.
b. safely assume that your regression results are significant.
15
17)
Under the least squares assumptions (zero conditional mean for the error term, Xi and Yi
being i.i.d., and Xi and ui having finite fourth moments), the OLS estimator for the slope
and intercept
a.
b.
c.
d.
Answer: d
18)
To obtain the slope estimator using the least squares principle, you divide the
a.
b.
c.
d.
Answer: c
19)
Answer: a
20)
dividing the error by the explanatory variable results in a zero (on average).
the sample regression function residuals are unrelated to the explanatory variable.
the sample mean of the Xs is much larger than the sample mean of the errors.
the conditional distribution of the error given the explanatory variable has a zero
16
mean.
Answer: d
21)
Answer: a
22)
Multiplying the dependent variable by 100 and the explanatory variable by 100,000
leaves the
a.
b.
c.
d.
Answer: c
Analytical Questions
Chapter 2
1)
Think of the situation of rolling two dice and let M denote the sum of the number of dots
on the two dice. (So M is a number between 1 and 12.)
(a)
In a table, list all of the possible outcomes for the random variable M together with its
probability distribution and cumulative probability distribution. Sketch both distributions.
Answer:
Outcome
(sum of
dots)
Probability
distribution
Cumulativ
e
probability
0.02
8
0.02
8
0.05
6
0.08
3
0.08
3
0.16
7
0.111 0.13
9
0.27 0.41
8
7
17
10
0.16
7
0.58
3
0.13
9
0.72
2
0.111 0.08
3
0.83 0.91
3
2
11
12
0.05
6
0.97
2
0.028
1.000
distribution
Probability and Cumulative Probability
Distribution of Number of Dots
Probability
10
11
12
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
2
10
11
12
Number of Dots
Probability
(b)
Cumulative Probability
(c)
Looking at the sketch of the probability distribution, you notice that it resembles a normal
distribution. Should you be able to use the standard normal distribution to calculate
probabilities of events? Why or why not?
Answer: You cannot use the normal distribution (without continuity correction) to
calculate probabilities of events, since the probability of any event equals zero.
(d)
Pr(M = 7)
Pr(M = 2 or M = 10)
Pr(M = 4 or M 4)
Pr(M = 6 and M = 9)
Pr(M < 8)
Pr(M = 6 or M > 10)
18
6 1
4 1
; (ii) 0.111 or
; (iii) 1; (iv) 0; (v) 0.583;
36 6
39 9
8 2
.
(vi) 0.222 or
36 9
Probabilities and relative frequencies are related in that the probability of an outcome is
the proportion of the time that the outcome occurs in the long run. Hence concepts of
joint, marginal, and conditional probability distributions stem from related concepts of
frequency distributions.
Answer: (i) 0.167 or
2)
You are interested in investigating the relationship between the age of heads of
households and weekly earnings of households. The accompanying data gives the number
of occurrences grouped by age and income. You collect data from 1,744 individuals and
think of these individuals as a population that you want to describe, rather than a sample
from which you want to infer behavior of a larger population. After sorting the data, you
generate the accompanying table:
Household Income
Y1 $0-under $200
Y2 $200-under $400
Y3 $400-under $600
Y4 $600-under $800
Y5 $800 and >
Age of head of
household
X3
X5
X1
X2
X4
16-under 20 20-under 25 25-under 45 45-under 65 65 and >
80
13
76
90
130
346
86
140
24
8
0
1
19
11
251
110
101
55
6
1
108
84
19
(a)
Calculate the joint relative frequencies and the marginal relative frequencies. Interpret
one of each of these. Sketch the cumulative income distribution.
Answer: The joint relative frequencies and marginal relative frequencies are given in the
accompanying table. 5.2 percent of the individuals are between the age of 20
and 24, and make between $200 and under $400. 21.6 percent of the individuals
earn between $400 and under $600.
Joint Relative and Marginal Frequencies of Age and Income, 1,744 Households
Household Income
Y1 $0-under $200
Y2 $200-under $400
Y3 $400-under $600
Y4 $600-under $800
Y5 $800 and >
Age of head of
household
X3
X5
X1
X2
X4
16-under 20 20-under 25 25-under 45 45-under 65 65 and > Total
0.046
0.007
0.044
0.052
0.075
0.198
0.049
0.080
0.014
0.005
0.227
0.342
0.000
0.001
0.011
0.006
0.144
0.063
0.058
0.032
0.003
0.001
0.216
0.102
0.001
0.001
0.062
0.048
0.001
0.112
20
Percent
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
$0-<$200
$200-<$400
$400-<$600
$600-<800
Income Class
Cumulative Income Distribution
(b)
Calculate the conditional relative income frequencies for the two age categories 16-under
20, and 45-under 65. Calculate the mean household income for both age categories.
Answer: The mean household income for the 16-under 20 age category is roughly $144.
It is approximately $489 for the 45-under 65 age category.
Conditional Relative Frequencies of Income and Age 16-under 20, and 45-under 65,
1,744 Households
Household Income
Y1 $0-under $200
Y2 $200-under $400
Y3 $400-under $600
Y4 $600-under $800
Y5 $800 and >
0.185
0.300
0.000
0.001
0.217
0.118
0.001
0.180
21
(c)
If household income and age of head of household were independently distributed, what
would you expect these two conditional relative income distributions to look like? Are
they similar here?
Answer: They would have to be identical, which they clearly are not.
(d)
Your textbook has given you a primary definition of independence that does not involve
conditional relative frequency distributions. What is that definition? Do you think that
age and income are independent here, using this definition?
Answer: Pr(Y y , X x ) Pr(Y y ) Pr( X x ) . We can check this by multiplying two
marginal probabilities to see if this results in the joint probability. For example,
Pr(Y Y3 ) 0.216 and Pr( X X 3 ) 0.542 , resulting in a product of 0.117,
which does not equal the joint probability of 0.144. Given that we are looking
at the data as a population, not a sample, we do not have to test how close
0.117 is to 0.144.
3)
Math and verbal SAT scores are each distributed normally with N (500,10000) .
(a)
What fraction of students scores above 750? Above 600? Between 420 and 530? Below
480? Above 530?
Answer: Pr(Y>750) = 0.0062; Pr(Y>600) = 0.1587; Pr(420<Y<530) = 0.4061;
Pr(Y<480) = 0.4270; Pr(Y>530) = 0.3821.
(b)
If the math and verbal scores were independently distributed, which is not the case, then
what would be the distribution of the overall SAT score? Find its mean and variance.
Answer: The distribution would be N (1000, 20000) , using equations (2.29) and (2.31) in
the textbook. Note that the standard deviation is now roughly 141 rather than
200.
(c)
Next, assume that the correlation coefficient between the math and verbal scores is 0.75.
Find the mean and variance of the resulting distribution.
Answer: Given the correlation coefficient, the distribution is now N (1000,35000) ,
which has a standard deviation of approximately 187.
(d)
Finally, assume that you had chosen 25 students at random who had taken the SAT exam.
Derive the distribution for their average math SAT score. What is the probability that this
average is above 530? Why is this so much smaller than your answer in (a)?
22
Answer: The distribution for the average math SAT score is N (500, 400) . Pr(Y 530) =
0.0668. This probability is smaller because the sample mean has a smaller
standard deviation (20 rather than 100).
You have read about the so-called catch-up theory by economic historians, whereby nations that
are further behind in per capita income grow faster subsequently. If this is true systematically,
then eventually laggards will reach the leader. To put the theory to the test, you collect data
on relative (to the United States) per capita income for two years, 1960 and 1990, for 24
OECD countries. You think of these countries as a population you want to describe, rather
than a sample from which you want to infer behavior of a larger population. The relevant
data for this question is as follows:
Y
0.023
0.014
.
0.041
0.033
0.625
X1
X2
Y X1
Y2
X 12
X 22
0.770
1.000
.
0.200
0.130
13.220
1.030
1.000
.
0.450
0.230
17.800
0.018
0.014
.
0.008
0.004
0.294
0.00053
0.00020
.
0.00168
0.00109
0.01877
0.593
1.000
.
0.040
0.017
8.529
1.0609
1.0000
.
0.2025
0.0529
13.9164
where X 1 and X 2 are per capita income relative to the United States in 1960 and 1990
respectively, and Y is the average annual growth rate in X over the 1960-1990 period.
Numbers in the last row represent sums of the columns above.
(a)
Calculate the variance and standard deviation of X 1 and X 2 . For a catch-up effect to be
present, what relationship must the two standard deviations show? Is this the case here?
Answer: The variances of X 1 and X 2 are 0.0520 and 0.0298 respectively, with standard
deviations of 0.2279 and 0.1726. For the catch-up effect to be present, the
standard deviation would have to shrink over time. This is the case here.
(b)
Calculate the correlation between Y and X 1 . What sign must the correlation coefficient
have for there to be evidence of a catch-up effect? Explain.
Answer: The correlation coefficient is 0.88. It has to be negative for there to be
evidence of a catch-up effect. If countries that were relatively ahead in the initial
period and in terms of per capita income grow by relatively less over time, then
eventually the laggards will catch-up.
23
4)
Following Alfred Nobels will, there are five Nobel Prizes awarded each year. These are
for outstanding achievements in Chemistry, Physics, Physiology or Medicine, Literature,
and Peace. In 1968, the Bank of Sweden added a prize in Economic Sciences in memory
of Alfred Nobel. You think of the data as describing a population, rather than a sample
from which you want to infer behavior of a larger population. The accompanying table
lists the joint probability distribution between recipients in economics and the other five
prizes, and the citizenship of the recipients, based on the 1969-2001 period.
Joint Distribution of Nobel Prize Winners in Economics and Non-Economics Disciplines, and Citizenship, 1969-2001
Economics Nobel
Prize ( X 0 )
Physics, Chemistry,
Medicine, Literature,
and Peace Nobel
Prize ( X 1 )
Total
(a)
U.S. Citizen
(Y 0 )
0.118
Non-U.S. Citizen (
Y 1)
0.049
Total
0.167
0.345
0.488
0.833
0.463
0.537
1.00
(b)
(c)
A randomly selected Nobel Prize winner reports that he is a non-U.S. citizen. What is the
probability that this genius has won the Economics Nobel Prize? A Nobel Prize in the
other five disciplines?
Answer: There is a 9.1 percent chance that he has won the Economics Nobel Prize, and a
90.9 percent chance that he has won a Nobel Prize in one of the other five
disciplines.
(d)
.
Show what the joint distribution would look like if the two categories were independent.
Answer:
24
Joint Distribution of Nobel Prize Winners in Economics and Non-Economics Disciplines, and Citizenship, 1969-2001, under assumption of
independence
Economics Nobel
Prize ( X 0 )
Physics, Chemistry,
Medicine, Literature,
and Peace Nobel
Prize ( X 1 )
Total
7)
U.S. Citizen
(Y 0 )
0.077
Total
0.167
0.386
0.447
0.833
0.463
0.537
1.00
A few years ago the news magazine The Economist listed some of the stranger
explanations used in the past to predict presidential election outcomes. These included
whether or not the hemlines of womens skirts went up or down, stock market
performances, baseball World Series wins by an American League team, etc. Thinking
about this problem more seriously, you decide to analyze whether or not the presidential
candidate for a certain party did better if his party controlled the house. Accordingly you
collect data for the last 34 presidential elections. You think of this data as comprising a
population which you want to describe, rather than a sample from which you want to
infer behavior of a larger population. You generate the accompanying table:
Joint Distribution of Presidential Party Affiliation and Party Control of House of Representatives, 1860-1996
Democratic
President ( X 0 )
Republican
President ( X 1 )
Total
(a)
Democratic Control
of House ( Y 0 )
0.412
Republican Control
of House ( Y 1 )
0.030
Total
0.441
0.176
0.382
0.559
0.588
0.412
1.00
Interpret one of the joint probabilities and one of the marginal probabilities.
Answer: 38.2 percent of the presidents were Republicans and were in the White
House while Republicans controlled the House of Representatives. 44.1 percent of
all presidents were Democrats.
25
(b)
(c)
If you picked one of the Republican presidents at random, what is the probability that
during his term the Democrats had control of the House?
Answer: E ( X ) 0.559 . 55.9 percent of the presidents were Republicans.
E ( X | Y 0) 0.299 . 29.9 percent of those presidents who were in office while
Democrats had control of the House of Representatives were Republicans. The
second conditions on those periods during which Democrats had control of the
House of Representatives, and ignores the other periods.
(d) What would the joint distribution look like under independence? Check your results by
calculating the two conditional distributions and compare these to the marginal
distribution.
Answer:
Joint Distribution of Presidential Party Affiliation and Party Control of House of Representatives, 1860-1996, under the Assumption of
Independence
Democratic
President ( X 0 )
Republican
President ( X 1 )
Total
Democratic Control
of House ( Y 0 )
0.259
Republican Control
of House ( Y 1 )
0.182
Total
0.441
0.329
0.230
0.559
0.588
0.412
1.00
0.259
0.440 (there is a small rounding error).
0.588
0.230
Pr(Y 1| X 1)
0.411 (there is a small rounding error).
0.559
Pr( X 0 | Y 0)
8)
p f (u u ) ,
where p is the actual inflation rate, is the expected inflation rate, and u is the
unemployment rate, with indicating equilibrium (the NAIRU Non-Accelerating
Inflation Rate of Unemployment). Under the assumption of static expectations ( = p1
26
), i.e. that you expect this periods inflation rate to hold for the next period (the sun
shines today, it will shine tomorrow), then the prediction is that inflation will accelerate
if the unemployment rate is below its equilibrium level. The accompanying table below
displays information on accelerating annual inflation and unemployment rate differences
from the equilibrium rate (cyclical unemployment), where the latter is approximated by a
five = year moving average. You think of this data as a population which you want to
describe, rather than a sample from which you want to infer behavior of a larger
population. The data is collected from United States quarterly data for the period 1964:1
to 1995:4.
Joint Distribution of Accelerating Inflation and Cyclical Unemployment,
1964:1-1995:4
p p1 0 (
X 0)
p p1 0 (
X 1)
Total
(a)
(u u ) 0
(Y 0 )
0.156
(u u ) 0
(Y 1 )
0.383
0.539
0.297
0.164
0.461
0.453
0.547
1.00
Total
(b)
(c)
27
(d)
You randomly select one of the 59 quarters when there was positive cyclical
unemployment ( (u u ) 0 ) . What is the probability there was decelerating inflation
during that quarter?
Answer: There is a 65.6 percent probability of inflation to decelerate when there is
positive cyclical unemployment.
9) The accompanying table shows the joint distribution between the change of the
unemployment rate in an election year and the share of the candidate of the incumbent party
since 1928. You think of this data as a population which you want to describe, rather than a
sample from which you want to infer behavior of a larger population.
Joint Distribution of Unemployment Rate Change and Incumbent Partys Vote Share in Total Vote Cast for the Two Major-Party Candidates,
1928-2000
u 0 ( X 0 )
u 0 ( X 1 )
Total
(a)
( Incumbent 50%) 0
(Y 0 )
0.053
0.579
0.632
( Incumbent 50%) 0
(Y 1 )
0.211
0.157
0.368
Total
0.264
0.736
1.00
(b)
(c)
What is the probability that the unemployment rate decreases in an election year?
Answer: Pr( X 1) 0.736.
(d)
(e)
u 0 ( X 0 )
u 0 ( X 1 )
Total
( Incumbent 50%) 0
(Y 0 )
0.167
0.465
0.632
( Incumbent 50%) 0
(Y 1 )
0.097
0.271
0.368
Total
0.264
0.736
1.00
10)
The accompanying table lists the joint distribution of unemployment in the United
States in 2001 by demographic characteristics (race and gender).
Joint Distribution of Unemployment by Demographic Characteristics,
United States, 2001
Age 16-19
( X 0)
Age 20 and above
( X 1)
Total
White
(Y 0 )
0.13
Total
0.60
0.22
0.82
0.73
0.27
1.00
0.18
Age 16-19
( X 0)
Age 20 and above
( X 1)
Total
White
(Y 0 )
0.18
0.82
0.81
1.00
1.00
(c)
Given your answer in the previous question, how do you reconcile this fact with the
probability to be 60% of finding an unemployed adult white person, and only 22% for the
category black and other.
Answer: The original table showed the joint probability distribution, while the table in
(b) presented the conditional probability distribution.
30
)2
(a)
(b)
(c)
(d)
(e)
3)
Calculate the following probabilities using the standard normal distribution. Sketch the
probability distribution in each table case, shading in the area of the calculated probability.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
31
Answer: (a) 0.5000; (b) 0.8413; (c) 0.0250; (d) 0.0228; (e) 0.0500; (f) 0.9500; (g)
0.0500; (h) 0.0100; (i) 1.2816; (j) 1.96.
4) Using the fact that the standardized variable Z is a linear transformation of the normally
distributed random variable Y, derive the expected value and variance of Z.
Y Y
1
Y
Y a bY , with a Y and b
. Given (2.29)
Y
Y Y
Y
Y
Y 1
1
Y 0 , and Z 2 Z2 1 .
and (2.30) in the text, E ( Z )
Y Y
Z
Answer: Z
5) Show in a scatterplot what the relationship between two variables X and Y would look
like if there was
(a)
(b)
32
(c) no correlation.
(d) What would the correlation coefficient be if all observations for the two variables were on a
curve described by Y X 2 ?
Answer: The correlation coefficient would be zero in this case, since the relationship is nonlinear.
6)
()a
()b
(c)
(d)
7)
In considering the purchase of a certain stock, you attach the following probabilities to
possible changes in the stock price over the next year.
Stock Price Change During
Next Twelve Months (%)
Probability
+15
0.2
+5
0.3
0.4
-5
0.05
-15
0.05
What is the expected value, the variance, and the standard deviation? Which is the most
likely outcome? Sketch the cumulative distribution function.
.
Probability
-5
+5
Percentage Change
Stock Price % Change
34
+15
8)
You consider visiting Montreal during the break between terms in January. You go to the
relevant Web site of the official tourist office to figure out the type of clothes you should
take on the trip. The site lists that the average high during January is 70 C, with a
standard deviation of 40 C. Unfortunately you are more familiar with Fahrenheit than
with Celsius, but find that the two are related by the following linear function:
C 32
5
F.
9
Find the mean and standard deviation for the January temperature in Montreal in
Fahrenheit.
Answer: Using equations (2.29) and (2.30) from the textbook, the result is 19.4 and 7.2.
9)
Two random variables are independently distributed if their joint distribution is the
product of their marginal distributions. It is intuitively easier to understand that two
random variables are independently distributed if all conditional distributions of Y given
X are equal. Derive one of the two conditions from the other.
Answer: If all conditional distributions of Y given X are equal, then
Pr(Y y | X 1) Pr(Y y | X 2) ... Pr(Y y | X l ) .
But if all conditional distributions are equal, then they must also equal the
marginal distribution, i.e.
Pr(Y y | X x) Pr(Y y ) .
Given the definition of the conditional distribution of Y given X = x, you then
get
Pr(Y y | X x)
Pr(Y y, X x)
Pr(Y y ) ,
Pr( X x)
There are frequently situations where you have information on the conditional
distribution of Y given X, but are interested in the conditional distribution of X given Y.
35
Pr( X x, Y y )
, derive a relationship between
Pr( X x )
Pr( X x | Y y ) and Pr(Y y | X x ) . This is called Bayes theorem.
Recalling Pr(Y y | X x )
Pr( X x, Y y )
,
Pr( X x )
Pr(Y y | X x) Pr( X x) Pr( X x, Y y ) ;
Pr( X x, Y y )
similarly Pr( X x | Y y )
and
Pr(Y y )
Pr( X x | Y y ) Pr(Y y ) Pr( X x, Y y ) . Equating the two and solving
for Pr( X x | Y y ) then results in
Pr(Y y | X x) Pr( X x)
Pr( X x | Y y ) =
.
Pr(Y y )
11)
You are at a college of roughly 1,000 students and obtain data from the entire freshman
class (250 students) on height and weight during orientation. You consider this to be a
population that you want to describe, rather than a sample from which you want to infer
general relationships in a larger population. Weight (Y) is measured in pounds and height
(X) is measured in inches. You calculate the following sums:
n
yi2 = 94,228.8,
i 1
xi2 = 1,248.9,
i 1
x y
i
= 7,625.9
i 1
Use the definition for the conditional distribution of Y given X x and the marginal
distribution of X to derive the formula for Pr( X x, Y y ) . This is called the
multiplication rule. Use it to derive the probability for drawing two aces randomly from a
deck of cards (no joker), where you do not replace the card after the first draw. Next,
generalizing the multiplication rule and assuming independence, find the probability of
having four girls in a family with four children.
36
Answer:
13)
4 3
0.0045 ; 0.0625 or
52 51
1
2
1
.
16
The systolic blood pressure of females in their 20s is normally distributed with a mean of
120 with a standard deviation of 9. What is the probability of finding a female with a
blood pressure of less than 100? More than 135? Between 105 and 123? You visit the
womens soccer team on campus, and find that the average blood pressure of the 25
members is 114. Is it likely that this group of women came from the same population?
Answer: Pr(Y<100) = 0.0131; Pr(Y>135) = 0.0478; Pr(105<Y<123) = 0.6784;
Pr(Y 114) Pr( Z 3.33) 0.0004 . (The smallest z-value listed in the table
in the textbook is 2.99, which generates a probability value of 0.0014.) This
unlikely that this group of women came from the same population.
14)
Show that the correlation coefficient between Y and X is unaffected if you use a linear
transformation in both variables. That is, show that corr ( X , Y ) corr ( X * , Y * ) , where
X * a bX and Y * c dY , and where a, b, c, and d are arbitrary non-zero constants.
*
*
Answer: corr ( X , Y )
15)
cov( X * , Y * )
*
var( X ) var(Y )
bd cov( X , Y )
2
b var( X ) d 2 var(Y )
corr ( X , Y ) .
The textbook formula for the variance of the discrete random variable Y is given as
k
Y2 ( yi Y ) 2 pi .
i 1
Y2 yi2 pi y2 .
i 1
2
2
Answer: ( yi Y ) pi ( y 2 Y yi ) pi ( yi pi Y pi 2 Y yi pi ) .
2
Y
i 1
i 1
2
i
2
Y
i 1
i 1
i 1
i 1
Y2 yi2 pi Y2 pi 2Y yi pi . But
i 1
i 1
pi 1 and Y yi pi , giving
37
16)
The Economic Report of the President gives the following age distribution of the United
States population for the year 2000:
United States Population By Age Group, 2000
Outcome (age
category)
Percentage
Under 5
5-15
16-19
20-24
25-44
45-64
0.06
0.16
0.06
0.07
0.30
0.22
65 and
over
0.13
Imagine that every person was assigned a unique number between 1 and 275,372,000 (the
total population in 2000). If you generated a random number, what would be the probability
that you had drawn someone older than 65 or under 16? Treating the percentages as
probabilities, write down the cumulative probability distribution. What is the probability of
drawing someone who is 24 years or younger?
Answer: Pr(Y 16 or Y 65) = 0.35;
Outcome (age
category)
Cumulative
probability
distribution
Under 5
5-15
16-19
20-24
25-44
45-64
0.06
0.22
0.28
0.35
0.65
0.87
65 and
over
1.00
The accompanying table gives the outcomes and probability distribution of the number of
times a student checks her e-mail daily:
Probability of Checking E-Mail
Outcome
(number of email checks)
Probability
distribution
0.05
0.15
0.30
0.25
0.15
0.08
0.02
Sketch the probability distribution. Next, calculate the c.d.f. for the above table. What is
the probability of her checking her e-mail between 1 and 3 times a day? Of checking it
more than 3 times a day?
38
Answer:
Outcome
(number of email checks)
Cumulative
probability
distribution
0.05
0.20
0.50
0.75
0.90
0.98
1.00
Probability
1
0.8
0.6
0.4
0.2
0
0
18)
The accompanying table lists the outcomes and the cumulative probability distribution
for a student renting videos during the week while on campus.
Video Rentals per Week during Semester
Outcome (number of weekly
video rentals)
Probability distribution
0.05
0.55
0.25
0.05
0.07
0.02
0.01
39
0.05
0.60
0.85
0.90
0.97
0.99
1.00
The textbook mentioned that the mean of Y , E (Y ) is called the first moment of Y, and
that the expected value of the square of Y , E (Y 2 ) is called the second moment of Y, and
so on. These are also referred to as moments about the origin. A related concept is
moments about the mean, which are defined as E[(Y Y ) r ] . What do you call the
second moment about the mean? What do you think the third moment, referred to as
skewness, measures? Do you believe that it would be positive or negative for an
earnings distribution? What measure of the third moment around the mean do you get for
a normal distribution?
Answer: The second moment about the mean is the variance. Skewness measures the
departure from symmetry. For the typical earnings distribution, it will be
positive. For the normal distribution, it will be zero.
19)
0.4
0.3
0.2
0.1
0
1
Number of Rentals
Number of Weekly Video Rentals
40
20)
Explain why the two probabilities are identical for the standard normal distribution:
Pr( 1.96 X 1.96) and Pr( 1.96 X 1.96) .
Answer: For a continuous distribution, the probability of a point is zero.
Chapter 3
Think of at least nine examples, three of each, that display a positive, negative, or no correlation
between two economic variables. In each of the positive and negative examples, indicate
whether or not you expect the correlation to be strong or weak.
Answer: Answers will vary by student. Students frequently bring up the following
correlations. Positive correlations: earnings and education (hopefully strong),
consumption and personal disposable income (strong), per capita income and
investment-output ratio or saving rate (strong); negative correlation: Okuns
Law (strong), income velocity and interest rates (strong), the Phillips curve
(strong); no correlation: productivity growth and initial level of per capita
income for all countries of the world (beta-convergence regressions),
consumption and the (real) interest rate, employment and real wages.
)3
Adult males are taller, on average, than adult females. Visiting two recent American
Youth Soccer Organization (AYSO) under-12-year-old (U12) soccer matches on a
Saturday, you do not observe an obvious difference in the height of boys and girls of that
age. You suggest to your little sister that she collect data on height and gender of children
in 4th to 6th grade as part of her science project. The accompanying table shows her
findings.
Height of Young Boys and Girls, Grades 4-6, in inches
(e)
YBoys
Boys
sBoys
nBoys
57.8
3.9
55
YGirls
Girls
sGirls
nGirls
58.4
4.2
57
Let your null hypothesis be that there is no difference in the height of females and males
at this age level. Specify the alternative hypothesis.
41
Find the difference in height and the standard error of the difference.
Answer: YBoys YGirls = -0.6, SE( YBoys YGirls ) =
(g)
3.92 4.22
= 0.77.
55
57
(h)
Calculate the t-statistic for comparing the two means. Is the difference statistically
significant at the 1% level? Which critical value did you use? Why would this number be
smaller if you had assumed a one-sided alternative hypothesis? What is the intuition
behind this?
Answer: t = -0.78, so | t | < 2.58, which is the critical value at the 1% level. Hence you
cannot reject the null hypothesis. The critical value for the one-sided hypothesis
would have been 2.33. Assuming a one-sided hypothesis implies that you have
some information about the problem at hand, and, as a result, can be more easily
convinced than if you had no prior expectation.
)4
Math SAT scores (Y) are normally distributed with a mean of 500 and a standard
deviation of 100. An evening school advertises that it can improve students scores by
roughly a third of a standard deviation, or 30 points, if they attend a course which runs
over several weeks. (A similar claim is made for attending a verbal SAT course.) The
statistician for a consumer protection agency suspects that the courses are not effective.
She views the situation as follows: H 0 : Y 500 vs. H1 : Y 530 .
(e)
Sketch the two distributions under the null hypothesis and the alternative hypothesis.
Answer:
(f)
The consumer protection agency wants to evaluate this claim by sending 50 students to
attend classes. One of the students becomes sick during the course and drops out. What is
42
the distribution of the average score of the remaining 49 students under the null, and
under the alternative hypothesis?
Answer: Y of the 49 participants is normally distributed, with a mean of 500 and a
standard deviation of 14.286 under the null hypothesis. Under the alternative
hypothesis, it is normally distributed with a mean of 530 and a standard
deviation of 14.286.
43
(g)
Assume that after graduating from the course, the 49 participants take the SAT test and
score an average of 520. Is this convincing evidence that the school has fallen short of its
claim? What is the p-value for such a score under the null hypothesis?
Answer: It is possible that the consumer protection agency had chosen a group of 49
students whose average score would have been 490 without attending the
course. The crucial question is how likely it is that 49 students, chosen
randomly from a population with a mean of 500 and a standard deviation of
100, will score an average of 520. The p-value for this score is 0.081, meaning
that if the agency rejected the null hypothesis based on this evidence, it would
make a mistake, on average, roughly 1 out of 12 times. Hence the average score
of 520 would allow rejection of the null hypothesis that the school has had no
effect on the SAT score of students at the 10% level.
(h)
What would be the critical value under the null hypothesis if the size of your test were
5%?
Answer: The critical value would be 523.
(i)
Given this critical value, what is the power of the test? What options does the statistician
have for increasing the power in this situation?
Answer: Pr(Y 523 | H1 is true) = 0.312. Hence the power of the test is 0.688. She could
increase the power by decreasing the size of the test. Alternatively, she could try
to convince the agency to hire more test subjects, i.e., she could increase the
sample size.
)5
Your packaging company fills various types of flour into bags. Recently there have been
complaints from one chain of stores: a customer returned one opened 5 pound bag which
weighed significantly less than the label indicated. You view the weight of the bag as a
random variable which is normally distributed with a mean of 5 pounds, and, after
studying the machine specifications, a standard deviation of 0.05 pounds.
.a
You take a sample of 20 bags and weigh them. Sketch below what the average pattern of
individual weights might look like. Let the horizontal axis indicate the sampled bag
number (1, 2, , 20). On the vertical axis, mark the expected value of the weight under
the null hypothesis, and two ( 1.96) standard deviations above and below the expected
value. Draw a line through the graph for E(Y) + 2 Y , E(Y), and E(Y) 2 Y . How many
of the bags in a sample of 20 will you expect to weigh either less than 4.9 pounds or more
than 5.1 pounds?
44
Answer: On average, there should be one bag in every sample of 20 which weighs less
than 4.9 pounds or more than 5.1 pounds.
.b
You sample 25 bags of flour and calculate the average weight. What is the distribution of
the average weight of these 25 bags? Repeating the same exercise 20 times, sketch what
the distribution of the average weights would look like in a graph similar to the one you
drew in (b), where you have adjusted the standard error of Y accordingly.
Answer: The average weight of 25 bags will be normally distributed, with a mean of 5
pounds and a standard deviation of 0.01 pounds.
45
.c
For each of the twenty observations in (b), a 95% confidence interval is constructed.
Draw these confidence intervals, using the same graph as in (b). How many of these 20
confidence intervals would you expect to weigh 5 pounds under the null hypothesis?
Answer: You would expect 19 of the 20 confidence intervals to contain 5 pounds.
5)
Assume that two presidential candidates, call them Bush and Gore, receive 50% of the
votes in the population. You can model this situation as a Bernoulli trial, where Y is a
random variable with success probability Pr(Y 1) p , and where Y = 1 if a person
votes for Bush and Y = 0 otherwise. Furthermore, let p be the fraction of successes (1s)
p (1 p )
in a sample, which is distributed N(p,
) in reasonably large samples, say for n
n
40.
(a) Given your knowledge about the population, find the probability that in a random sample
of 40, Bush would receive a share of 40% or less.
0.40 0.50
) Pr( Z 1.26) 0.104.
Answer:
In roughly every
0.25
40
10th sample of this size, Bush would receive a vote of less than 40%, although in
truth, his share is 50%.
Pr( p 0.40) Pr( Z
(a)
0.40 0.50
) Pr( Z 2.00) 0.023.
Answer:
With this sample
0.25
100
size, you would expect this to happen only every 50th sample.
Pr( p 0.40) Pr( Z
(b)
Given your answers in (a) and (b), would you be comfortable to predict what the voting
intentions for the entire population are if you did not know p but had polled 10,000
individuals at random and calculated p ? Explain.
Answer: The answers in (a) and (b) suggest that for even moderate increases in the
sample size, the estimator does not vary too much from the population mean.
Polling 10,000 individuals, the probability of finding a p of 0.48, for example,
would be 0.00003. Unless the election was extremely close, which the 2000
election was, polls are quite accurate even for sample sizes of 2,500.
(c)
This result seems to hold whether you poll 10,000 people at random in the Netherlands or
the United States, where the former has a population of less than 20 million people, while
the United States is 15 times as populous. Why does the population size not come into
play?
Answer: The distribution of sample means shrinks very quickly depending on the sample
size, not the population size. Although at first this does not seem intuitive, the
standard error of an estimator is a value which indicates by how much the
estimator varies around the population value. For large sample sizes, the sample
mean typically is very close to the population mean.
6) You have collected weekly earnings and age data from a sub-sample of 1,744 individuals
using the Current Population Survey in a given year.
(a)
.
Given the overall mean of $434.49 and a standard deviation of $294.67, construct a 99%
confidence interval for average earnings in the entire population. State the meaning of
this interval in words, rather than just in numbers. If you constructed a 90% confidence
interval instead, would it be smaller or larger? What is the intuition?
294.67
1744
= 434.49 18.13 = (416.36, 452.62). Based on the sample at hand, the best
guess for the population mean is $434.49. However, because of random
Answer: The confidence interval for mean weekly earnings is 434.49 2.57
47
sampling error, this guess is likely to be wrong. Instead, the best guess is for the
average earnings to lie between $416.36 and $452.62. Committing to such an
interval repeatedly implies that the resulting statement is incorrect 1 out of 100
times. For a 90% confidence interval, the only change in the calculation of the
confidence interval is to replace 2.57 by 1.64. Hence the confidence interval is
smaller. A smaller interval implies, given the same average earnings and the
standard deviation, that the statement will be false more often. The larger the
confidence interval, the more likely it is to contain the population value.
1.
When dividing your sample into people 45 years and older, and younger than 45, the
information shown in the table is found.
Age Category
Age 45
Age 45
Average Earnings
Y
$488.87
$412.20
Standard Deviation
sY
$328.64
$276.63
N
507
1237
Test whether or not the difference in average earnings is statistically significant. Given
your knowledge of age-earning profiles, does this result make sense?
(488.87 412.20)
507
1237
which is statistically significant at conventional levels whether you use a twosided or one-sided alternative. Hence the null hypothesis of equal average
earnings in the two groups is rejected. Age-earning profiles typically take on an
inverted U-shape. Maximum earnings occur in the 40s, depending on some
other factors such as years of education, which are not considered here. Hence it
is not clear if the alternative hypothesis should be one-sided or two-sided. In
such a situation, it is best to assume a two-sided alternative hypothesis.
7)
()c
A manufacturer claims that a certain brand of VCR player has an average life expectancy
of 5 years and 6 months with a standard deviation of 1 year and 6 months. Assume that
the life expectancy is normally distributed.
Selecting one VCR player from this brand at random, calculate the probability of its life
expectancy exceeding 7 years.
Answer: Pr(Y 7) Pr( Z 1) = 0.1587.
()d
The Critical Consumer magazine decides to test fifty VCRs of this brand. The average
life in this sample is 6 years and the sample standard deviation is 2 years. Calculate a
99% confidence interval for the average life.
48
Answer: 6 2.57
2
= 6 0.73 = (5.27, 6.73).
50
49
(c)
How many more VCRs would the magazine have to test in order to halve the width of the
confidence interval?
Answer:
1
2
1
2
2
(2.57
) 2.57
2.57
, or n = 200.
2
2
50
50
4 50
8) U.S. News and World Report ranks colleges and universities annually. You randomly
sample 100 of the national universities and liberal arts colleges from the year 2000 issue. The
average cost, which includes tuition, fees, and room and board, is $23,571.49 with a standard
deviation of $7,015.52.
(a) Based on this sample, construct a 95% confidence interval of the average cost of
attending a university/college in the United States.
Answer: 23,571.49 1.96
7, 015.52
= 23,571.49 701.55 = (22,869.94, 24,273.04).
100
(b) Cost varies by quite a bit. One of the reasons may be that some universities/colleges have
a better reputation than others. U.S. News and World Report tries to measure this factor by
asking university presidents and chief academic officers about the reputation of institutions.
The ranking is from 1 (marginal) to 5 (distinguished). You decide to split the sample
according to whether the academic institution has a reputation of greater than 3.5 or not. For
comparison, in 2000, Caltech had a reputation ranking of 4.7, Smith College had 4.5, and
Auburn University had 3.1. This gives you the statistics shown in the accompanying table.
Reputation Category
Ranking > 3.5
Ranking 3.5
Average Cost
Y
$29,311.31
$21,227.06
Standard Deviation
of Cost ( sY )
$5,649.21
$6,133.38
N
29
71
Test the hypothesis that the average cost for all universities/colleges is the same
independent of the reputation. What alternative hypothesis did you use?
29
71
which is statistically significant whether or not you use a one-sided or two-sided
hypothesis test. Your prior expectation is that academic institutions with a
50
higher reputation will charge more for attending, and hence a one-sided
alternative would have been appropriate here.
(c) What other factors should you consider before making a decision based on the data in
(b)?
Answer: There may be other variables which potentially have an effect on the cost of
attending the academic institution. Some of these factors might be whether or
not the college/university is private or public, its size, whether or not it has a
religious affiliation, etc. It is only after controlling for these factors that the
pure relationship between reputation and cost can be identified.
12)
The development office and the registrar have provided you with anonymous matches of
starting salaries and GPAs for 108 graduating economics majors. Your sample contains a
variety of jobs, from church pastor to stockbroker.
(a) The average starting salary for the 108 students was $38,644.86 with a standard deviation
of $7,541.40. Construct a 95% confidence interval for the starting salary of all economics
majors at your university/college.
7,541.40
= 38,644.86 1,422.32 = (37,222.54, 40,067.18).
108
(b) A similar sample for psychology majors indicates a significantly lower starting salary.
Given that these students had the same number of years of education, does this indicate
discrimination in the job market against psychology majors?
Answer: It suggests that the market values certain qualifications more highly than others.
Comparing means and identifying that one is significantly lower than others
does not indicate discrimination.
(c) You wonder if it pays (no pun intended) to get good grades by calculating the average
salary for economics majors who graduated with a cumulative GPA of B+ or better, and those
who had a B or worse. The data is as shown in the accompanying table:
Cumulative GPA
B+ or better
B or worse
Average Earnings
Y
$39,915.25
$37,083.33
51
Standard Deviation
sY
$8,330.21
$6,174.86
n
59
49
Conduct a t-test for the hypothesis that the two starting salaries are the same in the
population. Given that this data was collected in 1999, do you think that your results will
hold for other years, such as 2002?
59
49
The critical value for a one-sided test is 1.64, for a two-sided test 1.96, both at
the 5% level. Hence you can reject the null hypothesis that the two starting
salaries are equal. Presumably you would have chosen as an alternative that
better students receive better starting salaries, so that this becomes your new
working hypothesis. 1999 was a boom year. If better students receive better
starting offers during a boom year, when the labor market for graduates is tight,
then it is very likely that they receive a better offer during a recession year,
assuming that they receive an offer at all.
13)
During the last few days before a presidential election, there is a frenzy of voting
intention surveys. On a given day, quite often there are conflicting results from three
major polls.
(a) Think of each of these polls as reporting the fraction of successes (1s) of a Bernoulli
random variable Y, where the probability of success is Pr(Y 1) p . Let p be the fraction of
successes in the sample and assume that this estimator is normally distributed with a mean of
p(1 p)
p and a variance of
. Why are the results for all polls different, even though they are
n
taken on the same day?
Answer: Since all polls are only samples, there is random sampling error. As a result, p
will differ from sample to sample, and most likely also from p.
p (1 p )
(b) Given the estimator of the variance of p ,
, construct a 95% confidence interval
n
for p . For which value of p is the standard deviation the largest? What value does it take in
the case of a maximum p ?
52
p (1 p )
. A bit of thought or calculus will show that the standard
n
0.5
deviation will be largest for p = 0.5, in which case it becomes
.
n
Answer: p 1.96
()e
When the results from the polls are reported, you are told, typically in the small print, that
the margin of error is plus or minus two percentage points. Using the approximation of
1.96 2 , and assuming, conservatively, the maximum standard deviation derived in
(b), what sample size is required to add and subtract (margin of error) two percentage
points from the point estimate?
Answer: n = 2,500.
()f
What sample size would you need to halve the margin of error?
Answer: n = 10,000.
53
n 1 i 1
n 1
Answer:
b.
1 n
1 n
(
X
X
)(
Y
Y
)
i
( X iYi XYi YX i YX )
i
n 1 i 1
n 1 i 1
n
n
n
n
1
1
( X iYi X Yi Y X i nYX )
( X iYi nXY nYX nYX )
n 1 i 1
n 1 i 1
i 1
i 1
n
1
n
X iYi
XY .
=
n 1 i 1
n 1
For each of the accompanying scatterplots for several pairs of variables, indicate whether
you expect a positive or negative correlation coefficient between the two variables, and
the likely magnitude of it (you can use a small range).
(a)
10
2
2
10
12
14
54
(b)
0.08
0.06
0.04
0.02
0.00
-0.02
-0.04
0.0
0.2
0.4
0.6
0.8
1.0
0.55
0.50
0.45
0.40
0.35
3.5
4.0
4.5
5.0
5.5
6.0
6.5
55
(d)
2000
1500
1000
500
0
0
20
40
60
80
100
1 n
(Yi Y ) 2 ( X i X ) 2
n 1 i 1
1 n
1 n
2
(
Y
Y
)
i
( X i X )2
n 1 i 1
n 1 i 1
n Yi X i ( Yi )( X i )
i 1
i 1
i 1
i 1
i 1
i 1
i 1
n Yi 2 ( Yi ) 2 n X i2 ( X i ) 2
56
1 n
(Yi Y ) 2 ( X i X ) 2
n 1 i 1
1 n
1 n
1
2
2
(Yi Y )
(Xi X )
n 1 i 1
n 1 i 1
n 1
Y X
=
i 1
Y
i 1
nY
X
i 1
(Yi 2 2YYi Y 2 )
i 1
(X
i 1
2
i
2 XX i X 2 )
n Yi X i nYnX
i 1
=
2
i
nYX
n
1 n
(Yi X i YX i XYi YX )
n 1 i 1
nX
i 1
nY
X
i 1
=
2
i
n Yi X i ( Yi )( X i )
i 1
i 1
n Yi ( Yi )
i 1
i 1
i 1
n X ( X i )
i 1
2
i
.
2
i 1
4) IQs of individuals are normally distributed with a mean of 100 and a standard deviation
of 16. If you sampled students at your college and assumed, as the null hypothesis, that they
had the same IQ as the population, then in a random sample of size
(a)
(b)
(c)
5)
57
1 1
49
1
49
[ E (Y1 Y ) 2 E (Y2 Y ) 2 ... E (Yn 1 Y ) 2 E (Yn Y ) 2 ]
2
n 16
16
16
16
2
1 1 2 49 2
1 2 49 2
n 1 49
2
= 2 [ Y Y ... Y Y ] = Y2 [ ( )] = 1.5625 Y .
n 16
16
16
16
n 2 16 16
n
Since var(Y ) 0 as
Y is consistent. Y has a larger variance than Y
and is therefore not as efficient.
=
)5
Imagine that you had sampled 1,000,000 females and 1,000,000 males to test whether or
not females have a higher IQ than males. IQs are normally distributed with a mean of 100
and a standard deviation of 16. You are excited to find that females have an average IQ of
101 in your sample, while males have an IQ of 99. Does this difference seem important?
Do you really need to carry out a t-test for differences in means to determine whether or
not this difference is statistically significant? What does this result tell you about testing
hypotheses when sample sizes are very large?
Answer: The difference seems very small, both in terms of absolute values and, more
importantly, in terms of standard deviations. With a sample size as large as
n=1,000,000, the standard error becomes extremely small. This implies that the
distribution of means, or differences in means, has almost turned into a spike. In
essence, you are (very close to) observing the population. It is therefore
unnecessary to test whether or not the difference is statistically significant. After
all, if in the population, the male IQ were 99.99 and the female IQ were 100.01,
they would be different. In general, when sample sizes become very large, it is
very easy to reject null hypotheses about population means, which involve
sample means as an estimator, even if hypothesized differences are very small.
This is the result of the distribution of sample means collapsing fairly rapidly as
sample sizes increase.
7)
Let Y be a Bernoulli random variable with success probability Pr(Y 1) p , and let
Y1 ,..., Yn be i.i.d. draws from this distribution. Let p be the fraction of successes (1s) in
this sample. In large samples, the distribution of p will be approximately normal, i.e. p
p (1 p )
) . Now let X be the number of successes and n
is approximately distributed N ( p,
n
the sample size. In a sample of 10 voters (n=10), if there are six who vote for candidate
A, then X = 6. Relate X, the number of success, to p , the success proportion, or fraction
of successes. Next, using your knowledge of linear transformations, derive the
distribution of X.
p (1 p )
) , then, given that X is a
Answer: X n p . Hence if p is distributed N ( p,
n
linear transformation of p , X is distributed N (np, np (1 p )) .
58
8) When you perform hypothesis tests, you are faced with four possible outcomes described
in the accompanying table.
Decision based on
sample
Truth (Population)
Reject H 0
H 0 is true
I
H1 is true
Do not reject H 0
II
indicates a correct decision, and I and II indicate that an error has been made. In
probability terms, state the mistakes that have been made in situation I and II, and relate
these to the Size of the test and the Power of the test (or transformations of these).
Answer: I: Pr(reject H 0 | H 0 is correct) = Size of the test.
II: Pr(reject H1 | H1 is correct) = (1-Power of the test).
9)
Assume that under the null hypothesis, Y has an expected value of 500 and a standard
deviation of 20. Under the alternative hypothesis, the expected value is 550. Sketch the
probability density function for the null and the alternative hypothesis in the same figure.
Pick a critical value such that the p-value is approximately 5%. Mark the areas, which
show the size and the power of the test. What happens to the power of the test if the
alternative hypothesis moves closer to the null hypothesis, i.e., Y = 540, 530, 520, etc.?
Answer: For a given size of the test, the power of the test is lower.
59
60
11)
Y = 1.594; X
i 1
i 1
= 449.6;
Y X
i 1
= 6.4697;
Y
i 1
= 0.03982; X i =
2
i 1
3,022.76
Answer: r = 0.716.
12)
) E ( E ( ) E ( ) ) 2 E[( E ( )) ( E ( ) )]2
MSE (
E[( E ( )) 2 ( E ( ) ) 2 2( E ( ))( E ( ) )]
Next, moving through the expectation operator results in
E[ E ( )]2 E[ E ( ) )]2 2 E[( E ( ))( E ( ) )] .
61
The first term is the variance, and the second term is the squared bias,
since E[ E ( ) )]2 [ E ( ) )]2 . This proves MSE( ) = bias2 + var(
) if the last term equals zero. But
E[( E ( ))( E ( ) )] E[ E ( ) ( E ( )) 2 E ( )]
= E ( ) E ( ) E ( ) ( E ( )) 2 E ( ) 0.
13)
Your textbook states that when you test for differences in means and you assume
that the two population variances are equal, then an estimator of the population
variance is the following pooled estimator:
s 2pooled
nm nw 2
nw
2
(
Y
Y
)
(Yi Yw ) 2
i
m
i 1
i 1
nm
Explain why this pooled estimator can be looked at as the weighted average of the
two variances.
Answer:
s 2pooled
i.
nm nw 2
nm
nw
1
2
(nm 1) sm2 (nw 1) sw2
)
(Yi Yw ) 2
m
nm nw 2
i 1
(Y Y
i 1
(nm 1) 2
( nw 1) 2
sm
sw .
nm nw 2
nm nw 2
[ Y Y2 ] = Y .
4
2
62
p (1 p )
) . Using the estimator of the variance of p ,
n
p (1 p )
, construct a 95% confidence interval for p. Show that the margin for
n
sampling error simplifies to 1/ n if you used 2 instead of 1.96 assuming,
conservatively, that the standard error is at its maximum. Construct a table indicating
the sample size needed to generate a margin of sampling error of 1%, 2%, 5% and
10%. What do you notice about the increase in sample size needed to halve the
margin of error? (The margin of sampling error is 1.96 SE ( p ) .)
p (1 p )
. p (1 p ) is
n
n
at a maximum for p = 0.5, in which case the confidence interval reduces
1
0.25
1
p
to p 1.96
, and the margin of sampling error is
.
n
n
n
n
1
n
0.01
10,000
0.02
2,500
0.05
400
0.10
100
To halve the margin of error, the sample size has to increase fourfold.
16)
63
p p
1.96) 0.95
Answer:
. Multiplying through by the
p (1 p )
n
standard deviation results in
p (1 p)
p (1 p)
Pr(1.96
p p 1.96
) 0.95 . Subtraction of
n
n
p then yields, after multiplying both sides by (-1),
Pr(1.96
p (1 p )
) 0.95 . The 95%
n
p (1 p )
confidence interval for p then is p 1.96
.
n
Pr( p 1.96
17)
p (1 p)
p p 1.96
n
18) The accompanying table lists the height (STUDHGHT) in inches and weight
(Weight) in pounds of five college students. Calculate the correlation coefficient.
STUDHGHT
WEIGHT
74
73
72
68
66
165
165
145
155
140
Answer: r = 0.72.
64
Answer:
p (1 p )
1
1 p p
n
Hence 1 2 p 0 or p .
0.
2
p
n
n
20) Consider two estimators: one which is biased and has a smaller variance, the other
which is unbiased and has a larger variance. Sketch the sampling distributions and
the location of the population parameter for this situation. Discuss conditions under
which you may prefer to use the first estimator over the second one.
Answer: The bias indicates how far away, on average, the estimator is from the
population value. Although this average is zero for an unbiased
estimator, there may be quite some variation around the population
mean. In a single draw, there is therefore a high probability of being
some distance away from the population mean. On the other hand, if the
variance is very small and the estimator is biased by a small amount,
then the probability of being closer to the population value may be
higher. (The biased estimator may have a smaller mean square error than
the unbiased estimator.)
65
Chapter 4
1)
You have obtained measurements of height in inches of 29 female and 81 male
students (Studenth) at your university. A regression of the height on a constant and a
binary variable (BFemme), which takes a value of one for females and is zero otherwise,
yields the following result:
(b)
Test the hypothesis that females, on average, are shorter than males, at the 1%
level.
Answer: The t-statistic for the difference in means is -8.49. For a one-sided test,
the critical value is 2.33. Hence the difference is statistically
significant.
(c)
66
Answer: It is safer to assume that the variances for males and females are
different. In the underlying sample the standard deviation for females
was smaller.
2)
You have obtained a sub-sample of 1744 individuals from the Current Population
Survey (CPS) and are interested in the relationship between weekly earnings and
age. The regression, using heteroskedasticity-robust standard errors, yielded the
following result:
(a)
(b)
Is the relationship between Age and Earn statistically significant? Is the effect of
age on earnings large?
Answer: The t-statistic on the slope is 9.12, which is above the critical value from
the standard normal distribution for any reasonable level of significance.
Assuming that people worked 52 weeks a year, the effect of being one
year older translates into an additional $270.40 a year. This does not
seem particularly large in 2002 dollars, but may have been earlier.
(c)
Why should age matter in the determination of earnings? Do the results suggest
that there is a guarantee for earnings to rise for everyone as they become older?
Do you think that the relationship between age and earnings is linear?
Answer: In general, age-earnings profiles take on an inverted U-shape. Hence it
is not linear and the linear approximation may not be good at all. Age
may be a proxy for experience, which in itself can approximate on
the job training. Hence the positive effect between age and earnings.
The results do not suggest that there is a guarantee for earnings to rise
for everyone as they become older since the regression R2 does not equal
1. Instead the result holds on average.
(d)
The variance of the error term and the variance of the dependent variable are
related. Given the distribution of earnings, do you think it is plausible that the
distribution of errors is normal?
67
(Requires Appendix Material) The average age in this sample is 37.5 years. What
is annual income in the sample?
Y
X Y
X . Substituting the estimates for the
Answer: Since
0
1
0
1
slope and the intercept then results in average weekly earnings of
$434.16 or annual average earnings of $22,576.32.
3)
The baseball team nearest to your home town is, once again, not doing well.
Given that your knowledge of what it takes to win in baseball is vastly superior to
that of management, you want to find out what it takes to win in Major League
Baseball (MLB). You therefore collect the winning percentage of all 30 baseball
teams in MLB for 1999 and regress the winning percentage on what you consider
the primary determinant for wins, which is quality pitching (team earned run
average). You find the following information on team performance:
Summary of the Distribution of Winning Percentage and Team Earned Run
Average for MLB in 1999
Average
Team
ERA
Winning
Percentage
(a)
Standard
deviation
Percentile
10%
25%
40%
4.71
0.53
3.84
4.35
4.72
50%
(median)
4.78
0.50
0.08
0.40
0.43
0.46
0.48
60%
75%
90%
4.91
5.06
5.25
0.49
0.59
0.60
What is your expected sign for the regression slope? Will it make sense to
interpret the intercept? If not, should you omit it from your regression and force
the regression line through the origin?
Answer: You expect a negative relationship, since a higher team ERA implies a
lower quality of the input. No team comes close to a zero team ERA,
and therefore it does not make sense to interpret the intercept. Forcing
the regression through the origin is a false implication from this insight.
Instead the intercept fixes the level of the regression.
(b)
The authors of your textbook have informed you that unless you have more than
100 observations, it may not be plausible to assume that the distribution of your
OLS estimators is normal. What are the implications here for testing the
significance of your theory?
68
Answer: Since there are only 30 observations, the distribution of the t-statistic is
unknown. You should therefore not conduct statistical inference.
(c)
OLS estimation of the relationship between the winning percentage and the team
ERA yields the following:
(d)
(e)
What would be the effect on the slope, the intercept, and the regression R2 if you
measured Winpct in percentage points, i.e. as (Wins/Games)100?
Answer: Clearly the regression R2 will not be affected by a change in scale, since
a descriptive measure of the quality of the regression would depend on
whim otherwise. The slope of the regression will compensate in such a
way that the interpretation of the result is unaffected, i.e. it will become
10 in the above example. The intercept will also change to reflect the
fact that if X were 0, then the dependent variable would now be
measured in percentage, i.e., it will become 94.0 in the above example.
(f)
Are you impressed with the size of the regression R2? Given that there is 51% of
unexplained variation in the winning percentage, what might some of these
factors be?
69
Answer: It is impressive that a single variable can explain roughly half of the
variation in winning percentage. Answers to the second question will
vary by student, but will typically include the quality of hitting,
fielding, and management. Salaries could be included, but should be
reflected in the inputs.
4)
You have learned in one of your economics courses that one of the determinants
of per capita income (the Wealth of Nations) is the population growth rate.
Furthermore you also found out that the Penn World Tables contain income and
population data for 104 countries of the world. To test this theory, you regress the
GDP per worker (relative to the United States) in 1990 (RelPersInc) on the
difference between the average population growth rate of that country (n) to the
U.S. average population growth rate (nus ) for the years 1980 to 1990. This results
in the following regression output:
(a)
(b)
What would happen to the slope, intercept, and regression R2 if you ran another
regression where the above explanatory variable was replaced by n only, i.e., the
average population growth rate of the country? (The population growth rate of the
United States from 1980 to 1990 was 0.009.) Should this have any affect on the tstatistic of the slope?
Answer: The interpretation of the partial derivative is unaffected, in that the slope
still indicates the effect of a one percentage point increase in the
population growth rate. The regression R2 and t-statistic will remain the
same since only a constant was removed from the explanatory variable.
The intercept will change as a result of the change in X .
(c)
Is there any reason to believe that the variance of the error terms is
homoskedastic?
Answer: There are vast differences in the size of these countries, both in terms of
the population and GDP. Furthermore, the countries are at different
70
31 of the 104 countries have a dependent variable of less than 0.10. Does it
therefore make sense to interpret the intercept?
Answer: To interpret the intercept, you must observe values of X close to zero,
not Y.
5)
The neoclassical growth model predicts that for identical savings rates and
population growth rates, countries should converge to the per capita income level.
This is referred to as the convergence hypothesis. One way to test for the presence
of convergence is to compare the growth rates over time to the initial starting
level.
(a)
If you regressed the average growth rate over a time period (1960-1990) on the
initial level of per capita income, what would the sign of the slope have to be to
indicate this type of convergence? Explain. Would this result confirm or reject the
prediction of the neoclassical growth model?
Answer: You would require a negative sign. Countries that are far ahead of others
at the beginning of the period would have to grow relatively slower for
the others to catch up. This represents unconditional convergence,
whereas the neoclassical growth model predicts conditional
convergence, i.e., there will only be convergence if countries have
identical savings, population growth rates, and production technology.
(b)
Using the OLS estimator with homoskedasticity-only standard errors, the results
changed as follows:
g 6090 = 0.019 0.0006RelProd60 , R 2= 0.00007, SER = 0.016
(0.002) (0.0068)
Why didnt the estimated coefficients change? Given that the standard error of the
slope is now smaller, can you reject the null hypothesis of no beta convergence?
Are the results in (c) more reliable than the results in (b)? Explain.
Answer: Using homoskedasticity-only standard errors has no effect on the OLS
estimator. The t- statistic remains small and is certainly below the
critical value. The results are less reliable since there is no reason to
believe that the error variance is homoskedastic.
(d)
You decide to restrict yourself to the 24 OECD countries in the sample. This
changes your regression output as follows:
g 6090 = 0.048 0.0404 RelProd60 , R2 = 0.82 , SER = 0.0046
(0.004) (0.0063)
How does this result affect your conclusions from above? When you test for
convergence, should you worry about the relatively small sample size?
Answer: Judging by the size of the slope coefficient, there is strong evidence of
unconditional convergence for the OECD countries. The regression R2
is quite high, given that there is only a single explanatory variable in
the regression. However, since we do not know the sampling
distribution of the estimator in this case, we cannot conduct inference.
6) In 2001, the Arizona Diamondbacks defeated the New York Yankees in the
Baseball World Series in 7 games. Some players, such as Bautista and Finley for the
Diamondbacks, had a substantially higher batting average during the World Series
than during the regular season. Others, such as Brosius and Jeter for the Yankees, did
substantially poorer. You set out to investigate whether or not the regular season
batting average is a good indicator for the World Series batting average. The results
for 11 players who had the most at bats for the two teams are:
72
(b)
What can you say about the explanatory power of your equation? What do you
conclude from this?
Answer: Both regressions have little explanatory power as seen from the
regression R2. Hence performance during the season is a poor forecast of
World Series performance.
(c)
Calculate the t-statistics for the various regression coefficients. Are any of these
significant at the 5% level? When using statistical inference in this case, should
you be concerned about the number of observations?
Answer: The respective t-statistics are 0.575, 1.062, 0.361, and 0.101. None of
these are statistically significant at the 5% level. However, given that
there are only 11 observations, you should not conduct inference, since
the sampling distribution is unknown.
Mathematical Questions
1)
Prove that the regression R2 is identical to the square of the correlation coefficient
between two variables Y and X. Regression functions are written in a form that
suggests causation running from X to Y. Given your proof, does a high regression
R2 present supportive evidence of a causal relationship? Can you think of some
regression examples where the direction of causality is not clear? Is without a
doubt?
73
n
ESS
2
Answer: The regression R
, where ESS is given by (Y i Y ) . But
TSS
i 1
X and Y
X . Hence (Y Y ) 2
2 ( X X )2 ,
Y i
0
1 i
0
1
i
i
1
2
2 n
R2
2
1 xi2
i 1
yi2
i 1
r2
(y x )
i 1
n
i i
n
x y
i 1
2
i
i 1
(y x ) x
2
i
i i
i 1
( x )
i 1
2 2
i
i 1
n
2
i
y
i 1
2
i
2
1 xi2
i 1
y
i 1
2
i
For the following estimated slope coefficients and their standard errors, find the tstatistics for the null hypothesis H0: 1 = 0. Indicate whether or not you are able
to reject the null hypothesis at the 10%, 5%, and 1% level of a one-sided and twosided hypothesis.
(a)
1 4.2, SE ( 1 ) 2.4
74
(c)
1 0.5, SE ( 1 ) 0.37
0.003, SE ( ) 0.002
(d)
1 360, SE ( 1 ) 300
(b)
5)
You have analyzed the relationship between the weight and height of individuals.
Although you are quite confident about the accuracy of your measurements, you
feel that some of the observations are extreme, say, two standard deviations above
and below the mean. Your therefore decide to disregard these individuals. What
consequence will this have on the standard deviation of the OLS estimator of the
slope?
Answer: Other things being equal, the standard error of the slope coefficient will
decrease the larger the variation in X. Hence you prefer more variation
rather than less. This is easier to see in the case of homoskedasticity-
75
In order to calculate the regression R2 you need the TSS and either the SSR or the
ESS. The TSS is fairly straightforward to calculate, being just the variation of Y.
However, if you had to calculate the SSR or ESS by hand (or in a spreadsheet),
you would need all fitted values from the regression function and their deviations
from the sample mean, or the residuals. Can you think of a quicker way to
calculate the ESS simply using terms you have already used to calculate the slope
coefficient?
n
(Y
i 1
X and
Y ) 2 . But Y i
0
1 i
X . Hence (Y Y ) 2
2 ( X X ) 2 , and therefore
Y
0
1
i
i
1
n
2
2
ESS
1 ( X i X ) . The right-hand side contains the estimated
i 1
slope squared and the denominator of the slope, i.e., all values that have
already been calculated.
7)
(Requires Appendix Material) In deriving the OLS estimator, you minimize the
sum of squared residuals with respect to the two parameters 0 and 1 . The
resulting two equations imply two restrictions that OLS places on the data,
n
namely that
ui = 0 and
i 1
u X
i 1
the regression slope and the intercept if you impose these two conditions on the
sample regression function.
X u$i . Summing both
Answer: The sample regression function is Yi
o
1 i
sides results in
i 1
i 1
i 1
restriction, namely that the sum of the residuals is zero, dividing both
gives the OLS formula
sides of the equation by n, and solving for
o
for the intercept.
For the second restriction, multiply both sides of the sample regression
function by X i and then sum both sides to get
n
i 1
i 1
i 1
i 1
76
i 1
i 1
Yi X i (Y 1 X )nX 1 X i2 or
i 1
i 1
Yi X i nYX 1 X i2 1 X ,
(Requires Appendix Material) Show that the two alternative formulae for the
slope given in your textbook are identical.
1 n
X iYi XY
n i 1
=
1 n 2
2
Xi X
n i 1
(X
i 1
X )(Yi Y )
(X
i 1
X )2
In addition, the help function for a commonly used spreadsheet program gives the
following definition for the regression slope it estimates:
n
n X i Yi ( X i )( Yi )
i 1
i 1
i 1
i 1
i 1
n X i2 ( X i ) 2
Prove that this formula is also the same as those given above.
Answer: Lets start with the first equality. The numerator of the right-hand side
expression can be written as follows:
n
i 1
i 1
i 1
i 1
i 1
i 1
i 1
X
i 1
nX .)
Multiplying out the terms in the denominator and moving the summation
n
2
i
nX 2
77
i 1
i 1
i 1
n X iYi ( X i )( Yi )
n
n X ( X i )
2
i
n X iYi nXnY
i 1
n
n X ( nX )
2
i
X Y nXY
i 1
n
i i
2
i
nX 2
i 1
i 1
i 1
i 1
Finally,
Dividing both numerator and denominator by n then gives you the
desired result.
9)
mistakes
(Y b )
i
i 1
b0
(Yi b0 )2
i 1
i 1
(Yi b0 ) 2 2(Yi b0 )( 1)
b0
i 1
i 1
i 1
0
Y .
(2) Yi n
0
o
i 1
10)
mistakes
(Y b X )
i 1
in
n
n
n
2
2
(
Y
b
X
)
(
Y
b
X
)
2(Yi b1 X i )( X i )
i 1 i
i
1 i
b1 i 1
i 1 b1
i 1
n
i 1
i 1
78
i 1
i 1
(2)( Yi X i
1 Xi 0 1
Y X
i
i 1
n
X
i 1
11)
2
i
Show first that the regression R2 is the square of the sample correlation
coefficient. Next, show that the slope of a simple regression of Y on X is only
identical to the inverse of the regression slope of X on Y if the regression R2
equals one.
n
ESS
2
, where ESS is given by (Y i Y ) . But
TSS
i 1
X and Y
X . Hence (Y Y ) 2
2 ( X X )2 ,
Y i
0
1 i
0
1
i
i
1
2
Answer: The regression R
2 n
2
1 xi2
R2
i 1
2
i
i 1
r2
( yi xi )2
i 1
n
xi2 yi2
i 1
( yi xi )2 xi2
i 1
i 1
n
( xi2 ) 2 yi2
i 1
i 1
2
1 xi2
i 1
i 1
yi2
i 1
same.
n
1 r2
2
1 xi2
i 1
2
i
yi2
i 1
n
2
i
1 1
i 1
n
Now
. But
which is the inverse of the regression slope of X on Y.
i 1
12)
i 1
xi yi
i 1
2
i
1
and therefore
2
i
i 1
n
x y
i 1
dimensional diagram with SSR on the vertical axis and the regression slope on the
horizontal axis how you could find the least squares estimator for the slope by
varying its values through trial and error.
Answer: Taking averages results in Y 0 1 X , and subtracting this equation
from the above one, we get y x u .
i
1 i
2
x ) 2 is a quadratic which takes on different
SSR u$i ( yi
1 i
i 1
15)
Given the amount of money and effort that you have spent on your education, you
wonder if it was (is) all worth it. You therefore collect data from the Current
Population Survey (CPS) and estimate a linear relationship between earnings and
the years of education of individuals. What would be the effect on your regression
slope and intercept if you measured earnings in thousands of dollars rather than in
dollars? Would the regression R2 be affected? Should statistical inference be
dependent on the scale of variables? Discuss.
Answer: It should be clear that interpretation of estimated relationships and
statistical inference should not depend on the units of measurement.
Otherwise whim could dictate conclusions. Hence the regression R2 and
statistical inference cannot be effected. It is easy but tedious to show
this mathematically. Next, the intercept indicates the value of Y when X
81
x y
i 1
n
* *
i i
x
i 1
, where small
*2
i
Note that means of standardized variables are zero, and hence we get $1
X Y
i 1
n
* *
i i
X
i 1
*2
i
17)
The OLS slope estimator is not defined if there is no variation in the data for the
explanatory variable. You are interested in estimating a regression relating
earnings to years of schooling. Imagine that you had collected data on earnings
82
for different individuals, but that all these individuals had completed a college
education (16 years of education). Sketch what the data would look like and
explain intuitively why the OLS coefficient does not exist in this situation.
Answer: There is no variation in X in this case, and it is therefore unreasonable to
ask by how much Y would change if X changed by one unit. Regression
analysis cannot figure out the answer to this question, because a change
in X never happens in the sample.
Earnings
X
X
X
X
X
X
X
X
16
Years of
Education
83
Indicate in a scatterplot what the data for your dependent variable and your
explanatory variable would look like in a regression with an R2 equal to zero.
How would this change if the regression R2 was equal to one?
Answer:
19)
Imagine that you had discovered a relationship that would generate a scatterplot
very similar to the relationship Yi X i2 , and that you would try to fit a linear
regression through your data points. What do you expect the slope coefficient to
be? What do you think the value of your regression R2 is in this situation? What
are the implications from your answers in terms of fitting a linear regression
through a non-linear relationship?
Answer: You would expect the slope to be a straight line (=0) and the regression
R2 to be zero in this situation. The implication is that although there
may be a relationship between two variables, you may not detect it if
you use the wrong functional form.
20)
u
i 1
= 0 and
u X
i 1
u Y
i 1
Answer:
i i
= 0.
i 1
i 1
i 1
i 1
uiYi ui ( 0 1 X i ) 0 ui 1 ui X i 0
84