3.1 Multiple Choice: Introduction To Econometrics, 3e (Stock) Chapter 3 Review of Statistics

Introduction to Econometrics, 3e (Stock)
Chapter 3 Review of Statistics
3.1 Multiple Choice
1) An estimator is
A) an estimate.
B) a formula that gives an efficient guess of the true population value.
C) a random variable.
D) a nonrandom number.
Answer: C
2) An estimate is
A) efficient if it has the smallest variance possible.
B) a nonrandom number.
C) unbiased if its expected value equals the population value.
D) another word for estimator.
Answer: B
3) An estimator ˆY of the population value Y is unbiased if

A) ˆY = Y .
B) Y has the smallest variance of all estimators.
p
C) Y  Y .
D) E( ˆY ) = Y .
Answer: D
4) An estimator ˆY of the population value Y is consistent if

A) ˆY .
B) its mean square error is the smallest possible.
C) Y is normally distributed.
p
D) Y  0.
Answer: A
5) An estimator ˆY of the population value Y is more efficient when compared to another estimator Y ,
if
A) E( ˆY ) > E( Y ).
B) it has a smaller variance.
C) its c.d.f. is flatter than that of the other estimator.
D) both estimators are unbiased, and var( ˆY ) < var( Y ).
Answer: D
1
Copyright © 2011 Pearson Education, Inc.
6) With i.i.d. sampling each of the following is true except
A) E (Y ) = Y .
B) var (Y ) =  Y /n.
2
C) E (Y ) < E(Y).
D) Y is a random variable.
Answer: C
7) The standard error of is given by the following formula:

2
A)
1 n

 Yi  Y
n i 1
 .
SY2
B) .
n
C) SY .
SY
D) .
n
Answer: D
8) The critical value of a two-sided t-test computed from a large sample

A) is 1.64 if the significance level of the test is 5%.
B) cannot be calculated unless you know the degrees of freedom.
C) is 1.96 if the significance level of the test is 5%.
D) is the same as the p-value.
Answer: C
9) A type I error is
A) always the same as (1-type II) error.
B) the error you make when rejecting the null hypothesis when it is true.
C) the error you make when rejecting the alternative hypothesis when it is true.
D) always 5%.
Answer: B
10) A type II error

A) is typically smaller than the type I error.
B) is the error you make when choosing type II or type I.
C) is the error you make when not rejecting the null hypothesis when it is false.
D) cannot be calculated when the alternative hypothesis contains an "=".
Answer: C
11) The size of the test

A) is the probability of committing a type I error.
B) is the same as the sample size.
C) is always equal to (1-the power of test).
D) can be greater than 1 in extreme examples.
Answer: A
2
12) The power of the test is
A) dependent on whether you calculate a t or a t2 statistic.
B) one minus the probability of committing a type I error.
C) a subjective view taken by the econometrician dependent on the situation.
D) one minus the probability of committing a type II error.
Answer: D
13) When you are testing a hypothesis against a two-sided alternative, then the alternative is written as
A) E(Y) > µ .
Y,0
B) E(Y) = µ .
Y,0
C) ≠ µ .
Y,0
D) E(Y) ≠ µ .
Y,0
Answer: D
14) A scatterplot
A) shows how Y and X are related when their relationship is scattered all over the place.
B) relates the covariance of X and Y to the correlation coefficient.
C) is a plot of n observations on Xi and Yi, where each observation is represented by the point (Xi, Yi).
D) shows n observations of Y over time.
Answer: C
15) The following types of statistical inference are used throughout econometrics, with the exception of
A) confidence intervals.
B) hypothesis testing.
C) calibration.
D) estimation.
Answer: C
16) Among all unbiased estimators that are weighted averages of Y1,..., Yn , is
A) the only consistent estimator of µY.
B) the most efficient estimator of µY.
C) a number which, by definition, cannot have a variance.
D) the most unbiased estimator of µY.
Answer: B
3
17) To derive the least squares estimator µY, you find the estimator m which minimizes
n
  Y  m
2
A) i .
i 1
n
B)   Y  m
i 1
i .
n
C)  mY
i 1
i
2
.
n
D)   Y  m .
i 1
i
Answer: A
18) If the null hypothesis states H0 : E(Y) = µY,0, then a two-sided alternative hypothesis is
A) H1 : E(Y) ≠ µY,0.
B) H1 : E(Y) ≈ µY,0.
C) H1 : < µY,0.
D) H1 : E(Y) > µY,0.
Answer: A
19) The p-value is defined as follows:

A) p = 0.05.
B) PrH0 [ – µY,0 > act – µY,0 ].
C) Pr(z > 1.96).
D) PrH0 [ – µY,0 < act – µY,0 ].
Answer: B
20) A large p-value implies

A) rejection of the null hypothesis.
B) a large t-statistic.
C) a large act.
D) that the observed value act is consistent with the null hypothesis.
Answer: D
4
21) The formula for the sample variance is
A) SY 
2 1 n
 Yi  Y .
n  1 i 1
 
2
B) SY2 
1 n
 Yi  Y
n  1 i 1
  .
2
1 n
C) SY2    Yi   Y  .
n  1 i 1
2
1 n 1
D) SY2   Yi  Y
n  1 i 1
  .
Answer: B
22) Degrees of freedom

A) in the context of the sample variance formula means that estimating the mean uses up some of the
information in the data.
B) is something that certain undergraduate majors at your university/college other than economics seem
to have an ∞ amount of.
C) are (n-2) when replacing the population mean by the sample mean.
D) ensure that SY   Y .
2 2
Answer: A
23) The t-statistic is defined as follows:

Y  Y ,0
A)  Y2 .
t
n
Y  Y ,0
B) t 
 
.
SE Y
Y  
2
Y ,0
C) t  .
SE  Y 
D) 1.96.
Answer: A
24) The power of the test

A) is the probability that the test actually incorrectly rejects the null hypothesis when the null is true.
B) depends on whether you use or 2 for the t-statistic.
C) is one minus the size of the test.
D) is the probability that the test correctly rejects the null when the alternative is true.
Answer: D
5
25) The sample covariance can be calculated in any of the following ways, with the exception of:
A)
1 n n
B) 
n  1 i 1
X iYi 
n 1
XY .
C)
D) rXYSYSY, where rXY is the correlation coefficient.

Answer: C
26) When the sample size n is large, the 90% confidence interval for is
A) ± 1.96SE( ).
B) ± 1.64SE( ).
C) ± 1.64 .
D) ± 1.96.
Answer: B
27) The standard error for the difference in means if two random variables M and W, when the two
population variances are different, is
SM2  SW2
A) .
nM  nW
S M SW
B)  .
nM nW
1  SM2  SW2 
C)  .
2  nM  nW 
SM2  SW2
D) .
nM  nW
Answer: D
28) The t-statistic has the following distribution:

A) standard normal distribution for n < 15
B) Student t distribution with n–1 degrees of freedom regardless of the distribution of the Y.
C) Student t distribution with n–1 degrees of freedom if the Y is normally distributed.
D) a standard normal distribution if the sample standard deviation goes to zero.
Answer: C
6
29) The following statement about the sample correlation coefficient is true.
A) –1 ≤ rX,Y ≤ 1.
B) corr(Xi, Yi).
C) | rX,Y | < 1.
2
S XY
D) rX,Y = 2 2
.
S X SY
Answer: A
30) The correlation coefficient

A) lies between zero and one.
B) is a measure of linear association.
C) is close to one if X causes Y.
D) takes on a high value if you have a strong nonlinear relationship.
Answer: B
Ym  Yw
31) When testing for differences of means, the t-statistic t =

SE Ym  Yw ,
sm2 sw2

where SE Ym  Yw   
nm nw
has
A) a student t distribution if the population distribution of Y is not normal

B) a student t distribution if the population distribution of Y is normal
C) a normal distribution even in small samples
D) cannot be computed unless =
Answer: B
32) When testing for differences of means, you can base statistical inference on the
A) Student t distribution in general
B) normal distribution regardless of sample size
C) Student t distribution if the underlying population distribution of Y is normal, the two groups have
the same variances, and you use the pooled standard error formula
D) Chi-squared distribution with ( + - 2) degrees of freedom
Answer: C
33) Assume that you have 125 observations on the height (H) and weight (W) of your peers in college.
Let = 68, = 3.5, = 29. The sample correlation coefficient is
A) 1.22
B) 0.50
C) 0.67
D) Cannot be computed since males and females have not been separated out.
Answer: C
7
34) You have collected data on the average weekly amount of studying time (T) and grades (G) from the
peers at your college. Changing the measurement from minutes into hours has the following effect on the
correlation coefficient:
A) decreases the by dividing the original correlation coefficient by 60
B) results in a higher
C) cannot be computed since some students study less than an hour per week
D) does not change the
Answer: D
35) A low correlation coefficient implies that

A) the line always has a flat slope
B) in the scatterplot, the points fall quite far away from the line
C) the two variables are unrelated
D) you should use a tighter scale of the vertical and horizontal axis to bring the observations closer to the
line
Answer: B
3.2 Essays and Longer Questions
1) Think of at least nine examples, three of each, that display a positive, negative, or no correlation
between two economic variables. In each of the positive and negative examples, indicate whether or not
you expect the correlation to be strong or weak.
Answer: Answers will vary by student. Students frequently bring up the following correlations. Positive
correlations: earnings and education (hopefully strong), consumption and personal disposable income
(strong), per capita income and investment-output ratio or saving rate (strong); negative correlation:
Okun's Law (strong), income velocity and interest rates (strong), the Phillips curve (strong); no
correlation: productivity growth and initial level of per capita income for all countries of the world (beta-
convergence regressions), consumption and the (real) interest rate, employment and real wages.
8
2) Adult males are taller, on average, than adult females. Visiting two recent American Youth Soccer
Organization (AYSO) under 12 year old (U12) soccer matches on a Saturday, you do not observe an
obvious difference in the height of boys and girls of that age. You suggest to your little sister that she
collect data on height and gender of children in 4th to 6th grade as part of her science project. The
accompanying table shows her findings.
Height of Young Boys and Girls, Grades 4-6, in inches
Boys Girls
57.8 3.9 55 58.4 4.2 57
(a) Let your null hypothesis be that there is no difference in the height of females and males at this age
level. Specify the alternative hypothesis.
(b) Find the difference in height and the standard error of the difference.
(c) Generate a 95% confidence interval for the difference in height.
(d) Calculate the t-statistic for comparing the two means. Is the difference statistically significant at the
1% level? Which critical value did you use? Why would this number be smaller if you had assumed a
one-sided alternative hypothesis? What is the intuition behind this?
Answer:
(a) H0 : - = 0 vs. H1 : - ≠0
3.92 4.22
(b) - = -0.6, SE( - )=  = 0.77.
55 57
(c) -0.6 ± 1.96 × 0.77 = (-2.11, 0.91).
(d) t = -0.78, so t < 2.58, which is the critical value at the 1% level. Hence you cannot reject the null
hypothesis. The critical value for the one-sided hypothesis would have been 2.33. Assuming a one-sided
hypothesis implies that you have some information about the problem at hand, and, as a result, can be
more easily convinced than if you had no prior expectation.
9
3) Math SAT scores (Y) are normally distributed with a mean of 500 and a standard deviation of 100. An
evening school advertises that it can improve students' scores by roughly a third of a standard deviation,
or 30 points, if they attend a course which runs over several weeks. (A similar claim is made for attending
a verbal SAT course.) The statistician for a consumer protection agency suspects that the courses are not
effective. She views the situation as follows: H0 : = 500 vs. H1 : = 530.
(a) Sketch the two distributions under the null hypothesis and the alternative hypothesis.
(b) The consumer protection agency wants to evaluate this claim by sending 50 students to attend classes.
One of the students becomes sick during the course and drops out. What is the distribution of the average
score of the remaining 49 students under the null, and under the alternative hypothesis?
(c) Assume that after graduating from the course, the 49 participants take the SAT test and score an
average of 520. Is this convincing evidence that the school has fallen short of its claim? What is the p-
value for such a score under the null hypothesis?
(d) What would be the critical value under the null hypothesis if the size of your test were 5%?
(e) Given this critical value, what is the power of the test? What options does the statistician have for
increasing the power in this situation?
Answer:
(a)
(b) Y of the 49 participants is normally distributed, with a mean of 500 and a standard deviation of 14.286
under the null hypothesis. Under the alternative hypothesis, it is normally distributed with a mean of 530
and a standard deviation of 14.286.
(c) It is possible that the consumer protection agency had chosen a group of 49 students whose average
score would have been 490 without attending the course. The crucial question is how likely it is that 49
students, chosen randomly from a population with a mean of 500 and a standard deviation of 100, will
score an average of 520. The p-value for this score is 0.081, meaning that if the agency rejected the null
hypothesis based on this evidence, it would make a mistake, on average, roughly 1 out of 12 times. Hence
the average score of 520 would allow rejection of the null hypothesis that the school has had no effect on
the SAT score of students at the 10% level.
(d) The critical value would be 523.
(e) Pr( Y < 523 H is true) = 0.312. Hence the power of the test is 0.688. She could increase the power by
decreasing the size of the test. Alternatively, she could try to convince the agency to hire more test
subjects, i.e., she could increase the sample size.
10
4) Your packaging company fills various types of flour into bags. Recently there have been complaints
from one chain of stores: a customer returned one opened 5 pound bag which weighed significantly less
than the label indicated. You view the weight of the bag as a random variable which is normally
distributed with a mean of 5 pounds, and, after studying the machine specifications, a standard deviation
of 0.05 pounds.
(a) You take a sample of 20 bags and weigh them. Sketch below what the average pattern of individual
weights might look like. Let the horizontal axis indicate the sampled bag number (1, 2, …, 20). On the
vertical axis, mark the expected value of the weight under the null hypothesis, and two (≈ 1.96) standard
deviations above and below the expected value. Draw a line through the graph for E(Y) + 2 , E(Y), and
E(Y) – 2 . How many of the bags in a sample of 20 will you expect to weigh either less than 4.9 pounds
or more than 5.1 pounds?
(b) You sample 25 bags of flour and calculate the average weight. What is the distribution of the average
weight of these 25 bags? Repeating the same exercise 20 times, sketch what the distribution of the average
weights would look like in a graph similar to the one you drew in (b), where you have adjusted the
standard error of Y accordingly.
(c) For each of the twenty observations in (c) a 95% confidence interval is constructed. Draw these
confidence intervals, using the same graph as in (c). How many of these 20 confidence intervals would
you expect to weigh 5 pounds under the null hypothesis?
Answer:
(a) On average, there should be one bag in every sample of 20 which weighs less than 4.9 pounds or more
than 5.1 pounds.
11
(b) The average weight of 25 bags will be normally distributed, with a mean of 5 pounds and a standard
deviation of 0.01 pounds. (Same graph as in (a), but with the following lower and upper bounds.)
(c) You would expect 19 of the 20 confidence intervals to contain 5 pounds.
12
5) Assume that two presidential candidates, call them Bush and Gore, receive 50% of the votes in the
population. You can model this situation as a Bernoulli trial, where Y is a random variable with success
probability Pr(Y = 1) = p, and where Y = 1 if a person votes for Bush and Y = 0 otherwise. Furthermore, let
p 1 p
p̂ be the fraction of successes (1s) in a sample, which is distributed N(p, ) in reasonably large
n
samples, say for n ≥ 40.
(a) Given your knowledge about the population, find the probability that in a random sample of 40, Bush
would receive a share of 40% or less.
(b) How would this situation change with a random sample of 100?
(c) Given your answers in (a) and (b), would you be comfortable to predict what the voting intentions for
the entire population are if you did not know p but had polled 10,000 individuals at random and
calculated p̂ ? Explain.
(d) This result seems to hold whether you poll 10,000 people at random in the Netherlands or the United
States, where the former has a population of less than 20 million people, while the United States is 15
times as populous. Why does the population size not come into play?
Answer:
0.40  0.50
p̂
(a) Pr( < 0.40) = Pr(Z < 0.25 ) = Pr(Z < -1.26) ≈ 0.104. In roughly every 10th sample of this size,
40
Bush would receive a vote of less than 40%, although in truth, his share is 50%.
0.40  0.50
(b) Pr( p̂ < 0.40) = Pr(Z < 0.25 ) = Pr(Z < -2.00) ≈ 0.023. With this sample size, you would expect this
100
to happen only every 50th sample.
(c) The answers in (a) and (b) suggest that for even moderate increases in the sample size, the estimator
does not vary too much from the population mean. Polling 10,000 individuals, the probability of finding a
p̂ of 0.48, for example, would be 0.00003. Unless the election was extremely close, which the 2000 election
was, polls are quite accurate even for sample sizes of 2,500.
(d) The distribution of sample means shrinks very quickly depending on the sample size, not the
population size. Although at first this does not seem intuitive, the standard error of an estimator is a
value which indicates by how much the estimator varies around the population value. For large sample
sizes, the sample mean typically is very close to the population mean.
13
6) You have collected weekly earnings and age data from a sub-sample of 1,744 individuals using the
Current Population Survey in a given year.
(a) Given the overall mean of $434.49 and a standard deviation of $294.67, construct a 99% confidence
interval for average earnings in the entire population. State the meaning of this interval in words, rather
than just in numbers. If you constructed a 90% confidence interval instead, would it be smaller or larger?
What is the intuition?
(b) When dividing your sample into people 45 years and older, and younger than 45, the information
shown in the table is found.
Age Category Average Earnings Y Standard Deviation SY N

Age ≥ 45 $488.87 $328.64 507
Age < 45 $412.20 $276.63 1237
Test whether or not the difference in average earnings is statistically significant. Given your knowledge of
age-earning profiles, does this result make sense?
Answer:
294.67
(a) The confidence interval for mean weekly earnings is 434.49 ± 2.58 × = 434.49 ± 18.20
1744
= (416.29, 452.69). Based on the sample at hand, the best guess for the population mean is $434.49.
However, because of random sampling error, this guess is likely to be wrong. Instead, the interval
estimate for the average earnings lies between $416.29 and $452.69. Committing to such an interval
repeatedly implies that the resulting statement is incorrect 1 out of 100 times. For a 90% confidence
interval, the only change in the calculation of the confidence interval is to replace 2.58 by 1.64. Hence the
confidence interval is smaller. A smaller interval implies, given the same average earnings and the
standard deviation, that the statement will be false more often. The larger the confidence interval, the
more likely it is to contain the population value.
 488.87  412.20 
(b) Assuming unequal population variances, t = 328.64 2 276.632 = 4.62, which is statistically

507 12.7
significant at conventional levels whether you use a two-sided or one-sided alternative. Hence the null
hypothesis of equal average earnings in the two groups is rejected. Age-earning profiles typically take on
an inverted U-shape. Maximum earnings occur in the 40s, depending on some other factors such as years
of education, which are not considered here. Hence it is not clear if the alternative hypothesis should be
one-sided or two-sided. In such a situation, it is best to assume a two-sided alternative hypothesis.
14
7) A manufacturer claims that a certain brand of VCR player has an average life expectancy of 5 years and
6 months with a standard deviation of 1 year and 6 months. Assume that the life expectancy is normally
distributed.
(a) Selecting one VCR player from this brand at random, calculate the probability of its life expectancy
exceeding 7 years.
(b) The Critical Consumer magazine decides to test fifty VCRs of this brand. The average life in this sample
is 6 years and the sample standard deviation is 2 years. Calculate a 99% confidence interval for the
average life.
(c) How many more VCRs would the magazine have to test in order to halve the width of the confidence
interval?
Answer:
(a) Pr (Y > 7) = Pr(Z > 1) = 0.1587.
2
(b) 6 ± 2.58 × = 6 ± 0.73 = (5.27, 6.73).
50
1 2 1 2 2
(c) × (2.58 × ) = 2.58 × × = 2.58 × , or n = 200.
2 50 2 50 4  50
8) U.S. News and World Report ranks colleges and universities annually. You randomly sample 100 of the
national universities and liberal arts colleges from the year 2000 issue. The average cost, which includes
tuition, fees, and room and board, is $23,571.49 with a standard deviation of $7,015.52.
(a) Based on this sample, construct a 95% confidence interval of the average cost of attending a
university/college in the United States.
(b) Cost varies by quite a bit. One of the reasons may be that some universities/colleges have a better
reputation than others. U.S. News and World Reports tries to measure this factor by asking university
presidents and chief academic officers about the reputation of institutions. The ranking is from 1
("marginal") to 5 ("distinguished"). You decide to split the sample according to whether the academic
institution has a reputation of greater than 3.5 or not. For comparison, in 2000, Caltech had a reputation
ranking of 4.7, Smith College had 4.5, and Auburn University had 3.1. This gives you the statistics shown
in the accompanying table.
Reputation Average Cost Y Standard deviation N

Category of Cost ( SY )
Ranking > 3.5 $29,311.31 $5,649.21 29
Ranking ≤ 3.5 $21,227.06 $6,133.38 71
Test the hypothesis that the average cost for all universities/colleges is the same independent of the
reputation. What alternative hypothesis did you use?
(c) What other factors should you consider before making a decision based on the data in (b)?
15
7,015.52
Answer: (a) 23,571.49 ± 1.96 × = 23,571.49 ± 701.55 = (22,869.94, 24,273.04).
100
 29,311.31  21, 227.06 
(b) Assuming unequal population variances, t = 5,649.212 6,133.382 = 6.33, which is statistically

29 71
significant whether or not you use a one-sided or two-sided hypothesis test. Your prior expectation is that
academic institutions with a higher reputation will charge more for attending, and hence a one-sided
alternative would have been appropriate here.
(c) There may be other variables which potentially have an effect on the cost of attending the academic
institution. Some of these factors might be whether or not the college/university is private or public, its
size, whether or not it has a religious affiliation, etc. It is only after controlling for these factors that the
"pure" relationship between reputation and cost can be identified.
9) The development office and the registrar have provided you with anonymous matches of starting
salaries and GPAs for 108 graduating economics majors. Your sample contains a variety of jobs, from
church pastor to stockbroker.
(a) The average starting salary for the 108 students was $38,644.86 with a standard deviation of $7,541.40.
Construct a 95% confidence interval for the starting salary of all economics majors at your
university/college.
(b) A similar sample for psychology majors indicates a significantly lower starting salary. Given that
these students had the same number of years of education, does this indicate discrimination in the job
market against psychology majors?
(c) You wonder if it pays (no pun intended) to get good grades by calculating the average salary for
economics majors who graduated with a cumulative GPA of B+ or better, and those who had a B or
worse. The data is as shown in the accompanying table.
Cumulative GPA Average Earnings Standard deviation n

Y SY
B+ or better $39,915.25 $8,330.21 59
B or worse $37,083.33 $6,174.86 49
Conduct a t-test for the hypothesis that the two starting salaries are the same in the population. Given
that this data was collected in 1999, do you think that your results will hold for other years, such as 2002?
7,541.40
Answer: (a) 38,644.86 ± 1.96 × = 38,644.86 ± 1,422.32 = (37,222.54, 40,067.18).
108
(b) It suggests that the market values certain qualifications more highly than others. Comparing means
and identifying that one is significantly lower than others does not indicate discrimination.
 39,915.25  37,083.33
(c) Assuming unequal population variances, t = 8,33.212 6,174.86 2 = 2.03. The critical value for a

59 49
one-sided test is 1.64, for a two-sided test 1.96, both at the 5% level. Hence you can reject the null
hypothesis that the two starting salaries are equal. Presumably you would have chosen as an alternative
that better students receive better starting salaries, so that this becomes your new working hypothesis.
1999 was a boom year. If better students receive better starting offers during a boom year, when the labor
market for graduates is tight, then it is very likely that they receive a better offer during a recession year,
assuming that they receive an offer at all.
16
10) During the last few days before a presidential election, there is a frenzy of voting intention surveys.
On a given day, quite often there are conflicting results from three major polls.
(a) Think of each of these polls as reporting the fraction of successes (1s) of a Bernoulli random variable Y,
where the probability of success is Pr(Y = 1) = p. Let p̂ be the fraction of successes in the sample and
p 1 p
assume that this estimator is normally distributed with a mean of p and a variance of . Why are
n
the results for all polls different, even though they are taken on the same day?
pˆ  1  pˆ 
(b) Given the estimator of the variance of p̂ , , construct a 95% confidence interval for p̂ . For
n
which value of p̂ is the standard deviation the largest? What value does it take in the case of a maximum
p̂ ?
(c) When the results from the polls are reported, you are told, typically in the small print, that the "margin
of error" is plus or minus two percentage points. Using the approximation of 1.96 ≈ 2, and assuming,
"conservatively," the maximum standard deviation derived in (b), what sample size is required to add
and subtract ("margin of error") two percentage points from the point estimate?
(d) What sample size would you need to halve the margin of error?
Answer: (a) Since all polls are only samples, there is random sampling error. As a result, p̂ will differ
from sample to sample, and most likely also from p.
pˆ  1  pˆ 
(b) p̂ ± 1.96 × . A bit of thought or calculus will show that the standard deviation will be
n
0.5
largest for p̂ = 0.5, in which case it becomes .
n
(c) n = 2,500.
(d) n = 10,000.
17
11) At the Stock and Watson (http://www.pearsonhighered.com/stock_watson) website go to Student
Resources and select the option "Datasets for Replicating Empirical Results." Then select the "CPS Data
Used in Chapter 8" (ch8_cps.xls) and open it in Excel. This is a rather large data set to work with, so just
copy the first 500 observations into a new Worksheet (these are rows 1 to 501).
In the newly created Worksheet, mark A1 to A501, then select the Data tab and click on "sort." A dialog
box will open. First select "Add level" from one of the options on the left. Then select "sort by" and choose
"Northeast" and "Largest to Smallest." Repeat the same for the "South" as a second option. Finally press
"ok."
This should give you 209 observations for average hourly earnings for the Northeast region, followed by
205 observations for the South.
a. For each of the 209 average hourly earnings observations for the Northeast region and separately for
the South region, calculate the mean and sample standard deviation.
b Use the appropriate test to determine whether or not average hourly earnings in the Northeast region
the same as in the South region.
c Find the 1%, 5%, and 10% confidence interval for the differences between the two population means.
Is your conclusion consistent with the test in part (b)?
d In all three cases of using the confidence interval in (c), the power of the test is quite low (5%). What
can you do to increase the power of the test without reducing the size of the test?
Answer:
a. =$21.12; =$18.18; =$11.86; =$11.18
21.12  18.80
b. t = 11.862 11.182 = 2.05 You cannot reject the null hypothesis of equal average earnings in the two

209 205
regions at the 1% level, but you are able to reject it at the 10% and 5% significance level.
c. For the 10% significance level, the confidence interval is ($0.46, $4.18). For the 5% significance level, the
interval becomes larger and is ($0.10, $4.54). In either one of the cases you can reject the null hypothesis,
since $0 is not contained in the confidence interval. It is only for the 1% significance level that the null
hypothesis cannot be rejected. In that case, the confidence interval is ($-0.60, $5.24).
d. You would have to increase the sample size, since that would shrink the standard error (assuming that
the sample mean and variance will not change).
18
3.3 Mathematical and Graphical Problems
1) Your textbook defined the covariance between X and Y as follows:

1 n
 
 X i  X Yi  Y
n  1 i 1

Prove that this is identical to the following alternative specification:
1 n n

n  1 i 1
X iYi 
n 1
XY
19
2) For each of the accompanying scatterplots for several pairs of variables, indicate whether you expect a
positive or negative correlation coefficient between the two variables, and the likely magnitude of it (you
can use a small range).
(a)
(b)
20
(c)
(d)
Answer:
(a) Positive correlation. The actual correlation coefficient is 0.46.
(b) No relationship. The actual correlation coefficient is 0.00007.
(c) Negative relationship. The actual correlation coefficient is –0.70.
(d) Nonlinear (inverted U) relationship. The actual correlation coefficient is 0.23.
21
3) Your textbook defines the correlation coefficient as follows:
1 n
  X 
2 2
 Y Y
n  1 i 1 i i X
r
1 n
  1 n
 
2 2
 Yi  Y
n  1 i 1
 Xi  X
n  1 i 1
Another textbook gives an alternative formula:
n
 n  n 
n Yi X i    Yi    X i 
r i 1  i 1   i 1 
2 2
n
 n  n
 n 
n Yi 2    Yi  n  X i2    X i 
i 1  i 1  i 1  i 1 
Prove that the two are the same.
22
4) IQs of individuals are normally distributed with a mean of 100 and a standard deviation of 16. If you
sampled students at your college and assumed, as the null hypothesis, that they had the same IQ as the
population, then in a random sample of size
(a) n = 25, find Pr( Y < 105).
(b) n = 100, find Pr( Y > 97).
(c) n = 144, find Pr(101 < Y < 103).
Answer:
(a) 0.94
(b) 0.97
(c) 0.21
5) Consider the following alternative estimator for the population mean:
= 1 ( 1 Y1 + 7 Y2 + 1 Y3 + 7 Y4 + ... + 1 Yn–1 + 7 Yn)

n 4 4 4 4 4 4
Prove that is unbiased and consistent, but not efficient when compared to Y .
Since var( )→ 0 as n → ∞, is consistent. has a larger variance than Y and is therefore not as
efficient.
23
6) Imagine that you had sampled 1,000,000 females and 1,000,000 males to test whether or not females
have a higher IQ than males. IQs are normally distributed with a mean of 100 and a standard deviation of
16. You are excited to find that females have an average IQ of 101 in your sample, while males have an IQ
of 99. Does this difference seem important? Do you really need to carry out a t-test for differences in
means to determine whether or not this difference is statistically significant? What does this result tell
you about testing hypotheses when sample sizes are very large?
Answer: The difference seems very small, both in terms of absolute values and, more importantly, in
terms of standard deviations. With a sample size as large as n=1,000,000, the standard error becomes
extremely small. This implies that the distribution of means, or differences in means, has almost turned
into a spike. In essence, you are (very close to) observing the population. It is therefore unnecessary to
test whether or not the difference is statistically significant. After all, if in the population, the male IQ
were 99.99 and the female IQ were 100.01, they would be different. In general, when sample sizes become
very large, it is very easy to reject null hypotheses about population means, which involve sample means
as an estimator, even if hypothesized differences are very small. This is the result of the distribution of
sample means collapsing fairly rapidly as sample sizes increase.
7) Let Y be a Bernoulli random variable with success probability Pr(Y = 1) = p, and let Y1,..., Yn be i.i.d.
draws from this distribution. Let p̂ be the fraction of successes (1s) in this sample. In large samples, the
p 1 p
distribution of p̂ will be approximately normal, i.e., p̂ is approximately distributed N(p, ). Now
n
let X be the number of successes and n the sample size. In a sample of 10 voters (n=10), if there are six who
vote for candidate A, then X = 6. Relate X, the number of success, to p̂ , the success proportion, or fraction
of successes. Next, using your knowledge of linear transformations, derive the distribution of X.
Answer:
p 1 p
X = n × p̂ . Hence if p̂ is distributed N(p, ), then, given that X is a linear transformation of p̂ , X
n
is distributed N(np, np(1- p)).
8) When you perform hypothesis tests, you are faced with four possible outcomes described in the
accompanying table.
"☺" indicates a correct decision, and I and II indicate that an error has been made. In probability terms,
state the mistakes that have been made in situation I and II, and relate these to the Size of the test and the
Power of the test (or transformations of these).
Answer:
I: Pr(reject H0 H0 is correct) = Size of the test.
II: Pr(reject H1 H1 is correct) = (1-Power of the test).
24
9) Assume that under the null hypothesis, Y has an expected value of 500 and a standard deviation of 20.
Under the alternative hypothesis, the expected value is 550. Sketch the probability density function for
the null and the alternative hypothesis in the same figure. Pick a critical value such that the p-value is
approximately 5%. Mark the areas, which show the size and the power of the test. What happens to the
power of the test if the alternative hypothesis moves closer to the null hypothesis, i.e.,, = 540, 530, 520,
etc.?
Answer: For a given size of the test, the power of the test is lower.
10) The net weight of a bag of flour is guaranteed to be 5 pounds with a standard deviation of 0.05
pounds. You are concerned that the actual weight is less. To test for this, you sample 25 bags. Carefully
state the null and alternative hypothesis in this situation. Determine a critical value such that the size of
the test does not exceed 5%. Finding the average weight of the 25 bags to be 4.7 pounds, can you reject the
null hypothesis? What is the power of the test here? Why is it so low?
Answer: Let Y be the net weight of the bag of flour. Then H0 : E(Y) = 5 and H1 : E(Y) < 5. Under the null
hypothesis, Y is distributed normally, with a mean of 5 pounds and a standard deviation of 0.01 pounds.
The critical value is approximately 4.98 pounds. Since 4.7 pounds falls in the rejection region, the null
hypothesis is rejected. The power of the test is low here, since there is no simple alternative. In the
extreme case, where the alternative hypothesis would place the net weight marginally below five pounds,
the power of the test would approximately equal its size, or 5% in this case.
11) Some policy advisors have argued that education should be subsidized in developing countries to
reduce fertility rates. To investigate whether or not education and fertility are correlated, you collect data
on population growth rates (Y) and education (X) for 86 countries. Given the sums below, compute the
sample correlation:
n n n n n
 Yi = 1.594;
i 1
 X i = 449.6;
i 1
 Yi X i = 6.4697;
i 1
 Yi 2 = 0.03982;
i 1
X
i 1
i
2
= 3,022.76
Answer: r = –0.716.
25
12) (Advanced) Unbiasedness and small variance are desirable properties of estimators. However, you
can imagine situations where a trade-off exists between the two: one estimator may be have a small bias
but a much smaller variance than another, unbiased estimator. The concept of "mean square error"
estimator combines the two concepts. Let ̂ be an estimator of μ. Then the mean square error (MSE) is
defined as follows: MSE( ̂ ) = E( ̂ – μ)2. Prove that MSE( ̂ ) = bias2 + var( ̂ ). (Hint: subtract and add in
E( ̂ ) in E( ̂ – μ)2.)
Answer:
MSE ( ̂ ) = E( ̂ - E( ̂ ) + E( ̂ ) - μ)2 = E[( ̂ - E( ̂ )) + (E( ̂ ) - μ)]2
= E[( ̂ - E( ̂ ))2 + (E( ̂ ) - μ)2 + 2( ̂ - E( ̂ ))(E( ̂ ) - μ)]
Next, moving through the expectation operator results in
E[ ̂ - E( ̂ )]2 + E[E( ̂ ) - μ)]2 + 2E[( ̂ ) - E( ̂ ))( E( ̂ ) - μ)].
The first term is the variance, and the second term is the squared bias, since
E[E( ̂ ) - μ)]2 = [E( ̂ ) - μ)]2. This proves MSE ( ̂ ) = bias2 + var( ̂ ) if the last term equals zero. But
E[( ̂ - E( ̂ ))(E( ̂ ) - μ)] = E[E( ̂ ) ̂ - μ ̂ - (E( ̂ ))2 + μE( ̂ )]
= E( ̂ ) E( ̂ ) - μE( ̂ ) - (E( ̂ ))2 + μE( ̂ ) = 0.
26
13) Your textbook states that when you test for differences in means and you assume that the two
population variances are equal, then an estimator of the population variance is the following "pooled"
estimator:
 nm nw

1
    Y  Y 
2 2
   Yi  Ym
2
S pooled i w 
nm  nw  2  i 1 i 1 
Explain why this pooled estimator can be looked at as the weighted average of the two variances.
Answer:
14) Your textbook suggests using the first observation from a sample of n as an estimator of the
population mean. It is shown that this estimator is unbiased but has a variance of  Y , which makes it less
2
efficient than the sample mean. Explain why this estimator is not consistent. You develop another
estimator, which is the simple average of the first and last observation in your sample. Show that this
estimator is also unbiased and show that it is more efficient than the estimator which only uses the first
observation. Is this estimator consistent?
27
15) Let p be the success probability of a Bernoulli random variable Y, i.e., p = Pr(Y = 1). It can be shown
p 1 p
that p̂ , the fraction of successes in a sample, is asymptotically distributed N(p, . Using the
n
pˆ  1  pˆ 
estimator of the variance of p̂ , , construct a 95% confidence interval for p. Show that the margin
n
for sampling error simplifies to 1/ n if you used 2 instead of 1.96 assuming, conservatively, that the
standard error is at its maximum. Construct a table indicating the sample size needed to generate a
margin of sampling error of 1%, 2%, 5% and 10%. What do you notice about the increase in sample size
needed to halve the margin of error? (The margin of sampling error is 1.96×SE( p̂ ))
pˆ  1  pˆ  pˆ  1  pˆ 
Answer: The 95% confidence interval for p is p̂ ± 1.96 ×  is at a maximum for p̂ =
n n
0.25 1
0.5, in which case the confidence interval reduces to ± 1.96 × ≈ p̂ ± n , and the margin of
n
1
sampling error is .
n
1 n
n
0.01 10,000
0.02 2,500
0.05 400
0.10 100
To halve the margin of error, the sample size has to increase fourfold.
28
16) Let Y be a Bernoulli random variable with success probability Pr(Y = 1) = p, and let Y1 ,..., Yn be i.i.d.
draws from this distribution. Let p̂ be the fraction of successes (1s) in this sample. Given the following
statement
Pr(-1.96 < z < 1.96) = 0.95
p 1 p
and assuming that p̂ being approximately distributed N(p, , derive the 95% confidence interval
n
for p by solving the above inequalities.
17) Your textbook mentions that dividing the sample variance by n –1 instead of n is called a degrees of
freedom correction. The meaning of the term stems from the fact that one degree of freedom is used up
when the mean is estimated. Hence degrees of freedom can be viewed as the number of independent
observations remaining after estimating the sample mean.
Consider an example where initially you have 20 independent observations on the height of students.
After calculating the average height, your instructor claims that you can figure out the height of the 20 th
student if she provides you with the height of the other 19 students and the sample mean. Hence you
have lost one degree of freedom, or there are only 19 independent bits of information. Explain how you
can find the height of the 20th student.
20 20 19
1
Answer: Since Y =
20
 Y , 20 × Y
i 1
i = Y
i 1
i = Y20 +  Y . Hence knowledge of the sample mean and the
i 1
i
height of the other 19 students is sufficient for finding the height of the 20th student.
18) The accompanying table lists the height (STUDHGHT) in inches and weight (WEIGHT) in pounds of
five college students. Calculate the correlation coefficient.
STUDHGHT WEIGHT

74 165
73 165
72 145
68 155
66 140
Answer: r = 0.72.
29
19) (Requires calculus.) Let Y be a Bernoulli random variable with success probability Pr(Y = 1) = p. It can
p 1 p
be shown that the variance of the success probability p is . Use calculus to show that this
n
variance is maximized for p = 0.5.
 p  1 p 
  1
Answer:  n   1  p  p  0 . Hence 1 - 2p = 0 or p = .
2
p n n
20) Consider two estimators: one which is biased and has a smaller variance, the other which is unbiased
and has a larger variance. Sketch the sampling distributions and the location of the population parameter
for this situation. Discuss conditions under which you may prefer to use the first estimator over the
second one.
Answer: The bias indicates "how far away," on average, the estimator is from the population value.
Although this average is zero for an unbiased estimator, there may be quite some variation around the
population mean. In a single draw, there is therefore a high probability of being some distance away from
the population mean. On the other hand, if the variance is very small and the estimator is biased by a
small amount, then the probability of being closer to the population value may be higher. (The biased
estimator may have a smaller mean square error than the unbiased estimator.)
30
Resources and select the option "Datasets for Replicating Empirical Results." Then select the chapter 8
CPS data set (ch8_cps.xls) into a spreadsheet program such as Excel. For the exercise, use the first 500
observations only. Using data for average hourly earnings only (ahe) and years of education (yrseduc),
produce a scatterplot with earnings on the vertical axis and education level on the horizontal axis. What
kind of relationship does the scatterplot suggest? Confirm your impression by adding a linear trendline.
Find the correlation coefficient between the two and interpret it.
Answer:
Without the trendline added, there does not seem to be much of a linear relationship between average
hourly earnings and years of education. Perhaps a linear relationship is not plausible since it would
imply that the returns to education would become smaller as further years of education are added.
However, and regardless of the linearity issues, there is a positive relationship in the data between the
two variables, which becomes visible when the trend line is added. The correlation coefficient is positive
and has a value of 46.9%, which is reasonably high (the correlation between height and weight for college
students is approximately 50% by comparison).
22) IQ scores are normally distributed with an average of 100 and a standard deviation of 16. Some
research suggests that left-handed individuals have a higher IQ score than right-handed individuals. To
test this hypothesis, a researcher randomly selects 132 individuals and finds that their average IQ is 103.2
with a sample standard deviation of 14.6. Using the results from the sample, can you reject the null
hypothesis that left-handed people have an IQ of 100 vs. the alternative that they have a higher IQ? What
critical value should you choose if the size of the test is 5%?
Answer: The hypothesis is : μ = 100 versus the alternative : μ > 100.
103.2  100
The test statistic is t = 14.6 =2.52. Since the critical value for the one-sided alternative is 1.645 at the
132
5% significance level, the researcher should reject the null hypothesis that left-handed individuals have
an IQ of 100.
31
Resources and select the option "Datasets for Replicating Empirical Results." Then select the "Test Score
data set used in Chapters 4-9" (caschool.xls) and open the Excel data set. Next produce a scatterplot of the
average reading score (horizontal axis) and the average mathematics score (vertical axis). What does the
scatterplot suggest? Calculate the correlation coefficient between the two series and give an
interpretation.
Answer:
The scatterplot suggests that, on average, schools which perform highly on the reading score will also
perform highly on the mathematics score. The sample correlation between the two series is 92.3%,
suggesting a high positive correlation between the two variables.
24) In 2007, a study of close to 250,000 18-19 year-old Norwegian males found that first-borns have an IQ
that is 2.3 points higher than those who are second-born. To see if you can find a similar evidence at your
university, you collect data from 250 students, of which 140 are first-borns. After subjecting each of these
individuals to an IQ test, you find that the first-borns score 108.3 with a standard deviation of 13.2, while
the second borns achieve 107.1 with a standard deviation of 11.6. You hypothesize that first-borns and
second-borns in a university population have identical IQs against the one-sided alternative hypothesis
that first borns have higher IQs. Using a size of the test of 5%, what is your conclusion?
Answer: Given that your null hypothesis states : = , your test statistic is t =
108.3  107.1
13.22 11.62 = 0.76. Since the critical value for the one-sided alternative test is 1.64, you cannot reject

140 110
the null hypothesis.
32

3.1 Multiple Choice: Introduction To Econometrics, 3e (Stock) Chapter 3 Review of Statistics

Uploaded by

Copyright:

Available Formats

3.1 Multiple Choice: Introduction To Econometrics, 3e (Stock) Chapter 3 Review of Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3.1 Multiple Choice: Introduction To Econometrics, 3e (Stock) Chapter 3 Review of Statistics

Uploaded by

Copyright:

Available Formats

Introduction to Econometrics, 3e (Stock)

Chapter 3 Review of Statistics

3.1 Multiple Choice

3) An estimator ˆY of the population value Y is unbiased if

4) An estimator ˆY of the population value Y is consistent if

7) The standard error of is given by the following formula:

8) The critical value of a two-sided t-test computed from a large sample

10) A type II error

11) The size of the test

19) The p-value is defined as follows:

20) A large p-value implies

22) Degrees of freedom

23) The t-statistic is defined as follows:

24) The power of the test

D) rXYSYSY, where rXY is the correlation coefficient.

28) The t-statistic has the following distribution:

30) The correlation coefficient

A) a student t distribution if the population distribution of Y is not normal

35) A low correlation coefficient implies that

3.2 Essays and Longer Questions

Height of Young Boys and Girls, Grades 4-6, in inches

57.8 3.9 55 58.4 4.2 57

(c) You would expect 19 of the 20 confidence intervals to contain 5 pounds.

Age Category Average Earnings Y Standard Deviation SY N

Reputation Average Cost Y Standard deviation N

Cumulative GPA Average Earnings Standard deviation n

1) Your textbook defined the covariance between X and Y as follows:

Another textbook gives an alternative formula:

Prove that the two are the same.

5) Consider the following alternative estimator for the population mean:

= 1 ( 1 Y1 + 7 Y2 + 1 Y3 + 7 Y4 + ... + 1 Yn–1 + 7 Yn)

Pr(-1.96 < z < 1.96) = 0.95

You might also like