ps4 Fall2015
ps4 Fall2015
Columbia University
W3412
Fall 2015
Problem Set 4
Introduction to Econometrics
Profs. Seyhan Erden and Miikka Rokkanen
for all sections.
Part I.
True, False, Uncertain with Explanation:
(a) One can still use a linear regression framework even if the relation between a regressor and
the dependent variable is not linear.
(b) Including an interaction term between two independent variables, 1 and 2 , allows for the
measurement of the effect of a unit increase in 1 and 2, above and beyond the sum of the
individual effects of a unit increase in the two variables alone.
(c) To decide whether = 0 + 1 + or ln( ) = 0 + 1 + fits the data better,
you should examine the regression 2 .
Part II.
1. Consider the following multiple regression model:
Yi 0 1 X 1,i 2 X 2,i ui
(a) Fill out the table with necessary numbers, some will be on STATA output some you will
need to calculate yourself.
(b) Common sense predicts that your high school GPA (hsGPA) and the number of classes you
skipped (skipped) are determinants of your college GPA (colGPA). Use regression (2) to test
the hypothesis (at the 5% significance level) that the coefficients on these two economic
variables are all zero, against the alternative that at least one coefficient is nonzero.
(c) Find the F-statistic for regression (3) and explain what is it testing?
(d) Find the F-statistic for regression (4) and explain what is it testing?
(e) Are bgfriend (whether you have a boy/girlfriend) and campus (whether you live on campus)
jointly significant determinants of college GPA? Use regression (2) and (4) to test your
hypothesis. (i.e. use homoskedasticity-only F stat formula, eq.7.14 in the book, instead of
directly testing with STATA)
Table 1
Definitions of Variables in GPA4.dta (data is from Wooldridge textbook)
Variable
Definition
colGPA
Cumulative College Grade Point Average of a sample of 141
students at Michigan State University in 1994.
hsGPA
High School GPA of students.
skipped
Average number of classes skipped per week.
PC
= 1 if the students owns a personal computer
= 0 otherwise.
bgfriend
= 1 if the student answered yes to having a boy/girl friend
question
= 0 otherwise.
campus
= 1 if the student lives on campus.
= 0 otherwise.
Table 2
College GPA Results
Dependent variable: colGPA
Regressor
hsGPA
(1)
(2)
(3)
(4)
skipped
(
PC
__
bgfriend
__
__
campus
__
__
(
__
Intercept
F-statistics testing the hypothesis that the population coefficients on the indicated regressors are
all zero:
hsGPA, skipped
hsGPA, skipped, PC
__
__
__
bgfriend, campus
__
__
(
__
R2
R
Regression RMSE
n
Table 3
Teaching Ratings
Dependent variable: Course_eval
Regressor
(Standard Error
Below)
beauty
(1)
(2)
(3)
(4)
female
minority
__
nnenglish
__
intro
__
__
onecredit
__
__
age
__
__
__
__
__
__
intercept
(
beauty, female, minority
__
intro, onecredit
__
intro, age
__
beauty, female,
minority, nnenglish
minority, age
__
__
__
__
__
__
__
__
__
2
2
R
Regression RMSE
n
4. Lawsch85 data set is collected by Kelly Barnett, an MSU economics student, for use in a
term project. The data come from two sources: The Official Guide to U.S. Law Schools,
1986, Law School Admission Services, and The Gourman Report: A Ranking of Graduate
and Professional Programs in American and International Universities, 1995, Washington,
D.C.
(a) Regress salary on north south east and west to analyze the effects of regions on salary of
Law School graduates. What is wrong with this regression? Why can you not do this?
(b) How would you correct the problem in part (a)?
(c) Interpret the coefficient of east under your correction strategy in part (b). .
5. Does the separation of corporate control from corporate ownership lead to inflated executive
salaries and worse firm performance? George Stigler and Claire Friedland have addressed
these questions empirically using a sample of firms.1 A subset of their data are in the file
execcomp.dta. The variables in the file are described in table 4
Variable
ecomp
assets
profits
mcontrol
Table 4
Definitions of Variables in execcomp.dta
Definition
Average total amount of compensation in thousands of dollars for
a firms top three executive.
Firms assets in millions of dollars.
Firms annual profits in millions of dollars.
A dummy variable indicating management control of the firm
= 1 management-controlled firms.
= 0 ownership-controlled firms.
(a) Regress executives compensation on the firms assets and profits, the control dummy, and
an intercept term. What proportion of the variation in top executives compensation in this
sample is accounted for by these variables?
(b) If the firms profit rise by one million dollars, by how much do you estimate the top
executives average compensation will change, if assets and the form of control remain
fixed?
(c) What is the estimated difference between the expected average compensations of top
executives in management-controlled firms and those in ownership-controlled firms, if
assets and profits remain fixed?
(d) Regress firm profits on firm assets and the management-control dummy. How much of the
variation in the firms profit in this sample can be accounted for by the variation in firms
asset and the form of control?
(e) Are the empirical results in (a) and (d) consistent with the claim that management control
hurts firm performance and leads to a higher pay for executives?
George J. Stigler and Claire Friedman, The Literature of Economics: The case of Berle and Means, Journal of Law
and Economics 26 no. 2 (June 1983): 237-268
6. Consider the following STATA output on college distances. This dataset contains data from a
random sample of high school seniors interviewed in 1980 and re-interviewed in 1986. In
this exercise you will use these data to investigate the relationship between the number of
completed years of education for young adults and the distance from each student's high
school to the nearest four-year college. The variable ed corresponds to years of education and
dist is the distance to the nearest college and it is measured in tens of miles (For example dist
= 3 means that the high school of the senior is 30 miles from the nearest college).
. reg ed dist, robust
Linear regression
Number of obs
F( 1, 3794)
Prob > F
R-squared
Root MSE
=
=
=
=
=
3796
29.83
0.0000
0.0074
1.8074
-----------------------------------------------------------------------------|
Robust
ed |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dist | -.0733727
.0134334
-5.46
0.000
-.0997101
-.0470353
_cons |
13.95586
.0378112
369.09
0.000
13.88172
14.02999
------------------------------------------------------------------------------
(a) A students high school was 18 miles from the nearest college. Estimate the number of
years of schooling completed.
(b) Compute the 99% confidence interval for the difference in the predicted years of
education between a high school senior who is 93 miles to the nearest college and another
student who attends a high school that shares a campus with a college. Explain what your
solution means in one sentence.
(c) Does distance to the nearest college explain a lot of the variation in educational
attainment? Explain.
(d) Suppose distance was measured in kilometers such that 10 miles = 16 kilometers.
Replicate the entire STATA output.
(e) Interpret the coefficient of tuition below where the dependent variable, led, is the natural
logarithm of years of education. Give one good explanation for your answer. (note that
tuition is given in $1000)
Linear regression
Number of obs =
3796
F( 3, 3792) = 151.91
Prob > F
= 0.0000
R-squared
= 0.1001
Root MSE
= .12236
-----------------------------------------------------------------------------|
Robust
led |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------tuition |
.0158511
.0069175
2.29
0.022
.0022887
.0294135
momcoll |
.0474716
.0063938
7.42
0.000
.034936
.0600071
dadcoll |
.0749874
.0055234
13.58
0.000
.0641583
.0858164
_cons |
2.582142
.0065834
392.22
0.000
2.569234
2.595049
------------------------------------------------------------------------------
Following questions will not be graded, they are for you to practice and will be discussed at
the recitation:
7. SW Empirical Exercise 6.3
8. SW Exercise 7.1
9. SW Exercise 7.4
10. SW Empirical Exercises 7.1