Review Final Ex

Tstat/ fstat ss với c% lớn hơn so với cái nào thì sig tại đó
(1%=2,576; 5%=1,96; 10%=1,645)

Chapter 6
Model with quadrastic: Models capture decreasing or increasing marginal effects.
Adjusted R-squared:
 The adjusted R-squared imposes a penalty for adding new regressors.

 The adjusted R-squared increases if, and only if, the t-statistic of a newly added
regressor is greater than one in absolute value.
 Relationship between R-squared and adjusted R-squared:
Question 1
Use the data in WAGE1 for this exercise.
i. Use OLS to estimate the equation
and report the results using the usual format.
Use stata:
Log(wage) = 0.128 + 0,09-4 educ + 0,041 exper – 0,000714 exper 2
(0,106) (0,0075). (0,0052) (0,000116)
n=526; R2=0,3, adj R2= 0,296
ii. Is exper2 statistically significant at the 1% level?
T stat= -0,000714/0,000116= -6,16 => sig at 1%
iii. Using the approximation
find the approximate return to the fifth year of experience. What is the
approximate return to the twentieth year of experience?
Δ y = ( β 1 + 2 β 2 + x) Δ x = (0,041 - 2. 0,000714 + 4)1= 3,53%

Return to the twentieth year of experience => we start an exper =19 and increase by 1
(x=19; Δ x =1)
Δ y = (0,041 – 2.0,000714 +19)1 = 1,39%
iv. At what value of exper does additional experience actually lower predicted
log(wage)? How many people have more experience in this sample?
Turnaround point x=0,041/2.0.000714 = 28,7

 In the sample, there are 121 pp with at least 29 years of exper
Question 2
Using the data in RDCHEM, the following equation was obtained by OLS:
i. At what point does the marginal effect of sales on rdintens become negative?
Turnaround point = 0,0003/2.0,000000007=21428,57 mil
ii. Would you keep the quadratic term in the model? Explain. (sig thì giữ)
P val< 0,1 => sig at 10% => can keep

iii. Define salesbil as sales measured in billions of dollars: salesbil = sales/1,000.
Rewrite the estimated equation with salesbil and salesbil2 as the independent
variables. Be sure to report standard errors and the R-squared. [Hint: Note that
salesbil2 = sales2/ (1,000)2.]
Rdintens= 2,613 + 0,3salesbil – 0,0007 salesbil2

(0,4290 (0,00014). (0,0000000037)
iv. For the purpose of reporting the results, which equation do you prefer?
Prefer equation (iii) same interpretation with fewer zeros
Question 3
The following model allows the return to education to depend upon the total amount of
both parents’ education, called pareduc:
log ( wage )=β 0 + β 1 edu+ β2 educ . pareduc+ β 3 exper + β 4 tenure+ u
v. Show that, in decimal form, the return to another year of education in this model
is
∆ log ( wage ) / ∆ educ=β 1 + β 2 pareduc
What sign do you expect for β 2? Why?
HOFF , β 2 positive (more highly ecucated parents, the more children can get from
another year of education)
vi. Using the data in WAGE2, the estimated equation is
(Only 722 observations contain full information on parents’ education.)

Interpret the coefficient on the interaction term. It might help to choose two
specific values for pareduc—for example, pareduc = 32 if both parents have a
college education, or pareduc = 24 if both parents have a high school education—
and to compare the estimated return to educ.
The coef on interaction term sig at 1%, indicating that an additionsl year of
education yields a higher increase in wage for children with more highly educated
parents
Compare 2 groups of parents (pareduc=32 vs 24)
 The difference in estimated return to education = 0,00078 (32-24)= 0,0062
vii. When pareduc is added as a separate variable to the equation, we get:
Does the estimated return to education now depend positively on parent

education? Test the null hypothesis that the return to education does not depend
on parent education.
T stat=1,9 => sig at 5%

T stat interaction term = 1,3 => not sig even at 10%
Chapter 7
Interaction term with dummy variable
• Allowing for different slopes
Interaction term
• How to explain the coefficient on interaction term:
- if coefficient is insignificant, there is no difference in effect of education on wage
between men and women.
- if coefficient is negative and significant, the effect of education on wage for
women is weaker than for men.
Question 1
Consider the equation
^
colgpa = 1.241  0.0569hsize + 0.00468 hsize2  0.0132hsperc + 0.00165 sat
(0.079) (0.0164) (0.00225) (0.0006) (0.00007)
+ 0.155 female + 0.169 athlete
(0.018) (0.042)
where colgpa is cumulative college grade point average; hsize is size of high
school graduating class, in hundreds; hsperc is academic percentile in graduating
class; sat is combined SAT score; female is a binary gender variable; and athlete
is a binary variable, which is one for student-athletes.
v. What is the estimated GPA differential between athletes and nonathletes? Is it
statistically significant?
T stat = 4 => stat sig at 1%.

HOFF, athletes have a higher GPA than nonathletes (0,169 point)
vi. In the model, allow the effect of being an athlete to differ by gender and test the
null hypothesis that there is no ceteris paribus difference between women
athletes and women nonathletes (base group).
^
colgpa = 1.396  0.0568 hsize + 0.00467 hsize2  0.0132 hsperc + 0.00165 sat
(0.076) (0.0164) (0.00225) (0.0006) (0.00007)
+ 0.175 female_ath + 0.013 male_ath  0.155 male_nonath
(0.084) (0.049) (0.018)
Tsta = 2 => sig at 5% => have difference
Question 2
The following equations were estimated using the data in BWGHT:
and
The variables are defined as in Example 4.9, but we have added a dummy
variable for whether the child is male and a dummy variable indicating whether
the child is classified as white.
i. In the first equation, interpret the coefficient on the variable cigs. In particular,
what is the effect on birth weight from smoking 10 more cigarettes per day?
T stat = 4,8 => sig at 1%

HOFF, if smoking more 10 cigs per day, birth weight is expected to decrease by 4,4%
ii. How much more is a white child predicted to weigh than a nonwhite child,
holding the other factors in the first equation fixed? Is the difference statistically
significant?
Tstat= 4,23 => sig at 1%
HOFF, white child is predicted to weigh about 5,5% than a nonwhite child
iii. Comment on the estimated effect and statistical significance of motheduc.

T stat=1 => not sig
HOFF, if the mother have 1 more year of education, the birth weigh of child is
estimated lower than 0,3%
iv. From the given information, why are you unable to compute the F statistic for
joint significance of motheduc and fatheduc? What would you have to do to
compute the F statistic?
Different sample => run sample from second equation, reestimeta equation 1 with the
same observations.
Question 3
Using the data in GPA2, the following equation was estimated:
The variable sat is the combined SAT score; hsize is size of the student’s high
school graduating class, in hundreds; female is a gender dummy variable; and
black is a race dummy variable equal to one for blacks, and zero otherwise.
i. Is there strong evidence that hsize2 should be included in the model? From this
equation, what is the optimal high school size?
T stat = 4 => sig => should include

Turnaround point x= 19,3/2.2,19= 4,4=440 people
ii. Holding hsize fixed, what is the estimated difference in SAT score between
nonblack females and nonblack males? How statistically significant is this
estimated difference?
HOFF, sat score of nonblack female lower than nonblack male (45 points). Tstat = -
10,51 => very stat sig
iii. What is the estimated difference in SAT score between nonblack males and
black males? Test the null hypothesis that there is no difference between their
scores, against the alternative that there is a difference.
HOFF, sat score of nonblack males is lower than black males (169 points)
Tstat -13,4
iv. What is the estimated difference in SAT score between black females and
nonblack females? What would you need to do to test whether the difference is
statistically significant?
Chapter 8
- Generalized least squares estimation (GLS)
• Estimate OLS as usual -> obtain residuals -> square the residuals ()
• Run regression using squared residuals as the dependent variable and all independent
variables.
• Obtain fitted value of (h)
• Transform the model using weight = 1/h
- WLS Estimation for Linear Probability Model

• Estimate OLS as usual -> obtain fitted value
• Determine whether all of the fitted values are inside [0 1] • Construct the estimated
variance h = (1 − )
• Estimate the equation using weight = 1/h
1. Using the data in GPA3, the following equation was estimated for the fall and second
semester students:
Here, trmgpa is term GPA, crsgpa is a weighted average of overall GPA in

courses taken, cumgpa is GPA prior to the current semester, tothrs is total credit
hours prior to the semester, sat is SAT score, hsperc is graduating percentile in
high school class, female is a gender dummy, and season is a dummy variable
equal to unity if the student’s sport is in season during the fall. The usual and
heteroskedasticity-robust standard errors are reported in parentheses and
brackets, respectively.
i. Do the variables crsgpa, cumgpa, and tothrs have the expected estimated effects?
Which of these variables are statistically significant at the 5% level? Does it
matter which standard errors are used
ii. Why does the hypothesis H0: β crsgpa =1 make sense? Test this hypothesis against
the two-sided alternative at the 5% level, using both standard errors. Describe
your conclusions.
iii. Test whether there is an in-season effect on term GPA, using both standard
errors. Does the significance level at which the null can be rejected depend on the
standard error used?
2. Use the data in HPRICE1 to obtain the heteroskedasticity-robust standard

errors for equation (8.17).
i. Discuss any important differences with the usual standard errors. Làm giống
câu 1 bài trên
ii. Repeat part (i) for equation (8.18)

ii. What does this example suggest about heteroskedasticity and the
transformation used for the dependent variable?
Chapter 9
RESET Test for omitted variables
• when we lack important variables (low R-squared) • process:

- predict yhat
- gen yhat2 = yhat^2
- gen yhat3 = yhat^3
- add yhat2 and yhat3 into the original model
- perform F test for yhat2 and yhat3
- if we reject Ho, yhat2 or yhat3 is significant -> omitted variables
- Hausman test for endogeneity
• Regress the model (1) in which:

- dependent variable: the independent variable that is currently tested
- For endogeneity
- independent variables: all other independent variables
- add at least 2 instrumental variables
• Predict residuals of model (1)
• Add residuals into the original model
• If the residuals are significant, the variable is endogenous.
- GMM model
 to solve the problem of endogeneity

 after running the model, we need to check 2 tests:
 - Hansen test: validity of the list of instrumental variables (p>10%)

 - Arellano Bond test for AR(2): no serial correlation at second order (p>10%)
Question 1
F stat=( ( Rur 2- Rr 2)/q) / (1- Rur 2)/n-k-1 = ((0,0375 -0,353)/2)/(1-0,375)/169= 2,97 > 2,3
 Reject Η o at 10% => two newly added variables are jointly sig at 10%
 Model misspecification
Question 2
i, lnchprg: the percentage of student eligible for federally funded school lunch program.
Elgibility for funded school program is closely linked to being in poverty, so the percentage
of student eligible for this program is similar to the percentage of students living in poverty.
ii,
iii,
iv,
v,
Question 3
i) Use CEOSAL1, define a dummy variable, rosneg, which is equal to one if ros < 0
and equal to zero otherwise. Then, estimate the model:
(ii) Apply RESET test to the estimated model. Is there evidence of functional form
misspecification in the equation?
Chapter 10
Seasonality: biến thay đổi theo từng tháng Static-model: x và y là cùng t
Distributed-lag: x và y là cùng t
Question 1
Use the data in BARIUM for this exercise. Equation (10.22) is run and the result is
shown below:
(i) Add a linear time trend to equation (10.22). Are any variables, other than the
trend, statistically significant?
(ii) In the equation estimated in part (i), test for joint significance of all variables
except the time trend. What do you conclude?
(iii) Add monthly dummy variables to this equation and test for seasonality. Does
including the monthly dummies change any other estimates or their standard
errors in important ways?
Question 2
Use the data set CONSUMP for this exercise.
(i) Estimate a simple regression model relating the growth in real per capita
consumption (of nondurables and services) to the growth in real per capita
disposable income. Use the change in the logarithms in both cases. Report the
results in the usual form. Interpret the equation and discuss statistical
significance.
(ii) Add a lag of the growth in real per capita disposable income to the equation
from part (i). What do you conclude about adjustment lags in consumption
growth?
(iii) Add the real interest rate to the equation in part (i). Does it affect
consumption growth?
WRAP UP:
Question 1:
The following equation was estimated:
This equation allows roe to have a diminishing effect on log(salary). Is this generality
necessary?
Explain why or why not.
T stat= 0.3 => k sig => not necessary
Question 2:
An equation explaining chief executive officer salary is

The data used are in CEOSAL1, where finance, consprod, and utility are binary
variables indicating the financial, consumer products, and utilities industries. The
omitted industry is transportation.
(i) Compute the approximate percentage difference in estimated salary between

the utility and transportation industries, holding sales and roe fixed. Is the
difference statistically significant at the 1% level?
HOFF, the salary from utility is lower than salary from transportation industries 28,3% point.
T stat = 2,85 => sig at 1%
(ii) What is the approximate percentage difference in estimated salary between

the consumer products and finance industries? Write an equation that would
allow you to test whether the difference is statistically significant.
Ss với base là trans trc
The proportionate difference is 0,181-0,158=0,023= 2,3%
Question 3:
Use the data in INTDEF, Model 1 is run to investigate the impact of inflation and
budget deficit on short-term interest rate and its result is as follows:
^=
i3 1.73 + 0.606 inft + 0.513 deft
t
(0.43) (0.082) (0.118)
n = 56, R2 = 0.602, = 0.587.
The variable i3 is the three-month T-bill rate, inf is the annual inflation rate based on the
consumer price index (CPI), and def is the federal budget deficit as a percentage of GDP.
(i) Check the statistical significance and interpret these coefficients.

T stat inft = 7,93 => sig at 1%
HOFF, if inflation rate increases by 1% point the 3-month bill rate is expected to increase by
0,606% point
T stat deft = 4,34 => sig at 1%
HOFF, if def increase by 1% point 3-month bill rate is expected to increase by 0,513% point
(ii) The first lag of inf and def are added to Model 1, and the result is reported.
^ =1.61
i3 +0.343 inft +0.382 inft-1 0.190 deft +0.569 deft-1
t
(0.40) (0.125) (0.134) (0.221) (0.197)
n = 55, R2 = .685, = .660.
Are these two lags individually significant? Are they jointly significant?
F stat = (0,685-0,602)/2/(1-0,685)/50= 6,587 => reject Ho => jointly sig
(ii) Compare the estimated LRP (Long-Run Parameter) for the effect of inflation
with that in first equation. Are they vastly different?
Coef LRP = 0,343 +0,382 = 0, 725
They have no vastly different ( 0,725 -0,606 = 0,119). They are still sig at 1%
Question 4:
The following model is run and tested.
What conclusion do you make based on this result?

Review Final Ex

Uploaded by

Copyright:

Available Formats

Review Final Ex

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Review Final Ex

Uploaded by

Copyright:

Available Formats

Tstat/ fstat ss với c% lớn hơn so với cái nào thì sig tại đó

(1%=2,576; 5%=1,96; 10%=1,645)

Model with quadrastic: Models capture decreasing or increasing marginal effects.

 The adjusted R-squared imposes a penalty for adding new regressors.

Use the data in WAGE1 for this exercise.

i. Use OLS to estimate the equation

and report the results using the usual format.

T stat= -0,000714/0,000116= -6,16 => sig at 1%

iii. Using the approximation

Δ y = ( β 1 + 2 β 2 + x) Δ x = (0,041 - 2. 0,000714 + 4)1= 3,53%

Turnaround point x=0,041/2.0.000714 = 28,7

Turnaround point = 0,0003/2.0,000000007=21428,57 mil

P val< 0,1 => sig at 10% => can keep

Rdintens= 2,613 + 0,3salesbil – 0,0007 salesbil2

Prefer equation (iii) same interpretation with fewer zeros

vi. Using the data in WAGE2, the estimated equation is

(Only 722 observations contain full information on parents’ education.)

vii. When pareduc is added as a separate variable to the equation, we get:

Does the estimated return to education now depend positively on parent

T stat=1,9 => sig at 5%

Interaction term with dummy variable

• Allowing for different slopes

Consider the equation

+ 0.155 female + 0.169 athlete

T stat = 4 => stat sig at 1%.

+ 0.175 female_ath + 0.013 male_ath  0.155 male_nonath

(0.084) (0.049) (0.018)

Tsta = 2 => sig at 5% => have difference

The following equations were estimated using the data in BWGHT:

T stat = 4,8 => sig at 1%

iii. Comment on the estimated effect and statistical significance of motheduc.

Using the data in GPA2, the following equation was estimated:

T stat = 4 => sig => should include

- Generalized least squares estimation (GLS)

- WLS Estimation for Linear Probability Model

Here, trmgpa is term GPA, crsgpa is a weighted average of overall GPA in

2. Use the data in HPRICE1 to obtain the heteroskedasticity-robust standard

ii. Repeat part (i) for equation (8.18)

RESET Test for omitted variables

• when we lack important variables (low R-squared) • process:

- Hausman test for endogeneity

• Regress the model (1) in which:

 to solve the problem of endogeneity

 - Hansen test: validity of the list of instrumental variables (p>10%)

Distributed-lag: x và y là cùng t

The following equation was estimated:

Explain why or why not.

T stat= 0.3 => k sig => not necessary

An equation explaining chief executive officer salary is

(i) Compute the approximate percentage difference in estimated salary between

T stat = 2,85 => sig at 1%

(ii) What is the approximate percentage difference in estimated salary between

Ss với base là trans trc

The proportionate difference is 0,181-0,158=0,023= 2,3%

n = 56, R2 = 0.602, = 0.587.

(i) Check the statistical significance and interpret these coefficients.

T stat deft = 4,34 => sig at 1%

n = 55, R2 = .685, = .660.

F stat = (0,685-0,602)/2/(1-0,685)/50= 6,587 => reject Ho => jointly sig