Metrics Aug2020
Metrics Aug2020
August 2020
The answers should be presented in terms of equations, statistical details, and with necessary
proofs and statistical deduction. Verbal and brief descriptive discussions will not suffice.
PART A
(Answer any TWO from Part A)
1 of 15
Econometrics Comprehensive Exam – August 2020
2 of 15
Econometrics Comprehensive Exam – August 2020
Question 3
Table 1
Source SS df MS Number of obs blank
F( blank) blank
Model 2.1523e+12 4 5.3807e+11 Prob > F 0.000
Residual 1.1246e+14 83,081 1.3536e+09 R-squared blank
3 of 15
Econometrics Comprehensive Exam – August 2020
Table 2
Source SS df MS Number of obs ??????
F( 5, 12396) ??????
Model 2.5761e+12 5 5.1522e+11 Prob > F 0.000
Residual 1.1203e+14 83,080 1.3485e+09 R-squared ??????
4 of 15
Econometrics Comprehensive Exam – August 2020
a. Fill in the missing values ?????? in Table 2. (There are 8 numbers to calculate.)
b. Define and interpret R-squared
c. Explain the relationship between F and R-squared
d. Define and interpret the 95% CI for “Family Size”
e. Interpret the coefficient on “Married”
f. Calculate the marginal effect of being one year older than the mean age
(40.54746) of the sample.
COMPARING TABLES 1 & 2 (part g)
g. Why does adding “Married” create a relatively large change on the coefficients
on Family Size and Family Size^2 but cause relatively small change the
coefficients on Age and Age^2?
MORE (part h)
h. If you were going to add a variable to this regression, what variable would you
add? Explain the variable choice (why?), what the expected sign would be, and
how you interpret the coefficient on your newly added variable.
*Did you calculate the missing numbers (??????) in Table 2 (part a)?
5 of 15
Econometrics Comprehensive Exam – August 2020
Question 4:
4. Suppose ~
y is an unobserved latent variable that measures an individual’s economic productivity
(which can be proxied by hourly earnings), such that:
~
y = xβ + ε where ε ~ N (0, σ 2 I )
However, you do not observe earnings in your data. You only observe, y i , which indicates
whether an individual is working or not. y i = 1 if an individual participates in the labor force and
yi = 0 , otherwise. An individual participates in the labor force if he/she is able to earn wages
above some reservation wage, w .
Define φ (θ ) as the pdf for a standard normal and Φ (θ ) as the cdf for the standard normal.
∂Φ ( z ) ∂z
Note: = φ ( z)
∂θ θ
a. Define y i in terms of ~
yi and w .
d. Derive the contribution of each individual in your sample to the overall likelihood
function (i.e., derive Li( θ )) and the individual log-likelihood function.
Now suppose you observe the earnings of each individual only if he/she participates in the labor
market, such that y i = ~
yi if the individual participates in the labor market. However, if the
individual does not participate in the labor market, then all you know is that she/he is not
working but not what her/his earnings would have been if she/he did participate. So, for all
6 of 15
Econometrics Comprehensive Exam – August 2020
individuals not participating in the labor market, 𝑦𝑦𝑖𝑖 = 0., since all you know is that this
individual is not working but not what her/his earnings would have been if she/he did work.
∂φ ( zi ) ∂z
Note: = − ziφ ( zi ) i
∂θ ∂θ
f. Derive the probability that you observe each individual i. Assume w = 0 , from this point
forward.
g. Derive the contribution of each individual in your sample to the overall likelihood
function (i.e., derive Li( θ )) and the individual log-likelihood function.
i. Explain what is implied by the simplified form of the Score function (i.e., what is the
implied orthogonality condition).
7 of 15
Econometrics Comprehensive Exam – August 2020
Question 5
5. Suppose the government is concerned about increasing rate of adolescent vaping in the U.S. and
is thinking about implementing a tax on vape juice to combat it. However, first the government
would like to know how responsive teen vaping is to changes in price and is asking you to
estimate the vape juice demand function among the teenaged population.
where 𝑞𝑞𝑗𝑗𝑑𝑑 is teen demand for vape juice in county 𝑗𝑗, 𝑝𝑝𝑗𝑗 is the price, and 𝜀𝜀𝑗𝑗 is the error term,
which also includes demand shifters.
a. Define and explain the five assumptions required to interpret an Ordinary Least Squares
(OLS) estimate of a slope parameter as “BLUE.”
b. When the exogeneity assumption fails, we say that the estimate is endogenous. What
are the four sources of endogeneity bias we discussed in class? Explain each of them.
c. If you were to regress (1) using OLS, which of the OLS assumptions is likely to fail?
Explain why? What does this mean for your estimate for the slope of your demand
equation?
where 𝑞𝑞𝑗𝑗𝑠𝑠 represents vape juice supply in county 𝑗𝑗, 𝑝𝑝𝑗𝑗 is the price, and 𝑢𝑢𝑗𝑗 is the error term,
which also includes supply shifters.
Note: supply and demand shifters are independent, thus, 𝑐𝑐𝑐𝑐𝑐𝑐�𝜀𝜀𝑗𝑗 , 𝑢𝑢𝑗𝑗 � = 0
d. Assuming market equilibrium (i.e., 𝑞𝑞𝑗𝑗𝑑𝑑 = 𝑞𝑞𝑗𝑗𝑠𝑠 = 𝑞𝑞𝑗𝑗 ), using equations (1) and (2), derive an
expression for 𝑃𝑃𝑖𝑖 as a function of the slope parameters and error terms. How does this
expression relate to the failed OLS assumption you named in (c)? What is the underlying
source of this failure?
8 of 15
Econometrics Comprehensive Exam – August 2020
e. Using the expression for 𝑃𝑃𝑖𝑖 derived in (d), derive an expression for 𝑐𝑐𝑐𝑐𝑐𝑐(𝑃𝑃𝑖𝑖 , 𝜀𝜀𝑖𝑖 ) as a
function of the slope parameters and the variance of 𝑢𝑢𝑖𝑖 . What is the sign of 𝑐𝑐𝑐𝑐𝑐𝑐(𝑃𝑃𝑖𝑖 , 𝑢𝑢𝑖𝑖 )?
What does this mean about the correlation between price and the error term? Hint:
substitute the expression you derived in (c.) for p. Another Hint: 𝑐𝑐𝑐𝑐𝑐𝑐(𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏, 𝑋𝑋) =
𝑐𝑐𝑐𝑐𝑐𝑐(𝑎𝑎𝑎𝑎, 𝑋𝑋) + 𝑐𝑐𝑐𝑐𝑐𝑐(𝑏𝑏𝑏𝑏, 𝑋𝑋) = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎(𝑋𝑋, 𝑋𝑋) + 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏(𝑌𝑌, 𝑋𝑋) = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎(𝑋𝑋) + 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏(𝑌𝑌, 𝑋𝑋).
f. Now show that 𝑐𝑐𝑐𝑐𝑐𝑐(𝑝𝑝𝑗𝑗 , 𝑞𝑞𝑗𝑗 ) can be written as follows: 𝑐𝑐𝑐𝑐𝑐𝑐�𝑝𝑝𝑗𝑗 , 𝑞𝑞𝑗𝑗 � = 𝛾𝛾1 𝑣𝑣𝑣𝑣𝑣𝑣�𝑝𝑝𝑗𝑗 � +
𝑐𝑐𝑐𝑐𝑐𝑐(𝑝𝑝𝑗𝑗 , 𝜀𝜀𝑗𝑗 ). To do so, use the expression for demand denoted by equation (1) and
substitute it for 𝑞𝑞𝑗𝑗 in 𝑐𝑐𝑐𝑐𝑐𝑐(𝑝𝑝𝑗𝑗 , 𝑞𝑞𝑗𝑗 ). Note, you do not need anything you derived in (e) to
answer this question.
𝑐𝑐𝑐𝑐𝑐𝑐(𝑝𝑝𝑗𝑗 ,𝑞𝑞𝑗𝑗 )
g. The probability limit of the OLS estimate for 𝛾𝛾1 is as follows: 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝�𝛾𝛾1𝑜𝑜𝑜𝑜𝑜𝑜 � = .
𝑣𝑣𝑣𝑣𝑣𝑣(𝑝𝑝𝑗𝑗 )
Using the expression for 𝑐𝑐𝑐𝑐𝑐𝑐(𝑝𝑝𝑗𝑗 , 𝑞𝑞𝑗𝑗 ) introduced in (f), calculate the asymptotic bias of
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝�𝛾𝛾1𝑜𝑜𝑜𝑜𝑜𝑜 � (i.e., calculate 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝�𝛾𝛾1𝑜𝑜𝑜𝑜𝑜𝑜 � − 𝛾𝛾1 ). Give what we know from part (e), what is
the sign of this bias? What does this mean about the OLS estimate of 𝛾𝛾1 (i.e., does OLS
over- or underestimate 𝛾𝛾1 )?
h. The nicotine found in vape juice is sourced from tobacco. China produces approximately
40% of the world’s tobacco. Suppose you decide to instrument for 𝑝𝑝𝑗𝑗 in equation (1),
using growing season temperature in the tobacco producing regions of China as your
instrument. Do you think this is a valid instrument for vape juice price? Explain why or
why not and be sure to include a discussion of each of the requirements for an
instrument to be valid.
i. Assume that temperature is a valid instrument for 𝑝𝑝𝑗𝑗 in equation (1). Denote
temperature as Z. Using the variable names in equation (1) and Z, describe the two-
stage least squares process. Clearly define your first and second stages. Derive and
expression for 𝛾𝛾1𝐼𝐼𝐼𝐼 based on these two stages
9 of 15
Econometrics Comprehensive Exam – August 2020
Question 6
6. You are examining the correlation between witnessing domestic violence among parents on an
individual’s experience of domestic violence as an adult—otherwise known as the
intergenerational correlation of intimate partner violence. The Philippines is an interesting place
to study intimate partner violence (IPV) where reported female perpetrated IPV is as common as
male perpetrated IPV. You are therefore interested in comparing the intergenerational
correlation of IPV across men and women. Using data on 477 Filipino men and women who are
all either married or cohabitating you estimate the following model:
𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 = 𝛽𝛽0 + 𝛽𝛽1 𝑃𝑃𝑃𝑃 + 𝛽𝛽2 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 + 𝛽𝛽3 𝑃𝑃𝑃𝑃 × 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 + 𝜀𝜀 (3)
where
𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 = an index of the level of violence experienced by individual 𝑖𝑖 perpetrated by
her/his partner in the last year. The index increases with the level of violence
𝑃𝑃𝑃𝑃 = 1 if individual i remembers violence between her/his parents as a child and is 0
otherwise
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 =1 if individual i is male and 0 if female.
You run an OLS regression on equation (3). The results of this regression are reported in column
1 of Table 1 below.
Coefficients
Model 1 Model 2
Witnessed Parental Violence 0.2662831 0.154842
(0.0740957) (0.057451)
Male 0.1531792 0.0224423
(0.0802545) (0.0584512)
Witnessed Parental Violence X Male -0.2754482
(0.1164904)
Constant -0.1318646 -0.0814228
(0.04985) (0.0452727)
N 477 477
R-squared 0.0271 0.0156
Standard errors are in parentheses
10 of 15
Econometrics Comprehensive Exam – August 2020
Why is equation (3) referred to as the unrestricted model and equation (4) the
restricted model?
f. Test the hypothesis that the restricted model is the correct model by constructing an F-
statistic, which tests that the two models are actually statistically equivalent. Clearly walk
through the steps of this test and state the conclusion of the test.
g. The p-value of the F-statistic you calculated in part (f) has a p-value of 0.0185. What does
this p-value tell you about your null hypothesis?
11 of 15
Econometrics Comprehensive Exam – August 2020
Question 7
Q7. Nepal Study Center is planning to conduct a study to help a clinic in a rural village in
Nepal’s Gulmi District to implement a micro health insurance program. It plans to use a
dichotomous choice experiment design to carry out the study. The plan is to sample 420
households randomly from the three communities that lay around the clinic—its catchment
area. Each community has nine wards. The sampling will be performed by using the
proportional sampling design representing all the wards from each of the clinic. The households
are presented with options to enroll in one of three micro health insurance plans: Basic (clinic
visits), General (clinic + plus pharmacy), Comprehensive (clinic visits, pharmacy + minor
surgery). The three alternatives are presented below:
c=(Comprehensive, General, Basic)
We would expect a person’s utility related to each of the three alternatives to be a function of
both personal characteristics (such as income, age etc..) and characteristics of the health care
plan (such as its price/premium).
We collected data that look like the table below: person’s age (divided by 10), the person’s
household income (in Rs10,00 / month), and the price of a plan (in Rs100 / 6 months). The first
three cases from the data are shown below. It is in the long form.
HHid MH_Alt ch Choice hhinc age Premium
1 1 Comprehensive 1 3.66 2.1 2
2 1 General 0 3.66 2.1 1
3 1 Basic 0 3.66 2.1 0.5
4 2 Comprehensive 0 3.75 4.2 2
5 2 General 1 3.75 4.2 1
6 2 Basic 0 3.75 4.2 0.5
7 3 Comprehensive 0 2.32 2.4 2
8 3 General 0 2.32 2.4 1
9 3 Basic 1 2.32 2.4 0.5
Additionally, we will also collect information on the following variables: Receive Remittance
(yes/no), No Of Children, No of Clinic Visits Per Six Month, and Distance to Clinic (minutes of
walking distance). These variables are not shown in the table to save space.
Taking the first case (id==1), we see that the case-specific variables hhinc, age, Remittance,
NoChildren, and Distance are constant across alternatives, whereas the alternative-specific
variable price varies over alternatives. Additionally, we also collected information on the
following variables: Receive Remittance, No Of Children, No of Clinic Visits Per Six Month, and
Distance to Clinic
12 of 15
Econometrics Comprehensive Exam – August 2020
The variable MHalt (micro health insurance alternatives) labels the alternatives, and the binary
variable choice indicates the chosen alternative (it is coded 1 for the chosen plan, and 0
otherwise).
Q1.1. For simplicity, consider only three variables for model set up (age, income, and price).
A) Set up a Random Utility Model (RUM). Show all the steps.
B) Present the corresponding data table in the wide form as a set up for a long-hand mle
coding.
C) Present the log likelihood function. Show all the steps.
(You may assume that the income and age have the same impact on the choice functions.)
Q1.2. Note: cogit automatically suppresses alternative specific constants, whereas asclogit
allows the constants. In the DC modeling community, there is no consensus regarding the
preference for an ASC (alternate specific constant approach versus the non-ASC option). In this
case, which option may make more sense and why?
Q1.3. Usually, we use some sort of clustering adjustment for the standard errors vce (cluster
id). In this case, which clustering id would you use – individual id, community id, or ward id –
and why?
13 of 15
Econometrics Comprehensive Exam – August 2020
Question 8
Q.8 Using the number of doctors visit –demand for health care access—model, spell out a few
modeling options. Let’s postulate the following relationship
Y* = a+b*age+c*income+d*distance+e*Female + u
a. Set up a Poisson modelling framework, and spell out the log likelihood function.
Show all the steps.
b. In this case, do we need an exposure variable? Why or why not?
c. What are the expected signs on the independent variables?
d. There will be obviously many people with a 0 entry (with no visit recorded over the
last six months), leading a problem of “excess zeros”. This causes a problem known
as “over dispersion.” You have a couple of options to deal with this situation:
Set up the log likelihood function (with all the steps spelled out properly in detail)
for ONE of the three data generation processes ZIP or NB-II or Tobit.
14 of 15
Econometrics Comprehensive Exam – August 2020
Question 9
Q9. Consider the following 2-equation model for a cross-country inflation transmission:
15 of 15