ProblemSet Notebook17-18
ProblemSet Notebook17-18
ECONOMETRICS
: Problem Sets
1
ECONOMETRICS: Problem Sets
PREFACE
This document contains exercises for you to practice with the content of the course in a
practical way. Some of them will be included in the so-called Problem Sets, which will be
solved in class. Students are required to work by themselves on these Problem Sets. Exercises
to be solved within each Problem Set will be announced in advance to the due date. We will
solve five Problem Sets during the course.
Important: All rights reserved. No part of this document may be reproduced, in any form
or by any means, without the permission in writing from the author.
Any errors in this document are the responsibility of the author. Corrections and comments
regarding any material in this text are welcomed and appreciated.
2
ECONOMETRICS: Problem Sets
CONTENTS
...................................... 53
Problem Set 4: Qualitative
Analysis (dummy variables).
3
ECONOMETRICS: Problem Sets
PS1
Descriptive and Correlation
Analysis
COURSE CONTENT
4
ECONOMETRICS: Problem Sets
2 Suppose you are asked to conduct a study to determine whether small class sizes
lead to increase student performance.
3 A substitute teacher wants to know how students in the class did on their last test.
The teacher asks the 10 students sitting in the front row to state their latest test score. He
concludes from their report that the class did extremely well.
4 Suppose that X is the number of free throws make by a basketball player out of two
attempts and assume that the individual probabilities for each outcome of X are the
following: pr(x=0)=0.2; pr(x=1)=0.44 and pr(x=2)=0.36
5
ECONOMETRICS: Problem Sets
a- Find the mean and variance for a new random variable 𝑢 = 𝑥1 − 𝑏𝑥2
b- Find the mean and variance for a new random variable 𝑣 = 𝑎𝑥2 + 𝑏𝑥1
7 The table below shows data about annual salaries (thousand Euros) and tenure (years)
for 8 individual working in a company:
Salary 40 22 19 30 62 32 45 51
Tenure 15 3 1 8 39 13 17 24
a- What is your expectation about the type of relationship that exist between the two
variables?
b- Compute the linear correlation coefficient between salaries and tenure and interpret
your result.
c- Which variable is more dispersed? Why?
6
ECONOMETRICS: Problem Sets
8 In the table below, E denotes the employment growth rate and P the productivity
growth rate in the manufacturing industry in six countries for the period 1980-1990.
Country E P
a- Determine whether the data is a time series, a cross sectional data or panel data.
b- Draw a scatter plot with the data of the table. Interpret your graph.
c- Calculate the correlation coefficient (E, P) and interpret your result.
d- Calculate a new correlation coefficient eliminating the Japanese observation and
interpret your result.
𝑥1 + 𝑥2 + 𝑥3 𝑥1 + 𝑥4
𝜇̂ 1 = ; 𝜇̂ 2 =
3 6
b- Discuss the sufficiency of both estimators.
c- Suggest a sufficient estimator of 𝜇 and with a MSE lower than the above ones.
7
ECONOMETRICS: Problem Sets
11 In the table below, P denotes average property prices and S average property sizes in
six cities in 2012.
Country P S
a- Determine whether the data is a time series, a cross sectional data or panel data.
b- Draw a scatter plot with the data of the table. Interpret your graph.
c- Find the correlation coefficient (P, S) and interpret your result.
d- Find a new correlation coefficient eliminating the Tokyo observation and interpret
your result.
1 1 1 1
𝜇̂ 2 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4
8 8 4 2
Show that this second estimator is also an unbiased estimator for the population
mean. Find its variance.
8
ECONOMETRICS: Problem Sets
13 We define a random variable 𝑥 as the resulting sum when tossing two dices. .
14 Evaluate in which of the below cases you can say that the presented results are
compatible and explain why:
and 𝜌 = 0.775
15 In the table below, U denotes the unemployment rate and I the inflation rate in six
American countries in 2011.
Country U I
a- Determine whether the data is a time series, a cross sectional data or panel data.
b- Draw a scatter plot with the data of the table. Interpret your graph.
c- Calculate the correlation coefficient (U, I) and interpret your result.
d- Calculate a new correlation coefficient eliminating the Venezuelan observation and
interpret your result.
9
ECONOMETRICS: Problem Sets
𝑥1 + 𝑥2 𝑥1 + 𝑥4
𝜇̂ 1 = ; 𝜇̂ 2 =
2 4
Note: Observations are independent.
17 A professor teaches a large class and has scheduled an exam for 7:00 pm in a different
classroom. She estimates the probabilities in the table for the number of students who will call her
at home in the hour before the exam asking where the exam will be held.
18 A specific company has observed in the last 5 months that their sales depend on the
amount invested in advertising. Observe the table below:
$ 100,000 R$ 1,000,000
$ 200,000 R$ 1,000,000
$ 300,000 R$ 2,000,000
$ 400,000 R$ 2,000,000
$ 500,000 R$ 4,000,000
10
ECONOMETRICS: Problem Sets
a- Construct a scatter plot of the data. Does a clear linear relationship exist between the
two variables?
b- Conduct a descriptive and correlation analysis of the above data and interpret both
analysis.
19 In this exercise a researcher uses data on NBA players’ salaries and their
determinants. She is interested in knowing the effect of performance on NBA players’
salaries. The following information is available for 56 NBA players.
The summary statistics for all the variables in Table 1, as well as the correlation matrix, are
presented in Tables 2 and 3 respectively.
11
ECONOMETRICS: Problem Sets
20 Let Y1, Y2, Y3, and Y4 be independent, identically distributed random variables from
a population with mean μ and variance σ2. Let Y be:
1 1 1 1
𝑌= 𝑌1 + 𝑌2 + 𝑌3 + 𝑌4
4 6 2 4
denoting the average of these four random variables. Show that Y is a biased estimator of μ.
12
ECONOMETRICS: Problem Sets
PS2
Linear Regression
Analysis
COURSE CONTENT
13
ECONOMETRICS: Problem Sets
2 The per capita consumption of electric energy, in thousands of kWh (C), and the per
capita income (X), in thousands of Euros for the countries belonging to the European Union
in 2001 are explained for the following linear model:
Compute the per capita income elasticity for a per capita income of 6,000 Euros.
3 Review exercise 11 in Problem Set 1. Find (using the OLS equations) the simple
regression line that explains the behavior of P through the information contained in S. Use
firstly the six city observations and then, estimate the same regression line but eliminating
the Tokyo observation. Interpret and explain your results. Which is the difference between the
linear correlation analysis discussed in exercise 11 of Problem Set 1 and the linear
regression analysis performed in this exercise?
4 Analytically show that ∑𝑛𝑖=1 𝑢̂𝑖 = 0 is a descriptive property, which is satisfied when
estimating a SLRM using OLS.
5 We have a dataset containing data about births to women in the United States. Two
variables of interest are the dependent variable, infant birth weight in onces (bw), and an
explanatory variable, average number of cigarettes the mother smoked per day during
pregnancy (cigs). The following simple regression was estimated using data on 1,388 births:
̂ 𝑖 = 119.77 − 0.514𝑐𝑖𝑔𝑠𝑖
𝑏𝑤
14
ECONOMETRICS: Problem Sets
Such that 𝑌𝑡𝐴 measures total production in thousand Euros in year 𝑡 and 𝐾𝑡𝐴 measures the
use of capital in thousand Euros in year 𝑡.
c- In 2010 (𝑡 = 2010), the use of capital in company A had a value of 320,000 Euros
and 280,000 Euros in company B. Both companies are planning to expand their
businesses to the Brazilian market in 2015. Therefore, their capital levels will increase
20% respect to 2010. Find the total production prediction in 2015 (𝑡 = 2015) for
each company using the estimated cost functions. Explain which company will obtain
a more accurate prediction in your opinion.
d- Do you think the relationship between production and the use of capital has constant
returns (whether linearity assumption is satisfied)? If no, specify a more realistic
regression model.
7 Are rent rates influenced by the city population? Using 2005 data for 70 cities, the
following equation relates rent rates (rent) to total city population (pop):
15
ECONOMETRICS: Problem Sets
a- Find the regression coefficients for a regression model that investigates the behaviour
of e through the behaviour of g.
b- Interpret your regression coefficients.
c- Find and interpret the value of the determination coefficient.
d- Calculate the predicted e when g=3.15.
9 Review exercise 8 in Problem Set 1. Find (using the OLS equations) the simple
regression line that explains the behavior of E through the information contained in P. Use
firstly the six country observations and then, estimate the same regression line but eliminating
the Japanese observation. Interpret and explain your results. Which is the difference between the
linear correlation analysis discussed in exercise 8 of Problem Set 1 and the linear
regression analysis performed in this exercise?
10 Analytically show that if you estimate a SLRM using OLS method of estimation the
𝐶𝑜𝑣(𝑦̂,𝑖 𝑢̂𝑖 ) = 0
11 The CAMP (Capital Asset Pricing Model) is an equilibrium model explaining the
expected returns for assets. The regression for the excess of return (over the free-risk asset)
has the following econometric specification:
𝑓 𝑓
(𝑅𝑡 − 𝑟𝑡 ) = 𝛽0 + 𝛽1 (𝑅𝑡𝑀 − 𝑟𝑡 ) + 𝑢𝑡
𝑓
Where, for the 𝑡 − 𝑡ℎ month, 𝑅𝑡 represents the return of the asset, 𝑟𝑡 is the monthly return
of the risk-free asset (for example, the Treasury bills with a maturity of 30 days), 𝑅𝑡𝑀 is the
16
ECONOMETRICS: Problem Sets
return of the market available assets, and 𝑢𝑡 is the random perturbance term that captures
the random fluctuations that are independent on the market portfolio.
a- Interpret 𝛽1.
b- What can be say about an asset with 𝛽1 = 1? And one with 𝛽1 > 1? And with
𝛽1 < 1?
c- Explain the G-M condition that is being described above.
a- Estimate the Simple Linear Regression Model associated to the data. Interpret
your estimation results.
b- If the company invests 355,000 in advertising, what is the forecasted amount of
sales?
X Y
62 8.1
70 9.0
76 9.2
82 10.5
88 10.8
74 9
75 8.1
a- Estimate the relationship between X and Y using OLS; that is obtain the intercept and
slope estimates in the regression equation.
b- Compute the fitted values and residuals for each observation, and verify if residuals
(approximately) sum to zero.
c- What is the predicted value of Y when X=58?
d- How much of the variation in Y for these 7 observations is explained by X?
17
ECONOMETRICS: Problem Sets
14 The following data give X, the price charged per piece of playwood, and Y, the quantity
sold (in thousands).
$6 80
$7 60
$8 70
$9 40
$10 0
Such that 𝑌𝑡𝐴 measures total production in thousand Euros in year 𝑡 and 𝐿𝐴𝑡 measures the
use of labour in number of workers in year 𝑡.
c- In 2010 (𝑡 = 2010), the use of labour in company A had a value of 3,500 workers
and 2,800 workers in company B. Both companies are planning to expand their
businesses to the Chinese market in 2015. Therefore, their labor levels will increase
20% respect to 2010. Find the total production prediction in 2015 (𝑡 = 2015) for
each company using the estimated production functions. Explain which company will
obtain a more accurate prediction in your opinion.
18
ECONOMETRICS: Problem Sets
d- Do you think the relationship between production and the use of labor has constant
returns (whether linearity assumption is satisfied)? If no, specify a more realistic
regression model.
16 Analytically show that the OLS estimator for the intercept in a Simple Linear Regression
Model is an unbiased estimator.
17 We denote 𝐼𝑖 as total investment in a country (million dollars) and 𝐼𝑅𝑖 represents the
interest rate. We consider the following linear regression model that yields the relationship
between I and IR:
𝐼𝑖 = 𝛽0 + 𝛽1 𝐼𝑅𝑖 + 𝑢𝑖
Such that 𝑇𝐶𝑡𝐴 measures total production costs in thousand Euros in year 𝑡 and 𝑃𝑡𝐴 measures
the level of production in thousand Euros in year 𝑡.
Interpret the coefficients of the estimated cost function for company B in comparison
to the coefficients for company A.
19
ECONOMETRICS: Problem Sets
𝑀𝑡 = 𝛽0 + 𝛽1 𝑚𝑒𝑡 + 𝑢𝑡
In order to estimate the above regression model, data of the last five elections is collected
obtaining the following estimated regression:
̂𝑡 = 2.684 + 0.025𝑚𝑒𝑡
𝑀 𝑇=5 𝑅 2 = 0.392
20 Review exercise 15 in Problem Set 1. Find (using the OLS equations) the
simple regression line that explains the behavior of I through the information contained in
U. Use firstly the six country observations and then, estimate the same regression line but
eliminating the Venezuelan observation. Interpret and explain your results. Which is the
difference between the linear correlation analysis discussed in exercise 15 of Problem Set
1 and the linear regression analysis performed in this exercise?
20
ECONOMETRICS: Problem Sets
̂
𝑙𝑜𝑔(𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛) 𝑡 = 4.822 + 0.257 log(𝑐𝑎𝑝𝑖𝑡𝑎𝑙)𝑡 𝑅 2 = 0.311 𝑇 = 20
Where both production and the use of capital are measured in thousand Euros.
a- Interpret the the coefficient on log(capital). Is the sign of this estimate what
you expect it to be?
b- Interpret the determination coefficient.
c- What other factors may affect production levels?
d- Do you think the relationship between production and the use of capital has constant
returns? If no, specify a more realistic regression model.
22 Using data from 1988 for houses sold in Andover, MA, from Kiel and McClain
(1995), the following equation relates housing price (price) to the distance from a recently
built garbage incinerator (dist):
a- Interpret the the coefficient on log(dist). Is the sign of this estimate what
you expect it to be?
b- Interpret the determination coefficient. Why do you think is such a low value?
c- What other factors about a house affect its price? Might these be correlated
with distance from the incinerator?
23 Let sales be annual firm sales, measured in million dollars and salary annual salary
measured in thousand dollars. We estimate the following regression model:
21
ECONOMETRICS: Problem Sets
𝐺𝑖 = 𝛽0 + 𝛽1 𝑇𝐻𝑖 + 𝑢𝑖
Such that 𝐺𝑖 represents the student’s grade (points) obtained in Econometrics course and
𝑇𝐻𝑖 measures the total number of hours invested in studying Econometrics during the
course. Using a sample of 50 students at IE University, the following estimated regression is
obtained:
25 We denote 𝐼𝑖 as sale incomes for the shops located in a mall (thousand Euros) and
𝑁𝑆𝑖 represents the number of shop assistants working in each shop. We consider the
following linear regression model that yields the relationship between I and NS:
𝐼𝑖 = 𝛽0 + 𝛽1 𝑁𝑆𝑖 + 𝑢𝑖
𝐶𝑖 = 𝛽0 + 𝛽1 𝑖𝑛𝑐𝑖 + 𝑢𝑖
the (estimated) marginal propensity to consume (MPC) out of income is simply the slope of
the above regression model. Using observations for 100 families on annual income and
consumption (both measured in dollars), the following estimated equation is obtained:
22
ECONOMETRICS: Problem Sets
27 The data used for this exercise contains information on births for women in the
United States. Two variables of interest are the dependent variable, infant birth weight in
ounces (bwght) and an explanatory variable, average number of cigarettes the mother smoked
per day during pregnancy (cigs). The following simple regression was estimated using data on
n = 1388 births:
̂ 𝑖 = 119.77 − 0.514𝑐𝑖𝑔𝑠𝑖
𝑏𝑤𝑔ℎ𝑡
a- What is the predicted birth weight when cigs = 0? What about when cigs = 20 (one
pack per day)? Comment on the difference.
b- Does this simple regression necessarily capture a causal relationship between the
child’s birth weight and the mother’s smoking habits? Explain.
c- To predict a birth weight of 125 ounces, what would cigs have to be? Comment.
d- The proportion of women in the sample who do not smoke while pregnant is about
0.85. Does this help reconcile your finding from part (c)?
28 The econometrics team of the ministry of labor wants to investigate the relationship
between unemployment duration and job search effort. For that purpose, they collect
information from the Spanish Employment office (INEM) for 680 unemployed on the
following variables:
a- Specify the econometric model. Which relationship do you expect that holds in the
population between unemployment duration and effort? Why? Explain relating your
arguments to the elements of the postulated model.
23
ECONOMETRICS: Problem Sets
b- Could you think of variables included in the error component and correlated with
job search effort? Give two examples and discuss the consequences in terms of SLR
assumptions.
c- The estimation result is presented next:
𝑢𝑛𝑒𝑚
̂ 𝑖 = 24.5 − 1.86 𝑒𝑓𝑓𝑜𝑟𝑡𝑖
𝑛 = 680 𝑅 2 = 0.28
29 Explain, with your own words and using the graph below, Ordinary Least Squares
(OLS) estimation method.
24
ECONOMETRICS: Problem Sets
30 Suppose the following model describes the relationship between annual salary (salary)
and the number of previous years of labour market experience (exper)
a- What is salary when exper = 0? When exper = 5 Interpret the intercept. [Hint: you will
need to exponentiate].
b- Draw the shape (approximately) of the Population Regression Function for the salary
conditional on exper. Comment on the advantages of the semi-logarithmic function for
this particular example.
c- Approximate the percentage increase in salary when exper increases by five years. [Hint:
you can use the formula: %∆𝑦 ≈ (100 ∙ 𝛽1 )∆𝑥].
d- Use the results of part (a) to compute the exact percentage difference in salary when
exper = 5 and exper = 0. Comment on how this compares with the approximation in
part (c).
31 We have annual data, from 1963 until 1972, about the amount of money in a country
(𝑀𝑡 ) and the national income (𝑌𝑡 ), in million Euros, that can be summarised in the following:
𝑇 𝑇 𝑇
𝑇 𝑇
a- Could you specify a linear regression model representing the theory that states that the
national income is determined by the amount of money in a country?
b- Think about possible factors contained in 𝑢𝑖 of your econometric specification
representing the above theory.
c- Find the OLS estimated values for the parameters of your econometric model and
interpret your results.
25
ECONOMETRICS: Problem Sets
11 80
8 60
15 55
10 62
The average of TV Ad Minutes is 11; the average of money earned from Ad time is 64.25.
The standard deviation of TV Ad minutes is 2.94 and the standard deviation of money
earned from Ad time is 10.9. The covariance between variables is -7.33.
c- How much profit does the company make if they advertise for 20 minutes in a
month?
33 One company in the aeronautics industry wants to calculate the number of working
hours that are required to finish the design of a new airplane. They think that the relevant
explanatory variables are the top speed of the airplane, its weight and the number of pieces
that are shared with other airplane models that the company builds. In order to do this, a
sample of 35 airplanes is taken and the following model is estimated:
Such that:
𝑥3𝑖 = percent number of pieces that are shared with other airplane models.
26
ECONOMETRICS: Problem Sets
Such that for each outlet in the sample: 𝑦𝑖 measures total annual sales (thousand dollars), 𝑥1𝑖
is the number of competitor outlets in the locality where the outlet is located, 𝑥2𝑖 measures
the local population (millions) and 𝑥3𝑖 indicates annual marketing expenditures in each
sample outlet (thousand dollars).
a- Which would be, in your opinion, the expected signs for 𝛽1, 𝛽2 and 𝛽3? Why?
b- Interpret the estimated intercept coefficient if the above model is estimated such that:
𝑦̂𝑖 = 14 − 1𝑥1𝑖 + 0.3𝑥2𝑖 + 0.2𝑥3𝑖 𝑅 2 = 0.5809
c- Find the estimated change in annual sales for an outlet having 5 additional competitor
outlets within its local market, maintaining the population and marketing
expenditures as constant terms.
d- Interpret the value 𝛽̂3 = 0.2.
e- Discuss the explanatory power of the above estimated regression model.
f- The sixth sample outlet has 7 local competitors, is placed in a locality with 2,750,300
inhabitants and its marketing expenditures are 150,000 dollars. Find the estimated
annual sales for this outlet.
g- The true annual sales for the sixth sample outlet are 890,000 dollars each. Find the
estimated residual for this outlet.
35 We estimate a model that relates the salary for business managers with the sales of
the firm and the market value of the firm such that:
𝑛 = 220 𝑅 2 = 0.3481
27
ECONOMETRICS: Problem Sets
𝑛 = 220 𝑅 2 = 0.3541
Why 𝑝𝑟𝑜𝑓𝑖𝑡𝑠𝑖 variable is not included in the model in logs? Which is the model with
a better goodness-of-fit? Do these firm specific variables explain the behaviour of
the wage variable? Why?
36 Consider the regression model in which the dependent variable (television viewing
hours per week) is to be explained in terms of three explanatory variables:
Such that:
Explain, in your opinion, whether the following statements are true or false:
a- One unit change in 𝑥1𝑖 always produces the same effect on the value of the
independent variable.
b- One unit change in 𝑥1𝑖 does not produce the same effect on y, but depends on the
value of 𝑥1𝑖 .
28
ECONOMETRICS: Problem Sets
38 A company in the financial sector wants to rent an office space in Madrid. The
following regression model estimates the rent prices for office space in Madrid:
Such that sqfeet is the office space in square feet, dis is the distance between the place the
office is located and the city centre, measured in kilometres, and price is the monthly rental
price in thousand Euros.
𝑝𝑟𝑖𝑐𝑒
̂ 𝑖 = −19.315 + 1.1284𝑠𝑞𝑓𝑒𝑒𝑡𝑖 − 0.8819𝑑𝑖𝑠𝑖
𝑛 = 120 𝑅 2 = 0.6319
c- Find the estimated increment in the rental price for an office space with 100
additional square feet, maintaining the distance to the city centre as a constant term.
d- Find the change in the rental price for an office located 5 additional kilometres away
from the city centre, maintaining the size of the office as a constant term.
e- Discuss the explanatory power of the regression model.
f- The fifth sample office has a size of 120 square feet and is located at a distance of
5.4 kilometres from the city centre. Find the estimated rental price using the above
OLS regression line.
g- The true rental price for the fifth sample office is 89,000 Euros each month. Find the
estimated residual for this office. Could this suggest that the company is over paying
or under paying its office space?
Such that oilp is the price of oil in Dollars per barrel, dis is the distance between the location
of the manufacturing company and the location of its main supplier, measured in kilometers
and tc denotes transportation costs in thousand Dollars.
29
ECONOMETRICS: Problem Sets
𝑛 = 400 𝑅 2 = 0.6319
40 The initial wage for just graduated lawyers is determined by the following estimated
regression model:
𝑛 = 200 𝑅 2 = 0.278
Such that 𝑤𝑎𝑔𝑒𝑖 measures initial monthly wage in thousand Euros, 𝑏𝑜𝑜𝑘𝑖 indicates the
number of law books in the university library where the graduated studied and 𝑐𝑜𝑠𝑡𝑖
measures the annual cost (thousand Euros) of the university where the graduated got her law
title.
𝑛 = 200 𝑅 2 = 0.294
Why 𝑟𝑎𝑛𝑘𝑖 variable is not included in the model in logs? Which is the model with a better
goodness-of-fit? Do these university specific variables explain the behaviour of the wage
variable? Why?
30
ECONOMETRICS: Problem Sets
41 A consultancy firm is analyzing property prices in the city of Madrid using a sample
of 88 properties using the following regression model:
𝑝𝑖 = 𝛽0 + 𝛽1 𝑠𝑞𝑟𝑓𝑡𝑖 + 𝛽2 𝑏𝑑𝑟𝑚𝑠𝑖 + 𝑢𝑖
Such that p is property price in thousand dollars, sqrft is the size of the property in squared
feet, and bdrms is the number of bedrooms.
𝑛 = 88 𝑅 2 = 0.6319
𝑛 = 400 𝑅 2 = 0.388
31
ECONOMETRICS: Problem Sets
Such that 𝑠𝑎𝑙𝑎𝑟𝑦𝑖 measures monthly wage in thousand Euros, 𝑠𝑎𝑙𝑒𝑠𝑖 indicates monthly firm
sales in thousand Euros and 𝑐𝑒𝑜𝑡𝑒𝑛𝑖 measures CEO tenure with the firm in years.
We re-estimate the above model including a new explanatory factor, CEO education in years
and we obtain the following estimation results:
𝑛 = 400 𝑅 2 = 0.498
44 The Data on U.S. working men was used to estimate the following equation:
𝑛 = 722 𝑅 2 = 0.214
where educ is years of schooling, sibs is number of siblings, meduc is mother’s years of
schooling, and feduc is father’s years of schooling.
a- Does sibs have the expected effect? Explain. Holding meduc and feduc fixed, by how
much does sibs have to increase to reduce predicted years of education by one year?
(A non-integer answer is acceptable here)
b- Discuss the interpretation of the coefficient on meduc.
c- Suppose that Man A has no siblings, and his mother and father each have 12 years
of education. Man B has no siblings, and his mother and father each have 16 years
of education. What is the predicted difference in educ between A and B?
d- Would you say sibs, meduc and feduc explain much of the variation in educ? What other
factors might affect men’s years of schooling? Are these likely to be correlated with
sibs? Explain.
45 For a child 𝑖 living in a particular school district, let 𝑣𝑜𝑢𝑐ℎ𝑒𝑟𝑖 be a dummy variable
equal to one if a child is selected to participate in a school voucher program, and let 𝑠𝑐𝑜𝑟𝑒𝑖
be that child’s score on a subsequent standardized exam. Suppose that the participation
32
ECONOMETRICS: Problem Sets
a- If you run a simple regression 𝑠𝑐𝑜𝑟𝑒𝑖 on 𝑣𝑜𝑢𝑐ℎ𝑒𝑟𝑖 , using a random sample of size
𝑛, which sign do you expect to find on the coefficient associated to the dummy
variable? Explain the intuition.
b- Does the OLS estimator provide an unbiased estimator of the effect of the voucher
program?
c- Suppose you can collect additional background information, such as family income,
family structure and parent’s education levels. Do you need to control for these
factors to obtain an unbiased estimator of the effects of the voucher program?
Explain.
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢
Write and explain at least three characteristics that this model need to have to not violate
Gauss-Markov theorem.
47 Consider the multiple regression model containing three independent variables, under
Assumptions MLR.1 through MLR.4:
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢
You are interested in estimating the sum of the parameters on 𝑥1 and 𝑥2 ; call this 𝜃1 = 𝛽1 +
𝛽2
33
ECONOMETRICS: Problem Sets
48 Review exercise 19 in Problem Set 1. In order to exploit the data, the researcher
decides to estimate two different multiple linear regression models. They are presented next:
a- Interpret the estimated regression coefficients from Model 1. Are the signs of the
coefficients the expected ones? Compute the R-squared and interpret its meaning.
b- Is there a penalty in terms of lower wages associated with age? Explain. Is this result
consistent with the correlation analysis carried out in exercise 19 PS1.
c- Compare the estimated slope parameters associated to the explanatory variable of
interest (POINTS) between Models 1 and 2. Does the estimated slope change from
Model 1 to 2? Could you give an explanation for this happening?
d- Which model is the best in terms of goodness-of-fit?
48 We have information about the following variables: (1) 𝑝𝑖 (rental office prices in
thousand Euros per month, (2) 𝑠𝑖 (size of the office space in square meters), (4) 𝑑𝑖 (distance
from the city centre in kilometres) and (4) 𝑛𝑖 (number of floors of the building in which the
office space is placed). We use regression analysis to obtain some insights about the
34
ECONOMETRICS: Problem Sets
behaviour of rental office prices using a sample of 150 offices located within the city of
Barcelona in 2012. You can see in the below table estimation results:
Note: Models 1, 2 and 3 use as dependent variable prices and Model 4 uses as dependent
variable log(prices).
a- What happens to the coefficient of size when comparing Model 1 and Model 2?
Why?
b- Which model do you prefer when comparing Model 1 and Model 2?
c- Do you think Model 3 is a better specification than Model 2? Why?
d- Interpret the coefficients associated to Model 4?
e- Is Model 4 the best model? Why?
35
ECONOMETRICS: Problem Sets
PS3
Hypothesis
Testing
COURSE CONTENT
Three econometricians went out hunting, and came across a large deer. The first econometrician fired, but
missed, by a meter to the left. The second econometrician fired, but also missed, by a meter to the right. The
third econometrician didn't fire, but shouted in triumph, "We got it! We got it!"
36
ECONOMETRICS: Problem Sets
̂𝑖 = 26.652 − 0.537𝑃𝑖
𝐶𝐷 𝑅 2 = 0.41
(1.189) (0.06)
a- Interpret the estimated regression model and the value of the determination coefficient
b- Test the null hypothesis that the polarisation coefficient is zero at a 1% significance
level.
c- Knowing that a country has a 25% support for extremist parties, find the predicted
cabinet duration.
d- In your opinion, explain one application of the above model from the perspective of
a non-extremist political party.
̂𝑡 = 32.555 + 0.0534𝑡
𝑙𝑜𝑔𝑦
(33.2) (0.00211)
3 Population time evolution in the United States follows an exponential growth model
which was estimated for the time period 1970-1999 (both years included in the sample) such
that:
̂𝑡 = 201.9727 + 0.0284𝑡
𝑙𝑜𝑔𝑦
(743.2) (0.00211)
37
ECONOMETRICS: Problem Sets
4 Consider a SLRM relating the annual number of crimes on college campuses (crime) to
student enrollment (enroll) with the following estimation results:
̂ 𝑖 = −6.63 + 1.27log(𝑒𝑛𝑟𝑜𝑙𝑙)𝑖
log(𝑐𝑟𝑖𝑚𝑒) 𝑛 = 97 𝑅 2 = 0.585
(1.03) (0.11)
̂𝑡 = 1.20 + 0.55𝑙𝑜𝑔𝑥𝑡
𝑙𝑜𝑔𝑦 𝑆𝑆𝑇 = 330 𝑆𝑆𝑅 = 51
(0.11) (0.02)
38
ECONOMETRICS: Problem Sets
𝐼𝑛𝑓𝑙𝑎𝑡𝑖𝑜𝑛𝑖 = 𝛽0 + 𝛽1 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡𝑅𝑎𝑡𝑒𝑖 + 𝑢𝑖
Where both variables are measured in percentage points, a sample of 100 countries is used
in order to estimate the above model and the following information is given:
a- Find the OLS estimation of the effect of interest rates on inflation and the estimated
standard error.
b- Interpret your estimation results.
c- Calculate a one-tailed t-test in order to validate the significance of the estimated slope
coefficient at 1% significance level.
d- What could you say about the explanatory power of the above model? Test the whole
model fit at 5% significance level.
a- Interpret the estimated regression model and the value of the determination coefficient
b- Test the null hypothesis that work satisfaction does not produce any significant effect
on labour absenteeism at a 1% significance level.
c- The level of work satisfaction of a different worker is 6. Find the predicted labour
absenteeism days per year for this worker.
d- In your opinion, explain one application of the above model from the perspective of
the Human Resources department of the company.
39
ECONOMETRICS: Problem Sets
8 The following theoretical model is the so-called characteristic line for investment analysis:
𝑟𝑖𝑡 = 𝛽0 + 𝛽1 𝑟𝑚𝑡 + 𝑢𝑡
Such that the dependent variable measures return rate for an asset and the explanatory
variable denotes return rate for the market portfolio. In this type of model, we can interpret
the slope coefficient as a risk indicator. The above model was estimated using 240 monthly
return rates for the period 1956-1976 (both years included) related to IBM assets and USA
market portfolio:
𝑟̂
𝑖𝑡 = 0.7264 + 1.059𝑟𝑚𝑡 𝑅 2 = 0.551
(0.3001) (0.0728)
9 The French Ministry of Education is analyzing the evolution of university tuition fees
in the last 20 years. Using a sample of 55 public universities, the following estimated model
is obtained:
̂
log (𝑦𝑖 ) = 38.03 + 0.07𝑡 𝑅 2 = 0.41
(14.2) (0.05)
40
ECONOMETRICS: Problem Sets
10 The OLS estimation for a model that relates annual household expenditures in
thousand Euros (𝐺𝑖 ) with annual household disposable income in thousand Euros (𝐼𝑖 ) and
number of individuals within the household (𝑁𝑖 ) is given by the following regression line
(𝑛 = 38 households):
11 An econometric study for the period 1960-2004 relates production costs in USA (y)
and time (x) such that t=1 (1960), t=2 (1962), and ... t=23 (2004). The following exponential
model is obtained:
̂
log(𝑦𝑡 ) = 95.3 + 0.0253𝑡
(4.15) (0.008)
12 Are rent rates influenced by the student population in a college town? Let rent be the
average monthly rent paid on rental units in a college town. Let pop denote the total city
population, avginc the average city income, and pctstu the student population as a percent of
the total population. We get the following estimation results:
𝑅 2 = 0.458 𝑇 = 64
41
ECONOMETRICS: Problem Sets
Test whether we should keep the quadratic term in the model at 1% significance level.
a- Specify, in terms of the model parameters, the null hypothesis that, once that sales and
roe are accounted for, ros does not influence the CEO wage. As alternative hypothesis,
consider that, other things equal, a higher equity value tends to increase the CEO
wage.
𝑅 2 = 0.283 𝑛 = 209
b- In what predicted percentage would the wage increase if ros increased by 50 points?
c- Test, at the 5% significance level, the null that ros has no effect on salary, against the
alternative, that it has a positive effect.
d- Would you include ros in the final model to explain the CEO wage as a function of
the firm performance? Justify.
42
ECONOMETRICS: Problem Sets
15 Using a dataset for 46 states of the United States in 1992, the following estimated
regression line was obtained:
Such that:
16 The goal of this exercise is to test the rationality of assessments of housing prices.
We use a model that relates the assessment of the house with its price for a sample of 88
houses.
a- In the SLRM:
𝑝𝑟𝑖𝑐𝑒𝑖 = 𝛽0 + 𝛽1 𝑎𝑠𝑠𝑒𝑠𝑠𝑖 + 𝑢𝑖
𝑝𝑟𝑖𝑐𝑒
̂ = −14.47 + 0.976𝑎𝑠𝑠𝑒𝑠𝑠𝑖 𝑅 2 = 0.82 𝑆𝑆𝑅 = 165,644.
(16.27) (0.049)
First, test whether assess is a significant variable at 5% significance level. Then, test 𝐻0 : 𝛽1 =
1 . What do you conclude?
c- Now test whether the addition of new variables in the model below is a significant
improvement respect the first model:
Knowing that the determination coefficient of this model using the 88 houses is 0.829.
43
ECONOMETRICS: Problem Sets
17 For a sample of 506 communities in the Boston area, we estimate a model relating
median housing prices (price) in the community with two housing characteristics: dist is a
weighted distance of the community from five employment centres, in miles and rooms is the
average number of rooms in house in the community:
We try to improve the above specification by introducing two new independent factors
related to community characteristics: nox is the amount of nitrous oxide in the air, in parts
per million and stratio is the average student-teacher ratio of schools in the community:
loĝ
(𝑟𝑒𝑛𝑡)𝑖 = 11.8 + 0.25𝑟𝑜𝑜𝑚𝑠𝑖 − 0.13 log(𝑑𝑖𝑠𝑡𝑖 ) − 0.95 log(𝑛𝑜𝑥𝑖 ) − 0.052𝑠𝑡𝑟𝑎𝑡𝑖𝑜𝑖
18 We estimate a model aiming to study the annual salary, measured in thousand dollars
(1980-2007 time period) as a function of labor experience and education level, both of them
measured in years. The estimation results are the following:
44
ECONOMETRICS: Problem Sets
19 We have the following equation representing the behavior of salaries in the British
economy for the time period 1950-1969:
Where:
Where 𝑌 denotes annual income levels (thousand pounds), 𝑁 measures the number of trains
belonging to the company in each year, C is annual electricity consumption (thousand
pounds), L denotes annual labor costs (thousand pounds) and 𝑁𝐶 is total number of clients
in each year.
45
ECONOMETRICS: Problem Sets
a- Analytically show that imposing the linear restriction 𝛽1 = −𝛽3, the model can be
rewritten as:
𝑙𝑜𝑔𝑦𝑡 = 𝛽0 + 𝛽2 𝑙𝑜𝑔𝑥2𝑡 + 𝛽3 𝑙𝑜𝑔𝑧𝑡 + 𝑢𝑡
𝑥
Knowing that: 𝑧𝑡 = 𝑥3𝑡
1𝑡
And
a- 𝛽1 = 0
b- 𝛽4 = 𝛽5
c- 𝛽3 = 𝛽4 = 𝛽5 = 0
46
ECONOMETRICS: Problem Sets
𝑚𝑡 = 𝑧𝑡 + 𝑠𝑡
𝑦𝑡 = 𝛼̅ + 𝛽̅ 𝑥𝑡 + 𝜑̅𝑚𝑡 + 𝑢𝑡
Test the null hypothesis that the coefficients for 𝑧𝑡 and 𝑠𝑡 are the same at 5% significance
level. What about at 1% significance level?
2
𝑦̂𝑖 = 7.059 + 1.0847𝑥1𝑖 − 0.004𝑥1𝑖 − 0.245𝑥2𝑖 𝑅 2 = 0.567 𝑆𝑆𝑅 = 47
47
ECONOMETRICS: Problem Sets
Such that 𝑦𝑖 denotes sales (thousand Euros), the first explanatory variable measures
marketing expenditures (thousand Euros) and the second explanatory variable denotes
production costs (thousand Euros).
Where: Salary (major league baseball players salary), years (years in the league), gamesyr
(average games played by year), bavg (career batting average), hrunsyr (home runs per year)
and rbisyr (runs batted in per year).
Having a sample of 352 players, we estimate both models and obtain a SSR for the first
model of 183.186 and 198.311 for the second one. Knowing that the R-squared of the first
model is 0.6278 and for the second one 0.5971:
48
ECONOMETRICS: Problem Sets
27 A laboratory collected data about the cost of material used for testing necessary
products over a one year period. They want to know if the cost of materials A, B and C have
a significant value on the overall cost of testing. Observe the following tables and answer to
the questions below:
REGRESSION STATISTICS
R Squared 0.861831639
Observations 7
F-statistic 6.237546965
REGRESSION
RESULTS
49
ECONOMETRICS: Problem Sets
28 We have information about mortality rates (MORT=total mortality rate per 100,000
population) in a specific year for 51 States of the United States combined with information
about potential determinants: INCC (per capita income by State in Dollars), POV
(proportion of families living below the poverty line), EDU (proportion of population
completing 4 years of high school), TOBC (per capita consumption of cigarettes by State)
and AGED (proportion of population over the age of 65). Estimation results are presented
in the following table:
OLS Estimation
Results
Model 1 Model 2 Model 3
Variable coefficients coefficients coefficients
Constant 194.747 531.608 -9.231
(53.915) (94.409) (176.795)
Aged 5,546.56 5,024.38 5,311.4
(445.727) (358.218) (334.415)
Incc 0.014 0.015
(0.0038) (0.0037)
Edu -682.591 -285.715
(114.812) (152.926)
Pov 854.178
(302.345)
Tobc 0.989
(0.342)
n 51 51 51
Adjusted R squared 0.759 0.856 0.884
SSR 228,770.3 128,260.1 99,303.73
50
ECONOMETRICS: Problem Sets
51
ECONOMETRICS: Problem Sets
52
ECONOMETRICS: Problem Sets
PS4
Categorical Analysis
(Dummy Variables)
COURSE CONTENT
53
ECONOMETRICS: Problem Sets
1 We have information about the average annual salary (dollars) for teachers in public
secondary schools in 45 states in the USA. Using this information the following model is
estimated:
Such that 𝑥𝑖 is expenditures in public secondary schools per pupil (dollars), 𝐷1𝑖 is a dummy
variable being 1 if the state is a North-eastern or North central state and 𝐷2𝑖 is a dummy
variable being 1 if the state is a Southern state.
Interpret this estimated regression model and calculate the appropriate tests to validate the
model at 1% significance level.
2 Suppose you have survey data on wages, education, professional experience and
gender. Additionally, you have answers to the following question: how many times have you
smoked marihuana in the last month?
3 We are analyzing quarterly ice-cream consumption during ten years and estimate the
following regression model:
54
ECONOMETRICS: Problem Sets
where 𝑡𝑟𝑎𝑖𝑛 is a binary variable equal to unity if a worker participated in the program.
Think of the error term 𝑢 as containing unobserved worker ability. If less able workers
have a greater chance of being selected for the program, and you use an OLS analysis,
what can you say about the likely bias in the OLS estimator of 𝛿0 ?
5 Using the data of eight firms, a regression model was estimated to analyze the
relationship between investment in thousand Euros (𝑦𝑖 ) and production growth rate in %
(𝑥𝑖 ):
Additionally, two different regressions are estimated. The first one only takes into account
European firms within the original sample:
And the second one only takes into American firms within the original sample:
Find whether making the distinction between European and American firms helps to
understand better the behavior of investment and interpret your results.
55
ECONOMETRICS: Problem Sets
6 We have the following estimated regression model that explains the behavior of
profits:
Such that profit is monthly profits in thousand dollars, pc is monthly production costs in
thousand dollars, sector is a sector dummy variable with a value of 1 if the sampled company
belongs to the tertiary sector, home is a nationality dummy variable equals to 1 if the sampled
company is a national company, south is a dummy variable with a value of 1 if the sampled
company is located in the south of the country and urban is a dummy variable with a value
of 1 if the sampled company is located in an urban area.
a- Find the predicted average profit for a foreign manufacturing company that is located
in a rural area at the north of the country independently of pc.
b- Taking two companies of our sample with the same production costs, find the
estimated average difference in their monthly profit if we know that one of them is
a national manufacturing company located in a southern city of the country and the
other one is a foreign services company located in a northern city of the country.
Where the dependent variable is recurrent expenditures and the explanatory variable is
number of students in each secondary school.
However, it is believed that the type of school affects completely the behavior of recurrent
expenditures and two different regression models are estimated distinguishing between
regular secondary schools (40 observations) and occupational secondary schools (34
observations) such that:
56
ECONOMETRICS: Problem Sets
Is there a significant difference in the behaviour of recurrent expenditures between the two
types of schools? Interpret your result at 1% significance level.
8 Male babies tend to weigh more than female babies do. If we define a dummy variable
𝑀 = 1 for male babies and 𝑀 = 0 for female babies, the regression that explains baby´s
weigh in grams (𝑌) as a function of the number of cigarettes per day smoked by the mother
(𝑥) and the dummy variable 𝑀 is the following (sample size 𝑛 = 964):
Interpret this estimated regression model and calculate the appropriate tests to validate the
model.
9 Using the data of the previous exercise, a new regression model is estimated such
that (strategy 1):
Strategy 2 consists on performing two different regressions. The first one only takes into
account babies that are first-born (their mothers do not have previous births):
And the second one only takes into account babies that are not first-born (their mothers
have previous births):
Find the most appropriate strategy to better understand the behaviour of the dependent
variable (structural break?) and interpret your results.
57
ECONOMETRICS: Problem Sets
Test the null hypothesis that the regression coefficients are the same in the two sampled time
sub-periods knowing that T=64 and interpret your result.
11 We have the following estimated regression model that explains the behavior of
salaries:
Such that wage is the weekly salary in dollars, edu is years of education, exp is years of
professional experience, male is a gender dummy variable with a value of 1 if the sampled
individual is a male, black is a race dummy variable with a value of 1 if the sampled individual
is black, south is a dummy variable with a value of 1 if the sampled individual lives in the
south of the country and urban is a dummy variable with a value of 1 if the sampled individual
lives in an urban area.
a- Which would be the predicted average salary for a black female that lives in a rural area
at the north of the country independently from edu and exp?
b- Taking two males from our sample with the same years of education and the same
years of professional experience, which would be the estimated average difference in their
weekly salary if we know that one of them is black and lives in a southern city of the
country and the other one is white and lives in a northern city of the country?
𝑦𝑡 = 𝛽0 + 𝛽1 𝑇𝑖𝑚𝑒𝑡 + ∑ 𝛾𝑖 𝐷𝑖𝑡 + 𝑢𝑡
𝑖=1
̂0 = 0.2; 𝛽
Such that: 𝛽 ̂1 = 0.01; 𝛾̂1 = 0.5; ̂
𝛾2 = 0.8; 𝛾̂3 = 0.2.
Time subscript at the end of the sampled period is 𝑇𝑖𝑚𝑒𝑡 = 𝑇 = 200 and the last
observation is the second quarter in 2010. Which are the predicted values of the dependent
variable for the third quarter in 2010 and for the fourth quarter in 2010?
58
ECONOMETRICS: Problem Sets
13 The variable s denotes the time invested in sleeping at night (minutes per week), w is
the time invested in working (minutes per week), e (level of education) and a (age of the
individual) are measured in years and m is a dummy variable with a value of 1 if the individual
is a male. Sample size is 706 individuals.
Where:
X3= number of due process reviews by state courts that resulted in overturn of legislations
in previous 40 years.
X5= dummy variable taking a value 1 if justices of the state supreme court can be removed
from office by the governor, judicial review board or majority vote of the supreme court and
0 otherwise.
X6= dummy variable taking value of 1 if Supreme Court justices are elected on partisan
ballots and 0 otherwise.
59
ECONOMETRICS: Problem Sets
𝑅 2 = 0.2646
a- Interpret the coefficient of determination and use it to test the null hypothesis that,
taken as a group, the five independent variables do not linearly influence the
dependent variable.
b- Interpret the coefficients associated to x3 and x4.
16 We have the following estimated regression model that explains the behavior of
salaries:
Such that wage is the weekly salary in dollars, edu is years of education, exp is years of
professional experience, male is a gender dummy variable with a value of 1 if the sampled
individual is a male, black is a race dummy variable with a value of 1 if the sampled individual
is black, south is a dummy variable with a value of 1 if the sampled individual lives in the
south of the country, urban is a dummy variable with a value of 1 if the sampled individual
lives in an urban area and married is a dummy variable with a value of 1 if the sampled
individual is a married individual.
a- Which would be the predicted average salary difference for a black single female that lives
in a urban area at the north of the country respect the reference category and
independently of edu and exp?
b- Taking two males from our sample with the same years of education and the same
years of professional experience, which would be the estimated average difference in their
weekly salary if we know that one of them is black, single and lives in a southern city
of the country and the other one is white, married and lives in a northern city of the
country?
60
ECONOMETRICS: Problem Sets
17 Using the data of the mid-term exam results, the Econometrics teacher estimates the
following regression model (strategy 1):
Strategy 2 consists on performing two different regressions. The first one only takes into
account students that are foreign individuals:
And the second one only takes into account Spanish students:
a- Find the most appropriate strategy to better understand the behavior of the mid-term
exam grades and interpret your results.
b- Specify a model in which you could test directly if there is a difference in the
performance of the mid-term exam depending on whether the student is Spanish or
foreigner, independently of other factors.
𝑛 = 4,137 𝑅 2 = 0.0858
The variable sat is the combined SAT score, hsize is the size of the student’s high school
graduating class, in hundreds, female is a gender dummy variable and black is a race dummy
variable equal to 1 for blacks and 0 otherwise.
Note: summary statistics for SAT score: mean = 1,030; min = 540; max = 1,504
a- Holding hsize fixed, what is the estimated difference in SAT score between non-
black females and non-black males? How statistically significant is this estimated
difference?
b- Holding hsize fixed, what is the estimated difference in SAT score between non-
black males and black males? Test the null hypothesis that there is no difference
between their scores, against the alternative that there is a difference.
c- Holding hsize fixed, what is the estimated difference in SAT score between black
females and non-black females? What would you need to do to test whether the
difference is statistically significant?
61
ECONOMETRICS: Problem Sets
̂
log (𝑤𝑖 ) = 1.6 − 0.32𝑓𝑒𝑚𝑖 + 0.16log(𝑠𝑖𝑧𝑒𝑖 ) + 0.05𝑒𝑑𝑢𝑖 𝑅 2 = 0.31 𝑆𝑆𝑅 = 359
(0.02) (0.02) (0.02) (0.002)
Such that 𝑤𝑖 measures salaries in thousand dollars for each of our 2,000 sampled individuals,
𝑒𝑑𝑢𝑖 measures education in years, 𝑓𝑒𝑚𝑖 is a gender dummy variable with a value of 1 if the
individual i is a female, and size is a variable measuring the number of workers working in
the company.
̂
log (𝑤𝑖 ) = 1.6 − 0.26𝑓𝑒𝑚𝑖 + 0.18log(𝑠𝑖𝑧𝑒𝑖 ) + 0.05𝑒𝑑𝑢𝑖 − 0.16𝑓𝑒𝑚𝑖 ∗ log(𝑠𝑖𝑧𝑒𝑖 )
c- Do small companies discriminate against women more or less than larger firms? Is
the discrimination statistically significant?
20 The following model is regressed using data in quarterly form from 1990 to 2005 (64
observations) for Malaysian stock prices against output knowing that there was an economic
crisis in 1997.
𝑌𝑡 = 𝛽0 + 𝛽1 𝑋𝑡 + 𝑈𝑡
The first regression using all the data produced a SSR of 0.56. Then, two regressions were
run. The first one on a subsample of the data from 1990-1997, giving a SSR of 0.23. The
second one was on the simple from 1998 to 2005, producing a SSR of 0.17. Test whether the
crisis in 1997 produced a significant shock in the behavior of Malaysian stock prices.
62
ECONOMETRICS: Problem Sets
Such that 𝑦𝑖 measures annual expenditure on beer in dollars for each of our 34 sampled
individuals, 𝑥𝑖 measures individual annual income in thousand dollars and 𝐷𝑖 is a dummy
variable with a value of 1 if the individual i is a female and 0 if the individual is a male.
a- What will be the difference in consumption between a male and a female with the same
annual income?
b- Test at 1% level the following: there are no differences in beer consumption across
gender.
c- Test at 5% level the following: there are no differences in the marginal propensity to
consume beer respect to income across gender.
X1 X2 X3 X4 X5 X6 X7
Estimated 1.417 2.162 0.868 1.0845 0.4694 0.0038 0.0484
coeff.
Std.Error 0.4568 0.3287 0.4393 0.3766 0.0628 0.0094 0.0776
Where:
X1= dummy variable taking the value 1 if the 3-week course was taken and 0 if the 14-week
course was taken.
X3= dummy variable taking the value 0 or 1, depending on which of two teachers had taught
the course.
X4= dummy variable taking the value 1 if the student is a male and 0 if female.
X5= score on a standardised test of understanding mathematics before taking the course.
Knowing that the value of the determination coefficient is 0.344, answer the following
questions:
23 We have obtained the following estimated model in a study carried out for 100
multinationals firms:
Where E is the number of employees (,00 employees), T has a value of 1 if the company
applies the last technological improvements and 0 otherwise, C has a value of 1 if there are
competitors located within 50 km distance and 0 otherwise and F has a value of 1 if there is
a complementary company located within 50 km distance and 0 otherwise. Explain whether
the following statements are true or false:
24 We have a housing price model with the following variables: price (house prices), sqrft
(house size), bdrms (number of bedrooms) and colonial (dummy variable equal to one if the
house is of the colonial style. The estimation results are the following (sample size is 88
houses):
25 The following stock price model was regressed using monthly data from 1980m1 to
1989m12:
𝑠𝑡 = 𝛽0 + 𝛽1 𝑦𝑡 + 𝑢𝑡
64
ECONOMETRICS: Problem Sets
It is believed there is a structural break at 1987m11, following a stock market crash. The
regression using all the data produced a SSR of 0.97. Then two further regressions were run
from 1980m1 to 1987m11, which produced a SSR of 0.58 and another regression from
1987m12 to 1989m12 produced a SSR of 0.32.
a- Do you think the stock market crash at 1987m11 was statistically significant?
b- Why are structural breaks a problem for financial econometrics? Give examples of
some recent structural breaks.
Such that 𝑦𝑖 measures profits in thousand dollars for each of our 55 sampled companies, 𝑥𝑖
measures production costs in thousand dollars and 𝐷𝑖 is a dummy variable with a value of 1
if the company i is a manufacturing firm and 0 if the company is a services firm.
27 Let´s consider the following regression model using a sample of annual data from
1970 until 2001 (both included) for the Castilla-León economy:
Such that 𝑦𝑡 are annual regional exports, 𝑥𝑡 is the annual exchange rate (pts/$) and 𝐷𝑡 is a
dummy variable equals to 1 if 𝑡 ≤ 1985 and equals to 0 if 𝑡 > 1985 (Spain being a
European Union member).
65
ECONOMETRICS: Problem Sets
28 The following model was estimated to examine the short run interest rate:
Such that 𝑥𝑡 is the interest rate for the Treasury bills with a maturity of 90 days and 𝐷𝑖𝑡 are
seasonal dummy variables where 𝑖 corresponds to the first, second and third year quarter
respectively.
29 The following wage equations have been estimated using data on workers from
Vietnam:
̂
log(𝑠𝑎𝑙𝑎𝑟𝑦) = 1.25 + 0.15𝑔𝑒𝑛𝑑𝑒𝑟 + 0.02𝑒𝑥𝑝
(0.35) (0.03) (0.004)
̂
log(𝑠𝑎𝑙𝑎𝑟𝑦) = 1.55 + 0.10𝑔𝑒𝑛𝑑𝑒𝑟 + 0.015𝑒𝑥𝑝 − 0.0005𝑔𝑒𝑛𝑑𝑒𝑟 ∗ 𝑒𝑥𝑝
(0.48) (0.05) (0.005) (0.002)
Where salary is measured in US dollars and gender is a dummy variable taking the value of 1
if the worker is a male and 0 if the worker is a female, exp measures the years of work
experience.
a- Why the coefficients associated to gender and experience are lower in the second
than in the first model?
b- What is the estimated average difference between a man´s salary with 5 years work
experience and that of a woman´s with 10 years work experience according to the
first model?
c- What is the estimated average difference between a man´s salary with 5 years work
experience and that of a woman´s with 10 years work experience according to the
second model?
d- Test that the salary difference between men and women does not depend on
experience.
30 To see whether people living in urban areas spend more on fish than people living
in rural areas, we get the following estimation results:
66
ECONOMETRICS: Problem Sets
Where the dependent variable is expenditure in fish (with log), income is disposable income
(with log), gender is a gender dummy with 1 if male and 0 if female and urban is another
dummy which takes the value 1 if person lives in an urban area. Please, answer the
following three questions:
a- Interpret the above estimations results (only the value of the OLS coefficient for each
of the explanatory variables).
b- Is the variable gender individually significant to explain the behavior of fish
expenditures (at 1% significance level)? Explain.
c- Is the model globally significant at 1% significance level? Explain.
Variable Description
regq Quantity demanded regular apples, lbs
ecoq Quantity demanded Eco labeled apples, lbs
regp Price of regular apples, pounds
ecop Price of Eco labeled apples, pounds
educ Years of schooling
age Age in years
hhsize Household size
faminc Family income, thousands
male =1 if the individual is a male
67
ECONOMETRICS: Problem Sets
Three different models have been estimated using ecoq as dependent variable. The results
are presented next. (Standard errors in parenthesis)
a- Interpret the coefficients on the price variables from Model 1 and comment on their
signs and magnitudes. Are regular apples and eco-labeled apples substitute goods?
b- Report the individual t-tests from Model 1. At the individual level, are the price
variables statistically significant?
c- Is there a gender difference in the quantity demand for eco-labeled apples? If so, is
the difference statistically significant? Justify your answer.
d- Compare the goodness of fit between Model 1 and Model 2.
e- Explain with your own words how would you extend Model 3 to allow a different
effect of education on apples’ consumption by gender.
f- Model 3 adds the variables faminc, hhsize, educ and age to the regression from part
(b). Test whether these four variables are jointly significant.
68
ECONOMETRICS: Problem Sets
32 Gathering data for Michigan manufacturing firms in 2010, we obtain the following
estimation results using a log transformation:
Such that, the dependent variable is hours of training per employee, the variable sales
represents annual sales, employees is the number of employees and grant variable is a dummy
equals to one if the firm received a job training grant for 2010 and zero otherwise. Please,
answer the following three questions:
a- Interpret the above estimations results (only the value of the OLS coefficient for each
of the explanatory variables).
b- Is the variable grant individually significant to explain the dependent variable (at 1%
significance level)? Explain.
c- Is the model globally significant at 1% significance level? Explain.
Variable Description
cigs Average number of cigarettes smoked per day
cigprice State cigarette price, cents per pack
educ Years of schooling
age Age in years
income Annual income, dollars
white =1 if the individual is white
restaurn =1 if state restaurant smoking restrictions
69
ECONOMETRICS: Problem Sets
Three different models have been estimated using cigs as dependent variable. The results are
presented next. (Notes: Standard errors in parenthesis; l_ stands for natural logarithm)
a- Interpret the coefficients on the variables from Model 1 and comment on their signs
and magnitudes. Is the income effect statistically significant?
b- Interpret the coefficient on the variable restaurn.
c- Is there a race difference in the quantity demanded for cigarettes? If so, is the
difference statistically significant? Justify your answer.
d- Explain with your own words how you would extend Model 3 to allow a different
effect of education on smoking habits by race.
e- Model 3 adds the variables age and educ to the regression from part (b). Test
whether these two variables are jointly significant.
34 An insurance company finds that the probability of having a home insurance or not
can be described by the following linear relationship:
Knowing that inc denotes annual income (in thousand Euros) of the individual and age the
age of the individual (in years):
a- Find the probability of having a home insurance for an individual with 400,000 Euros
income and being 30 years old.
70
ECONOMETRICS: Problem Sets
b- Find the increment in the probability of having a home insurance if the individual´s
income increases in 20,000 Euros.
a- Find the probability of having home ownership for a female individual with 80,000
Euros income, being 46 years old and having a job.
b- What is the difference in the probability of having home ownership between a female
individual and a male individual with the same characteristics.
36 In 1985, neither Florida nor Georgia had laws banning open alcohol containers in
vehicle compartments. By 1990, Florida had passed such a law, but Georgia had not.
a- Suppose you can collect random samples of the driving age population in both states,
for 1985 and 1990. Let arrest be a binary variable equal to unity if a person was
arrested for drunk driving during the year. Without controlling for any factors, specify
a linear probability model that allows you to test whether the open container law
reduced the probability of being arrested for drunk driving. Which coeffcient in your
model measures the e_ect of the law?
b- Why might you want to control for other factors in the model? What other factors
might you want to include? Explain your answer.
37 Suppose that you want to explain the behavior of a binary variable (approve) which is
equal to one if a mortgage loan to an individual was approved. The key explanatory variable
is white, a dummy variable equal to one if the applicant was White. The other applicants in
the dataset are black and hispanic. To test for discrimination in the mortgage loan market, a
linear probability model can be used:
a- If there is discrimination against minorities, and the appropriate factors have been
controlled for, what is the sign of 𝛽1? Explain your answer.
b- Regressing approve against white we obtain the following estimation results:
71
ECONOMETRICS: Problem Sets
𝑎𝑝𝑝𝑟𝑜𝑣𝑒
̂ 𝑖 = 0.707 + 0.201𝑤ℎ𝑖𝑡𝑒𝑖
(0.0182) (0.0198)
2
𝑛 = 1,989 𝑅 = 0.048
c- Interpret the new beta one coefficient. What happens to the coefficient on white
variable? Is there still evidence of discrimination against non-whites? Explain your
answer.
d- Justify whether the following statement is true or false: “all the fitted values of the
coefficients for the rest of variables in the second model are strictly between zero
and one”
72
ECONOMETRICS: Problem Sets
PS5
Estimation
Problems
COURSE CONTENT
73
ECONOMETRICS: Problem Sets
a- The first empirical studies aimed at measuring the impact of class size on education
performance were based on data comparing the grades in comprehensive tests
achieved by students from different schools and different class sizes. If we aimed at
measuring the relationship between class size and academic performance with such
data, could we infer that size has a causal effect on performance? Justify.
b- The presence of more policemen to fight crime is a matter of controversy. Suppose
that we have data for all the capital cities in France about crime incidence per 10,000
inhabitants and number of police units per 10,000 inhabitants. With such data, could
we obtain the causal effect of police surveillance on crime incidence? Explain.
c- Suppose that there is a positive and strong correlation between the amount of
children´s books within a home and the academic performance of the children at that
home. Could you say that the number of children´s book at home has a positive
causal effect on the academic performance of children at such home. Justify.
2 Suppose you are interested in estimating the effect of hours spent in a SAT
preparation course (hours) on total SAT score (sat). The population is all college-bound high
school seniors for a particular year.
a- Suppose you are given a grant to run a controlled experiment. Explain how you would
structure the experiment in order to estimate the causal effect of hours on sat.
b- Consider the more realistic case where students choose how much time to in a
preparation course, and you can only randomly simple sat and hours from the
population. Write the population model as:
𝑠𝑎𝑡𝑖 = 𝛽0 + 𝛽1 ℎ𝑜𝑢𝑟𝑠𝑖 + 𝑢𝑖
List, at least, two factors contained in the random perturbance term. Are these likely
to have positive or negative correlation with hours? Explain.
3 The following equation describes the number of hours of television watched per
week by a child as a function of his age, his education, his mother´s education, his father´s
education and the number of siblings:
We suspect the dependent variable contains a certain error of measurement. Explain the
consequences in your estimation results.
74
ECONOMETRICS: Problem Sets
X: Family income.
P: Price index.
Two different regressions are estimated with the following estimation results (standard errors
are in brackets and sample size is 500):
Find and discuss the specifation error the first model is suffering. Explain it using the estimation
results of the above table.
a- Find the assumption that does not hold in this model and explain why.
b- How would you rewrite the model in order to solve the problem?
75
ECONOMETRICS: Problem Sets
6 We have estimated a SLRM explaining office rental prices in the city of Madrid (Y)
with the information contained in distance to the city center (X). The following two graphs:
Figure 1(Y versus X) and Figure 2 (residuals versus fitted values of Y) are related to the above
model.
Figure 1 Figure 2
a- Discus according to the two graphs if the model may suffer a non-linearity problem
b- Provide an economic reason explaining the possible non-linearity in the above
relationship.
c- How should Figure 2 be if the relationship between office rental prices and distance
was a linear relationship?
7 The following table shows two different samples with two explanatory variables each
of them in order to study the behavior of Y (dependent variable):
Sample 1 Sample 2
Observation Y X1 X2 Z1 Z2
1 1 2 4 2 4
2 4 6 12 6 12
3 2 4 11 4 8
8 Consider the regression of country level GDP per capita on percentage urban
population in several countries (1995) obtaining a determination coefficient of 0.457 and
obtaining the following graph when plotting the data (Figure 1):
a- Can you detect a non-linear relationship between the two variables? Why?
b- Can you explain solutions to be implemented in order to solve the non-linearity
problem?
Suppose now that we estimate the same model but using a semilog transformation obtaining
the following estimation results:
̂
log(𝐺𝐷𝑃𝑝𝑐) 2
𝑖 = 4.631 + 0.052𝑢𝑟𝑏𝑎𝑛𝑖 𝑅 = 0.549
c- Compare the determination coefficients and the graphs between the two models. Do
you think the semilog transformation might be a good solution for the nonlinearity
problem? Explain your answer.
77
ECONOMETRICS: Problem Sets
9 We have data for a sample of high schools in Vietnam where the variable math
denotes the percentage of students who passed a math test. We want to estimate the effect
that spending per student has on the outcomes of this test and propose the following model:
Where poverty describes the percentage of students living below the poverty line, spend denotes
spending per student and enroll is the number of students enrolled in the high school.
a- We do not have data for poverty variable but the variable lnchprg describes the
percentage of students eligible for a programme subsidising school lunches. Why is
this variable a sensible proxy variable for poverty?
b- The table below shows the OLS estimates with and without the inclusion of lnchprg
as an explanatory variable:
Explain why the effect of spending and enrol are greater in the first model than in
the second one? What about if we compare standard errors between the two models?
c- What conclusions can you derive when comparing both models?
78
ECONOMETRICS: Problem Sets
VARIABLE DESCRIPTION
NAME
a- In order to avoid specification errors, which variables would you keep in your analysis
according to practical significance? Justify your choices.
b- Explain, the process you would follow in order to specify your final model and to
choose the final variables in your model.
c- Explain the difference between practical and statistical significance.
11 We have the following information for the annual growth rates (%) in different
countries about stock prices (Y) and in consumer prices (X):
79
ECONOMETRICS: Problem Sets
12 Imagine that you are interested in analyzing the determinants of infant mortality rates
worldwide. Using the Development Reports from the World Bank in 2013, you get the
following information for 248 countries:
IMR Infant Mortality rate - is the number of deaths of infants per 1,000 live births.
GDP GDP per capita (constant 2005 US$)
Source: World Bank Development Reports, 2013.
80
ECONOMETRICS: Problem Sets
a- Have a look at the graph above, why Angola and Guinea might be considered as
outliers in this regression model? Comment on the implications of the inclusion of
these two countries in the analysis.
b- Angola presents one of the highest infant mortality rates in this sample (103 per 1,000
live births). Compute the residual for this country given that our model predicts for
Angola an infant mortality rate of 28.6 per 1,000 live births.
c- Knowing that the standard deviation of the estimation residuals (using all the
observations) is 26.22, is Angola a significant outlier?
d- What about Guinea? Note that the estimation residual associated to Guinea
observation is 52.
13 We have representative data for 30 years old for the US. Levine, Gustafson and
Velenchik (1997) estimated a wage equation using the following variables:
Y = log(wage)
ED = years of education
81
ECONOMETRICS: Problem Sets
(se=0.031)
(se=0.021) (se=0.0004)
Compare the two fitted models and explain what happens when we omit one relevant variable
(in this case, years of education).
Such that the dependent variable is the ratio of trade taxes (imports and export taxes) to total
government revenues, the first explanatory variable is the ratio of exports plus imports to
GDP and the second explanatory variable is GDP per capita. We estimate this model using
OLS and obtain the residuals of the above regression. Then we do the following auxiliary
regression:
Knowing that the determination coefficient for the auxiliary equation is 0.1148. Could you
compute the White statistic? What is your conclusion about heteroscedasticity in your regression
model?
82
ECONOMETRICS: Problem Sets
Country M G Country M G
A researcher estimates a regression using the above data and obtains that:
̂ = 74.2 + 0.27𝐺
𝑀 𝑅 2 = 0.6
(128.1) (0.05)
a- Draw a scatter plot using M and G in each of the axes and explain why the researcher
should expect that there is a problem of heteroscedasticity.
b- Explain the consequences of heteroscedasticity on the properties of the estimated
coefficients.
Due to the fact that the previous model has a heteroscedasticity problem, the researcher
performs the following two regressions:
̂
𝑀
= 0.32 − 39.4𝑍 𝑅 2 = 0.23
𝐺
̂ = −1.66 + 1.05𝑙𝑜𝑔𝐺
𝑙𝑜𝑔𝑀 𝑅 2 = 0.84
c- Knowing that the determination coefficient for the auxiliary equation in the first
model is 0.25 and in the second one 0.61, which solution is solving the
heteroscedasticity problem? Work at 1% significance level.
83
ECONOMETRICS: Problem Sets
16 Explain the estimation problem that can be found in the following graph (predicted
values of the dependent variable versus estimation residuals):
log(𝑌𝑖 ) = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝑢𝑖
where Y = ratio of trade taxes (import and export taxes) to total government revenue, X1 =
ratio of the sum of exports plus imports to GNP, and X2 = GNP per capita.
a- Write the theoretical specification of the auxiliary regression given the above model.
b- Test for heteroscedasticity at 5% significance level.
84
ECONOMETRICS: Problem Sets
a- Look at the following three graphs and explain why we should expect
heteroscedasticity in the above regression model.
𝑒̂2 2
𝑖 = 695,1942 + 1,349.7𝑠𝑎𝑙𝑒𝑠𝑖 − 19,656.9𝑝𝑟𝑜𝑓𝑖𝑡𝑠𝑖 − 0.0027𝑠𝑎𝑙𝑒𝑠𝑖
−0.1163𝑝𝑟𝑜𝑓𝑖𝑡𝑠𝑖2 + 0.0501𝑠𝑎𝑙𝑒𝑠𝑖 ∗ 𝑝𝑟𝑜𝑓𝑖𝑡𝑠𝑖 𝑅𝑠2 = 0.889
19 We want to estimate a demand function for daily cigarette consumption. Since most
people do not smoke, the dependent variable, cigs, is zero for most observations. The
equation to be estimated uses the following explanatory variables: income (annual income in
85
ECONOMETRICS: Problem Sets
Dollars), cigprc (state cigarette price cents per pack), educ (years of schooling), age (in years)
and restaurn (dummy equal to one if there is state restaurant smoking restrictions). Using a
simple of 807 individuals we obtain the following estimation results:
𝑐𝑖𝑔𝑠
̂𝑖 = 0.375 + 0.00005𝑖𝑛𝑐𝑜𝑚𝑒𝑖 + 0.00053𝑐𝑖𝑔𝑝𝑟𝑐𝑖 − 0.494𝑒𝑑𝑢𝑐𝑖 + 0.784𝑎𝑔𝑒𝑖
−0.0091𝑎𝑔𝑒𝑖2 − 2.845𝑟𝑒𝑠𝑡𝑎𝑢𝑟𝑛𝑖
(0.0017) (1.112)
𝑛 = 807 𝑅 2 = 0.052
We plot actual (reds) and fitted (blues) values of cigs variable by observation number
obtaining the following graph:
86
ECONOMETRICS: Problem Sets
d- Looking at the actual values, do you think this model is linear? Why?
e- Some of the fitted values are negative values. Do you think this is realistic?
f- Do the errors underlying the above equation contain heteroscedasticity? Test for
heteroscedasticity at 1% significance level knowing that the determination coefficient
of the auxiliary regression is equal to 0.0649. Show both the F-test and the Chi-
squared tests. Are your two results consistent?
87
ECONOMETRICS: Problem Sets
a- Explain the main differences between the trend and the irregular component in a
time series.
b- Explain the main differences between a cyclical and a seasonal component in a time
series.
c- Explain an econometric tool to take into account seasonal components in a time
series.
d- Explain one possible econometric tool to detect irregular components in a time series.
e- Explain how the trend component can be identified in a SLRM.
22 The general fertility rate (gfr) is the number of children born to every 1,000 women
of childbearing age. For years 1913 through 1984, the equation:
explains gfr in term of the average real dollar value of the personal tax exemption (pe) and
two dummy variables. The variable ww2 takes on the value unity during the years 1941
through 1945, when the United States was involved in World War II. The variable pill is
unity from 1963 on, when the birth control pill was made available for contraception. Using
a dataset, the following estimation results were obtained:
88
ECONOMETRICS: Problem Sets
89
ECONOMETRICS: Problem Sets
𝑦𝑡 = 𝛽0 + 𝛽1 𝑥𝑡 + 𝑢𝑡
He is asked to estimate 𝛽1. He does not know that the true value of 𝛽1 is 5, and he performs
the following experiments:
90
ECONOMETRICS: Problem Sets
1) He uses first OLS and estimate 𝛽1 obtaining the following results: 𝛽̂1=4.64;
se(𝛽̂1)=1.30.
2) Next, he is told that the random term follows a first order autoregressive model such
that:
𝑒̂𝑡 = 0.7𝑒𝑡−1 + 𝜀𝑡
And he performs the regression using 𝑦𝑡∗ as the dependent variables and 𝑥𝑡∗ as the
explanatory variable, obtaining the following results: 𝛽̂1=5.14; se(𝛽̂1)=0.75.
Nine different students (so-called B, C, D…J) perform the same two experiments but with
different random terms. The results are shown in the following table:
a- Compare and explain why students should not be satisfied with the results obtained in
the first experiment.
b- Explain why students should be satisfied with the results in the second experiment
when they were told that:
𝑒̂𝑡 = 0.7𝑒𝑡−1 + 𝜀𝑡
91
ECONOMETRICS: Problem Sets
c- Test for autocorrelation at 1% significance level knowing that the standard error
associated to the 0.7 coefficient of the AR(1) process is equal to 0.11. Explain the
type of autocorrelation the model is suffering.
𝑒̂𝑡 = −0.88𝑒𝑡−1 + 𝜀𝑡
(0.22)
92
ECONOMETRICS: Problem Sets
Test for autocorrelation at 5% significance level and explain your answer and the type
of autocorrelation the model is suffering.
Where M1 is the narrow money supply, GDP is real GDP, RS is the interest rate and PR is
the rate of inflation.
The model was estimated using quarterly data for the United states over the period 1952:1-
1992:4 (T=163 observations) and using two different specifications: Model 1 assumes the
model suffers autocorrelation AR(1) and Model 2 assumes the structure in the error terms
following AR(2) process. The following table shows estimation results:
OLS Results
Note that the dependent variable in Model 1 is log(M1) and the dependent variable in Models
2 and 3 is the estimation residuals of Model 1.
93
ECONOMETRICS: Problem Sets
Knowing that the error term follows a second order autoregressive structure:
28 Explain the following two graphs: (a) upper two graphs and (b) lower two graphs in
terms of autocorrelation.
94
ECONOMETRICS: Problem Sets
29 We want to analyze quarterly new car sales on price, income, unemployment and
population over 64 quarters and we obtain the following estimation results:
𝑇 = 64 𝑅 2 = 0.441
However, you are told this model is naïve since you can allow for serial correlation of fourth
order. Please, answer the following two questions:
a- Successive error terms derived from the application of regression analysis to time
series data are correlated.
b- There is a high degree of correlation between two or more of the independent
variables included in a multiple regression model.
c- The dependent variable is highly correlated with the independent variable(s) in a
regression analysis.
d- The application of a multiple regression model yields estimates that are nonlinear in
form.
B- A situation in which measures of two or more variables are statistically related but
are not in fact causally linked because the statistical relationship is caused by a third
omitted variable is called:
a- Partial correlation
b- Linear correlation
c- Spurious correlation
d- Marginal correlation
95
ECONOMETRICS: Problem Sets
C- Step-wise regression is the most widely used search procedure of developing the
……….. regression model without examining all possible models.
a- worst
b- best
c- medium
d- least
a- The dependent variable is highly correlated with the explanatory variables included
in the regression model.
b- There is a high degree of correlation between the explanatory variables included in a
multiple regression model.
c- The application of a multiple regression model yields estimates that are nonlinear
form.
d- None of the above.
G- If your dataset has heteroscedasticity, but you completely ignore the problem and
use OLS, you will
96
ECONOMETRICS: Problem Sets
b- Get parameter standard errors that could be either too large or too small.
c- Get t-statistics that make you too optimistic about your parameters being statistically
different from zero.
d- Get t-statistics that make you too pessimistic about your parameters being statistically
different from zero.
97