Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Week1 Lecture1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

Econometrics 1 (6012B0374Y)

dr. Artūras Juodis

University of Amsterdam

Week 1. Lecture 1

February, 2024

1 / 65
Overview

1 Course structure 4 Empirical illustration


Schedule 5 Using OLS for analysis
Topics OLS useful properties
2 What is Econometrics? Model fit
Definitions and distinctions The R 2
3 Linear regression model The role of a constant term
The setting (self-study)
The OLS estimator 6 Summary

2 / 65
1. Course structure

3 / 65
1.1. Schedule

4 / 65
Weekly schedule

Every week 8 contact hours:


I Monday. Lecture 1.
I (Tuesday/Wednesday/Thursday) Tutorial and Computer Lab.
I Friday. Lecture 2.

5 / 65
Final grade

Your final grade is a weighted average of:


I Final exam (3h closed book). 75%.
I Assignment 1. (In groups of 4). 13%.
I Assignment 2. (In groups of 4). 12%.
Comments:
I You need to get at least 5.5 on the final exam to pass the course.
I The weighted average must be at least 5.5 to pass the course.
I Assignment grades expire at the end of the academic year.
I Same assignment groups for both assignments. Self-enrol before the
first tutorial.

6 / 65
The team

I Peter Foldvari (Tutorials + Computer Labs).


I Arturas Juodis (Lectures + Tutorials). Coordinator of the course.
I Rutger Poldermans (Tutorials + Computer Labs).
I Erik van der Sluis (Computer Labs).
For information regarding the course (and especially lectures) please contact
me (Arturas Juodis). For tutorials and pc labs - contact your corresponding
tutor.

7 / 65
Software

As long as software is concerned the course is divided into two blocks:


1 Block 1. Weeks 1-3. STATA is used for mostly empirical tasks. In
Week 1 you get introduction for STATA.
2 Block 2. Weeks 4-6. R is used for mostly simulation and programming
based tasks. If you are not experienced with R, you have 3 weeks
(before Week 4) to catch up.
STATA license for students. Available on Canvas. R is free of charge.

8 / 65
Our expectations and suggestions

I Respectful communication with the teaching staff.


I Treat fellow students with courtesy, especially your group mates for the
assignments.
I Attend lectures, and check the material before each lecture.If you
have questions, please ask.
I Lectures, Tutorials, and PC-labs are complementary to the book and
not replacing it.
Experiences from the past show that being fully prepared for the lectures,
tutorials and computer labs, considerably improves the passing rate as well
as learning satisfaction from this course.

9 / 65
1.2. Topics

10 / 65
Material

Compulsory course material for the exam:


1 (The5) Heij, C., Boer, P. de, Franses, P.H., Kloek, T. and Dijk, H.K.
van (2004): Econometric Methods with Applications in Business and
Economics. Oxford University Press.
2 Lecture material (slides).
3 Material from Assignments 1 and 2.
Additional (more up-to-date reading) with many empirical illustrations:
1 (BK) G. Bekes and G. Kezdi (2021): Data Analysis For Business
Economics, and Policy. Cambridge University Press.
2 (SW) Stock and M.Watson (2019, 4th Edition): Introduction to
Econometrics. Pearson Education Limited.

11 / 65
Exam material. Advice.

I Just by reading my slides, you will not learn all the material necessary
for exam.
I During the lectures we will not cover all topics and details
discussed in the book.
I During the lectures (sometimes) we will not cover all topics and
details discussed on the slides.
I Reading the book is essential to prepare for the exam.
I Every week you will get a list of sections from The5 that is essential for
understanding the material. This list will also overlap with the exam
material.

12 / 65
Week-by-week topics

I Week 1. Introduction to Econometrics with one regressor.


I Week 2. Multiple regression analysis. Matrix notation.
I Week 3. Regression analysis and OLS estimation.
I Week 4. (Large Sample) statistical properties of OLS estimator.
Multiple hypotheses testing.
I Week 5. Heteroscedasticity and regression functional form analysis.
I Week 6. Model selection and evaluation. Review.

13 / 65
Course objectives

At the end of this course, students will be able to:


I Specify the linear regression model with standard assumptions;
I Provide a statistical and economic interpretation of the results of an
econometric analysis;
I Give the model specification in case of nonlinear, interaction or
categorical effects and apply tests for such effects.
I Explain the economic interpretation of different model formulations;
I Formulate hypotheses on real-life data and set up a simple
investigation to verify these hypotheses;
I Use matrix notation for model formulation and LS estimation;
I Analyse the quality of the LS estimators and the related t and F-tests.
Adjust the LS method in case of invalid classical model assumptions
(e.g. heteroscedasticity).

14 / 65
Prior Knowledge

In order to archive those objectives, we expect your knowledge of:


I Probability Theory and Statistics 1
I Probability Theory and Statistics 2
I Probability Theory and Statistics 3*
I Mathematics 2: Linear Algebra
I Mathematics 3: Advanced Linear Algebra and Real Analysis
I Introduction Econometrics and Actuarial Science*
Without solid understanding of Probability Theory, Statistics, and Advanced
Linear Algebra, this course will be difficult to master. Without solid intuitive
understanding of economic problems, this course will be also difficult to
master.

15 / 65
2. What is Econometrics?

16 / 65
2.1. Definitions and distinctions

17 / 65
One of the thoughts on the definition

Ragnar Frisch (1895-1973) in the editorial for the first issue of Econometrica
in 1933: ... econometrics is by no means the same as economic statistics.
Nor is it identical with what we call general economic theory, although a
considerable portion of this theory has a definitely quantitative character.
Nor should econometrics be taken as synonymous with the application of
mathematics to economics.... It is the unification of all three that is
powerful. And it is this unification that constitutes econometrics.

18 / 65
Types of data
Up until very recent times, econometricians were mostly working with
structured, low-dimensional datasets (especially in academia). This is
changing gradually (as you have learned from the Statistical/Machine
Learning course).

The types of data we mostly encounter in econometrics:


I Cross-section (this course). Typical example: many individuals with
characteristics observed at some point in time (or at some age). Think
of individual wages/incomes in 2023.
I Time-Series (course in the 3rd year). Typical example: GDP, inflation,
interest rate for some country (e.g. the Netherlands) observed annually
or quarterly over 50-60 years.
I Panel data (course in the 3rd year). Typical example: income of
individuals observed over multiple years or profit of firms observed over
multiple years.
The material being covered in this course is also useful for more general
types of data. Linear regression models serve as a foundation for more
complex models you will encounter.
19 / 65
Nobel Memorial Prize in Economic Sciences for
Econometricians

I 1969. Ragnar Frisch and Jan Tinbergen.


I 1989. Trygve Haavelmo.
I 2000. James Heckman and Daniel McFadden.
I 2003. Robert Engle and Clive Granger.
I 2011. Chris Sims.
I 2013. Lars Peter Hansen.
I 2021. Joshua Angrist and Guido Imbens.
Some of them got the prize what one would call a structural analysis, while
other got for one would call a reduced form analysis. Some of them are
more of micro-econometricians, while others are macro-econometricians.

These distinctions were not a thing back in 1933.

20 / 65
Figure: The two guys that make many Dutch econometricians proud.

(b) Guido Imbens (1963-). Economist of


(a) Jan Tinbergen (1903-1994).
Dutch origin.

21 / 65
3. Linear regression model

22 / 65
3.1. The setting

23 / 65
The problem

We want to establish/investigate the relationship between certain target


variable yi and a set of K explanatory variables (x1,i , . . . , xK ,i ). Ideally we
want to investigate if these relationships:
I Are Causal, i.e. we can claim that different values of e.g. x1,i causes yi
to be larger/smaller.
I Are useful for Counterfactual analysis, i.e. what would have been the
value of yi if we were to observe someone with certain values of
(x1,i , . . . , xK ,i ). In most cases, these are hypothetical scenarios were no
such combinations exist in the data.
I Are Explainable, i.e. if we make prediction based on the proposed
model, do we understand why adding one new variable changes the
predictions?
The final aspect is generally easily satisfied for low-dimensional models
studied in this course. The other two aspects, on the other hand, can be
more difficult to justify.

24 / 65
The model

In this course we will consider the linear models in parameters (or linear
regression models) of the form:
K
X
yi = α + βk xk,i + εi , i = 1, . . . , n. (1)
k=1

Here n is the total size of our sample. Essentially we try to explain yi (that
is observed) using observed quantities (x1,i , . . . , xK ,i ), and we leave
ourselves some room for error so that we cannot perfectly predict/explain
yi . εi is the error term that captures the variation in yi that cannot be
linearly explained using observed explanatory variables.

25 / 65
Interpretation of the coefficients

In the linear model of the form:


K
X
yi = α + βk xk,i + εi , i = 1, . . . , n. (2)
k=1

The coefficient βk under ideal circumstances (more on that later in this


course) can be interpreted as the partial/marginal effect:

∂ E[yi |(x1,i , . . . , xK ,i )]
= βk , k = 1, . . . , K . (3)
∂xk,i

Hence, βk measure the effect of the marginal change in xk,i on the


conditional expectation of yi (given all regressors).

26 / 65
Notation/Language/Jargon

Depending on the textbook considered (and the age that textbook was
written) different terminology is used for yi and (x1,i , . . . , xK ,i ):
I yi is usually referred to as the LHS (left-hand side) variable, or the
dependent variable, or regressand, or the explained variable. More
generally (from statistical learning, machine learning) point of view it is
the target variable.
I (x1,i , . . . , xK ,i ) are usually referred to as the RHS (right-hand side)
variables, or the independent variables, or covariates, or explanatory, or
regressors. In the machine learning terminology these are the features.
In most cases, I will use the LHS/RHS terminology or the
dependent/regressor notation. I will avoid the notion of independent
variables as it is too confusing.

27 / 65
3.2. The OLS estimator

28 / 65
Simple model

Consider the simple generic linear model with a single regressor xi

yi = α + βxi + εi , i = 1, . . . , n. (4)

We assume that econometrician has access to a random sample


{(yi , xi )}ni=1 , i.e. all units are independently and identically distributed. We
are interested to find (α, β) such that our model has some desired features.

How do obtain good values (α, β) if we have data for {(yi , xi )}ni=1 ?

29 / 65
Ordinary Least-squares (OLS) objective function

The two-centuries old approach suggests that parameters (α, β) can be


obtained by minimizing the Least-squares objective function (or by
minimizing the sum of squared errors):
n
X
(b
α, β)
b = arg min (yi − α − βxi )2 . (5)
α,β
i=1

Where minimization is done over the interior of some parameter space


without any constraints. Let us alternatively express the above problem as:

(b
α, β)
b = arg min LSn (α, β). (6)
α,β

30 / 65
Derivatives
We look at the first partial derivatives of that objective function:
n
∂LSn (α, β) X
= −2 (yi − α − βxi )
∂α
i=1
n
∂LSn (α, β) X
= −2 xi (yi − α − βxi ).
∂β
i=1

Hence, the minimizer of the objective function (b


α, β)
b should be the zero of
the above set of equations (two equations with two unknowns), i.e.:
n
X
(yi − α
b − βx
b i ) = 0,
i=1
n
X
xi (yi − α
b − βx
b i ) = 0.
i=1

31 / 65
Derivatives

b = y − βx.
The first equality implies that α b We can plug-in this expression
into the second equality:
n
X
xi ((yi − y ) − β(x
b i − x)) = 0. (7)
i=1

Such that: P
xi (yi − y )
βb = Pi . (8)
i xi (xi − x)
Or equivalently: P
i (x − x)(yi − y )
βb = Pi 2
. (9)
i (xi − x)
This basic equivalence will be proved during the tutorial.

32 / 65
Special case. Binary regressor
Consider the special case Pwhere xi = Di is binary variable. Assume that
n = n0 + n1 , where n1 = i Di (i.e. the number of observations where
Di = 1). In that case:
P
Di (yi − y )
βb = P i
i i (Di − D)
D
n1 y 1 − n1 y
=
n1 − n1 D
y − (n0 /n)y 0 − (n1 /n)y 1
= 1
1 − (n1 /n)
(n − n1 )y 1 − n0 y 0
=
n − n1
= y 1 − y 0.

It is easy to show that α


b = y 0 . Hence, the OLS coefficient βb just measures
the difference between the two sample means.

33 / 65
4. Empirical illustration

34 / 65
The Dataset. Hotels Vienna.

We consider a dataset from the BK book. The hotels-vienna data includes


information on price and features of hotels in Vienna for a single weekday
night in November 2017. The dataset has n = 428 observations.

We want to investigate the relationship between prices (per night) and the
distance of the hotel from the city center of Vienna using the linear model:

yi = α + βxi + εi , i = 1, . . . , n. (10)

We consider two types of variables for xi : i) xi that is the distance itself,


and Di = 1(xi < 2km) (dummy/binary variable).It is customary in
econometrics, to use Di notation for binary variables.

35 / 65
Scatter plot
250

200
price

150

100

50
0 5 10 15 20
distance_km

Figure: Scatter plot

36 / 65
Empirical results. Hotel prices in Vienna. Binary variables.

Figure: Regression of Price on dummy variable Di = 1(distancei < 2km).

We can see that on average hotels within the 2km radius from the city
center are 34.67 EUR per night more expensive than the hotels outside of
that radius.

37 / 65
Double check the algebra

Figure: Sample averages for Di = 0 and Di = 1 categories.

Algebra does not lie!

38 / 65
Empirical results. Hotel prices in Vienna. Continuous
distance.

Figure: Regression of Price on the distance variable.

Conclusions are similar to those based on the previous regression on a binary


variable. Hotels that are further away from the center are cheaper! And the
further you go, they cheaper they are (on average). Do you understand
why?
39 / 65
5. Using OLS for analysis

40 / 65
5.1. OLS useful properties

41 / 65
Some properties of OLS

Consider modified regression on transformed variables yi∗ = ay + by yi and


xi∗ = ax + bx xi , then:
P ∗ ∗ ∗ ∗
c∗ = i (xP i − x )(yi − y ) by b
β ∗ ∗ 2
= β (11)
i (xi − x ) bx

α c∗ x ∗ = ay − by ax βb + by α
c∗ = y ∗ − β b. (12)
bx

Hence, if add constants to all xi and/or yi then βb does not change (location
invariance). If we multiple all xi and/or yi , then βb scales up/down
appropriately.

Estimate α
b absorbs all location changes in xi and yi directly.

42 / 65
Example.

You are interested to measure the effect of the distance from the Schiphol
airport on the amount of noise and pollution. Imagine you have data from
multiple measurement stations where xi is the distance in kilometres, while
yi is the daily noise measurement in dB.

If you now translate xi from kilometres to imperial miles, then the two β
estimates should differ by ≈ 1.61.

Estimates of α should not change. Why?

43 / 65
Empirical results. Hotel prices in Vienna. Continuous
distance in Miles.

Figure: Regression of Price on the distance (measured by miles) variable.

Estimate of constant (b
α) is the same. But the coefficient for distance (β)
b is
≈ 1.61 bigger. Make sure you understand why this is the case.
44 / 65
5.2. Model fit

45 / 65
Residual and fitted value

Consider the following generic decomposition of the data into the fitted
value and the residuals:
yi = ybi + yi − ybi . (13)
| {z }
ebi

Here ybi is the fitted value, and ebi is the residual (i.e. the remainder of yi
that is not explained by ybi ). In our case:

ybi = α
b + xi β.
b (14)

Hence, the fitted value ybi = ybi (b


α, β,
b xi ), is a function of the OLS estimates
(b
α, β) and the corresponding regressor value xi for the same unit i.
b

46 / 65
Residual and fitted value. Visually.

Figure: Figure from Wooldridge Introduction to Econometrics.

47 / 65
Decomposition. Intuitive explanation.

Hence the variable yi is decomposition into:


I the fitted value ybi lies on the line α
b + xi β.
b
I the residual ebi is what remains unexplained once we do that projection.
As it turns out OLS does the two steps such that in finite samples:
X
ybi ebi = 0, (15)
i

i.e. the two sets of quantities have zero sample correlation. Alternatively,
yi }ni=1 and {b
we can say that these two sets of quantities, i.e. {b ei }ni=1 (as
vectors) are orthogonal to each other.

48 / 65
Decomposition. Intuitive explanation.

Next week, we provide the necessary geometrical justification for the above
statements.

Make sure to refresh your knowledge of matrix algebra, vectors, and the
concepts of (orthogonal) projections.

49 / 65
P
Why i ebi ybi = 0?

Next, recall that ebi = yi − αb − βx


b i . So we can expand:
X X
ebi ybi = (yi − α
b − βx
b i )(b
α + βx
b i)
i i
X X

b (yi − α
b − βx
b i ) + βb (yi − α
b − βx
b i )xi .
i i

50 / 65
We know that by the definition of the OLS estimator (so by construction):
n
X
(yi − α
b − βx
b i ) = 0,
i=1
n
X
xi (yi − α
b − βx
b i ) = 0.
i=1

Hence, also: X
ebi ybi = 0. (16)
i

51 / 65
5.3. The R 2

52 / 65
The SST = SSR + SSE the decomposition
We want to understand how well the linear model and the corresponding
OLS estimator helps are able to explain the variation of yi in the data, or
alternatively how well we can fit the data for yi given the data for xi .

Again consider the decomposition:

yi = ybi + yi − ybi . (17)


| {z }
ebi

Observe that by the definition of the OLS estimator:

X n
X
ebi = (yi − α
b − βx
b i) = 0 (18)
i i=1

Hence from here it follows that:


X X
yi = ybi . (19)
i i

Such that y = n−1 yi = n−1


P P
i i ybi
53 / 65
Consider sample Total Sum of Squares (SST) of yi :
X
SST = (yi − y )2 . (20)
i

It can be expanded as:


X X X X
SST = yi + ebi − yb)2 =
(b yi − yb)2 +
(b ei )2 + 2
(b yi − yb). (21)
ebi (b
i i i i

Note that:
X X 1X X
yi − yb) =
ebi (b ebi ybi − ebi ybi = 0 − 0. (22)
n
i i i i

As was already established before.

54 / 65
From here:
X X
SST = yi − yb)2 +
(b ei )2
(b (23)
i i
= SSE + SSR. (24)

So the total sum of squares of yi can be expanded as a sum of the explained


sum of squares (SSE) and the residual sum of squares (SSR).
I The OLS estimator, by construction minimizes SSR.
I Hence, it maximizes SSE for a given choice of regressor xi !

55 / 65
Towards R 2
The SST = SSR + SSE decomposition is useful to describe the amount of
variance of yi that we are able to explain by xi , but it is not a scale-free
measure. In particular, if we multiply yi by 100 then all measures increase
by 1002 . This is inconvenient.

Instead we consider the scaled version:


SST SSE SSR
= + . (25)
SST SST SST
Here for the ratio SSE /SST we usually use the R 2 (the coefficient of
determination) notation, i.e.:

SSE
R2 ≡ ∈ [0; 1]. (26)
SST
Hence, the unexplained (or the residual) part is always given by 1 − R 2 .
Hence, the OLS estimator maximizes the R 2 by construction.

56 / 65
Why do we call it an R 2 ?

Notice that
X
SSE = yi − yb)2
(b
i
X
= (b b i − (b
α + βx b 2
α + βx))
i
X
= βb2 (xi − x)2
i
P 2
( i (xi − x)(yi − y ))
= P 2
.
i (xi − x)

57 / 65
From the above it follows that:
SSE
R2 =
SST
P 2
( (xi − x)(yi − y ))
=P i 2
P 2
i (xi − x) i (yi − y )
P !2
( i (xi − x)(yi − y ))
= pP pP .
( i (xi − x)2 ) ( i (yi − y )2 )

2
The term inside the (·) is the sample correlation between {xi }ni=1 and
{yi }ni=1 . Hence, the name R 2 .

58 / 65
The Vienna hotels example.

Table: The R 2 s for the two model specifications in Lecture 1.

Model R2
Binary 0.21
Continuous 0.08

The model with a single binary variables (that was derived from the
continuous variable). Next week, we will see that the two models can be
combined effectively.

59 / 65
5.4. The role of a constant term (self-study)

60 / 65
Why include constant α in your model?

So far we assumed that the model always includes a constant term (or an
intercept) α, and we construct estimators (b
α, β).
b But why?

Without a constant term: P


xi yi
βb = P i 2
. (27)
i (xi )
This estimator is not without problems.

61 / 65
Why include constant α in your model?
OLS without a constant term has many drawbacks and only one advantage.
Drawbacks:
I It can be no longer related to the correlation coefficient between {xi }
and {yi }.
I It is not invariant transformations x → x + constant and
y → y + constant. So, for example, it matters if measure temperature
in Degress Celsius vs. Kelvins.
I If the true model( more about this concept later) contains a non-zero α
then our estimator βb is biased and inconsistent (more about these
aspects next two weeks).
I R 2 interpretation as the squared correlation coefficients is also lost.
I You loose reference group if your regressor is a binary variable
Di ∈ {0; 1}.
The only benefit is lower variance of the estimator (efficiency) if the true
intercept in your model is exactly 0. So is it worth it? Usually not.

62 / 65
6. Summary

63 / 65
Summary today

In this lecture
I We introduced the course.
I We introduced the meaning of econometrics in the narrow sense.
I We introduced the Ordinary Least Squares estimator for a single
variable.
I We discussed how to measure the Regression fit using R 2 .

64 / 65
The remainder of this week

I Tutorial. You will re-fresh your knowledge on OLS algebra with one
regressor, and investigate how to measure the fit of the OLS regression.
I PC-lab. Introduction to STATA. Basic manipulations with the data.
I Lecture (Friday). The meaning of a linear regression (population)
model and different motivations. Finite sample properties of the
estimators.

65 / 65

You might also like