Ordinary Least Squares With A Single Independent Variable

Econ 444: Elementary Econometrics Winter 2011
The Ohio State University Jason R. Blevins
Lecture 3: Ordinary Least Squares
1. Ordinary Least Squares With a Single Independent Variable
Previously we have discussed linear regression in a fairly general sense, but now we will focus on
a particular method of determining linear regression coefficients called Ordinary Least Squares
(OLS). OLS is by far the most widely used tool in regression analysis. It is a benchmark of sorts, so
that even when researchers go on to use other more sophisticated methods, the OLS estimates
are commonly presented first for comparison.
Recall that the theoretical model of interest is the linear model
Y i = β 0 + β 1 X i + εi .
From this equation, we seek to use the information contained in a dataset of observations (X i , Yi )
to estimate the values of β0 and β1 , which we call β̂1 and β̂2 . The estimated equation is
Ŷi = β̂0 + β̂1 X i
and we defined the residuals as the difference between the fitted values Ŷi and the observed
values Yi :
e i ≡ Yi − Ŷi .
To estimate the regression coefficients in a consistent way, we must first formally define our
loss function, the criteria by which we determine whether the fit is good or not. OLS is founded
on minimizing the sum of squared residuals (SSR), defined as
n
e i2 = e 1 + e 2 + · · · + e n ,
X
SSR =
i =1
where n is the sample size. (Note that Studenmund calls this quantity the Residual Sum of
Squares or RSS, but SSR is probably the more common term of the two.) Therefore, the OLS
estimates of β0 and β1 are defined to be the values of β̂0 and β̂1 which, when plugged into the
estimated regression equation, minimize the SSR. Note that these estimates will be different
for a different sample, both for another sample of size n and for samples with more or less
observations.
1.1. Why Ordinary Least Squares?
OLS is used so often for several reasons.

1. It is very straightforward to implement, both by hand and computationally, and it is simple
to work with theoretically. The form of the OLS coefficients is easy to calculate by hand and
in general, running OLS on a computer is simple computationally as it only involves basic
matrix algebra, which computers handle well. The objective function, the sum of squared
residuals, is a well-behaved function that lends itself to easier mathematical analysis than
some other choices.
1
Econ 444: Elementary Econometrics Lecture 3: Ordinary Least Squares
2. The criteria of minimizing the squared residuals is intuitive. There are several other
possibilities we could consider as well, but each has practical or theoretical drawbacks,
which as resulted in OLS being the most widely used method. For example, we might try
simply minimizing the sum of the residuals ( i εi ), but that would weight negative and
P
positive residuals differently. In most cases, we probably want to weight them the same.
P
We might also try minimizing the sum of the absolute values of the residuals ( i |εi |). This
method, called Least Absolute Deviations (LAD), is very robust to outliers, but it is very
difficult to implement. There is no closed form for the regression coefficients, like we have
with OLS, and calculating them on a computer is very computationally complex.
3. Third reason?
1.2. The Ordinary Least Squares Estimator
The Ordinary Least Squares estimator of β0 and β1 in the linear model is defined to be the values
of β̂0 and β̂1 which minimize the sum of squared residuals
n ¯ n ¯
¯Yi − Ŷi ¯2 = ¯Yi − β̂0 − β̂1 X i ¯2 .
X ¯ X ¯
i =1 i =1
The general equations for the coefficients which minimize this quantity are well-known and can
be written:
n ni=1 X i Yi − ni=1 X i · ni=1 Yi
P P P
β̂1 = ¢2 ,
n ni=1 X i2 − ni=1 X i
P ¡P
β̂0 = Ȳ − β̂1 X̄ .
Equivalently, we can also write β̂1 as

Pn £ ¤
(X i − X̄ ) · (Y i − Ȳ )
β̂1 = i =1 Pn 2
.
i =1 (X i − X̄ )
The first form for β̂1 is easier to calculate by hand while the second form is perhaps more intuitive,
being the ratio of the sample covariance between X and Y to the sample variance of X .
Using the first set of equations for the coefficients, note that we only need to calculate the
following sample properties to find numerical values for β̂0 and β̂1 :
1. ni=1 X i ,
P
2. ni=1 Yi ,
P
3. ni=1 X i Yi ,
P
4. ni=1 X i2 .
P
Once we know these values, and the sample size (n), we can calculate the regression coefficients
and draw the regression line.
2
i Xi Yi X i Yi X i2
1 1 2 2 1
2 4 2 8 16
3 1 3 3 1
Total 6 7 13 18
Example 1. Recall our simple example dataset on stock price and trade volume, given in the
table below.
The quantities we need to calculate are:
n
X
Xi = 1 + 4 + 1 = 6
i =1
X n
Yi = 2 + 2 + 3 = 7
i =1
X n
X i Yi = 2 + 8 + 3 = 13
i =1
n
X i2 = 1 + 16 + 1 = 18
X
i =1
Substituting these values into the equations for the regression coefficients above, we have:
n ni=1 X i Yi − ni=1 X i · ni=1 Yi

P P P
β̂1 = ¢2
n ni=1 X i2 − ni=1 X i
P ¡P
3 · 13 − 6 · 7 39 − 42 3 1
= 2
= =− =− .
3 · 18 − 6 54 − 36 18 6
and
µ ¶
7 1 14 + 2 8
β̂0 = Ȳ − β̂1 X̄ = − − · 2 = = .
3 6 6 3
Therefore, the regression line is a line with intercept equal to 8/3 and slope −1/6:
8 1
Ŷi = − Xi .
3 6
Does the regression line pass through the middle of the two points (1, 2) and (1, 3)? To
check, we can evaluate the fitted values of Y for each point in the dataset. The SSR is ni=1 e i2 =
P
i Ŷi = 38 − 16 X i ei e i2
1 2.5 -0.5 0.25
2 2.0 0 0
3 2.5 0.5 0.25
3
0.25 + 0 + 0.25 = 0.5.

In particular, for the regression line at the point X = 1:
8 1 16 − 1 15
Ŷ = − ·1 = = = 2.5.
3 6 6 6
So, yes, this line passes exactly through the middle of the points (1, 2) and (1, 3). It also passes
directly through the point (4, 2) since
8 1 16 − 4 12
Ŷ = − ·4 = = = 2.
3 6 6 6
Example 2. Using the dataset below, calculate β̂0 and β̂1 , plot the data points, and draw the
regression line.
i Xi Yi
1 20 2
2 10 1
3 30 4
Example 3. Using the dataset below, calculate β̂1 and interpret your findings.
i Xi Yi X i Yi X i2
1 2 3 6 4
2 2 4 8 4
3 2 5 10 4
Total 6 12 24 12
In this example, we have
3 · 24 − 6 · 12 72 − 72 0
β̂1 = = = .
3 · 12 − 62 36 − 36 0
This is undefined! So, what is the slope of the regression line? To see what is going wrong in this
example, plot the data points. The sample variance of X i is exactly zero. In order to estimate the
regression coefficients, which represent Yi as a function of X i , we need to have some variation in
X i . If we only observe a single value of X i , we can never estimate the slope of the regression line,
which represents the effect of changes in X i on Yi .
2. Interpretation of Regression Coefficients
In the regression line
Yi = β̂0 + β̂1 X i ,
the coefficient on X i represents the amount by which we predict Yi will increase when X i
increases by one unit.
4
12 12
10 10
y1
y2
8 8
6 6
4 4
4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18
x1 x2
12 12
10 10
y3
y4
8 8
6 6
4 4
4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18
x3 x4
Figure 1. Anscombe’s Quartet
For example, suppose that Yi is individual i ’s annual demand for housing in dollars and that
X i is individual i ’s annual income, also measured in dollars. Then β̂1 is the number of additional
dollars individual i is predicted to spend on housing when income increases by one dollar. The
intercept, β̂0 , is an individual’s predicted expenditure on housing when income is zero.
Note that when β̂1 is zero, X i does not influence Yi . Many empirically relevant questions
focus on whether or not a particular coefficient is approximately zero or not. For example, what
is the effect of the number of young children in a household on the mother’s decision about how
many hours to work outside the home. We could regress hours worked, Yi , on the number of
children under age six, X i , and look at the coefficient on X i to determine whether there is any
effect.
3. Anscombe’s Quartet: A Cautionary Tale
Now that we have formally defined Ordinary Least Squares, a particular type of linear regression
founded on minimizing the sum of squared residuals, it is useful to take a step back and think
about what the results tell us, and to look at some cases where they can be very misleading. In
particular, there is a collection of four datasets referred to as Anscombe’s Quartet, displayed in
Figure 1 that serve to highlight the importance of exploring the data carefully before blindly
running a regression Anscombe (1973). Despite looking very different when plotted, these four
datasets have many identical statistical properties. Despite the fundamental differences, the
OLS coefficients, and therefore the estimated regression lines, are identical. Plotting the data
first might suggest other functional forms (e.g., perhaps we should regress on Z = X 2 ) or might
suggest that we should worry about outliers (e.g., perhaps we should be using a more advanced
technique such as LAD, which is beyond the scope of this course).
5
References
Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician 27, 17–21.

Ordinary Least Squares With A Single Independent Variable

Uploaded by

Copyright:

Available Formats

Ordinary Least Squares With A Single Independent Variable

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ordinary Least Squares With A Single Independent Variable

Uploaded by

Copyright:

Available Formats

Econ 444: Elementary Econometrics Winter 2011

The Ohio State University Jason R. Blevins

Lecture 3: Ordinary Least Squares

1. Ordinary Least Squares With a Single Independent Variable

1.1. Why Ordinary Least Squares?

OLS is used so often for several reasons.

1.2. The Ordinary Least Squares Estimator

Equivalently, we can also write β̂1 as

n ni=1 X i Yi − ni=1 X i · ni=1 Yi

0.25 + 0 + 0.25 = 0.5.

In this example, we have

2. Interpretation of Regression Coefficients

In the regression line

Figure 1. Anscombe’s Quartet

3. Anscombe’s Quartet: A Cautionary Tale

You might also like