Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
42 views

Week 2 - Simple Linear Regression

The document defines the objectives of simple linear regression as estimating regression parameters and introducing OLS estimators. It then provides definitions of key terms like dependent and independent variables. The document derives the OLS estimates by minimizing the sum of squared residuals to obtain the normal equations and estimates beta_0 and beta_1. An example calculates the OLS estimates for a dataset on corn yield and fertilizer amount over 11 years.

Uploaded by

Brave Kamuzeri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Week 2 - Simple Linear Regression

The document defines the objectives of simple linear regression as estimating regression parameters and introducing OLS estimators. It then provides definitions of key terms like dependent and independent variables. The document derives the OLS estimates by minimizing the sum of squared residuals to obtain the normal equations and estimates beta_0 and beta_1. An example calculates the OLS estimates for a dataset on corn yield and fertilizer amount over 11 years.

Uploaded by

Brave Kamuzeri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Week 2: Lecture 3 -4

1
Objectives:
To define a simple linear regression

Introduce the OLS estimators

To estimate the regression parameters

Use example to estimate the OLS parameters.

2
Definitions:
We are interested in explaining y in terms of x or how y varies with changes
in x.
In writing a model that explains this there are three issues:

First, since there is never an exact relationship, how do we allow for other factors to
affect y?

Second, what is the functional form between y and x?

Third, how can we be sure that we are capturing a ceteris paribus relationship between y
and x?
3
Simple Linear Regression
Some of these can be solved by writing the equation yt = β0+β1xt+u
which is known as the simple linear regression (SLR) or the two variable regression
model or a bivariate regression model.

4
Simple Linear Regression …..
The variables y and x have several different names that are used interchangeably:

-The y variable is called the dependent variable, the explained variable, the response
variable, the predicted variable, and the regressand
-The x variable is called the independent variable, the explanatory variable, the
control variable, the predictor variable, and the regressor
-The variable, μ, represents all factors other than x that affect y.
It is known as the error term, the disturbance term, the stochastic term, and the
random term

5
Linear Regression….
 Therefore, if the change in μ =0 then the change in y will = the change in x.
 βo represents the intercept and β1 represents the slope parameter in a SLR.

• Using the assumption that E(μ)=0, and that E(β0)=β0 and E(β1)= β1, we can obtain what is
known as the ‘population regression function’ (PRF):
• E(y|x) = E(β0 + β1x + μ)
• E(y|x) = E(β0 + β1x) + E(μ)
• E(y|x) = E(β0) + E(β1x) + E(μ)
• Therefore: E(y|x) = β0 + β1x.
• This means the linear relationship of the PRF gives us a one unit increases in x changes the
expected value of y by the slope amount

6
Deriving the OLS estimates
A line best fit drawn is called the sample regression function that is

𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊

• For any xi value, the difference between the actual value of Yi and the value given
by the sample regression function is called the residual, μi, where:
• 𝝁𝒊 = 𝒚𝒊 − 𝒚𝒊 = 𝒚𝒊 − 𝜷𝟎 + 𝜷𝟏 𝒙𝒊

7
Deriving the OLS estimates…..
For a correctly specified model the residual is the sample estimate of the error
term.

Like the error term (μ), it represents that part of the value of the variable y that
the estimated linear model is unable to explain.

The OLS method states that we should choose the SMF line that has the smallest
residuals.

The line that minimises the amount of variation in y that cannot be explained by
the model.

8
Deriving the OLS estimates…..
To measure the variation we sum the residuals.
However they would equal 0 are they are negative and
positive.
To avoid this we sum their squared values:

𝒏 𝒏 𝟐 𝒏 𝟐
 𝒊 𝝁𝟏 = 𝒊=𝟏 𝒚𝒊 − 𝒚𝒊 = 𝒊=𝟏 𝒚𝒊 − 𝜷𝟎 − 𝜷𝟏 𝒙𝒊

9
Deriving the OLS estimates…..
The OLS procedure minimises the function with respect to two unknowns βo and
β1.

Using calculus the necessary conditions for local extremum are:

𝜕 𝑛 2
𝑖=1 𝜇𝑖
 = −2 𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 = 0
𝛿 𝛽0

𝜕 𝑛 2
𝑖=1 𝜇𝑖
 = −2 𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 𝑥𝑖 = 0
𝛿 𝛽1

10
Deriving the OLS estimates…..
Expanded to give “normal equation”

 𝑦𝑖 − 𝛽0 𝑛 − 𝛽1 𝑥𝑖1 = 0

 𝑥𝑖 𝑦𝑖 − 𝛽0 𝑥𝑖 − 𝛽1 𝑥𝑖2 = 0 and solved

11
Deriving the OLS estimates…..
𝑛 𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 −
𝑛
𝑖=1 𝑥𝑖
𝑛
𝑖=1 𝑦𝑖
𝛽1 = 𝑛 𝑥2− 𝑛 𝑥 2
𝑛 𝑖=1 𝑖 𝑖=1 𝑖

𝑛
𝑖=1 𝑥𝑖 −𝑥𝑖 𝑦𝑖 −𝑦𝑖
 𝑛 2
𝑖=1 𝑥𝑖 −𝑥𝑖

𝛽0 = 𝑦𝑖 − 𝛽1 𝑥 𝑖

12
Examples: Corn produced with fertilizer used
Year n Yi Xi
2001 1 40 6
2002 2 44 10
2003 3 46 12
2004 4 48 14
2005 5 52 16
2006 6 58 18
2007 7 60 22
2008 8 68 24
2009 9 74 26
2010 10 80 32
2011 11 85 38

13
Examples: Corn produced with fertilizer
used
• The Table gives the kilogram of corn per hectare, Y, resulting from the use of
various amounts of fertilizers in kg per hectare, X, produced on a farm in 11 years
from 2001 to 2011.

• These are plotted in a scatter diagram.

• The relationship between X and Y is approximately linear i.e., the points would fall
on or near a straight line).

14
Examples: Corn produced with fertilizer
used: Scatter plot
90

80

70
Corn Y, yield

60

50

40

30

20

10

0
0 5 10 15 20 25 30 35 40

Fertilizer, X

15
The Ordinary Least Squares Methods
OLS is technique for fitting the “best” straight line to the sample of XY
observations.

It involves minimizing the sum of squared (vertical) deviations of points from the
line:

2
𝑀𝑖𝑛 𝑌𝑖 − 𝑌𝑖
Where Yi refers to the actual observations, and 𝑌𝑖 refers to the corresponding
fitted values
Remember: 𝑌𝑖 − 𝑌𝑖 = 𝜇𝑖 , the residual.
16
The Ordinary Least Squares Methods…
• 𝛽0 = 𝑦𝑖 − 𝛽1 𝑥 𝑖

• It is often useful to use an equivalent formula for estimating 𝛽1 .

𝑥𝑖 𝑦 𝑖 𝑐𝑜𝑣(𝑋,𝑌)
• 𝛽1 = 𝑋𝑖2
= 2
𝜎𝑋

• Where 𝑥𝑖 = Xi - 𝑥, and 𝑦𝑖 = Yi - 𝑦.
• The estimated least squares regression (OLS) equation is then:
• 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖

17
The calculations to estimate the regression for the corn-
fertilizer problem
Year n Yi Xi yi xi x i yi x2i
2001 1 40 6 -19.5455 -13.8182 270.0826 190.9421

2002 2 44 10 -15.5455 -9.81818 152.6281 96.39669

2003 3 46 12 -13.5455 -7.81818 105.9008 61.12397

2004 4 48 14 -11.5455 -5.81818 67.17355 33.85124

2005 5 52 16 -7.54545 -3.81818 28.80992 14.57851

2006 6 58 18 -1.54545 -1.81818 2.809917 3.305785

2007 7 60 22 0.454545 2.181818 0.991736 4.760331

2008 8 68 24 8.454545 4.181818 35.35537 17.4876

2009 9 74 26 14.45455 6.181818 89.35537 38.21488

2010 10 80 32 20.45455 12.18182 249.1736 148.3967

2011 11 85 38 25.45455 18.18182 462.8099 330.5785

n =11 Sum 655 218 0.00 0.00 1465.09 939.64

Mean 59.54545 19.81818 Residuals Residuals

18
Calculations
𝑥𝑖 𝑦𝑖 1465.09
𝛽1 = = = 1.559 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑙𝑖𝑛𝑒
𝑋𝑖2 939.64

𝛽0 = 𝑦𝑖 − 𝛽1 𝑥 ≅ 59.55 − 1.559 19.82 = 28.6447 𝑌 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡


The estimated regression equation:

𝑌𝑖 = 28.6447 + 1.559Xi

19
Calculations….
• Thus, when Xi = 0, 𝑌𝑖 = 28.6447 = 𝛽0

• When Xi = 19.82 = 𝑥 , 𝑌𝑖 = 28.6447 + 1.559(19.82)=59.5454 = 𝑦𝑖

• As a result, the regression line passes through point 𝑥𝑦𝑖

20
“Best line-of-fit”
40

35

30

25
Corn Yield, Y

20

15

10

0
0 10 20 30 40 50 60 70 80 90
Fertilizer, X

21
Summary
Simple regression is used for testing hypothesis about the relationship between a
dependent variable Y and an independent or explanatory variable X and for
prediction

Linear regression analysis assumes that there is an approximate linear relationship


between X and Y

• The set of random sample values of X and Y fall on or near a straight line

The error term (disturbance or stochastic term) measures the deviation of each
observed Y value from the true (but unobserved) regression line.

22
Summary….
The error term arise because of:

 Numerous explanatory variables with only slight and irregular effects on Y that are
omitted from the exact linear relationship given by the equation

 Possible errors of measurements in Y

 Random human behavior

23
Review questions
• The data in the following table reports the aggregate consumption (in million N$)
and disposable income (in millions N$) for the Namibian economy for the past 20
years from 2002 – 2021 (populate data for 2013 to 2021) do this by making use of a
trend equation.
1. Draw a scatter diagram for the data and determine by inspection if there exists
an appropriate linear relationship between Y and X.
2. State the general relationship between consumption and disposable income in
a) exact linear form,
b) stochastic from,
c) why would you expect most observed values of Y not to fall exactly on a straight
line.

24
Year
Reviewn Questions….data
Xi Yi
2002 1 114 102
2003 2 118 106
2004 3 126 108
2005 4 130 110
2006 5 136 122
2007 6 140 124
2008 7 148 128
2009 8 156 130
2010 9 160 142
2011 10 164 148
2012 11 170 150
2013 12 178 154
25
2014 13 188 153

You might also like