Assignment 7 Regression
Assignment 7 Regression
BSAF - HCC
Regression
Regression is a technique that relates one dependent variable to one or more independent (or
explanatory) variables. A regression model estimates an equation that helps predict the dependent
variable for specific values of the explanatory variables. So, the equation tracks average changes in the
dependent variable caused by changes in the explanatory variables. The relationship can be linear or non-
linear. This relation can be estimated using techniques like Ordinary Least Square and Maximum
Likelihood.
This assignment focuses on Ordinary Least Square estimation for a simple regression line Y = a + bX (+e)
that tries to capture a linear relationship. Remember that Y = a + b X is an equation of a straight line where
‘a’ is the intercept and ‘b’ is the slope (rate of change in Y per unit of change in X). The parameter ‘b’ is
also called slope coefficient. Both ‘a’ and ‘b’ are also called estimators.
The regression equation here will be estimated using Ordinary Least Square technique that estimates the
values of ‘a’ and ‘b’ that minimizes the sum of square residuals ∑ 𝑒 2 where 𝑒 = 𝑌 − 𝑌̂ (Y denotes the
observed values and 𝑌̂ shows the values estimated by the regression equation and is called estimated or
fitted value). The difference between the observed and estimated values of the dependent variable( 𝑒) is
called residual or error.
We can use two techniques here. First, we can estimate the parameters ‘a’ and ‘b’ by using two normal
equations that are derived by optimization technique by minimizing the sum of squared residuals ∑ 𝑒 2.
Secondly, we can solve the equations and derive a formula for the parameter (or slope coefficient) b.
We can simultaneously solve for the following two equations (called normal equations)
∑ 𝑌 = 𝑛𝑎 + 𝑏 ∑ 𝑋
∑ 𝑋𝑌 = 𝑎 ∑ 𝑋 + 𝑏 ∑ 𝑋 2
We can directly solve for b by using the following formula (that can be derived from the above equations).
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌
𝑏=
𝑛 ∑ 𝑋 2 − (∑ 𝑋)2
Or alternatively
Solved Example
Given below are the marks in a test in logical reasoning (denoted by Y) and marks in mathematics in the
previous degree (denoted by X) for 10 students. Estimate a regression line Y on X by OLS. Using the
estimated regression line, estimate the marks of the test in logical reasoning for a student who scored 80
marks in mathematics.
X 66 85 75 67 92 80 85 73 98 65
Y 70 80 75 73 88 73 86 70 92 82
Solution
𝑋 𝑌 𝑋𝑌 𝑋2
66 70 4620 4356
85 80 6800 7225
75 75 5625 5625
67 73 4891 4489
92 88 8096 8464
80 73 5840 6400
85 86 7310 7225
73 70 5110 5329
98 92 9016 9604
65 82 5330 4225
786 789 62638 62942
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌 10(62638) − (786)(789)
𝑏= = = 0.535616
𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 10(62942) − (786)2
𝑏 = 0.535616
Interpretation: The value of b shows that if X changes by one unit, Y changes, on the average, by 0.535616
units in the same direction
𝑎 = 𝑌̅ − 𝑏 𝑋̅
∑ 𝑌 789
𝑌̅ = = = 78.9
𝑛 10
∑ 𝑋 786
𝑋̅ = = = 78.6
𝑛 10
𝑎 = 𝑌̅ − 𝑏 𝑋̅ = 78.9 − (0.535616)(78.6) = 36.80058
𝑎 = 36.80058
So the regression line can be written as 𝑌 = 𝟑𝟔. 𝟖𝟎𝟎𝟓𝟖 + 𝟎. 𝟓𝟑𝟓𝟔𝟏𝟔 𝑿
𝑌 = 𝟑𝟔. 𝟖𝟎𝟎𝟓𝟖 + 𝟎. 𝟓𝟑𝟓𝟔𝟏𝟔 𝑿
This means that we expect a student with 80 marks in mathematics to score 79.65 in the logical test, on
the average)
Alternative Formula
This means that we expect a student with 80 marks in mathematics to score 79.65 in the logical test, on
the average)
Questions to be solved
Estimate a regression line Y on X (Y = a + b X) from the following data and interpret the value of ‘b’
1.
2.
3.
4.
5.
6.