Lesson 11 Simple Regression PDF
Lesson 11 Simple Regression PDF
Lesson 11 Simple Regression PDF
REGRESSION 11
Definition of Regression
Regression literally means “return”or “go back”. In the 19th century, Francis Galton at
first used regression in his paper “Regression towards Mediocrity in Hereditary Stature”
for the study of hereditary characteristics.
Use of regression in modern times is not limited to hereditary characteristics only but it is
widely used for the study of expected dependence of one variable on the other.
Therefore, the method by which best probable values of unknown data of a variable are
calculated for the known values of the other variable is called regression.
Regression Lines
Regression line is that line which gives the best estimate of dependent variable for any
given value of independent variable. If we take the case of two variables X and Y, we shall
have two regression lines as the regression of X on Y and the regression of Y on X.
Regression Line X and Y: In this formation, Y is independent and X is dependent
variable, and best expected value of X is calculated corresponding to the given value of Y.
Regression Line Y on X : Here Y is dependent and X is independent variable, best
expected value of Y is estimated equivalent to the given value of X.
An important reason of having two regression lines is that they are drawn on least square
assumption which stipulates that the sum of squares of the deviations from different points
to that line is minimum. The deviations from the points from the line of best fit can be
measured in two ways – vertical, i.e. parallel to Y – axis, and horizontal i.e. parallel to
X axis.
For minimizing the total of the squares separately, it is essential to have two regression lines.
Single line of Regression: When there is perfect positive or perfect negative correlation
between the two variables (r = ±1) the regression lines will coincide or overlap and will form
a single regression line.
In a bivariate data, an independent variable is one that occurs naturely or is specially chosen in
order to investigate another (dependent) variable. Relationships involving the dependence of one
variable on the other are called causal relationships
Graphical Method
It can be proved mathematically that the regression line must pass through the point representing
the arithmetic means of the two variables. The procedure graphical method is as follows:
(a) Calculate the means of x and y , of the two variables.
(b) Plot the point corresponding to this pair of values on the scatter diagram.
(c) Using a ruler, draw a straight line through the point you have just plotted and lying, as evenly
as you can judge, among the other points on the diagram.
If someone else were to do it, you might well get a line of a slightly different slope, but it would
still go through the point of the means.
Σ(X -̅̅̅̅ ̅̅̅ = Sum of product of deviations Σ(X -̅̅̅̅ ̅̅̅ = Sum of product of deviations
taken from actual mean of X & Y. taken from actual mean of X & Y.
Σ(Y- ̅ 2= Sum of squares of deviations Σ(X- ̅ 2= Sum of squares of deviations
2. Short
from actualCut
meanMethod
of Y. : 2. Short Cut Method :
bxy = Nx ΣXY – (ΣX x ΣY) bxy = bxy = Nx ΣXY – (ΣX x ΣY)
NxΣY2 – (ΣY)2 NxΣX2 – (ΣX)2
XY = Sum of product of observations of X & Y.
XY = Sum of product of deviations taken from actual
= Sum of squares of observations of Y. mean of X & Y.
1. Interpolation – the given to estimation carried out within the range of values given for
the independent variable. It can be regarded with a degree of confidence.