DSC 402
DSC 402
DSC 402
DSC 402
BUSINESS STATISTICS-II
UNIT-1
REGRESSION
Introduction:
The concept 'Regression' is a continuation of 'Correlation' which we
learnt in Business Statistics–I.
But today, the word regression as used in Statistics has a much wider
perspective without any reference to biometry.
Regression:
Regression analysis is a mathematical measure of the average
relationship between two or more variables in terms of the original
units of the data.
For instance, the yield of a crop depends on the rainfall, the cost or
price of a product depends on the production and advertising
expenditure, the demand for a particular product depends on its price,
expenditure of a person depends on his income, and so on.
Simple Regression:
The regression analysis confined to the study of only two variables at a
time is termed as Simple regression.
But quite often the values of a particular phenomenon may be affected
by multiplicity of factors.
Multiple regression:
The regression analysis for studying more than two variables at a time
is known as multiple regression. However, in this chapter we shall
confine ourselves to simple regression only.
called the 'curve of regression'. Often such a curve is not distinct and is
quite confusing and sometimes complicated too. The mathematical
equation of the regression curve, usually called the regression
equation, enables us to study the average change in the value of the
dependent variable for any given value of the independent variable.
relationship, the regression lines can be used to find out the values of
dependent variable.
When we plot two variables (say X and Y on a scatter diagram and draw
two lines of best fit which pass through the plotted points, these lines
are called regression lines. In linear regression, these lines are straight
ones. These regression lines are based on two equations called
regression equations which give best estimate of one variable when the
other is exactly known or given.
LINES OF REGRESSION
Line of regression is the line which gives the best estimate of one
variable for any given value of the other variable. In case of two
variables x and y, we shall have two lines of regression; one of y on x
and the other of x on y.
Line of regression of y on x is the line which gives the best estimate for
the value of y for any specified value of x. Similarly, line of regression of
x on y is the line which gives the best estimate for the value of x for any
specified value of y.
The term best fit is interpreted in accordance with the Principle of Least
Squares which consists in minimising the sum of the squares of the
residuals or the errors of estimates, i.e., the deviations between the
given observed values of the variable and their corresponding
estimated values as given by the line of best fit. We may minimise the
sum of the squares of the errors parallel to y-axis or parallel to x-axis,
the former (i.e., minimising the sum of squares of errors parallel to y-
axis), gives the equation of the line of regression of y on x and the
latter, viz., minimising the sum of squares of the errors parallel to x-axis
gives the equation of the line of regression of x on y.
We shall explain below the technique of deriving the equation of the
line of regression of y on x.
Regression Analysis
For a bivariate data on 'x' and 'y', the regression equation obtained with
the assumption that 'x' is dependent on 'y' is called regression equation
of x on y.
The regression equation of x on y is
(X - X ) = bxy = (Y - Y )
The regression equation obtained with the assumption that 'y' is
dependent on 'x' is called regression equation of y on x.
The regression equation of y on x is:
(y - Y ) = byx (X- X ).
Here, the constants 'b' and 'b' are the regression coefficients. The
regression equation of x on y is used for the estimation of x values and
the regression equation of y on x is used for the estimation of y values.
83 = 24a + 108b
10 = -36b
b = -0.28 approx.
Put this in Eq. (1)
8a + (24) (-0.28) = 31
8a – 6.72 = 31
8a = 31 + 6.72
A = 4.715
Thus the regression of X on Y is
X = 4.715 - 0.28y (Ans)
a) Now let us predict the value of Y if X = 20
Y on X regression equation is
Y = 4.1625 - 0.3 x
= 4.1625 - 0.3 (20)
= - 1.8375 (Ans)
b) Now let us predict X if Y = 5
X = 4.715 - 0.28 (5)
= 4.715 - 1.40
X = 3.315 (Ans)
28 26 -7 -4 49 16 28
30 29 -5 -1 25 1 5
32 30 -3 0 9 0 0
35 25 0 -5 0 25 0
36 18 1 -12 1 144 -12
38 26 3 -4 9 16 -12
39 35 4 5 16 25 20
42 35 7 5 49 25 35
45 46 10 16 100 256 160
350 290 0 -10 358 608 324
Σx 350
x=
n
= 10
Σy 29 0
y=
n
= 10
Bxy =
EXERCISE:
1. What is regression? Difference between regression and
correlation.
2. What are the properties and limitations of regression ?
3.
4. Find the regression equation "y" on "x" from the following data.
X 16 22 18 04 03 10 05 12
Y 87 88 89 68 78 80 75 83
Age (yrs) 66 38 56 42 72 36 63 47 55 45
(x)
Blood 145 124 147 125 160 118 149 128 150 124
pressure
(Y)
4. Calculate coefficient of correlation and obtain the lines of regression for the
following data
Advertisement expenditure (in 10 12 15 23 20
lakhs) X
5. Calculate coefficient of correlation and obtain the lines of regression for the
following data
Adv. Expenditure (in 60 62 65 70 73 75 71
lakhs)
13. Obtain the equation of the lines of regression for data given below
X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15
Ans. Y = 7.25 + 0.95x: X = -6.4 + 0.95y
14. The following results were worked out from scores in statistics and
mathematics in a certain examination:
Score in statistics Score in mathematics
Regression Analysis