Correlation Regression
Correlation Regression
Prepared By
Nitin Varshney
Assistant Professor
Agricultural Statistics
CoA, NAU, Waghai.
The study related to the characteristics of only one variable
such as height, weight, age, marks, wages etc. is known as
univariate analysis.
The study related to the relationship between two variables
such as height & weight. is known as Bivariate analysis.
CORRELATION
When we study two or more variables simultaneously, we
observe that movements in one variable are accompanied by
movements in other variable.
Example:
Husband’s age and wife’s age move together
Scores on IQ test move with scores in university
examinations
Relation b/w income & expenditure on household.
Relation b/w price & demand of commodity.
Meaning of Correlation
Types of Correlation
X 2 4 6 8 10 12 14 16
Y 3 6 9 12 15 18 21 24
X Y
2 2
4 6
6 8
8 12
10 18
12 24
14 36
16 44
18 54
20 67
22 75
24 89
Multiple and Partial Correlation
When there are interrelationship between many variables
and the value of one variable is influenced by many other
variables, e.g. The yield of crop per acre (X1) may depends
upon quality of seed (X2), fertility of soil (X3), fertilizer
used (X4), irrigation facilities (X5), weather conditions (X6)
and so on.
Whenever we are interested in studying the joint effect of
a group of variables upon a variable, then the correlation
is known as multiple correlation.
1
( xi x )( yi y )
n
(x x)
1
Variance X2 E{ X E ( X )}2 i
2
n
( y y)
1
Variance Y2 E{Y E (Y )}2 i
2
n
1
( xi x ) 2 1
n ( xi x )( yi y )
n
1
( xi2 x 2 2 xi x )
1
n ( xi yi xi y xyi xy )
n
1 1
xi2 x 2 2 x
xi 1
n n xi yi y xi x yi x y
n
1
X2 xi2 x 2
1
n Cov( X , Y ) xi yi xy
n
y
1 1
Variance Y2 E{Y E (Y )}2 ( yi y ) 2 2
i y2
n n
PROPERTIES OF CORRELATION COEFFICIENT
Range is -1 to +1.
is independent of change of origin and scale.
Two independent variables are uncorrelated.
Ranks xi yi i=1, 2, 3, …, n
d
i 1
i
2
6d i 1
i
2
1 1
2n X2 n(n 1)
2
Price of Tea 88 90 95 70 60 75 50
Price of Coffee 120 134 150 115 110 140 100
REGRESSION
The term “regression” literally means “stepping towards
the average”.
It is given by Sir Francis Galton.
OR OR
Regressed Regressor Explanatory
Explained
Predictor variable
variable
is minimum.
By solving the partial derivatives we will get two normal
equations for estimating a and b.
n n
y
i 1
i na byx xi 1
i (i)
n n n
i 1
xi yi a
i 1
xi byx
i 1
xi2 (ii)
If we divide the eqn. (i) by n then we get
y a byx x
Thus the line of regression of Y on X passes through the point
(x , y).
So regression coefficient (slope) of the line of regression of Y
on X is given by
Cov( x, y)
byx
V ( x)
xy
( x)( y)
byx n
and a y byx x
x n
2
( x) 2
( x)( y)
bxy
xy
n and a y bxy x
y 2
( y) 2
n
Since byx is the slope of the regression of Y on X and since the
line of regression passes through the point ( x, y ), its equation
is
Cov( X , Y )
Y y byx ( X x ) ( X x) r Y ( X x)
V (X ) X
Cov( X , Y )
X x bxy (Y y ) (Y y ) r X (Y y )
V (Y ) Y
Cov( X , Y )
r Cov( X , Y ) r X Y
V ( X )V (Y )
Cov( X , Y )
bYX Cov( X , Y ) bYX X2
V (X )
r X Y bYX X2
Y X
bYX r similarlyb XY r
X Y
PROPERTIES OF REGRESSION COEFFICIENT
1. Fundamental Property: Correlation coefficient is the geometric
mean between the regression coefficients.
b XY bYX r Y r X r 2
X Y
r b XY bYX
If r=0, tan θ =∞→ θ=90°. Thus if the two variables are uncorrelated,
the lines of regression become perpendicular to each other.
If r=±1, tan θ =0→ θ=0° or 180°. Thus if the two variables are
perfectly correlated, the lines of regression coincide to each other.
Q.3. From a paddy field, 15 plants were selected randomly. The length of
panicle (cm) and number of grains per panicle were recorded. Fit the
regression line for the given dataset and compute the number of estimated
grains per panicle if the panicle length is 25.2 cm.
Length of Panicle (cm) 22.4 23.3 24.1 24.3 23.5 23.1 21 20.6 26.4 25.4 23.4 21.4 23.6 24.5 22.5