Correlation
Correlation
Correlation
Correlation – Module 4
Introduction:
Statistical methods of measures of central tendency & dispersion, are helpful for the purpose of
comparison and analysis of distributions involving only one variable i.e. univariate distributions.
However, describing the relationship between two or more variables is another important part of
statistics. When for every value of a variable X we know a corresponding value of a second
variable Y (i.e the data is in the form of paired measurements), then we are interested in the
relationships of these two variables. In many business research situations, the key to decision
making lies in understanding the relationships between two or more variables. For example, in an
effort to predict the behavior of the bond market, a broker might find it useful to know whether
the interest rate of bonds is related to the prime interest rate. While studying the effect of
advertising on sales, an account executive may find it useful to know whether there is a strong
relationship between advertising cost and sales for a company.
Meaning:
Correlation is a statistical method that determines the degree of relationship between two
different variables. It is also known as a “bivariate” statistic, with bi- meaning two and variate
indicating variable or variance. The methods that are employed to determine if there exists any
relationship between two variables & to express this relationship numerically come under
correlation analysis. Correlation analysis was developed by Francis Galton & Karl Pearson. Here
we should consider only a logical relationship.
“Correlation means that between two series or groups of data there exists some casual
connections” By W.I.King
“The Whole subject of Correlation refers to that inter – relation between separate characters by
which they tend in some degree at least to move together.” By E.Davenpost
Quantitative Techniques - II
Correlation – Module 4
Utility of Correlation
The study of correlation is very useful in practical life as revealed by these points.
1. With the help of correlation analysis, we can measure in one figure, the degree of relationship
existing between variables like price, demand, supply, income, expenditure etc. Once we know
that two variables are correlated then we can easily estimate the value of one variable, given the
value of other.
2. Correlation analysis is of great use to economists and businessmen, it reveals to the
economists the disturbing factors and suggest to him the stabilizing forces. In business, it enables
the executive to estimate costs, sales etc. and plan accordingly.
3. Correlation analysis is helpful to scientists. Nature has been found to be a multiplicity of inter-
related forces.
Positive
&
Negative
Partial
&
Total
Quantitative Techniques - II
Correlation – Module 4
Positive or direct Correlation refers to the movement of variables in the same direction. The
correlation is said to be positive when the increase (decrease ) in the value of one variable is
accompanied by an increase (decrease) in the value of other variable also. Negative or inverse
correlation refers to the movement of the variables in opposite direction. Correlation is said to be
negative, if an increase (decrease) in the value of one variable is accompanied by a decrease
(increase) in the value of other.
Under simple correlation, we study the relationship between two variables only i.e., between the
yield of wheat and the amount of rainfall or between demand and supply of a commodity. In case
of multiple correlation, the relationship is studied among three or more variables. For example,
the relationship of yield of wheat may be studied with both chemical fertilizers and the
pesticides.
There are two categories of multiple correlation analysis. Under partial correlation, the
relationship of two or more variables is studied in such a way that only one dependent variable
and one independent variable is considered and all others are kept constant. For example,
coefficient of correlation between yield of wheat and chemical fertilizers excluding the effects of
pesticides and manures is called partial correlation. Total correlation is based upon all the
variables.
When the amount of change in one variable tends to keep a constant ratio to the amount of
.change in the other variable. Then the correlation is said to be linear. But if the amount of
change in one variable does not bear a constant ratio to the amount of change in the other
variable then the correlation is said to be non-linear. The distinction between linear and non-
linear is based upon the consistency of the ratio of change between the variables.
There are different methods which help us to find out whether the variables are related or not.
1. Scatter Diagram: Scatter diagram is drawn to visualise the relationship between two
variables. The values of more important variable are plotted on the X-axis while the values of
the other variable are plotted on the Y-axis. On the graph, dots are plotted to represent
different pairs of data. When dots are plotted to represent all the pairs, we get a scatter
diagram. The way the dots scatter gives an indication of the kind of relationship which exists
between the two variables. While drawing scatter diagram, it is not necessary to take at the
point of sign the zero values of X and Y variables, but the minimum values of the variables
considered may be taken.
When there is a positive correlation between the variables, the dots on the scatter
diagram run from left hand bottom to the right hand upper corner. In case of
perfect positive correlation all the dots will lie on a straight line.
When a negative correlation exists between the variables, dots on the scatter
diagram run from the upper left hand corner to the bottom right hand corner. In
case of perfect negative correlation, all the dots lie on a straight line.
If a scatter diagram is drawn and no path is formed, there is no correlation.
Quantitative Techniques - II
Correlation – Module 4
Merits
1. It is a very simple method of studying correlation. It is easy to draw, understand and
interpret a scatter diagram.
2. It is not affected by the values of extreme items.
Limitations
1. It does not give the precise degree of relationship between the variables. It only gives an
idea about the degree of correlation.
2. It is not amenable to mathematical treatment.
Covariance Method:
r = (∑dxdy) Short-Cut Method:
(N σxσy) r = Σ dx.dy
Where √Σdx2. Σ dy2
x = (X – X)
y = (Y – Y) r = nΣxy – (Σx)(Σy)
σx =Standard Deviation of Series X. √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]
σ y =Standard Deviation of Series Y. Where:
r= The correlation Coefficient x = (X – X)
N=Number of Pairs of Observation. y = (Y – Y)
Where r is the correlation coefficient of the random sample and N is the number of pairs of
observations in the sample.
Interpretation Of „r‟ Based On Probable Error Interpretation of r with the help of probable error
is done as follows:
If r is less than probable error, there is no correlation, and
If r is greater than six times the probable error, there is decided evidence of correlation and the
value of r is significant.
“Rank correlation” is the study of relationships between different rankings on the same set of
items. It deals with measuring correspondence between two rankings, and assessing the
significance of this correspondence. Spearman‟s correlation coefficient is defined as:
r = 1-((6∑D2)/(N(N-1)2))
Where r , denotes rank coefficient of correlation and
D = refers to the difference of rank relation between paired I tems in two series.
Features
The rank method has principal uses:
Sol:
X Dx = X - X Y Dy = Y - Y Dx2 Dy2 DxDy
11 3 20 9 9 81 27
10 2 18 7 4 49 14
9 1 12 1 1 1 1
8 0 8 -3 0 9 0
7 -1 10 -1 1 1 1
6 -2 5 -6 4 36 12
5 -3 4 -7 9 49 21
∑X = 56 0 ∑Y = 77 0 ∑dx2 = 28 ∑dy2 = 226 ∑dxdy = 76
X = ∑X = 56/7 = 8 Y = ∑Y = 77/7 = 11
N N
r = Σ dx.dy = 76/√28×226 = 76/√6328 = 76/79.55 = 0.96
√Σdx2. Σ dy2
Alternative Method:
r = (∑dxdy) = 76/(7×2×5.68) = 76/79.52 = 0.96
(N σx σy)
Standard Deviation of X – Series:
σx = √∑dx2 /N = √28/7 = √4 = 2
σy = √∑dy2/N = √226/7 = √32.28 = 5.68
Exercise Sums:
Q1) Find out Coefficient of Correlation between X & Y series:
X 17 18 19 19 20 20 21 21 22 22
Y 12 16 14 11 15 19 22 16 15 20
(Ans: X = 20, Y = 16, r = 0.61)
Q2) Calculate Coefficient of Correlation between the marks obtained by 10 students in
Accountancy & Statistics:
Student 1 2 3 4 5 6 7 8 9 10
Quantitative Techniques - II
Correlation – Module 4
Accountancy 45 70 65 30 90 40 50 75 85 60
Statistics 35 90 70 40 95 40 60 80 80 50
(Ans: X = 61, Y = 64, r = 0.903)
Q3) Find out Coefficient of Correlation between X & Y series:
X 58 50 53 60 63 55 60 59 61 51
Y 115 110 121 120 124 112 118 115 118 117
(Ans: X = 57, Y = 117, r = 0.56)
Q4) Calculate the Karl Pearson‟s Coefficient of Correlation by taking actual means 52 & 44
respectively:
X 44 46 46 48 52 54 54 56 60 60
Y 36 40 42 40 42 44 46 48 50 52
(r = 0.95)
Q5) Calculate the Karl Pearson‟s Correlation Coefficient from the following given below:
X Y
Mean 31 61
Standard Deviation 3.25 3.35
Sum of product of deviations of X & Y from their respective means = 75
Number of pairs of X & Y = 10 (Ans: r = 0.69)
Q6) Determine σx σy & the coefficient of correlation between X & Y series:
X Series Y Series
No of Items 15 15
Arithmetic Mean 25 18
Sum of Squares of Deviation 136 138
from mean
Sum of products of deviations of X & Y series from mean = 122 (Ans: r = 0.89)
Short – Cut Method (Assumed Mean)
Short-Cut Method:
r= nΣdxdy – (Σdx)(Σdy)
√[nΣdx2 – (Σdx)2] [nΣdy2 – (Σdy)2]
Where:
x = (X – X)
y = (Y – Y)
Quantitative Techniques - II
Correlation – Module 4
Q7) Calculate Coefficient of Correlation between the values of X & Y given below:
X 78 89 97 69 54 79 60 65
Y 125 137 156 112 107 136 120 110
Sol:
X Dx = X – A(69) Dx2 Y Dy = Y – Dy 2 dxdy
A(120)
78 9 81 125 5 25 45
89 20 400 137 17 289 340
97 28 784 156 36 1296 1008
69 0 0 112 -8 64 0
54 -15 225 107 -13 169 195
79 10 100 136 16 256 160
60 -9 81 120 0 0 0
65 -4 16 110 -10 100 40
39 1687 43 2199 1788
r= nΣdxdy – (Σdx)(Σdy)
√[nΣdx2 – (Σdx)2] [nΣdy2 – (Σdy)2]
r = 8x1788 – (39x43)
√[8x1687 – (39)2] [8x2199 – (43)2]
r= 14304 – 1677
√[13,496 – 1521] [17,592 – 1849]
r = 12,627
√[11,975] [15,743]
r = 12, 627
109.43 x 125.47
r = 12,627 / 13730. 17 = 0.92
Exercise Sums:
Q8) Calculate Karl Pearson‟s Correlation Coefficient between X & Y:
X 58 43 41 39 43 46 43 45 41 47 45 44
Y 11 27 31 42 30 28 28 20 19 20 32 30
(Ans: r = - 0.733)
Q9) Calculate the coefficient of correlation between X & Y variables for the data given below:
X 17 18 19 19 20 20 21 21 22 23
Quantitative Techniques - II
Correlation – Module 4
Y 12 16 14 11 15 19 22 16 15 20
(Ans: r = 0.614)
Q10) Calculate Karl Pearson‟s Coefficient of Correlation between the age of Husband (X) &
wives (Y):
Given: ∑XY = 3040, ∑X = -170, ∑X2 = 8,288 ∑Y = - 20, ∑Y2 = 2,264, N=10 (Ans: 0.78)
Q11) Calculate the number of items for which r = 0.8, ∑XY = 200, Standard deviation of Y = 5
& ∑x2 = 100, where X & Y denote deviations of items from actual mean (where ∑x & ∑y = 0)
(Ans: N=25)
Q12) If Covariance between x & y variables is 9.6 & the variance of x & y are respectively 16 &
9. Find the coefficient of Correlation. (Hint: Covariance = ∑xy/N = 9.6, σ = √variance, r = 0.8)
Q13) The following table gives the distribution of the total population & those who are blind
among them. Find out if there is any relation between age & blindness:
Age 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Population 150 90 50 32 20 12 5 2
(„000)
Blind 90 63 50 40 30 30 20 12
Sol:
Age Mid- Y dx dy Dx2 Dy2 dxdy
Value
(x) (X- A)/10 (Y-A)/5
A = 35 A = 150
0-10 5 90x1,00,000/1,50,0 -3 -18 9 324 54
00
= 60
10-20 15 63x1,00,000/90,000 -2 -16 4 256 32
= 70
20-30 25 50x1,00,000/50,000 -1 -10 1 100 10
= 100
30-40 35 40x1,00,000/32,000 0 -5 0 25 0
= 125
40-50 45 30x1,00,000/20,000 1 0 1 0 0
= 150
50-60 55 30x1,00,000/12,000 2 20 4 400 40
= 250
60-70 65 20x1,00,000/5,000 3 50 9 2500 150
= 400
70-80 75 12x1,00,000/2,000 4 90 16 8100 360
Quantitative Techniques - II
Correlation – Module 4
= 600
∑x = ∑Y = 1755 ∑dx = 4 ∑dy = ∑dx2 = ∑dy2 ∑dxdy =
320 111 44 =11,705 646
(Ans: r = 0.9038)
Q14) From the following data find out if there is any relationship between density of population
& death rate:
District Area (in sq km) Population No of Deaths
A 120 24,000 288
B 150 75,000 1,125
C 80 48,000 768
D 50 40,000 720
E 250 50,000 650
(Hint: X = Population/Area, Y = No of Deaths/Population*1,000) (Ans r = 0.99)
Q15) Calculate Karl Pearson‟s Coefficient of Correlation use 38 as assumed mean for
commodity A & 75 for Commodity B. (Ans r = 0.827)
Months Jan Feb March April May June July Aug Sep Oct
Commodity 35 36 40 38 37 39 41 40 36 38
A
Commodity 65 72 78 77 76 77 80 79 76 75
B
Q16) If Probable Error = .05 & N = 16, Find out the Coefficient of Correlation & points out its
significance.
Sol: P.E = 0.6745 (1-r2)
√N
.05 = 0.6745 (1-r2)
√16
.05 = 0.6745 (1-r2)
4
.05 x 4 = (1-r2)
0.6745
.7034 = r2
r = √.7034 = 0.84
6PE = .05x6 = 0.3
Since r is more than 6 times of probable error the value of r is significant.
Q17) The following table gives the value of X (soil temperature at 4 inch below ground in degree
F) & Y (germination interval in days) for winter wheat at 12 places:
Quantitative Techniques - II
Correlation – Module 4
X 57 42 40 38 42 45 42 44 40 46 44 43
Y 10 26 30 41 29 27 27 19 18 19 31 29
Calculate the correlation coefficient between soil temperature & germination interval & interpret
the results.
Sol:
X Dx = X - Dx2 Y Dy = Y - Dy2 dxdy
44 26
57 13 169 10 -16 256 -208
42 -2 4 26 0 0 0
40 -4 16 30 +4 16 -16
38 -6 36 41 +15 225 -90
42 -2 4 29 +3 9 -6
45 +1 1 27 +1 1 +1
42 -2 4 27 +1 1 -2
44 0 0 19 -7 49 0
40 -4 16 18 -8 64 +32
46 +2 4 19 -7 49 -14
44 0 0 31 +5 25 0
43 -1 1 29 +3 9 -3
∑x = 523 ∑dx = -5 ∑dx2 = 255 ∑y = 306 ∑dy = -6 ∑dy2 = 704 ∑dxdy = -
306
r= nΣdxdy – (Σdx)(Σdy)
√[nΣdx2 – (Σdx)2] [nΣdy2 – (Σdy)2]
r = 12x(-)306 – (-5)(-6)
√[12x255 – (-5)2] [12x704 – (-6)2]
r = -3672 - 30
√[3060 – 25] [8448 – 36]
r = - 3702
√[3035] [8412]
r = -3702/55.09x91.71 = - 3702/5052.38 = - 0.732
PE = .09
6 PE = 6x.09 = 0.54
Since the value of r is more than 6 times of PE the negative correlation is significant.
Q18) calculate the karl pearson‟s coefficient of correlation between the ages of husband‟s &
wives & comment on the result:
Quantitative Techniques - II
Correlation – Module 4
X 20 25 30 35 40 45 50 55 60 65 70
Y 17 24 28 32 35 38 42 51 56 60 62
(Ans: 0.99 PE = .004 Result is significant)
Q19) From the data given below calculate coefficient of correlation & interpret it:
X Y
Number of items 8 8
Mean 68 69
Sum of squares of deviation from mean 36 44
Sum of the product of deviations = 24 (Ans: 0.603 PE = 0.15 Not Significant)
SPEARMAN’s RANKING METHOD:
Q20) Calculate Rank Correlation coefficient from the following data:
Rank 5 3 4 8 2 1 7 10 6 9
X
Rank 3 7 5 9 2 4 1 10 8 6
Y
Sol:
Rank X Rank Y D = Rx - Ry D2
5 3 2 4
3 7 -4 16
4 5 -1 1
8 9 -1 1
2 2 0 0
1 4 -3 9
7 1 6 36
10 10 0 0
6 8 -2 4
9 6 3 9
2
N = 10 ∑D = 0 ∑D = 80
Q21) Find out the coefficient of correlation between X & Y by the method of Rank differences:
X 22 24 27 35 21 20 27 25 27 23
Y 30 38 40 50 38 25 38 36 41 32
Sol:
Quantitative Techniques - II
Correlation – Module 4
X Y Rx Ry D = Rx-Ry D2
22 30 8 9 -1 1
24 38 6 5 +1 1
27 40 3 3 0 0
35 50 1 1 0 0
21 38 9 5 +4 16
20 25 10 10 0 0
27 38 3 5 -2 4
25 36 5 7 -2 4
27 41 3 2 +1 1
23 32 7 8 -1 1
∑D2 = 28
Q24) Value of Spearman‟s Rank Correlation Coefficient for a certain pairs of number of
observations was found to be 2/3. The sum of squares of the difference between corresponding
ranks was 55. Find the number of pairs. (Ans: N = 10)
Q25) Find out the coefficient of correlation between X & Y by the method of Rank differences:
X 78 89 97 69 59 79 68 57
Y 125 137 156 112 107 136 123 108 (Ans: rs = 0.95)
Q26) (a) If r = 0.4 Cov (x,y) = 10 & σy = 5, then find the value of σx. (Ans: σx = 5)
Quantitative Techniques - II
Correlation – Module 4
(b) Find the correlation coefficient between x & y when variance of x is 2.25, variance of y is 1
& covariance of x & y is 0.9. (Ans: r = 0.6)
(c) The correlation coefficient & covariance of two variables X & Y are 0.28 & 7.6 respectively.
If the variance of X be 9, find the standard deviation of Y. (Ans: σy = 9)
2
(d) Find R (Spearman‟s Rank Correlation Coefficient) when ∑d = 30 & N = 10. (Ans: r = 0.82)
(e) Find the correlation coefficient if σ2x = 6.25, σ2y = 4 & Cov (x,y) = 0.9. (Ans: r = 0.18)