Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Correlation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Quantitative Techniques - II

Correlation – Module 4

Introduction:

Statistical methods of measures of central tendency & dispersion, are helpful for the purpose of
comparison and analysis of distributions involving only one variable i.e. univariate distributions.
However, describing the relationship between two or more variables is another important part of
statistics. When for every value of a variable X we know a corresponding value of a second
variable Y (i.e the data is in the form of paired measurements), then we are interested in the
relationships of these two variables. In many business research situations, the key to decision
making lies in understanding the relationships between two or more variables. For example, in an
effort to predict the behavior of the bond market, a broker might find it useful to know whether
the interest rate of bonds is related to the prime interest rate. While studying the effect of
advertising on sales, an account executive may find it useful to know whether there is a strong
relationship between advertising cost and sales for a company.

Meaning:

Correlation is a statistical method that determines the degree of relationship between two
different variables. It is also known as a “bivariate” statistic, with bi- meaning two and variate
indicating variable or variance. The methods that are employed to determine if there exists any
relationship between two variables & to express this relationship numerically come under
correlation analysis. Correlation analysis was developed by Francis Galton & Karl Pearson. Here
we should consider only a logical relationship.
“Correlation means that between two series or groups of data there exists some casual
connections” By W.I.King
“The Whole subject of Correlation refers to that inter – relation between separate characters by
which they tend in some degree at least to move together.” By E.Davenpost
Quantitative Techniques - II
Correlation – Module 4

Utility of Correlation

The study of correlation is very useful in practical life as revealed by these points.

1. With the help of correlation analysis, we can measure in one figure, the degree of relationship
existing between variables like price, demand, supply, income, expenditure etc. Once we know
that two variables are correlated then we can easily estimate the value of one variable, given the
value of other.
2. Correlation analysis is of great use to economists and businessmen, it reveals to the
economists the disturbing factors and suggest to him the stabilizing forces. In business, it enables
the executive to estimate costs, sales etc. and plan accordingly.
3. Correlation analysis is helpful to scientists. Nature has been found to be a multiplicity of inter-
related forces.

Positive
&
Negative

Linear Types Simple


& of &
Non-Linear Correlation Multiple

Partial
&
Total
Quantitative Techniques - II
Correlation – Module 4

Positive and Negative Correlation

Positive or direct Correlation refers to the movement of variables in the same direction. The
correlation is said to be positive when the increase (decrease ) in the value of one variable is
accompanied by an increase (decrease) in the value of other variable also. Negative or inverse
correlation refers to the movement of the variables in opposite direction. Correlation is said to be
negative, if an increase (decrease) in the value of one variable is accompanied by a decrease
(increase) in the value of other.

Simple and Multiple Correlation

Under simple correlation, we study the relationship between two variables only i.e., between the
yield of wheat and the amount of rainfall or between demand and supply of a commodity. In case
of multiple correlation, the relationship is studied among three or more variables. For example,
the relationship of yield of wheat may be studied with both chemical fertilizers and the
pesticides.

Partial and Total Correlation

There are two categories of multiple correlation analysis. Under partial correlation, the
relationship of two or more variables is studied in such a way that only one dependent variable
and one independent variable is considered and all others are kept constant. For example,
coefficient of correlation between yield of wheat and chemical fertilizers excluding the effects of
pesticides and manures is called partial correlation. Total correlation is based upon all the
variables.

Linear and Non-Linear Correlation


Quantitative Techniques - II
Correlation – Module 4

When the amount of change in one variable tends to keep a constant ratio to the amount of
.change in the other variable. Then the correlation is said to be linear. But if the amount of
change in one variable does not bear a constant ratio to the amount of change in the other
variable then the correlation is said to be non-linear. The distinction between linear and non-
linear is based upon the consistency of the ratio of change between the variables.

Methods of Studying Correlation

There are different methods which help us to find out whether the variables are related or not.

1. Scatter Diagram Method.


2. Karl Pearson's Coefficient of correlation.
3. Spearman‟s Rank Correlation Method.

1. Scatter Diagram: Scatter diagram is drawn to visualise the relationship between two
variables. The values of more important variable are plotted on the X-axis while the values of
the other variable are plotted on the Y-axis. On the graph, dots are plotted to represent
different pairs of data. When dots are plotted to represent all the pairs, we get a scatter
diagram. The way the dots scatter gives an indication of the kind of relationship which exists
between the two variables. While drawing scatter diagram, it is not necessary to take at the
point of sign the zero values of X and Y variables, but the minimum values of the variables
considered may be taken.
When there is a positive correlation between the variables, the dots on the scatter
diagram run from left hand bottom to the right hand upper corner. In case of
perfect positive correlation all the dots will lie on a straight line.
When a negative correlation exists between the variables, dots on the scatter
diagram run from the upper left hand corner to the bottom right hand corner. In
case of perfect negative correlation, all the dots lie on a straight line.
If a scatter diagram is drawn and no path is formed, there is no correlation.
Quantitative Techniques - II
Correlation – Module 4

Merits
1. It is a very simple method of studying correlation. It is easy to draw, understand and
interpret a scatter diagram.
2. It is not affected by the values of extreme items.
Limitations
1. It does not give the precise degree of relationship between the variables. It only gives an
idea about the degree of correlation.
2. It is not amenable to mathematical treatment.

2. Karl Pearson's Co-efficient of Correlation: Karl Pearson's method, popularly known as


Pearsonian co-efficient of correlation, is most widely applied in practice to measure correlation.
The Pearsonian co-efficient of correlation is represented by the symbol “r”.

Covariance Method:
r = (∑dxdy) Short-Cut Method:
(N σxσy) r = Σ dx.dy
Where √Σdx2. Σ dy2
x = (X – X)
y = (Y – Y) r = nΣxy – (Σx)(Σy)
σx =Standard Deviation of Series X. √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]
σ y =Standard Deviation of Series Y. Where:
r= The correlation Coefficient x = (X – X)
N=Number of Pairs of Observation. y = (Y – Y)

Properties Of Correlation Coefficient (r)


1. Correlation coefficient is a pure number and written without any units of measurement.
Quantitative Techniques - II
Correlation – Module 4

2. The value of coefficient of correlation lies between -1 and +1. Symbolically -1 ≤ r ≤ +1


3. Correlation coefficient is independent of change of origin and scale of the variable X and Y. If
all the values in the X series and /or Y series are multiplied (or divided) and/or increased (or
decreased) by a constant, the correlation coefficient (r) will not be affected.
4. The correlation coefficient is the geometric mean of two regression coefficients, i.e.,
r = bxy.byx
Merits
It is the most popular method of studying correlation. It gives direction as well as the
degree of relationship between the two variables.
The correlation coefficient along with regression analysis helps in estimating the value of
the dependent variable from the known value of an independent variable.
Limitations
Compared to other methods this method takes more time to compute the value of
coefficient of correlation.
The value of correlation coefficient is unduly affected by the presence of extreme values.
It is based on a large number of assumptions (like linear relationship, normality of the
distributions, cause and effect relationship) which may not always hold well.
It is very much likely to be misinterpreted.

Probable Error of Correlation Coefficient

Meaning & Calculation


One of the measures, which help in interpreting the value of correlation coefficient, is its
probable error. It helps in testing the reliability of an observed value of r so far as it depends
upon the condition of random sampling. It is an amount, which when added and subtracted from
the correlation coefficient, produces limits within which the population coefficient of correlation
will have 50% chance to lie. Probable error denoted by P.E., is given by the following formula
P.E. = .6745 x 1- r2

Quantitative Techniques - II
Correlation – Module 4

Where r is the correlation coefficient of the random sample and N is the number of pairs of
observations in the sample.
Interpretation Of „r‟ Based On Probable Error Interpretation of r with the help of probable error
is done as follows:
If r is less than probable error, there is no correlation, and
If r is greater than six times the probable error, there is decided evidence of correlation and the
value of r is significant.

Conditions for the use of Probable Error


The following conditions must be fulfilled for the use of probable error:
The data must have been drawn from a normal population,
The conditions of random sampling should prevail in selecting the observations for the
sample,
The number of observation in the sample should be large.

3. Spearman‟s Rank Correlation Co-Efficient:


It is possible to avoid making any assumptions about the populations being studied by
ranking the observations according to size and basing the calculations on the ranks rather
than upon the original observations. It does not matter which way the items are ranked, item
number one may be the largest or it may be the smallest. Using ranks rather than actual
observations gives the coefficient of rank correlations.
This method of finding out co variability or the lack of it between two variables was
developed by the British Psychologist Charles Edward Spearman in 1904. A ranking is a
relationship between a set of items such that, for any two items, the first is either 'ranked
higher than', 'ranked lower than' or 'ranked equal to' the second. In mathematics, this is
known as a weak order or total preorder of objects. It is not necessarily a total order of
objects because two different objects can have the same ranking. The rankings themselves
are totally ordered. For example, materials are totally preordered by hardness, while degrees
of hardness are totally ordered.
Quantitative Techniques - II
Correlation – Module 4

“Rank correlation” is the study of relationships between different rankings on the same set of
items. It deals with measuring correspondence between two rankings, and assessing the
significance of this correspondence. Spearman‟s correlation coefficient is defined as:
r = 1-((6∑D2)/(N(N-1)2))
Where r , denotes rank coefficient of correlation and
D = refers to the difference of rank relation between paired I tems in two series.
Features
The rank method has principal uses:

• The sum of the differences between two variables is zero.


• Spearman‟s rank correlation coefficient ρ is the Pearsonian correlation coefficient between
the ranks.
• The rank correlation can be interpreted in the same way as Karl Pearson‟s correlation
coefficient. Karl Pearson correlation coefficient assumes that the sample observations are
drawn from a normal population. Rank correlation coefficient is a distribution free measure
since no strict assumption is made about the population from which it is drawn.
• The values obtained for two formulae are different due to the fact that when ranking is used
some information is hidden.
• Spearman‟s formula is the only formulae available to find the correlation between
qualitative characters.
Types of Rank Methods
In the rank correlation we may have two types of problems:
• Where ranks are given
• Where ranks are not given
• Where repeated ranks occur
Note:
If r = 1 then there is a perfect Positive correlation
If r = 0 then the variables are uncorrelated
If r=-1 then there is a perfect Negative Correlation
Quantitative Techniques - II
Correlation – Module 4

Sums on Karl Pearson‟s Coefficient of Correlation: (Based of Arithmetic Mean)


Q1 Calculate Karl Pearson‟s Correlation of coefficient from the following data:
X 11 10 9 8 7 6 5
Y 20 18 12 8 10 5 4

Sol:
X Dx = X - X Y Dy = Y - Y Dx2 Dy2 DxDy
11 3 20 9 9 81 27
10 2 18 7 4 49 14
9 1 12 1 1 1 1
8 0 8 -3 0 9 0
7 -1 10 -1 1 1 1
6 -2 5 -6 4 36 12
5 -3 4 -7 9 49 21
∑X = 56 0 ∑Y = 77 0 ∑dx2 = 28 ∑dy2 = 226 ∑dxdy = 76

X = ∑X = 56/7 = 8 Y = ∑Y = 77/7 = 11
N N
r = Σ dx.dy = 76/√28×226 = 76/√6328 = 76/79.55 = 0.96
√Σdx2. Σ dy2
Alternative Method:
r = (∑dxdy) = 76/(7×2×5.68) = 76/79.52 = 0.96
(N σx σy)
Standard Deviation of X – Series:
σx = √∑dx2 /N = √28/7 = √4 = 2
σy = √∑dy2/N = √226/7 = √32.28 = 5.68

Exercise Sums:
Q1) Find out Coefficient of Correlation between X & Y series:
X 17 18 19 19 20 20 21 21 22 22
Y 12 16 14 11 15 19 22 16 15 20
(Ans: X = 20, Y = 16, r = 0.61)
Q2) Calculate Coefficient of Correlation between the marks obtained by 10 students in
Accountancy & Statistics:
Student 1 2 3 4 5 6 7 8 9 10
Quantitative Techniques - II
Correlation – Module 4

Accountancy 45 70 65 30 90 40 50 75 85 60
Statistics 35 90 70 40 95 40 60 80 80 50
(Ans: X = 61, Y = 64, r = 0.903)
Q3) Find out Coefficient of Correlation between X & Y series:
X 58 50 53 60 63 55 60 59 61 51
Y 115 110 121 120 124 112 118 115 118 117
(Ans: X = 57, Y = 117, r = 0.56)
Q4) Calculate the Karl Pearson‟s Coefficient of Correlation by taking actual means 52 & 44
respectively:
X 44 46 46 48 52 54 54 56 60 60
Y 36 40 42 40 42 44 46 48 50 52
(r = 0.95)

Q5) Calculate the Karl Pearson‟s Correlation Coefficient from the following given below:
X Y
Mean 31 61
Standard Deviation 3.25 3.35
Sum of product of deviations of X & Y from their respective means = 75
Number of pairs of X & Y = 10 (Ans: r = 0.69)
Q6) Determine σx σy & the coefficient of correlation between X & Y series:
X Series Y Series
No of Items 15 15
Arithmetic Mean 25 18
Sum of Squares of Deviation 136 138
from mean
Sum of products of deviations of X & Y series from mean = 122 (Ans: r = 0.89)
Short – Cut Method (Assumed Mean)
Short-Cut Method:
r= nΣdxdy – (Σdx)(Σdy)
√[nΣdx2 – (Σdx)2] [nΣdy2 – (Σdy)2]
Where:
x = (X – X)
y = (Y – Y)
Quantitative Techniques - II
Correlation – Module 4

Q7) Calculate Coefficient of Correlation between the values of X & Y given below:
X 78 89 97 69 54 79 60 65
Y 125 137 156 112 107 136 120 110

Sol:
X Dx = X – A(69) Dx2 Y Dy = Y – Dy 2 dxdy
A(120)
78 9 81 125 5 25 45
89 20 400 137 17 289 340
97 28 784 156 36 1296 1008
69 0 0 112 -8 64 0
54 -15 225 107 -13 169 195
79 10 100 136 16 256 160
60 -9 81 120 0 0 0
65 -4 16 110 -10 100 40
39 1687 43 2199 1788

r= nΣdxdy – (Σdx)(Σdy)
√[nΣdx2 – (Σdx)2] [nΣdy2 – (Σdy)2]
r = 8x1788 – (39x43)
√[8x1687 – (39)2] [8x2199 – (43)2]
r= 14304 – 1677
√[13,496 – 1521] [17,592 – 1849]
r = 12,627
√[11,975] [15,743]
r = 12, 627
109.43 x 125.47
r = 12,627 / 13730. 17 = 0.92
Exercise Sums:
Q8) Calculate Karl Pearson‟s Correlation Coefficient between X & Y:
X 58 43 41 39 43 46 43 45 41 47 45 44
Y 11 27 31 42 30 28 28 20 19 20 32 30
(Ans: r = - 0.733)
Q9) Calculate the coefficient of correlation between X & Y variables for the data given below:
X 17 18 19 19 20 20 21 21 22 23
Quantitative Techniques - II
Correlation – Module 4

Y 12 16 14 11 15 19 22 16 15 20
(Ans: r = 0.614)
Q10) Calculate Karl Pearson‟s Coefficient of Correlation between the age of Husband (X) &
wives (Y):
Given: ∑XY = 3040, ∑X = -170, ∑X2 = 8,288 ∑Y = - 20, ∑Y2 = 2,264, N=10 (Ans: 0.78)
Q11) Calculate the number of items for which r = 0.8, ∑XY = 200, Standard deviation of Y = 5
& ∑x2 = 100, where X & Y denote deviations of items from actual mean (where ∑x & ∑y = 0)
(Ans: N=25)
Q12) If Covariance between x & y variables is 9.6 & the variance of x & y are respectively 16 &
9. Find the coefficient of Correlation. (Hint: Covariance = ∑xy/N = 9.6, σ = √variance, r = 0.8)
Q13) The following table gives the distribution of the total population & those who are blind
among them. Find out if there is any relation between age & blindness:
Age 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Population 150 90 50 32 20 12 5 2
(„000)
Blind 90 63 50 40 30 30 20 12

Sol:
Age Mid- Y dx dy Dx2 Dy2 dxdy
Value
(x) (X- A)/10 (Y-A)/5
A = 35 A = 150
0-10 5 90x1,00,000/1,50,0 -3 -18 9 324 54
00
= 60
10-20 15 63x1,00,000/90,000 -2 -16 4 256 32
= 70
20-30 25 50x1,00,000/50,000 -1 -10 1 100 10
= 100
30-40 35 40x1,00,000/32,000 0 -5 0 25 0
= 125
40-50 45 30x1,00,000/20,000 1 0 1 0 0
= 150
50-60 55 30x1,00,000/12,000 2 20 4 400 40
= 250
60-70 65 20x1,00,000/5,000 3 50 9 2500 150
= 400
70-80 75 12x1,00,000/2,000 4 90 16 8100 360
Quantitative Techniques - II
Correlation – Module 4

= 600
∑x = ∑Y = 1755 ∑dx = 4 ∑dy = ∑dx2 = ∑dy2 ∑dxdy =
320 111 44 =11,705 646
(Ans: r = 0.9038)
Q14) From the following data find out if there is any relationship between density of population
& death rate:
District Area (in sq km) Population No of Deaths
A 120 24,000 288
B 150 75,000 1,125
C 80 48,000 768
D 50 40,000 720
E 250 50,000 650
(Hint: X = Population/Area, Y = No of Deaths/Population*1,000) (Ans r = 0.99)
Q15) Calculate Karl Pearson‟s Coefficient of Correlation use 38 as assumed mean for
commodity A & 75 for Commodity B. (Ans r = 0.827)
Months Jan Feb March April May June July Aug Sep Oct
Commodity 35 36 40 38 37 39 41 40 36 38
A
Commodity 65 72 78 77 76 77 80 79 76 75
B
Q16) If Probable Error = .05 & N = 16, Find out the Coefficient of Correlation & points out its
significance.
Sol: P.E = 0.6745 (1-r2)
√N
.05 = 0.6745 (1-r2)
√16
.05 = 0.6745 (1-r2)
4
.05 x 4 = (1-r2)
0.6745
.7034 = r2
r = √.7034 = 0.84
6PE = .05x6 = 0.3
Since r is more than 6 times of probable error the value of r is significant.
Q17) The following table gives the value of X (soil temperature at 4 inch below ground in degree
F) & Y (germination interval in days) for winter wheat at 12 places:
Quantitative Techniques - II
Correlation – Module 4

X 57 42 40 38 42 45 42 44 40 46 44 43
Y 10 26 30 41 29 27 27 19 18 19 31 29
Calculate the correlation coefficient between soil temperature & germination interval & interpret
the results.
Sol:
X Dx = X - Dx2 Y Dy = Y - Dy2 dxdy
44 26
57 13 169 10 -16 256 -208
42 -2 4 26 0 0 0
40 -4 16 30 +4 16 -16
38 -6 36 41 +15 225 -90
42 -2 4 29 +3 9 -6
45 +1 1 27 +1 1 +1
42 -2 4 27 +1 1 -2
44 0 0 19 -7 49 0
40 -4 16 18 -8 64 +32
46 +2 4 19 -7 49 -14
44 0 0 31 +5 25 0
43 -1 1 29 +3 9 -3
∑x = 523 ∑dx = -5 ∑dx2 = 255 ∑y = 306 ∑dy = -6 ∑dy2 = 704 ∑dxdy = -
306

r= nΣdxdy – (Σdx)(Σdy)
√[nΣdx2 – (Σdx)2] [nΣdy2 – (Σdy)2]

r = 12x(-)306 – (-5)(-6)
√[12x255 – (-5)2] [12x704 – (-6)2]
r = -3672 - 30
√[3060 – 25] [8448 – 36]
r = - 3702
√[3035] [8412]
r = -3702/55.09x91.71 = - 3702/5052.38 = - 0.732
PE = .09
6 PE = 6x.09 = 0.54
Since the value of r is more than 6 times of PE the negative correlation is significant.
Q18) calculate the karl pearson‟s coefficient of correlation between the ages of husband‟s &
wives & comment on the result:
Quantitative Techniques - II
Correlation – Module 4

X 20 25 30 35 40 45 50 55 60 65 70
Y 17 24 28 32 35 38 42 51 56 60 62
(Ans: 0.99 PE = .004 Result is significant)
Q19) From the data given below calculate coefficient of correlation & interpret it:
X Y
Number of items 8 8
Mean 68 69
Sum of squares of deviation from mean 36 44
Sum of the product of deviations = 24 (Ans: 0.603 PE = 0.15 Not Significant)
SPEARMAN’s RANKING METHOD:
Q20) Calculate Rank Correlation coefficient from the following data:
Rank 5 3 4 8 2 1 7 10 6 9
X
Rank 3 7 5 9 2 4 1 10 8 6
Y

Sol:
Rank X Rank Y D = Rx - Ry D2
5 3 2 4
3 7 -4 16
4 5 -1 1
8 9 -1 1
2 2 0 0
1 4 -3 9
7 1 6 36
10 10 0 0
6 8 -2 4
9 6 3 9
2
N = 10 ∑D = 0 ∑D = 80

rs = 1 - 6∑D2 =1– 6(80) = 1 – 480 = 1- 0.48 = 0.52


N (N2 – 1) 10(100 – 1) 990

Q21) Find out the coefficient of correlation between X & Y by the method of Rank differences:
X 22 24 27 35 21 20 27 25 27 23
Y 30 38 40 50 38 25 38 36 41 32
Sol:
Quantitative Techniques - II
Correlation – Module 4

X Y Rx Ry D = Rx-Ry D2
22 30 8 9 -1 1
24 38 6 5 +1 1
27 40 3 3 0 0
35 50 1 1 0 0
21 38 9 5 +4 16
20 25 10 10 0 0
27 38 3 5 -2 4
25 36 5 7 -2 4
27 41 3 2 +1 1
23 32 7 8 -1 1
∑D2 = 28

rs = 1 - 6{∑D2 + 1/12(m31 – m1) + 1/12 (m23 – m2)


N (N2 – 1)
Rs = 1 – 6{28 + 1/12 (33 – 3) + 1/12 (33 – 3)
10x99
Rs = 1 – 6(32) rs = 1 – 192/990 = 0.81
990
Q22) Find out the value of rs: (Ans: - 0.54)
X 15 14 25 14 14 20 22
Y 25 12 18 25 40 10 7
Q23) Find the Spearman‟s Rank Coefficient of Correlation between Sales & Profits of the
following 10 firms: (rs = +0.75)
Firms A B C D E F G H I J
Sales 50 50 55 60 65 65 65 60 60 50
Profit 11 13 14 16 16 15 15 14 13 13

Q24) Value of Spearman‟s Rank Correlation Coefficient for a certain pairs of number of
observations was found to be 2/3. The sum of squares of the difference between corresponding
ranks was 55. Find the number of pairs. (Ans: N = 10)
Q25) Find out the coefficient of correlation between X & Y by the method of Rank differences:
X 78 89 97 69 59 79 68 57
Y 125 137 156 112 107 136 123 108 (Ans: rs = 0.95)
Q26) (a) If r = 0.4 Cov (x,y) = 10 & σy = 5, then find the value of σx. (Ans: σx = 5)
Quantitative Techniques - II
Correlation – Module 4

(b) Find the correlation coefficient between x & y when variance of x is 2.25, variance of y is 1
& covariance of x & y is 0.9. (Ans: r = 0.6)
(c) The correlation coefficient & covariance of two variables X & Y are 0.28 & 7.6 respectively.
If the variance of X be 9, find the standard deviation of Y. (Ans: σy = 9)
2
(d) Find R (Spearman‟s Rank Correlation Coefficient) when ∑d = 30 & N = 10. (Ans: r = 0.82)
(e) Find the correlation coefficient if σ2x = 6.25, σ2y = 4 & Cov (x,y) = 0.9. (Ans: r = 0.18)

You might also like