Correlation
Correlation
Correlation
Correlation
Where,
Where,
𝑥 = Mean of x variable.
𝑦̅ = Mean of y variable.
(𝑥𝑖 − 𝑥) = Deviation of each data point of x from the mean.
(𝑦𝑖 − 𝑦) = Deviation of each data point of y from the mean
(𝑥𝑖 − 𝑥̅ )2 = Squared deviation of each data point of x from the mean
(𝑦𝑖 − 𝑦̅)2 = Squared deviation of each data point of y from the mean
Other Formulae of Correlation
𝑛(Σ𝑥𝑦) − (Σ𝑥)(Σ𝑦)
𝑟=
√[𝑛Σ𝑥 2 − (Σ𝑥)2 ][𝑛Σ𝑦 2 − (Σ𝑦)2 ]
Where,
𝑛 = sample size.
X 25 30 36 43
Y 30 44 52 70
Solution:
∑𝑥𝑖
Mean 𝑥 = 𝑛
25 + 30 + 36 + 43
=
4
134
=
4
= 33.5
∑𝑦𝑖
Mean 𝑦 = 𝑛
30 + 44 + 52 + 70
=
4
196
=
4
= 49
𝑥̅ = 33.5 is not an integer, use assumed mean A = 34
𝒅𝒙 = 𝒙 − 𝑨 ̅
𝒅𝒚 = 𝒚 − 𝒚
x y 𝒅𝒙𝟐 𝒅𝒚𝟐 𝒅𝒙. 𝒅𝒚
=x-34 =y-49
25 30 -9 -19 81 361 171
30 44 -4 -5 16 25 20
36 52 2 3 4 9 6
43 70 9 21 81 441 189
134 196 ∑dx=-2 ∑dy=0 𝟐
∑ 𝒅𝒙 =182 𝟐
∑ 𝒅𝒚 =836 ∑dx⋅dy=386
Correlation Coefficient r :
𝑛 ⋅ ∑𝑑𝑥𝑑𝑦 − ∑𝑑𝑥 ⋅ ∑𝑑𝑦
𝑟=
√𝑛 ⋅ ∑𝑑𝑥 2 − (∑𝑑𝑥)2 ⋅ √𝑛 ⋅ ∑𝑑𝑦 2 − (∑𝑑𝑦)2
4 ⋅ 386 − (−2) ⋅ 0
=
√4 ⋅ 182 − (−2)2 ⋅ √4 ⋅ 836 − (0)2
1544 + 0
=
√728 − 4 ⋅ √3344 − 0
1544
=
√724 ⋅ √3344
1544
=
1555.9743
Example 3: Calculate Karl Pearson’s coefficient of correlation for the following data using 20
as the working mean for price and 70 as the working mean for demand:
Price 14 16 17 18 19 20 21 22 23
Demand 84 78 70 75 66 67 62 58 60
Solution:
Let the variables X and Y refers the level of price and demand, respectively
Price Dema 𝑿−𝒂 𝒀−𝒃 [𝑿 − 𝒂][𝒀 [𝑿 [𝒀
X nd Y − 𝒃] − 𝒂]𝟐 − 𝒃]𝟐
14 84 −6 14 −84 36 196
16 78 −4 18 −32 16 64
17 70 −3 0 0 9 0
18 75 −2 5 −10 4 25
19 66 −1 −4 4 1 16
20 67 0 −3 0 0 9
21 62 1 −8 −8 1 64
Here, 𝑛 = 9
∑𝑛
𝑖=1[𝑋𝑖 −𝑎][𝑌𝑖 −𝑏] −184
𝑟= = = − 0.827520 = − 0.282
√∑𝑛 2 𝑛 2 √80∗618
𝑖=1[𝑋𝑖 −𝑎] ∑𝑖=1[𝑌𝑖 −𝑏]
The correlation value is − 0.282; it implies that the demand and the price are negatively
related.
Example 4: A researcher wants to determine if there is a correlation between the number of
hours students spend studying and their exam scores. The researcher collects data from 10
students and records the number of hours they studied and their corresponding exam scores.
The data is as follows:
Hours Studied: 4, 6, 5, 3, 7, 8, 6, 2, 4, 5
Exam Scores: 70, 75, 68, 62, 80, 85, 77, 60, 72, 68
Calculate the correlation coefficient for this data set. Also calculate the probable error and
interpret it.
(Ans. 0.945)