Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Correlation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Module-3

Correlation

KARL PEARSON’S COEFFICIENT OF CORRELATION


Karl Pearson's coefficient of correlation, also known as Pearson's correlation coefficient or
simply Pearson's r, is a measure of the linear relationship between two variables. It quantifies
the strength and direction of the correlation between two sets of data points.
Karl Pearson's correlation coefficient, denoted by the symbol "r," can range from -1 to +1.
Degree of Correlation

Degrees Positive Negative


Absence of correlation 0 0
Perfect correlation +1 -1
High degree + 0.75 to + 1 - 0.75 to -1
Moderate degree + 0.25 to + 0.75 - 0.25 to - 0.75
Low degree 0 to 0.25 0 to - 0.25

Formulae of Correlation Coefficient


𝐶𝑜𝑣(𝑋, 𝑌)
𝑟(𝑋, 𝑌) =
√𝑉𝑎𝑟(𝑋) ∗ 𝑉𝑎𝑟(𝑌)

Where,

𝐶𝑜𝑣(𝑋, 𝑌) = Covariance of X and Y


𝑉𝑎𝑟(𝑋) = Variance of X
𝑉𝑎𝑟(𝑌) = Variance of Y
Or
∑𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑟(𝑋, 𝑌) =
√{∑𝑛𝑖=1(𝑥𝑖 − 𝑥)2 }{∑𝑛𝑖=1(𝑦𝑖 − 𝑦)2 }

Where,

𝑥 = Mean of x variable.
𝑦̅ = Mean of y variable.
(𝑥𝑖 − 𝑥) = Deviation of each data point of x from the mean.
(𝑦𝑖 − 𝑦) = Deviation of each data point of y from the mean
(𝑥𝑖 − 𝑥̅ )2 = Squared deviation of each data point of x from the mean
(𝑦𝑖 − 𝑦̅)2 = Squared deviation of each data point of y from the mean
Other Formulae of Correlation

𝑛(Σ𝑥𝑦) − (Σ𝑥)(Σ𝑦)
𝑟=
√[𝑛Σ𝑥 2 − (Σ𝑥)2 ][𝑛Σ𝑦 2 − (Σ𝑦)2 ]
Where,

𝑛 = sample size.

Example 1: Evaluate the correlation coefficient for the following data:


𝛴𝑋 = 24; 𝛴𝑌 = 44; 𝑛 = 4; 𝛴𝑋 2 = 164; 𝛴𝑌 2 = 574 and 𝛴𝑋𝑌 = 306.
Solution:
Consider the given data
𝛴𝑋 = 24; 𝛴𝑌 = 44; 𝑛 = 4; 𝛴𝑋 2 = 164; 𝛴𝑌 2 = 574 and 𝛴𝑋𝑌 = 306.
By definition,
𝑛𝛴𝑋𝑌−[𝛴𝑋] [𝛴𝑌]
𝑟=
√[[𝑛𝛴𝑋 2 ]−[𝛴𝑋]2 [𝑛𝛴𝑌 2 ] − [𝛴𝑌]2 ]

[4 ∗ 306] − [24 ∗ 44] 168 168


𝑟= = = = 0.99
√[4 ∗ 164 − [24]2 ] ∗ [4 ∗ 574 − [44]2 ] √(8) ∗ (360) 169.71

The variables are positively related.


Example 2: Find the Karl Pearson’s coefficient of Correlation

X 25 30 36 43
Y 30 44 52 70

Solution:

∑𝑥𝑖
Mean 𝑥 = 𝑛

25 + 30 + 36 + 43
=
4
134
=
4
= 33.5

∑𝑦𝑖
Mean 𝑦 = 𝑛

30 + 44 + 52 + 70
=
4
196
=
4

= 49
𝑥̅ = 33.5 is not an integer, use assumed mean A = 34

𝒅𝒙 = 𝒙 − 𝑨 ̅
𝒅𝒚 = 𝒚 − 𝒚
x y 𝒅𝒙𝟐 𝒅𝒚𝟐 𝒅𝒙. 𝒅𝒚
=x-34 =y-49
25 30 -9 -19 81 361 171
30 44 -4 -5 16 25 20
36 52 2 3 4 9 6
43 70 9 21 81 441 189
134 196 ∑dx=-2 ∑dy=0 𝟐
∑ 𝒅𝒙 =182 𝟐
∑ 𝒅𝒚 =836 ∑dx⋅dy=386

Correlation Coefficient r :
𝑛 ⋅ ∑𝑑𝑥𝑑𝑦 − ∑𝑑𝑥 ⋅ ∑𝑑𝑦
𝑟=
√𝑛 ⋅ ∑𝑑𝑥 2 − (∑𝑑𝑥)2 ⋅ √𝑛 ⋅ ∑𝑑𝑦 2 − (∑𝑑𝑦)2

4 ⋅ 386 − (−2) ⋅ 0
=
√4 ⋅ 182 − (−2)2 ⋅ √4 ⋅ 836 − (0)2

1544 + 0
=
√728 − 4 ⋅ √3344 − 0

1544
=
√724 ⋅ √3344

1544
=
1555.9743

= 0.9923 (High degree of positive correlation)

Example 3: Calculate Karl Pearson’s coefficient of correlation for the following data using 20
as the working mean for price and 70 as the working mean for demand:

Price 14 16 17 18 19 20 21 22 23

Demand 84 78 70 75 66 67 62 58 60

Solution:
Let the variables X and Y refers the level of price and demand, respectively
Price Dema 𝑿−𝒂 𝒀−𝒃 [𝑿 − 𝒂][𝒀 [𝑿 [𝒀
X nd Y − 𝒃] − 𝒂]𝟐 − 𝒃]𝟐

14 84 −6 14 −84 36 196

16 78 −4 18 −32 16 64

17 70 −3 0 0 9 0

18 75 −2 5 −10 4 25

19 66 −1 −4 4 1 16

20 67 0 −3 0 0 9

21 62 1 −8 −8 1 64

22 58 2 −12 −24 4 144

23 60 3 −10 −30 9 100

Total −184 618

Here, 𝑛 = 9
∑𝑛
𝑖=1[𝑋𝑖 −𝑎][𝑌𝑖 −𝑏] −184
𝑟= = = − 0.827520 = − 0.282
√∑𝑛 2 𝑛 2 √80∗618
𝑖=1[𝑋𝑖 −𝑎] ∑𝑖=1[𝑌𝑖 −𝑏]

The correlation value is − 0.282; it implies that the demand and the price are negatively
related.
Example 4: A researcher wants to determine if there is a correlation between the number of
hours students spend studying and their exam scores. The researcher collects data from 10
students and records the number of hours they studied and their corresponding exam scores.
The data is as follows:
Hours Studied: 4, 6, 5, 3, 7, 8, 6, 2, 4, 5
Exam Scores: 70, 75, 68, 62, 80, 85, 77, 60, 72, 68
Calculate the correlation coefficient for this data set. Also calculate the probable error and
interpret it.
(Ans. 0.945)

You might also like