Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Pearson's Correlation Coefficient

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Pearson’s Correlation Coefficient

• In this lesson, we will find a quantitative measure to describe


the strength of a linear relationship (instead of using the terms
strong or weak). A quantitative measure is important when
comparing sets of data.
• The strength of a linear relationship is an indication of how
closely the points in a scatter diagram fit a straight line. A
measure of this strength is given by a correlation coefficient.
• There are several ways that this correlation coefficient can be
found. We will be using the Pearson’s product moment
correlation coefficient, which is shortened to Pearson’s
correlation coefficient. It is represented by r.
• Important properties of r:

1. r does not depend on the units or which variable is


chosen as x or y.
2. r always lies in the range [–1, 1].
3. A positive r indicates a positive association between the
variables. A negative indicates a negative association
between the variables.
4. A perfect linear correlation occurs if r = –1 or r = 1, that is,
when the scatter plot points lie on a straight line.
5. r only measures the strength of a linear association
between two variables. Thus, if there is not evidence that
a linear relationship does exist, then it does not make
sense to calculate r, because r would not be meaningful.
• The properties of r and corresponding scatter plots can be
summarized as follows:

r = +1 r = +0.9 r = +0.5 r=0

r = –1 r = –0.9 r = –0.5 r=0

• The following table provides a good indication of the qualitative


description of the strength of the linear relationship and the
qualitative value of r.

Value of r Qualitative Description of the


Strength
–1 perfect negative
(–1, –0.75) strong negative
(–0.75, –0.5) moderate negative
(–0.5, –0.25) weak negative
(–0.25, 0.25) no linear association
(0.25, 0.5) weak positive
(0.5, 0.75) moderate positive
(0.75, 1) strong positive
1 perfect positive

• Just because two variables are highly correlated does not mean
that one necessarily causes the other. Remember, one variable
has to be consistently dependent on the other. Just ask
yourself, would changing x bring about a change in y?
• For example, your fatigue during the summer months may be
influenced by the warm temperature of the day. However, if you
happen to experience a lot of fatigue on some other day, this
will not indicate the temperature level of the day.
• There are several ways to determine the value of r from a data
set.
• Remember that you don't calculate the Pearson's correlation
coefficient unless you have first determined if it is appropriate.
You should first use a scatter plot to establish if the data
indicates a linear relationship. If the data indicates that there
is a possible linear relationship, then the calculation of the r
value is appropriate.
• To compute the value of r, we can use the following formulas:

s xy
r=
∑ ( x − x )( y − y ) or
r=
sxsy
∑(x − x) (y − y)
2 2
s xy = r • s x s y

Even though the first formula is a useful form for manually


finding the value of r, it requires very tedious work. The GDC
can produce a value of r with the push of a few buttons.
• Example 1:
Find the Pearson correlation coefficient for the following set of
data.

x 2 3 4 6 8 9 10
y 21 19 18 17 15 13 12

First we input the data into the GDC and draw a scatter plot to
determine if it is appropriate to use Pearson's correlation
coefficient.

• Clear the home screen by pressing <2nd><mode>


• Press <2nd><0> to access the Catalog menu
• Locate DiagnosticOn
• Press <Enter><Enter>
• Press <Stat>, choose Calc, and then select 4: LinReg(a+bx)
and enter L1,L2
The screen should now display the value of r.

There is also an r2 on the calculator screen. We need to look at


what this r2 represents.
• r2 is called the coefficient of determination.

explained var iation


r2 =
total var iation

• r2 is a proportion whereas r is the square root of a proportion.


• r2 is more informative when interpreting the magnitude of the
relation between two variables, regardless of directionality.
• For two linearly related variables, r2 provides the proportion of
variation in one variable that can be explained by the variation
in the other variable.
• For example, r2 = 0.947 or 94.7% means that approximately
95% of the variation in the variable y can be explained by the
variation in the variable x. The higher this value is the better.
• For example, if r = 0.6, then r2 =0.36 means that only 36% of
the variation in one variable is explained by the variation in the
other variable, so r = 0.6 is not really very good.
• Example 2:
The following table gives the final exam scores and the final
averages for 10 students in biology class.

Final Exam Scores Final Averages


66 73
81 86
75 68
62 71
93 95
82 89
51 48
54 63
50 51
90 94
Enter the data into L1 and L2 in the GDC. When we plot the
scatter diagram on the GDC, we find that it does appear
reasonable to assume that a linear relationship exists.
Therefore, calculating the value of r is appropriate.

Remember, if there does not appear to be a linear relationship,


then calculating the value of r does not make sense.

Next, we need to find r and r2.

Remember that to find r and r2:


• Clear the home screen
• Go to the catalog menu
• Turn on Diagnostics
• Press <Stat>, choose Calc, and select LinReg(a+bx)
• Press <Enter>

We see that r = 0.9507. This indicates a very strong positive


relationship between the final exam grade and the final average
grade.

Also, the r2 = 0.9037953898 tells us that about 90% of the


variation in the final average grade can be accounted for by
the variation in the final exam grade. That is, only
approximately 10% of the variation is attributed to other factors.
• In the next example, the covariance, sxy is given. We are going
to use the formula below to calculate the correlation coefficient
r. Then, we are going to compare this to the coefficient r found
using the graphing calculator.

s xy
r=
sxsy
s xy = r • s x s y
• Example 3:
The following table represents the final averages of ten
students in math and science. Given that the covariance,
sxy = 465.66, calculate the product moment correlation
coefficient, r, correct to 2 decimal places.

First we need to make sure that there appears to be a linear


relationship between math and science grades. We enter the
grades in the GDC and use a scatter diagram to decide this.

Then, find the standard deviations for x and y. Use <Stat>,


Calc, and then select 2–Var Stats L1,L2.

WARNING!
We want sx and sy, but in the calculator we will use the σ x and
σy .

σx ≈

σy ≈

s xy
r= = _____________________ ≈
sx sy

Now, use the GDC to check this value of r. Use Stat, Calc, and
LinReg.
Comments about the findings:

• There is strong evidence from a scatter diagram that a


positive linear relationship exists between the final averages
for math and science.
• The value of the correlation coefficient is _______________.
• This value of r indicates that there is a strong correlation
between math and science final averages. This means that
those students who do well in mathematics are likely to also
do well in science.
• The value of r2 is _______________, which indicates that
only _______________% of the variation in the science
averages can be attributed to the variation in the math
averages.

You might also like