Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Correlation Qmt-Students - 13 May 2022

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

CHAPTER 3: PEARSON PRODUCT-MOMENT CORRELATION

COEFFICIENT

Research Problem:

What is the relationship between two variables?

Relationship between hours studying (X)


and grades on a midterm (Y)?

Relationship between self-esteem (X)


and depression (Y)?

How can we explore the relationship between two quantitative variables?


- Graphically – we can construct scatter plot
- Numerically – we can calculate a correlation coefficient and regression
equation.

1
Correlation = Direction (+/-) and strength (-1 to 1) of relationship between
two variables.

EXAMPLE:

What is the relationship between hours studying (X) and scores on a test (Y)?

STUDENT HOURS SCORE


(QMT181)
Nazmi 1 1
Yasmin 1 3
Sheikh 3 2
Syuhada 4 5
Niza 6 4
Sajat 7 5

2
*** You can use a Pearson Correlation to examine a linear relationship
between two interval/ratio variables. Before calculating the Pearson correlation
coefficient, it is a good idea to create a scatterplot.

Scatter Plot/Diagram:
A scatter plot/diagram is a graph in which data points (x,y) are plotted as
individual points on a grid with horizontal axis x and vertical axis y. By
observing the scatter diagram it can be observed if there may be a linear
relationship between the x and y values.

4
e
r
3
o
c
S
2

0
0 1 2 3 4 5 6 7
Hours Studying 3
Characteristics of a Relationship:

Direction

Positive  As X goes up, Y goes up; variables “move” in same direction

Negative  As X goes up, Y goes down; variables “move” in different


directions

25
130000

110000 20

90000
15
Income

Rainfall
70000

50000 10

30000
5
10000
10 12 14 16 18 20 22 24 26
0
Education 0 2 4 6 8 10 12
Hours Spent Outdoors

Positive Negative 4
Form of the Relationship

(a) Linear (b) Non-linear (“curvilinear”)


190

180

170

160

Performance
Weight

150

140

130

120

110

100
58 60 62 64 66 68 70 72 Arousal
Height

Degree/Strength of Relationship

How well do the data fit a specific form?


Typically look for how well data fit a straight line.

5
Pearson Correlation Coefficient:

- Symbol: r
- r can range from -1.0 to +1.0
- Sign (+/-) indicates “direction”
- Value indicates “strength”
- Measures a “linear” relationship only.

Computational Formula:
n xy   x y
Pearson Correlation Coefficient, r =
[n x 2 ( x) 2 ][n y 2 ( y) 2 ]

(a) Direction of relationship between x, y

Positive (+r) = As X goes up, Y goes up


Negative (-r) = As X goes up, Y goes down

6
(b) Strength of a relationship between X, Y

Closer to  1.0, stronger


Closer to 0, weaker

when r = 0  X,Y relationship not defined by a straight line.

Pearson Correlation Coefficient

-1.0 0 +1.0
Perfect No Linear Perfect
Negative Relationship Positive
Relationship Relationship

Example: r = + 0.9  strong positive correlation between x and y variables


Example: r = -0.2  weak negative correlation between x and y variables

7
 Closer to 0 = weaker
 Closer to 1.0 = stronger

 r  0 could mean many things:


- No relationship at all between X & Y
- Non-linear relationship between X & Y
- Outlier may be causing problems

8
EXAMPLE 1
Computing the Pearson r:

HOURS SCORE
STUDENT (X) (Y) X2 Y2 XY
Darwisyah 1 1
Nabila 1 3
Aidil 3 2
Khalif 4 5
Luqman 6 4
Asyraf 7 5
Nurhaliza 8 7
Masidayu 8 8
x = y = x2= y2= xy=

9
n xy   x y
r=
[n x 2 ( x) 2 ][n y 2 ( y) 2 ]

r = ______________

Conclusion: There is a _____________ relationship/correlation between


number of hours studying and scores on the test.

10
Example 2:

Let’s say we want to examine the relationship between years of education and salary. We might expect a linear
positive relationship between these variables (salary is positively correlated with education). In other words, we would
expect that people with more education tend to have higher salaries.

Data for 10 participants:

Years of Salary
education (RM)
16 38000
14 40000
12 40000
14 32000
20 45000
22 50000
12 28000
12 44000
16 40000
16 38000

a) Draw a Scatter Diagram to show the association, if any, between these two variables; can you draw any
conclusion/observation without doing any calculation?
b) Calculate the Pearson product-moment correlation coefficient for the above data. What is the direction and strength
of the relationship?

11
Regression – line of best fit
Sometimes it makes sense to find the Regression Line /Prediction Line /Line of Best Fit /Least Square’s Line /Least Square’s
Regression Line. (They are all the same thing.) This is a line through the scatterplot that minimizes the sum of the squares of how far
vertically the points are from the line.

- Simple regression is used to examine the relationship between one dependent and one independent variable. After performing
an analysis, the regression statistics can be used to predict the dependent variable when the independent variable is known.
- The regression line (known as the least squares line) is a plot of the expected value of the dependent variable for all values of
the independent variable.
- The regression line is the one that best fits the data on a scatterplot.
- Using the regression equation, the dependent variable may be predicted from the independent variable. The slope of the
regression line (b) is defined as the rise divided by the run. The y intercept (a) is the point on the y axis where the regression
line would intercept the y axis. The slope and y intercept are incorporated into the regression equation. The intercept is usually
called the constant, and the slope is referred to as the coefficient.
- In the regression equation, y is always the dependent variable and x is always the independent variable.

12
The following give the equation for the line of best fit for sample data:

y  a  bx
Where,

a
 y
b
 x
b
n  xy   x y
n x   x 
2
n n 2

Coefficient of Determination ( r 2 )
- the overall magnitude of the relationship between two variables.
- the proportion of variation in a dependent variable that is explained by the independent variable.
- the Pearson Correlation Coefficient squared.

13
Example 3
A veterinary science study was conducted to study the weight of Ponies. The question poses was “How much should a healthy Pony
weight?” The follow data was observed and expanded to develop a correlation for the situation. Then it was desired to construct a
line of best fit for the data.

X= _____________________ = independent variable


Y= _____________________ = dependent variable

Age of the pony Weight of the pony


(in months) (in kilograms)

3 60
6 95
12 140
18 170
24 185

a) Draw a Scatter Diagram to show the association,


b) Calculate the Pearson product-moment correlation coefficient for the above data. What is the direction and strength of the
relationship?
c) Find the regression equation.

14

You might also like