Note Seminar 4 - Topic 3 Correlation and Regression
Note Seminar 4 - Topic 3 Correlation and Regression
Note Seminar 4 - Topic 3 Correlation and Regression
3.1 CORRELATION
Correlation is a statistical method used to determine whether a relationship between variables
exits. The coefficient that can be used to measure the strength of relationship between two
variables is called correlation coefficient. The sample correlation coefficient would take on
values ranging from -1 to 1.
a) Positive correlation
Positive correlation shows the existence of a positive relationship between two variables, x and y.
The direction of change for both variables is the same. If x increases, then y would increase too.
1
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
b) Negative correlation
Negative correlation shows the existence of a negative relationship between two variables x
and y, that is both x and y change in the opposite direction of each other. If x increases, y
would decrease
c) No correlation
No correlation simply means there exist no relationship between two variables, x and y. We
cannot relate the changes that occur between x and y, in any way.
2
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
Example 3.1
Table shows the test scores for finance and mathematics tests for students in a faculty.
Finance (x) 39 12 21 64 57 47 28 75 34
Mathematics(y) 65 35 52 82 92 89 73 98 56
Plot a scatter diagram and determine whether there is a relationship between finance test scores
and mathematics test scores.
Example 3.2
Fuzi Company supplies prawns to restaurants. The demand for prawns depends on the price per
kg. The data are shown below.
Price per kg(RM) 20 22 24 26 28 30 32
Sales (kg) 600 550 480 450 400 330 250
Draw a scatter diagram and state type of relationship between price per kg and the sales.
Correlation Analysis is a statistical method used to measure the strength of the relationship
between two variables. The Coefficient of Correlation (r) is a measure of the strength of the
relationship between two variables. It requires interval or ratio-scaled data. The sample
correlation coefficient would take on values ranging from -1 to 1.
-1 0 1
3
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
x y
xy n
rxy
x 2 x y 2 y
2 2
n
n
Example 3.3
Table shows the test scores for finance and mathematics tests for students in a faculty.
Finance (x) 39 12 21 64 57 47 28 75 34
Mathematics(y) 65 35 52 82 92 89 73 98 56
Calculate the Pearson’s Product Moment Correlation Coefficient and comment on the value
obtained.
4
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
Example 3.4
Fuzi Company supplies prawns to restaurants. The demand for prawns depends on the price per
kg. The data are shown below.
Price per kg(RM) 20 22 24 26 28 30 32
Sales (kg) 600 550 480 450 400 330 250
Calculate the coefficient correlation. What can be said about the relationship?
5
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
Step 2: Calculate the difference, d, between the ranks for each pair of values.
d= rank x – rank y
Step 3 : Determine the value of the Spearman’s coefficient of rank correlation, rs as follows:
6 d 2
rs 1
n (n 1)
2
6
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
Example 3.5
Table shows the test scores for finance and mathematics tests for students in a faculty.
Finance (x) 80 70 60 35 80 48 80
Mathematics(y) 50 60 80 40 75 60 90
Calculate the Spearman’s Coefficient of Rank Correlation and interpret its meaning.
7
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
Example 3.6
Two women rank the eight flower bouquet in a competition as follows:
Flower A B C D E F G H
1st woman 2 5 3 6 1 4 7 8
2nd woman 4 3 2 6 1 8 5 7
Calculate the Spearman’s Coefficient of Rank Correlation and interpret the meaning of the value
obtained.
8
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
Thus the regression equation can be calculated using the following formula
x y
xy n y b x
b a
x 2 n n
x 2
n
Where n = number of observation
Interpretation :
Slope b
If slope has positive value. That implies a positive relationship exist between the two
variables
The specific value b = b indicate that for every unit increase in independent variable (x),
dependent variable (y) would increase by b units.
The specific value b = -b indicate that for every unit increase in independent variable (x),
dependent variable (y) would decrease by b units.
Intercept a
Let a= a, When independent variable (x) =0, the dependent variable(y) would be a.
9
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
Example 3.7
Table gives the daily production quantity and the estimated production cost of Butik Ceria Sdn.
Bhd. This data was collected continuously for a period of 10 days. Using least square method,
find the regression line for cost against quantity. Interpret the meaning of the slope and intercept
of the regression line.
Day Quantity (‘000 unit) Cost (RM’000)
1 10 20
2 13 28
3 20 38
4 16 35
5 17 32
6 15 30
7 18 31
8 14 29
9 11 23
10 12 25
10
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
11
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
Example 3.8
The number of kilograms steam used per month by a chemical plant is thought to be related to
the average ambient temperature (in of) for that month. The past year’s usage and temperature
are shown in the following table. Using least square method, find the regression line for
temperature against steam usage. Interpret the meaning of the slope of the regression line.
12
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
13
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
The coefficient of determination (r2) is the proportion of the total variation in the
dependent variable (Y) that is explained or accounted for by the variation in the
independent variable (X).
It is the square of the coefficient of correlation.
It ranges from 0 to 1.
Example 3.9
The correlation coefficient for the strength of the relationship between the marks of Additional
Mathematics and Physics for 9 students was found to equal 0.8976. Find the coefficeint of
determination and interpret the value.
Solution 3.9
Ccoefficient of determination, r2 = (0.8976)2 = 0.8056
Means that 80.56% of the variation between the variables is explained by the model
while the other 0.10 or 19.44% is caused by random errors. Therefore, the model built is quite
good.
Example 3.10
The correlation coefficient between the marks of 10 students in Malay and English Language
quizzes is 0.4909. Find the coefficient of determination and interpret the value.
14
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
Example 3.11
The table below shows the advertising expenditures and company sales for Exandra Company
over the past 12 weeks.
15
Chapter 3 STA104 : Correlation and Simple Linear Regression DEC2016-APRIL2017
16