Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Module 12. Linear Corr & Reg Analysis

Module 12 covers regression and correlation analysis, focusing on establishing relationships between variables. Students will learn to construct regression lines, formulate predicting equations, and perform linear regression and correlation analysis. Key concepts include scatter diagrams, least squares estimation, and the Pearson and Spearman correlation coefficients.

Uploaded by

Carl John
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 12. Linear Corr & Reg Analysis

Module 12 covers regression and correlation analysis, focusing on establishing relationships between variables. Students will learn to construct regression lines, formulate predicting equations, and perform linear regression and correlation analysis. Key concepts include scatter diagrams, least squares estimation, and the Pearson and Spearman correlation coefficients.

Uploaded by

Carl John
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

CBMEC 107 -Business Statistics Module 12.

Regression and Correlation Analysis

Module 12. REGRESSION AND CORRELATION


ANALYSIS

Overview: In this module we shall establish the association or relationship between


.
variables through the study of regression and correlation analysis.
Learning Outcomes : At the end of this module, students should be able to
 Construct a regression line
 Formulate predicting equation using least squares estimation
 Perform at least simple linear regression and correlation analysis
Indicative Contents:
 Regression Line
 Scatter Diagram
 Least Squares Estimation
 Correlation
 Coefficient of Correlation
 Interpretation of the Coefficient of Correlation
 The Pearson’s Product-Moment Coefficient of Correlation
 Spearman’s Rank-Order Correlation

Module 12. REGRESSION AND CORRELATION ANALYSIS

Regression Analysis
Regression analysis is a statistical technique used for determining the probable form of the
relationship between variables. The ultimate objective when using this method of analysis is usually to
predict or estimate the value of one variable corresponding to a given value of another variable.

Simple regression analysis a form of linear relationship consisting only one independent variable X to
predict dependent variable Y. Objective: To find the possible relationship between two variables X and
Y, where X and Y are paired variables.

Regression Line
We assume that the value of X is known in advance and that the value assumed by Y depends in part on
the particular value of X under consideration while Y called the dependent or response variable, the
variable X whose value is used to help predict the behavior of Y is called the independent or predictor
variable or the regressor.
In the simple linear regression model

Page 1
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

y = a + βx
where a = denotes the intercept and β = the slope of the regression line.
Scatter Diagram

Scatter diagram (also called scatter plots, scatter graphs and correlation chart) are similar to line
graphs. A line graph uses a line on an X-Y axis to plot a continuous function, while a scatter plot uses
dots to represent individual pieces of data.
In statistics, these plots are useful to see if two variables are related to each other. For example, a
scatter chart can suggest a linear relationship (i.e. a straight line).

Scatter diagrams are useful to determine the relationship


between two variables. This relationship can be between
two causes, or a cause and an effect, etc. It can be
positive, negative or no relationship at all. The first
variable is independent, and the second variable depends
on the first.

We draw this graph with two variables. The first variable


is independent and the second variable depends on the
first. (Figure 1)

Figure 1. Scatter Diagram


Types of Scatter Diagram
a) Scatter Diagram with No Correlation
o This diagram is also known as “Scatter Diagram with Zero Degree of Correlation”.
o Here, the data point spread is so random that you cannot draw a line through them.
o Therefore, you can say that these variables have no correlation.

b) Scatter Diagram with Moderate Correlation


o This diagram is also known as “Scatter Diagram with a Low Degree of Correlation”.
o Here, the data points are a little closer and you can see that some kind of relationship exists between
these variables.

c) Scatter Diagram with Strong Correlation


o This diagram is also known as “Scatter Diagram with a High Degree of Correlation”.
o In this diagram, data points are close to each other and you can draw a line by following their pattern.
o In this case, you say that these variables are closely related.
o

Page 2
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

a) No Correlation b) Moderate Correlation c) Strong Correlation

Figure 2. Types of Scatter Diagram


Example 1
Below is the summary of solvent compound and drying time

Amount (ml) 17 11 10 18 20 5 22 14 17 25 22 8 11 18 21
Time 56 50 120 70 80 120 30 45 55 60 64 56 76 48 92
(in minutes)

Solution:
Draw the scatter diagram (Figure 3)

Figure 3. Scatter diagram of hypothetical data on the amount of solve compound


and the drying time.

Least Squares Estimation

The parameters α and β are estimated by the methods of least squares. From the many straight lines
that can be drawn through a scatter diagram, we choose the one that “best fits” the data. The estimated
regression line takes form
y = a + bx

If we let ei denote the vertical distances from a point (x, y) to the estimated regression line, then each
data point satisfies the equation
y = a + bxi + ei
The term ei is called the residual. Figure 2 illustrates this idea.

y
(x3, y3)
(x2, y2) e3
(x1, y1) e2
e1
e5
e4 (x5, y5)

Page 3
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

(x4, y4)
x

Figure 4. The least-squares procedure minimizes the sum of the squares of the residuals e i.
The residual for a data point that lies above the estimated regression line is positive; for the
point that lies below the regression, the residual is negative. If the residuals are summed, the negative
and the positive values will counteract one another and the sum will always be zero.

The estimates for α and β can be solved easily by

a = y - bx

n ∑ xiyi ∑ xi ∑ yi

b=
2
n ∑ xi2 ∑ xi

The graphs of the regression line are shown below, with relative positive, negative, and zero
slopes.

y y y

x x x

Positive slope (b > 0) Negative slope (b < 0) Zero slope (b = 0)

Figure 5. Relative positive, negative, and zero slopes for a regression line.

Example 2
The relationship between energy consumption and household income was studied, yielding the
following data on household income X (in units of P1000/month) and energy consumption Y (in
kilowatt hour/month).

see Data in Table 1 below

Page 4
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

Table 1

Household Household Income (x) Energy Consumption (y)

1 70 200
2 85 175
3 45 100
4 56 120
5 60 80
6 100 350
7 93 255
8 81 400
9 48 70
10 115 450
11 90 320
12 57 125

Summary statistics for these data are:


n = 12

∑x = 900 ∑x2 = 72,974 x = 75 σx = 21.36

∑y = 2,645 ∑y2 = 774,375 y = 220.42 σy = 126.28

∑ xy = 227,045

To estimate the simple linear regression line, we estimate the slope b and the intercept a. These
estimates are

n∑ -∑ ∑ (12) (227,045) - (900)(2,645)


b= = = 5.24
n ∑ x2 - (∑ x)2 (12) (72,974) - (900)

a = y - bx = 220.42 - 5.24 (75) = - 172.58

Hence the estimated regression equation is

y = - 172.58 + 5.24x

The graph of this equation is shown in Figure 6.

Page 5
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

1000
900
800
700
600
500
400
300
200
100 0 50 100 150 200 300 Income

Figure 6. A graph of the estimated line or regression of Y, the energy consumption on X,


the household income.

To predict the energy consumption when the household income is P200, 000 per month, we
substitute the value 200 for x in the equation

y = -172.58+5.24x

to obtain y = -172.58 = 5.24 (200) = 975.42 kilowatts

Correlation
It is desirable to observe and measure the association which occurs between two statistical series. For
example, it is desirable to know whether there is a relationship between changes in the cost of living
and changes in wages; the grades on an examination and the intelligent quotient of a group of students;
and the academic material retained in memory after various intervals of time and many other similar
associated data.

The relationship between two data may be established and measured by means of the correlation
method.

Coefficient of Correlation

The coefficient of correlation is used as the comparative measure of association. The coefficient of
correlation will have the limits 0 to 1.00. The value of 1 or -1 indicates perfect positive or negative
linear relationships, respectively. A value of 0 indicates no linear relationship. When this happens, we
say that X and Y are uncorrected.

The coefficient of correlation will be positive or negative: positive when the compared variables are
directly proportional, i.e., as one variable increases the other variable also increases, or as one variable
decreases the other variable also decreases; negative when the compared variables are inversely

Page 6
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

proportional, that is to say, an increase in the value of X results in a corresponding decrease in the
value of Y. Under these circumstances the line of regression slopes downward.

Figure 7. Perfect Positive Correlation, Figure 8. Perfect Negative Correlation

Figure 9. Uncorrelated, , Figure 10. Uncorrelated,


points indicate a relationship points are randomly scattered
between x and y, but the
relationship is not liner

Interpretation of the Coefficient of Correlation

Table 2

Coefficient of Correlation Interpretation


0 - ±0.20 Negligible relationship
±0.20 - ±0.40 Slight relationship
±0.41 - ±0.70 Moderate relationship
±0.71 - ±0.90 Marked or high relationship
±0.91 - ±1.00 Very high to perfect relationship

Page 7
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

The Pearson’s Product-Moment Coefficient of Correlation

The Pearson’s Product-Moment coefficient of correlation measures the linear relationship of two
variables, defined by

r=
σx σy

where, ∑xy ∑x ∑y
= -
N N N

This standard deviation of x,

∑x2 ∑x 2

σx = -
N
N

The standard deviation for y,

∑y2 ∑y 2

σY = -
N N

Consolidating the formulas,

N ∑xy - (∑x ) (∑y)


r= =
σx σy N ∑x2 - (∑x)2 N∑y2 – (∑y)2

Example 3
Table 3

x Y xy x2 y2
36 21 756 1296 441
42 18 756 1764 324
37 15 555 1369 225
31 11 341 961 121
25 15 375 625 225
28 9 252 784 81
33 10 330 1089 100
28 20 560 784 400
42 16 672 1764 256
39 11 429 1521 121
38 21 798 1444 441

Page 8
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

40 14 560 1600 196


419 181 6384 15001 2931
Computation of the coefficient of correlation:

∑xy ∑x ∑y 6384 419 181


= - = -
N N N 12 12 12

= 532 - (34.92) (15.08)


= 5.41
The standard deviation of x,

∑x2 ∑x 2
15001 419
2

σx = - =
N 12
N 12

= 1250.08 – 1219.41

= 30.67 = 5.54
The standard deviation for y,

∑y2 ∑y 2
2931 181
2

σy = - =
N 12
N 12

= 244.25 – 227.51

= 16.74 = 4.09

Then, the coefficient of correlation is

5.41 5.41
r= = = = 0.24
σx σy (5.54)(4.09) 22.66

There is a slight positive relationship between the two variables.

Spearman’s Rank-Order Correlation

The strength of relationship between two ranked variables can be measured by the Spearman’s Rank-
Order coefficient or correlation, defined by

6 ∑d12

Page 9
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

r=1 -
n (n2-1)
Example 4

The tables below show the score and judge rank of 8 contestants.

Ranks and Score of 8 Contestants

Contestant 1 2 3 4 5 6 7 8
Judge’s Rank 3 1 6 2 4 7 8 5
Score 26 40 52 25 20 60 37 48

Table 5
Ranks and Data in Table 2
Contestant 1 2 3 4 5 6 7 8
Judge’s Rank (xi) 3 1 6 2 4 7 8 5

Score’s Rank (yi) 5 4 2 6 8 1 5 3

Table 6
Differences and Square of Differences for the Contestant Ranks

Contestant xi yi di = x – y di2
1 3 5 -2 4
2 1 4 -3 9
3 6 2 4 16
4 2 6 -4 16
5 4 8 -4 16
6 7 1 6 36
7 8 5 3 9
8 5 3 2 4
Total ∑=110

Substituting values into the formula for r, we have

6 ∑d12
r = 1 -- -
n (n2-1)

6 (110)
r=1- = 1 - 1.309 = - 0.309
n (64-1)
Page 10
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

There is an inverse slight relationship between the judges’ rank and the contestants’ scores.

SUGGESTED ENRICHMENTACTIVITY
 Watch video clips of relevant topics from You-tube or internet.

REFERENCES

Jonathan B. Cabero, Lorina G. Salamat and Antonina C. Sta. Maria (2013). Business Statistics. Anvil
Publishing Inc., Mandaluyong City.
Gerald Keller (2013). Business Statistics. Cengage Learning Asia Pte Ltd., Singapore
Faith B. Basilio et. al. (2003) Fundamentals of Statistics. Trinitas Publishing Inc., Bulacan
Internet-based references

Page 11
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

CB MECH 107: BUSINESS STATISTICS


EXERCISE NO. 12
REGRESSION AND CORRELATION ANALYSIS

Name: ________________________________ Date: __________________


Course and Year: _________________________ Rating: _________________

Direction: Answer the following problems in separate answer sheet. Use black pen only.
PROBLEMS:

1. A student wants to determine how are grades in college algebra and in Statistics related. From a
random of eight students she obtained the following scores.

Student Algebra, X Statistics, Y


1 85 84
2 67 68
3 52 60
4 60 65
5 91 82
6 91 87
7 60 62
8 75 77

Calculate the coefficient of correlation using the Pearson Product Moment coefficient of
correlation.

2. The values in X below are hours spent studying, and the values in Y are grades on a test.

X = {3.2, 3.0, 1.0, 2.5, 1.9, 1.6, 3.1, 3.5, 4.2, 3.0}
Y = { 90, 88, 57, 86, 79, 71, 84, 97, 90, 91}

The Pearson Product Moment Correlation Coefficient of the above data is


A. 0.706
B. 0.737
C. 0.803
D. 0.889

Page 12
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

CB MECH 107: BUSINESS STATISTICS


EXERCISE NO. 12
REGRESSION AND CORRELATION ANALYSIS

Name: ________________________________ Date submitted __________________

SOLUTIONS TO PROBLEMS

Page 13
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

REVIEW QUESTIONS
REGRESSION AND CORRELATION ANALYSIS

Multiple Choice Questions. Encircle the letter of the best answer


1. r2 is known as 9. The lines of regression intersect at the point
A. Coefficient of determination A. (X, Y)
B. Multiple correlation coefficient B. (x, y)
C. Partial correlation coefficient C. (0, 0)
D. Semi-partial correlation coefficient D. (1, 1)
2. The sum of the residuals for all data points that 10. Regression coefficient is independent of
lie in the regression is always A. Origin
A. negative B. Scale
B. positive C. Both origin and scale
C. zero D. Neither origin nor scale
D. none of the above 11. If the two lines of regression are perpendicular
3. Correlation coefficients lie between to each other, the correlation coefficient r = is:
A. -2 and +2 A. 0
B. -1 and +1 B. −1
C. 0 and 1 C. 1
D. -1 and 0 D. Nothing can be said
4. In regression equation y= a+bx, x is called 12. Which of the following statements about outliers
A. regressor is not true?
B. predictor A. Outliers are values very different from the
C. independent variable rest of the data.
D. Any of these B. Influential cases will always show up as
5. The estimation of the linear regression line is outliers.
dependent of C. Outliers have an effect on the mean.
A. origin and intercept D. Outliers have an effect on regression
B. slope and intercept parameters.
C. horizontal and vertical axes 13. What is b0 in regression analysis?
D. point and slope A. The value of the outcome when all the
6. When the compared variables are inversely predictors are 0.
proportional, the line of regression B. The relationship between a predictor and the
A. is horizontal outcome variable.
B. slopes downward C. The value of the predictor variable when the
C. slopes upward outcome is zero.
D. is vertical D. The gradient of the regression line.
7. The least squares method minimizes which of 14. What is the degree of freedom used to calculate
the following? the test statistic t for a correlation test.
A. n (n-1)
A. sum of residuals
B. n − 1
B. sum of squared residuals
C. n − 2
C. sum of squares error
D. n – 3
D. total sum of squares
15. If ρ=0, the lines of regression are:
8. Spearman rank-order correlation is used for
A. Coincident
what type of data?
B. Parallel
A. nominal C. Perpendicular to each other
B. Ordinal D. None of the above
C. Interval
D. Ratio

Page 14
CBMEC 107 -Business Statistics Module 12. Regression and Correlation Analysis

16. The estimate of β in the regression equation D. The distribution of possible εi values have
Y=α+βX+e by the method of least square is: equal variances for all values of x.
A. Biased 19. An investigator reports that the arithmetic mean
B. Unbiased of two regression coefficients of a regression
C. Consistent line is 0.7 and the correlation coefficient is 0.75.
D. Efficient The investigation results are:
17. Which of the following is not one of the A. Valid
assumptions required for the t test for B. Invalid
determining whether the correlation is C. Inconclusive
significant? D. None of these
A. The data are interval or ratio level. 20. Homogeneity of three or more population
B. The variances are equal or σ12= σ22 correlation coefficients can be tested by
C. The two variables are distributed as a A. t-test
bivariate normal distribution. B. Z-test
D. All three are required assumptions. C. χ2-test
18. Which of the following is not true regarding the D. F-test
error term ε? 21. In multiple linear regression analysis, the square
A. Individual values of the error term εi are root of Mean Squared Error (MSE) is called the:
statistically dependent on each other. A. Multiple correlation coefficient
B. For a given value of x, there can exist many B. Standard error of estimate
values of εI. C. Coefficient of determination
C. The distribution of possible εi values for any D. None of these
x value is normal.

PROBLEMS
1. Calculate the test statistic t for a correlation hypothesis test when the sample correlation coefficient
is r = 0.889 and the sample size is n = 10.
A. 5.337
B. 5.491
C. 5.519
D. 5.664

2. Compute the slope of the regression equation based on these sample data.

X = {3.2, 3.0, 1.0, 2.5, 1.9, 1.6, 3.1, 3.5, 4.2, 3.0}
Y = { 90, 88, 57, 86, 79, 71, 84, 97, 90, 91}

A. 9.638
B. 10.144
C. 10.835
D. 11.169

3. Compute the y intercept of the regression equation based on these sample data.
X = {3.2, 3.0, 1.0, 2.5, 1.9, 1.6, 3.1, 3.5, 4.2, 3.0}
Y = { 90, 88, 57, 86, 79, 71, 84, 97, 90, 91}

A. 52.338
B. 54.045
C. 55.159
D. 56.779

Page 15

You might also like