Regression 1.2 Regression Analysis 1.2.1 Introduction To Regression Analysis
Regression 1.2 Regression Analysis 1.2.1 Introduction To Regression Analysis
The goal in regression analysis is to develop a statistical model that can be used
to predict the values of a dependent variable based on the values of at least one
independent variable.
Example:
To describe the effect of income on expenditure.
To increase the export of rubber by controlling other factors such as
price.
To predict the price of houses based on lot size and location.
SLR Model is a basic regression model where there is only one independent
variable and one response variable.
Modeling
Simple Linear Regression Model : Yˆ a bX
o
Multiple Linear Regression Model: Yˆ a bX cX
o 1 2
The bigger the value of r2 the better the model fit the data. (However, you should
not use r2 as the only criteria in determining how well the model fit the data. For
example: Adjusted R2).
Coefficient of Determination
r2 r
2
2
S xy
S xx S yy
2
x y
xy n
x 2 y 2 y 2
x 2
n
n
0 ≤ r2 ≤ 1
3
E(Y) = 25 + 0.5X
25
β0 = 25
Test Marks
β0 = 25 (Y-intercept)
β1= 0.5 (slope)
For every 1-mark increase in Test Marks, the average Final Marks increases by
0.5.
Or
For every 10 marks increase in Test Marks, the average Final Marks increases
by 5.
Yˆ a bX
x y
xy
n xy x y
n
b OR
x 2 x
2
2
n x 2 x
n
a y b x OR a
y b x
n n
Example 2
4
A recent magazine article listed the "Best Small and Medium Companies" for the year 2006. A random
sample of 8 companies was selected, and the sales and earnings, in million ringgit, are reported below:
Earnings
Sales
(RM
(RM million)
million)
89.2 4.9
18.6 4.4
18.2 1.3
71.7 8
58.6 6.6
46.8 4.1
17.5 2.6
11.9 1.7
i) Construct a scatter diagram. Hence, describe the relationship that might exist
between earnings and sales based on the scatter diagram.
There exists a strong positive linear relationship between Sales (X) and Earnings
(Y).
5
r
XY X Y
n
n X X n Y Y
2 2 2 2
81760.55 332.5 33.6 2912.4
r
819846.79 332.5 8179.08 33.6
2 2 48218.07 303.68
r 0.7611
iii) Compute the coefficient of determination and interpret the value obtained.
r 2 0.5793
This indicates that Sales(X) can explain 57.93% of the total variation in
Earnings (Y).
iv) Determine the regression equation by using the least squares method.
b
n XY X Y 81760.55 332.5 33.6 0.0604
n X X 8 19846.79 332.5
2 2 2
a
Y b X 33.6 0.0604 332.5 1.6897
n n 8 8
ˆ a bX
Y
ˆ 1.6897 0.0604X
Y
X 50
ˆ 1.6897 0.0604 50
Y
ˆ 4.7097
Y
RM 4.7097 million
TUTORIAL
7
8 64
10 86
12 92
2. The following data represent birth weights (oz) of babies and their percentage
increase between 70 and 100 days after birth.
School A B C D E F G H I J
Research
dollars 1 2 4 6 3 5 9 7 10 8
Football
rankings 4 5 3 1 9 7 6 8 2 10
4. The data in table below illustrate 10 students’ result for 2 tests. Calculate the
Spearman’s Rank Correlation Coefficient and interpret your result.
Student 1 2 3 4 5 6 7 8 9 10
English
Test A+ B- C A C A- B+ B C A
Mandari
n Test B A C B B+ C A+ A- B- A