Lesson 3-Multiple Linear Regression
Lesson 3-Multiple Linear Regression
MATH 212
ENGINEERING DATA ANALYSIS
1|Page
ENGINEERING DATA ANALYSIS
Faculty Information:
Getting help
2|Page
ENGINEERING DATA ANALYSIS
TABLE OF CONTENTS
CONTENTS PAGE
Lesson 3 ………..……………………………….. 14
Application 3…………………………………………. 15
References ………………………………………….. 18
3|Page
ENGINEERING DATA ANALYSIS
Learning Outcome:
o Estimate the value of the response variable from a given value of independent
variable.
o Conduct test of hypothesis of the significance about the regression line.
Introduction
In most research problem where regression analysis is applied, more than one
independent variable is needed in the regression model. The complexity of most
scientific mechanism is such that in order to be able to predict an important response,
a multiple regression model is needed. When this model is linear in the coefficients, it
is called multiple linear regression model.
Activity
Analysis
Sketch the graph and the regression line and interpret the model.
Abstraction
From Lesson 2(Simple Linear Regression) we learned that when there is one
independent variable or predictor, the regression equation for predicting y from x is
4|Page
ENGINEERING DATA ANALYSIS
̂
where:
̂ = the predicted value
a = the y-intercept
the expected change in y when changes one unit and remains
constant,
value of the first independent variable,
the expected change in y when changes one unit and remains
constant, and
= the value of the second independent variable.
i = number of observations
The equation for two independent variables can be extended to any number of
independent variables, say, k, such as , the mean of y│
( read as y given ) is given by the multiple
regression model:
̂ +... +
Estimating Coefficients
or
where and are the random error and residual, respectively, associated with
the response
5|Page
ENGINEERING DATA ANALYSIS
SSE = ∑ ∑ .
∑ ∑ ∑ ∑
∑ ∑ ∑ ∑
. . . . .
. . . . .
. . . .
∑ ∑ ∑ ∑
∑
This time, we will only consider two independent variables as example, for ease of
computation using algebraic manual computations where:
∑ ∑ ∑
∑ ∑ ∑ ∑
∑ ∑ ∑ ∑
6|Page
ENGINEERING DATA ANALYSIS
7|Page
ENGINEERING DATA ANALYSIS
The estimated regression equation based on the data represented by the equation
Interpretation:
For every unit change in the ambient temperature, there correspond a 0.39 increase
in average the monthly electric power consumption, holding the number of working
days in a month constant. Likewise, for every increase in the working days in a month
by the company, there is a 10.80 increase in the average monthly power
consumption holding the ambient temperature constant.
b. Estimate the average monthly electric power consumption if the plants average
ambient temperature is 48 and the number of working days in a month is 22.
̂
= 222.48
8|Page
ENGINEERING DATA ANALYSIS
where
∑ ∑ ̂
∑ ̅ =∑ ̂ ̅ +∑ ̂ continues to hold.
To find a statistic that measures how well a multiple regression model fits a set of
data, we use the multiple regression equivalent of , the coefficient of determination
for the straight-line model. Thus, we define the multiple coefficient of determination
, as
∑ ̂
∑ ̅
=1
Thus, = 0 implies a complete lack of fit of the model to the data, and
= 1 implies a perfect fit, with the model passing through every data point.
In general, , and the larger the value of , the better the model fits the data.
9|Page
ENGINEERING DATA ANALYSIS
The fact that is a sample statistic implies that it can be used to make inference
about the utility of the entire model for predicting the population of y values at each
setting of the independent variables.
F=
Conditions:
The analysis of variance table for multiple regression problem provides a test of the
null hypothesis
10 | P a g e
ENGINEERING DATA ANALYSIS
The tail values of the F-distribution are given in Tables . The F-test statistic becomes
large as the coefficient of determination becomes large.
Example 2. From the given data in Example 1, decide, at the 5% significance level,
whether the data provide sufficient evidence to conclude that the ambient temperature
and the monthly consumption and the number of working days in a month (predictor
variables) are useful for predicting the average monthly power consumption(response
variable).
Solution:
k = 2, n = 12
SSE = ∑ ̂ = 2004.7456
11 | P a g e
ENGINEERING DATA ANALYSIS
SST = ∑ ̅ = 6707.667
Finding
∑ ̂
∑ ̅
=1
=1
So
= = 10.56 .
F=
MSR = = = 2351.4607
MSE = = 222.7495
F=
Compare
Since = 4.2565¸
Multiple Correlation
12 | P a g e
ENGINEERING DATA ANALYSIS
The dot after y in the notation separates the dependent variable, y, from the
independent variables, x1, x2, . . . , xk.
where
and are correlation coefficients for the respective variables.
The multiple regression coefficient can assume values from 0 to 1, where 0 indicates
the absence of a linear multiple correlation between y and the independent variables
and 1 indicates a perfect linear multiple correlation in which all of the observed y’s
fall on the regression plane.
For , =
√
For , =
√
For , =√
Variable
Variable y
y 1.000
1.000
1.000
13 | P a g e
ENGINEERING DATA ANALYSIS
The coefficient of multiple determination will be relatively large when the correlation
of each of the predictors with y is large and the correlations among the predictors are
0 or very small.
If correlations exist among some or all of the independent variables, it is usually the
case that
In the latter case, the inclusion of in the regression equation would not account for
any variance in y not already accounted for by and Ideally, you would like to
have predictors that have high correlations with the dependent variable and zero
correlations with each other. Unfortunately in the behavioral sciences, health sciences,
and education, it is difficult to find predictors that meet these criteria. Once you have
found three or four good predictors, it is often difficult to find additional predictors
that are not highly correlated with at least one of the original predictors.
Application
Closure
Congratulations! You have successfully completed the tasks and activities
for Lesson 3. It is expected that your knowledge about correlation and regression will
surely help you in solving other real life problems or practical applications involving
predictions or estimation.
You are almost done with this module. The module summary and assessment will
follow.
14 | P a g e
ENGINEERING DATA ANALYSIS
SUMMARY
15 | P a g e
ENGINEERING DATA ANALYSIS
MODULE ASSESSMENT
1. Regression methods were used to analyze the data from a study investigating
the relationship between roadway surface temperature (x) and pavement
deflection ( y). The data follow :
2. The article “How to Optimize and Control the Wire Bonding Process”
described an experiment carried out to assess the impact of the variables
16 | P a g e
ENGINEERING DATA ANALYSIS
a. Find the regression equation representing the ball bond shear strength in terms
force and temperature.
b. Determine whether the data provide sufficient evidence to conclude that the
force and temperature are useful for predicting the ball bond shear strength at 5%
level of significance.
17 | P a g e
ENGINEERING DATA ANALYSIS
References
Peck, R., Olsen, C. and Devore, J.L. (2012): Introduction to Statistics and Data
Analysis(4th edition). Brooks/Cole/Cengage Learning, 20 Channel
Center Street Boston, MA 02210, USA
Walpole, RE, & Myers, RH.(1993). Probability and Statistics for Engineers and (5th
ed.). Macmillan Publishing Company, New York.
18 | P a g e
ENGINEERING DATA ANALYSIS
19 | P a g e
ENGINEERING DATA ANALYSIS
20 | P a g e
ENGINEERING DATA ANALYSIS
21 | P a g e
ENGINEERING DATA ANALYSIS
22 | P a g e
ENGINEERING DATA ANALYSIS
23 | P a g e
ENGINEERING DATA ANALYSIS
24 | P a g e