Chapter 3 Multiple Linear Regression - We Use This One
Chapter 3 Multiple Linear Regression - We Use This One
Chapter 3 Multiple Linear Regression - We Use This One
[This handout is just a brief description about the topic. Students are advised to refer the
recommended text book and reading material]
Contents
Y = β0 + β1 X + e
where ,
• β0 is called the intercept
• β1 is called the slope or regression coefficient
• e’s represent the departure of the true line from the observed values.
β0 and β1 are the unknown parameters in the model. They are estimated from the data.
1
An Illustrative Example
Data on the average number of cigarettes (X) smoked per adult in 1980 and the death rate per
million (Y) in 2002 for sixteen countries is taken for illustration.
The question of interest on the above data is whether there is a relationship between the death
rate (Y) and level of smoking (X).
Slope (regression coefficient): If cigarettes smoked increases by 1 unit per year, death rate will
increase by 0.24 units. In other words, if cigarettes smoked increases by 100 units, death rate will
increase by 24 units.
Intercept of 28.31 only has meaning if the range of X values (cigarettes smoked) under study
includes the value of zero. Here zero cigarettes smoked still gives an estimated death rate of 28.3
per million.
Multiple linear Regression (MLR), also known simply as multiple regression, is a statistical
technique that uses 2 or more explanatory variables (X’s ) to predict the outcome of a response
variable (Y). Multiple regression is an extension of linear regression that uses just one
explanatory variable.
ii) If there is a relationship, using the information in the independent variables will improve
our accuracy in predicting values for the dependent variable.
2
iv) Regression analysis also helps to fine-tune manufacturing and delivery processes.
Examples:
i) •The selling price of a house (Y) can depend on the desirability of the location (X1), the
number of bedrooms (X2), the number of bathrooms (X3), the year the house was built
(X4), the square footage of the lot (X5) and a number of other factors.
ii) •The height of a child (Y) can depend on the height of the mother (X1), the height of the
father (X2), nutrition (X3), and environmental factors (X4).
The usual linear regression is a method of measuring the type and magnitude of linear relations
that exist between a dependent variable (Y) and a set of independent/explanatory/predictor
variable (say X1 and X2).
Y = β0 + β 1 X1 + β 2 X2 + e
where
• β0 is called the intercept
• β 1 and β 2 are called partial regression coefficients.
β0 , β 1 and β 2 are the unknown parameters in the model. They are estimated from the data
In addition to assuming a linear form for the model, the random error component ei are assumed
to be
i. independent,
ii. with zero mean and constant variance σ2,
iii. and be normally distributed.
Given a sample of n values of (Y, X1, X2) the sample regression (prediction equation) is
∧ ∧ ∧ ∧
Y = β 0 + β1 X1 + β 2 X 2
∧ ∧ ∧
where β 0 , β1 and β 2 are the estimate of β0 ,β1 and β2 respectively.
∧
Similarly β2 measures the average or expected change in Y when X2 increase by 1 unit while X1
remaining unchanged.
3
Example: Two Independent Variables using “EXCEL Data Analysis”
∧ ∧ ∧ ∧
Let Y = β0 + β1 X1 + β 2 X 2 , be the equation of multiple regression equation
EXCEL can be used to generate the coefficients and measures of goodness of fit for multiple
regression
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.7221
R Square 0.5215
Adjusted R Square 0.4417
Standard Error 47.4634
Observations 15
4
ANOVA
Sources of Significance
variations df SS MS F F
Regression 2 29460.02687 14730.01343 6.538606789 0.012006372
Residual 12 27033.30647 2252.775539
Total 14 56493.33333
∧
β1 = -24.975: sales will decrease, on average, by 24.975 pizza per day for each P1 increase in
selling price, while advertising effects remaining unchanged
∧
β2 = 74.131: sales will increase, on average, by 74.131 pizza per day for each P100 increase in
advertising, while price effects remaining unchanged
Coefficient of Determination
52.1% of the variation in pizza sales is explained by the variation in price and advertising
Clearly there may be other factors that influence the response variable since over 47.85% of the
variability is left unexplained.
5
3. EXERCISES
1. What are the two primary uses for regression in business?
2. What are the assumptions of random error component e?
3. Coefficient of Determination, R2 measures the proportion of variation in Y that is
explained by X, and is often expressed as a percentage. (True/False)
4. Interpret if coefficient of Determination, R2 = 70%
5. In the multiple linear equation
Y = β0 + β 1 X1 + β 2 X2 + e
β 1 and β 2 are called ________________________
∧
6. In Y = 16.4769 + 0.3899 X − 0.6233 X
1 2
interpret 0.3899 and -0.6233.