Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 3 Multiple Linear Regression - We Use This One

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Handout: Multiple Linear Regression

[Multiple linear regression model and assumptions. Estimation of model parameters.


Model assessment using coefficient of determination]

[This handout is just a brief description about the topic. Students are advised to refer the
recommended text book and reading material]

Contents

1. SIMPLE LINEAR REGRESSION MODEL ...................................................................... 1


2. MULTIPLE LINEAR REGRESSIONS .............................................................................. 2
Purpose of multiple regressions .................................................................................................. 2
OLS Estimators (fitting multiple linear regression model) ......................................................... 3
Interpretation of the coefficients ................................................................................................. 3
Example: Two Independent Variables using “EXCEL Data Analysis” ..................................... 4
Coefficient of Determination ...................................................................................................... 5
3. EXERCISES........................................................................................................................... 6
4. RECOMMENDED TEXTS AND READING MATERIALS ........................................... 6

1. SIMPLE LINEAR REGRESSION MODEL


If there is only one explanatory variable, we have a Simple Linear Regression Model.

Y = β0 + β1 X + e
where ,
• β0 is called the intercept
• β1 is called the slope or regression coefficient
• e’s represent the departure of the true line from the observed values.

β0 and β1 are the unknown parameters in the model. They are estimated from the data.

The random error e, is assumed to have a


(i) independent
(ii) normal distribution
(iii)with constant variance (whatever the value of X)

Given a sample of n values of (Y, X) the sample regression (prediction equation) is


∧ ∧ ∧
Y = β0 + β1 X
∧ ∧
where β 0 and β1 are the estimate of β0 and β1 respectively.

1
An Illustrative Example

Data on the average number of cigarettes (X) smoked per adult in 1980 and the death rate per
million (Y) in 2002 for sixteen countries is taken for illustration.

The question of interest on the above data is whether there is a relationship between the death
rate (Y) and level of smoking (X).

Using the EXCEL Data Analysis we get



Estimated equation: Y = 28.31 + 0.241 X

Interpreting model parameters

Slope (regression coefficient): If cigarettes smoked increases by 1 unit per year, death rate will
increase by 0.24 units. In other words, if cigarettes smoked increases by 100 units, death rate will
increase by 24 units.

Intercept of 28.31 only has meaning if the range of X values (cigarettes smoked) under study
includes the value of zero. Here zero cigarettes smoked still gives an estimated death rate of 28.3
per million.

Remark: These are estimates of coefficients of the regression equation.

Note: The regression of Y on a single independent variable X is often inadequate. Two or


more X’s may be available to give additional information about Y means of a multiple
regression on the X’s.

2. MULTIPLE LINEAR REGRESSIONS


Multiple Linear Regression

Multiple linear Regression (MLR), also known simply as multiple regression, is a statistical
technique that uses 2 or more explanatory variables (X’s ) to predict the outcome of a response
variable (Y). Multiple regression is an extension of linear regression that uses just one
explanatory variable.

Purpose of multiple regressions


i) The purpose of multiple regression is to analyze the relationship between metric (interval,
ratio, and continuous) or dichotomous independent variables (X) and a metric dependent
variable (Y).

ii) If there is a relationship, using the information in the independent variables will improve
our accuracy in predicting values for the dependent variable.

iii) The two primary uses for regression in business are:


(a) Forecasting and
(b) Optimization.

2
iv) Regression analysis also helps to fine-tune manufacturing and delivery processes.

Examples:
i) •The selling price of a house (Y) can depend on the desirability of the location (X1), the
number of bedrooms (X2), the number of bathrooms (X3), the year the house was built
(X4), the square footage of the lot (X5) and a number of other factors.

ii) •The height of a child (Y) can depend on the height of the mother (X1), the height of the
father (X2), nutrition (X3), and environmental factors (X4).

OLS Estimators (fitting multiple linear regression model)

The usual linear regression is a method of measuring the type and magnitude of linear relations
that exist between a dependent variable (Y) and a set of independent/explanatory/predictor
variable (say X1 and X2).

The multiple linear equation is given by

Y = β0 + β 1 X1 + β 2 X2 + e
where
• β0 is called the intercept
• β 1 and β 2 are called partial regression coefficients.

β0 , β 1 and β 2 are the unknown parameters in the model. They are estimated from the data

In addition to assuming a linear form for the model, the random error component ei are assumed
to be
i. independent,
ii. with zero mean and constant variance σ2,
iii. and be normally distributed.

Given a sample of n values of (Y, X1, X2) the sample regression (prediction equation) is

∧ ∧ ∧ ∧
Y = β 0 + β1 X1 + β 2 X 2
∧ ∧ ∧
where β 0 , β1 and β 2 are the estimate of β0 ,β1 and β2 respectively.

Interpretation of the coefficients



β0 is known as intercept of the model

β1 measures the average or expected change in Y when X1 increase by 1 unit while X2
remaining unchanged.


Similarly β2 measures the average or expected change in Y when X2 increase by 1 unit while X1
remaining unchanged.

3
Example: Two Independent Variables using “EXCEL Data Analysis”

A distributor of pizza wants to evaluate factors thought to influence demand

Dependent variable: Pizza sales (units per month)


Independent variables: 1. Price in Pula 2 Advertising (P100’s)

Data are collected for 15 Days


Days Pizza Sales (Y) Price (Pula): (X1) Advertising (P100’s): (X2)
1 350 5.5 3.3
2 460 7.5 3.3
3 350 8.0 3.0
4 430 8.0 4.5
5 350 6.8 3.0
6 380 7.5 4.0
7 430 4.5 3.0
8 470 6.4 3.7
9 450 7.0 3.5
10 490 5.0 4.0
11 340 7.2 3.5
12 300 7.9 3.2
13 440 5.9 4.0
14 450 5.0 3.5
15 300 7.0 2.7

∧ ∧ ∧ ∧
Let Y = β0 + β1 X1 + β 2 X 2 , be the equation of multiple regression equation

Estimating a Multiple Linear Regression Equation

EXCEL can be used to generate the coefficients and measures of goodness of fit for multiple
regression

In EXCEL: Data-> Data analysis ->Regression then follow the instructions

Multiple Regression Output

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.7221
R Square 0.5215
Adjusted R Square 0.4417
Standard Error 47.4634
Observations 15

4
ANOVA
Sources of Significance
variations df SS MS F F
Regression 2 29460.02687 14730.01343 6.538606789 0.012006372
Residual 12 27033.30647 2252.775539
Total 14 56493.33333

Standard P- Lower Upper


Coefficients Error t Stat value 95% 95%
Intercept 306.5262 114.2539 2.6829 0.0199 57.5883 555.4640
X Variable 1 -
(Price in Pula) -24.9751 10.8321 2.3057 0.0398 -48.5763 -1.3739
X Variable 2
(Advertising
(P100’s)) 74.1310 25.9673 2.8548 0.0145 17.5530 130.7089

The estimated equation



Y = 306.5262 − 24.9751 X1 +74.131 X2


β1 = -24.975: sales will decrease, on average, by 24.975 pizza per day for each P1 increase in
selling price, while advertising effects remaining unchanged


β2 = 74.131: sales will increase, on average, by 74.131 pizza per day for each P100 increase in
advertising, while price effects remaining unchanged

Coefficient of Determination

Coefficient of Determination, R2 measures the proportion of variation in Y that is explained by


X, and is often expressed as a percentage.

The R2 value = Regression SS x 100/Total SS = 29460.02687x100/56493.33333 = 52.15%

52.1% of the variation in pizza sales is explained by the variation in price and advertising
Clearly there may be other factors that influence the response variable since over 47.85% of the
variability is left unexplained.

5
3. EXERCISES
1. What are the two primary uses for regression in business?
2. What are the assumptions of random error component e?
3. Coefficient of Determination, R2 measures the proportion of variation in Y that is
explained by X, and is often expressed as a percentage. (True/False)
4. Interpret if coefficient of Determination, R2 = 70%
5. In the multiple linear equation
Y = β0 + β 1 X1 + β 2 X2 + e
β 1 and β 2 are called ________________________

6. In Y = 16.4769 + 0.3899 X − 0.6233 X
1 2
interpret 0.3899 and -0.6233.

4. RECOMMENDED TEXTS AND READING MATERIALS


1. Ama, N.O; Mokgatlhe, L.L., Ramanathan, T.V. and Sediakgotla, K (2008). Introduction to
Statistics. Zebra Publishing (Pty) Ltd, Windhoek, Namibia.

2. Ramanathan, T.V. and Sediakgotla, K. (2003). Lecture Notes on Introductory Business


Statistics. Bay Publishers, Gaborone.

You might also like