Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
65 views

8multiple Linear Regression

This document provides an overview of multiple linear regression analysis. It defines key terms like correlation and regression analysis. It outlines the assumptions of multiple linear regression like independence of observations, linearity, and normality. It also describes how to interpret the outputs of multiple linear regression in SPSS, including checking model fit with R-squared, testing assumptions with residual plots, and interpreting regression coefficients. Examples are provided of SPSS outputs and how to report results in APA format.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
65 views

8multiple Linear Regression

This document provides an overview of multiple linear regression analysis. It defines key terms like correlation and regression analysis. It outlines the assumptions of multiple linear regression like independence of observations, linearity, and normality. It also describes how to interpret the outputs of multiple linear regression in SPSS, including checking model fit with R-squared, testing assumptions with residual plots, and interpreting regression coefficients. Examples are provided of SPSS outputs and how to report results in APA format.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Virtual Learning Environment - New Era University

College of Engineering and Architecture

Module Nine: Multiple Linear


Regression

Luzale D. Henson, R.E.E., M.B.A.,Ph.D.


Module Nine: Multiple Linear
Regression

LEARNING OUTCOMES
When you have completed this chapter, you will be able to:

1. Check the assumptions required to carry out multiple linear


regression
2. Interpret the results from the multiple regression procedure.
3. Interpret scatterplots and partial regression plots,
4. Interpret the generated tables from SPSS output.
Definition of Terms
Correlation – is a statistical tool to measure the association of two or more
quantitative variables. It is concerned with the relationship in the change and
movements of two variables. It is also defined as the measure of linear relationship
between two random variables X and Y and is denoted by r. It measures the extent
to which the points cluster about a straight line.

Regression analysis - is a statistical process for estimating the relationships among


variables. It includes many techniques for modeling and analyzing several variables,
when the focus is on the relationship between a dependent variable and one or
more independent variables.

Regression analysis is widely used for prediction and forecasting, where its use
has substantial overlap with the field of machine learning. Regression analysis is also
used to understand which among the independent variables are related to the
dependent variable, and to explore the forms of these relationships. In restricted
circumstances, regression analysis can be used to infer causal relationships between
the independent and dependent variables. However, this can lead to illusions or false
relationships, so caution is advisable; For example, correlation does not imply
causation.
Multiple Linear Regression

 Used to determine how a dependent variable y is


related to two or more independent variables.
 The multiple regression model takes the form
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 +…+ 𝛽𝑝 𝑥𝑝 + 𝜀
 The estimated multiple regression equation is
𝛾 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 +…+ 𝑏𝑝 𝑥𝑝

Note: In this topic, we focused on how to interpret the


SPSS output rather than on how to make the multiple
regression computations.
12-19

Assumptions Underlying Multiple


Linear Regression
 Dependent variable is measured at the continuous level, if not, use
ordinal regression
 At least two or more independent variables which can either be
continuous or categorical.
 Independence of observations (use Durbin-Watson statistic).
 There needs to be a linear relationship between dependent variable
and each of the independent variable (use scatterplot)
 No significant outliers, high leverage points or highly influential points
 The standard deviations of these normal distributions are equal
(homoscedasticity).
 For each value of X, there is a group of Y values, and these Y values
are approximately normally distributed.
 The means of these normal distributions of Y values all lie on the
straight line of regression.
 Data must not show multicollinearity (high correlation between 2 or
more variables.) r-values of .9 or higher means redundant predictor
varables
 Sample size (15 observations per predictor variable).
12-22
Use step-wise as method as it chooses which IV
In SPSS has the highest correlation and does it sequentially
looking for the next predictor and does semi-
Analyze->Regression->Linear partial correlation with the DV.
12-22

In SPSS
Multiple Linear Regression
 A model fits the data well if the differences between the
observed values and the model’s predicted values are
small (check residual plots).
 R-squared (between 0-100%) reveals how close the data
to the filled regression line. The higher the R-squared,
the better the model fits your data but it cannot
determine whether coefficient estimates and predictions
are biased and does not indicate whether a regression
model is adequate.
 Partial correlations accounts the influence of the control
variables on both dependent and independent variables
 Part correlations takes into account the influence of
control variables on the independent variable
 Collinearity diagnostics implies that 2 predictor variables
are near perfect linear combinations of one another
12-22

This tests the assumptions of heteroscedasticity

In SPSS
Multiple Linear Regression

 Zresid (standardized residuals)


 Zpred (unstandardized predicted values)
 Both checks linearity and homogeneity of error
variance, and outliers (Zresid)
In SPSS
Normlity Test:
Go to Analyze→Descriptives→Explore
In SPSS
12-22

In SPSS

It appears competing scores has a negative Pearson correlation while variable


areaSizeper1000families has the highest r-value.
12-22

In SPSS

Adjusted R square takes into account the sample size. Durbin-Watson which
test the hypothesis that there might be a serial correlation in the data. As a
rule of thumb, if the value is not within 1.5-2.5, then there is no meaningful
serial correlation.
12-22

In SPSS

The most important here is row 5 (model 5) as it takes into account the 5 IV
with p<0.001, F(5,21)=611.59
12-22

In SPSS

A coefficient for areasizeper000families (model 1)means that for one unit increase in
areasizeper1000families, we would expect a 35.635 increase in
annualnetsalesper1000usd. Also, from standardized coefficients beta, a one sd increase
in areasizeper1000families leads to .954 sd increase in annualnetsalesper1000usd
12-22

In SPSS
In SPSS
In SPSS
In SPSS
12-22

In SPSS

To report the result in APA format:


A multiple regression was run to predict <DV> from <IV1, IV2, IV3 ....>. These
variables statistically predicted the DV, F(df1,df2)=f-value, p<0.05, R2 = __. All
IV added statistically significantly to the prediction, p<0.05

You might also like