Bio2 Module 4 - Multiple Linear Regression
Bio2 Module 4 - Multiple Linear Regression
Bio2 Module 4 - Multiple Linear Regression
Multivariate Analysis
Getu Degu
November 2008
1
Multivariate Analysis
Multivariate analysis refers to the analysis of data that
takes into account a number of explanatory variables
and one outcome variable simultaneously.
2
I) Multiple linear regression
Multiple linear regression (we often refer to this method as
multiple regression) is an extension of the most
fundamental model describing the linear relationship
between two variables.
X1, X2, . . . Xn
to a single response variable, denoted
Y
is described by the linear equation involving several
variables. The general linear equation is:
Where:
The regression coefficients (or b1 . . . bn ) represent the
independent contributions of each explanatory variable to
the prediction of the dependent variable.
3
X1 . . . Xn represent the individual’s particular set of values
for the independent variables.
Assumptions
a) First of all, as it is evident in the name multiple linear
regression, it is assumed that the relationship between the
dependent variable and each continuous explanatory
variable is linear. We can examine this assumption for any
variable, by plotting (i.e., by using bivariate scatter plots)
the residuals (the difference between observed values of
the dependent variable and those predicted by the
regression equation) against that variable. Any curvature in
the pattern will indicate that a non-linear relationship is more
appropriate- transformation of the explanatory variable may
be considered.
4
Predicted and Residual Scores
5
Residual Variance and R-square
6
♣ The R-square value is an indicator of how well the
model fits the data
b) The sum of squares due to regression (SSR) over the total sum
of squares (TSS) is the proportion of the variability accounted for by the
regression. Therefore, the percentage variability accounted for or
explained by the regression is 100 times this proportion.
7
Interpreting the Correlation Coefficient R
8
Choice of the Number of Variables
9
In large samples the omission of non-significant
variables will have little effect on the other regression
coefficients.
Stepwise regression
10
Forward stepwise regression :- The first step in many
analyses of multivariate data is to examine the simple
relation between each potential explanatory variable and the
outcome variable of interest ignoring all the other variables.
Forward stepwise regression analysis uses this analysis as
its starting point. Steps in applying this method are:
11
Backward stepwise regression :
12
Multicollinearity
This is a common problem in many correlation
analyses. Imagine that you have two predictors (X
variables) of a person's height:
13
exists, and it may only manifest itself after several
variables have already been entered into the
regression equation.
14
Example on multiple regression
The following data were taken from a survey of women
attending an antenatal clinic. The objectives of the study
were to identify the factors responsible for low birth weight
and to predict women 'at risk' of having a low birth weight
baby.
Notations: BW = Birth weight (kgs) of the child =X1
15
Number X1 X2 X3 X4 X5 X6
16
1 3.6 170 32 30 800 280
2 2.5 156 35 40 300 255
3 3.0 166 32 40 700 265
4 3.2 168 32 32 900 275
5 2.8 162 28 25 400 268
6 4.0 172 29 27 1000 280
7 3.4 170 27 35 650 276
8 1.8 152 22 40 180 255
9 2.4 156 18 24 250 263
10 3.6 169 24 24 750 278
17
32 3.4 169 29 35 750 275
33 3.8 175 32 45 780 276
34 2.8 167 26 40 330 266
35 3.0 168 36 42 350 275
36 3.3 169 38 50 400 276
37 2.5 160 19 48 250 259
38 1.8 156 17 27 150 253
39 3.6 175 31 40 1200 278
40 3.2 170 29 39 1000 274
18
Answer the following questions based on the above data
19
mother's age of 49 years.
20