BA Module 5 Summary
BA Module 5 Summary
Multiple Regression
→ We use single variable linear regression to investigate the relationship between a dependent variable and one
independent variable.
• A coefficient in a single variable linear regression characterizes the gross relationship between the
independent variable and the dependent variable.
→ We use multiple regression to investigate the relationship between a dependent variable and multiple
independent variables.
→ Forecasting with a multiple regression equation is similar to forecasting with a single variable linear model.
However, instead of entering only one value for a single independent variable, we input a value for each of the
independent variables.
→ As with single variable linear regression, it is important to evaluate several metrics to determine whether a multiple
variable linear regression model is a good fit for our data.
• For multiple regression we rely less on scatter plots and more on numerical values and residual plots because
visualizing three or more variables can be difficult.
2
→ Because R never decreases when independent variables are added to a regression, it is important to multiply it by
an adjustment factor when assessing and comparing the fit of a multiple regression model. This adjustment factor
2
compensates for the increase in R that results solely from increasing the number of independent variables.
2
• Adjusted R is provided in the regression output.
2 2
• It is particularly important to look at Adjusted R , rather than R , when comparing regression models with
different numbers of independent variables.
2
→ In addition to analyzing Adjusted R , we must test whether the relationship between the independent and
dependent variables is linear and significant. We do this by analyzing the regression’s residual plots and the p-
values associated with each independent variable’s coefficient.
→ For multiple regression models, because it is difficult to view the data in a simple scatter plot, residual plots are an
indispensable tool for detecting whether the linear model is a good fit.
• There is a residual plot for each independent variable included in the regression model.
• We can graph a residual plot for each independent variable to help detect patterns such as
heteroskedasticity and nonlinearity.
• As with single variable regression models, if the underlying multiple relationship is linear, each of the residuals
follows a normal distribution with a mean of zero and fixed variance.
→ We should also analyze the p-values of the independent variables to determine whether there is a significant
relationship between the variables in the model. If the p-value of each of the independent variables is less than
0.05, we conclude that there is sufficient evidence to say that we are 95% confident that there is a significant linear
relationship between the independent and dependent variables.
→ Multiple regression requires us to be aware of the possibility of multicollinearity among the independent variables.
• Multicollinearity occurs when there is a strong linear relationship among two or more of the independent
variables.
• Indications of multicollinearity include seeing an independent variable’s p-value increase when one or more
other independent variables are added to a regression model.
• We may be able to reduce multicollinearity by either increasing the sample size or removing one (or more) of
the collinear variables.
EXCEL SUMMARY
Recall the Excel functions and analyses covered in this course and make sure to familiarize yourself with all of the
necessary steps, syntax, and arguments. We have provided some additional information for the more complex
functions listed below. As usual, the arguments shown in square brackets are optional.