6. Simple and Multiple Regression
6. Simple and Multiple Regression
Regression Analysis
• Independent variables: This is the variable that explains the other explain
variable . Its values are not dependent, called independent variable. We
denote it as X.
5
Regression Analysis
LINEAR REGRESSION: The line of Regression is the graphical or
relationship representation of the best estimate of one variable for any given
value of the other variable.
If X and Y are two variables of which relationship is to be indicated, a line that
gives best estimate of Y for any value of X, it is called Regression line of Y on X.
6
Assumptions Underlying Linear Regression
For each value of X, there is a group of Y values, and these
1. Normality: For any fixed value of X, Y is normally distributed.
The means of these normal distributions of Y values all lie on the
straight line of regression. The standard deviations of these normal
distributions are equal.
2. Independence: The Y values are statistically independent. This means that
in the selection of a sample, the Y values chosen for a particular X value do
not depend on the Y values for any other X values.
7
Assumptions Underlying Linear Regression
The least square method of fitting a line of best fit requires minimizing the
sum of the squares of vertical deviations of each observed Y value from the
fitted line.
^ Y 2 aY bXY
s y.x
(Y Y ) 2 s y.x
n 2 n 2
16
Finding the Regression Equation - Example
19
Standard Error of the Estimate - Example
20
Standard Error of the Estimate
Graphical Illustration of the Differences between Actual Y – ^
Estimated Y (Y Y )
22
PROPERTIES OF REGRESSION COEFFICIENTS
1. The slope of regression line is called the regression coefficient. It tells the
effect on dependent variable if there is a unit change in the independent
variable. Since for a paired data on X and Y variables, there are two
regression lines: regression line of Y on X and regression line of X on Y, so
we have two regression coefficients:
The signs of both the regression coefficients are the same, and so the value of r
will also have the same sign.
PROPERTIES OF REGRESSION COEFFICIENTS
Problem
Problem
Multiple Regression Analysis
28
Multiple Regression Analysis
29
Regression Plane for a 2-Independent Variable Linear
Regression Equation
30
Problem
• Here,
• Z = Financial distress
X1(Liquidity Ratio) = Net Working Capital/ Total Assets
X2(Profitability Ratio) = Retained Earnings/ Total Assets
X3(Return on Assets)= EBIT/ Total Assets
X4(Solvency Ratio)= Market Value of Equity/ Book Value of
Total Liabilities
Multiple Linear Regression - Example
33
The Multiple Regression Equation – Interpreting the
Regression Coefficients
35
Multiple Standard Error of Estimate
36
Multiple Regression and Correlation Assumptions
37
The ANOVA Table
38
Minitab – the ANOVA Table
39
Coefficient of Multiple Determination (r2)
40
Minitab – the ANOVA Table
SSR 171,220
R2 0.804
SS total 212,916
41
Adjusted Coefficient of Determination
42
Adjusted Coefficient of Determination
43
Evaluating the
Assumptions of Multiple Regression
44
Analysis of Residuals
45
Scatter Diagram
46
Residual Plot
47
Distribution of Residuals
Both MINITAB and Excel offer another graph that helps to evaluate the
assumption of normally distributed residuals. It is a called a normal probability
plot and is shown to the right of the histogram.
48
Multicollinearity
•The term R2j refers to the coefficient of determination, where the selected
independent variable is used as a dependent variable and the remaining
independent variables are used as independent variables.
•A VIF greater than 10 is considered unsatisfactory, indicating that independent
variable should be removed from the analysis.
50
Multicollinearity – Example
Refer to the data in the table,
which relates the heating
cost to the independent
variables outside
temperature, amount of
insulation, and age of
furnace.
Develop a correlation matrix
for all the independent
variables.
Does it appear there is a
problem with
multicollinearity?
Find and interpret the
variance inflation factor
for each of the
independent variables.
51
Correlation Matrix - Minitab
52
VIF – Minitab Example
Coefficient of
Determination
54
Residual Plot versus Fitted Values
• The graph below shows the
residuals plotted on the
vertical axis and the fitted
values on the horizontal axis.
• Note the run of residuals
above the mean of the
residuals, followed by a run
below the mean. A scatter plot
such as this would indicate
possible autocorrelation.
55
Qualitative Independent Variables
• Frequently we wish to use nominal-scale
variables—such as gender, whether the home has
a swimming pool, or whether the sports team
was the home or the visiting team—in our
analysis. These are called qualitative variables.
• To use a qualitative variable in regression
analysis, we use a scheme of dummy variables in
which one of the two possible conditions is coded
0 and the other 1.
56