A Tutorial On How To Run A Simple Linear Regression in Excel
A Tutorial On How To Run A Simple Linear Regression in Excel
The objective of this tutorial is to demonstrate how to run a simple linear regression using Excel
2013 and interpret the regression results. This guideline is very useful for your group project
FICB 213 Corporate Finance.
b = Beta, the coefficient of X; the slope of the regression line; how much Y changes
for each one-unit change in X.
For example, if you want to examine how leverage affect dividend => then leverage is called the
independent variable X while dividend will be the dependent variable Y.
The main reasons to run a simple regression between X and Y are as follows:
1. To find out the relationship between X and Y. Is there any significant relationship
between X and Y? If there is a relationship between X and Y, is it a positive or negative
relationship?
Whether there is a positive or a negative relationship between X and Y is given by the
value of b or beta.
For example if the equation result is Y = 0.2 + 0.5 X
Here, b= +0.5 ; this means the higher value of X then the higher value of Y will be; or X
has a positive relationship with Y.
But, on the other hand, if the equation result is Y = 0.2 - 0.5 X
Here b= -0.5 ; this means the higher value of X then the lower value of Y will be; or X
has a negative relationship with Y.
The relationship between variable X and Y is given by the value of the correlation
coefficient R. For example if the value or R = +0.70 and the p-value is lower than 0.05, it
means that the variable X is positively correlated to variable Y at 5 percent significant
level. However, if for example R = -0.70, it means that the variable X is negatively
correlated to variable Y at 5 percent significant level. If R close to zero, e.g R = 0.00001
2. To find out how much independent variable X influence the dependent variable Y. The
percentage of how much variation in X influence the variation of Y is given by the value
of R-squared or R2.. For example if the R2. = 0.20 then it means the variation in Y is
explained by only 20 percent of the variation in the variable X. The balance 80 percent
could be due to other factors or unexplained variables.
3. To predict value of Y when you know X. For example if the equation result is Y = 0.2 +
0.5 X. Then if you know X= 30 then putting into the equation we can predict the value of
Y = 0.2 + 0.5(30) = 15.2.
The example data in Table 1 are plotted in Figure 1. You can see roughly that there is a
positive relationship between X and Y.
If you were going to predict Y from X, we need to find the regression line of predicted Y.
Table 1. Example data.
X
1.00
1.00
2.00
2.00
3.00
1.30
4.00
3.75
5.00
2.25
Linear regression consists of finding the best-fitting straight line through the points. The
best-fitting line is called a regression line. The black diagonal line in Figure 2 is the regression
line and consists of the predicted score on Y for each possible value of X. The vertical lines from
the points to the regression line represent the errors of prediction. As you can see, the red point is
very near the regression line; its error of prediction is small. By contrast, the yellow point is
much higher than the regression line and therefore its error of prediction is large.
Regression
Line
Y'
Y-Y'
(Y-Y')2
1.00
1.00
1.210
-0.210
0.044
2.00
2.00
1.635
0.365
0.133
3.00
1.30
2.060
-0.760
0.578
4.00
3.75
2.485
1.265
1.600
5.00
2.25
2.910
-0.660
0.436
The last column in Table 2 shows the squared errors of prediction. The sum of the squared errors
of prediction shown in Table 2 is lower than it would be for any other regression line.
The formula for a regression line is
Y' = a + bX
where Y' is the predicted score, b is the slope of the line, and A is the Y intercept. The equation
for the line in Figure 2 is
Y' = 0.785 + 0.425X
For X = 1,
Y' = 0.785 +(0.425)(1) = 1.21.
For X = 2,
Y' = 0.785 + (0.425)(2) = 1.64.
A REAL EXAMPLE
The case study "SAT and College GPA" was conducted in the U.S contains high school and
university grades for 105 computer science majors at a local state school. We now consider how
we could predict a student's university GPA if we knew his or her previous high school GPA.
Figure 3 shows a scatter plot of University GPA as a function of High School GPA. You can see
from the figure that there is a strong positive relationship. The correlation is 0.78. The regression
equation is
University GPA' = 1.097 + (0.675)(High School GPA)
Therefore, a student with a high school GPA of 3 would be predicted to have a university GPA of
University GPA' = 1.097 + (0.675)(3) = 3.12.
To run a regression in Excel you need to install a Data Analysis menu into the Excel
spreadsheet. Instructions on how to install a Data Analysis menu is given in the Appendix.
If you open the Excel spreadsheet and click the DATA menu and if you see the Data Analysis
menu on the top right side like below printscreen, then you can start running the regression.
To run a regression, click Data Analysis and choose Regression and click OK as below:
The following screen below will appear when you click OK. Now you have to decide
what is Input Y Range and what is Input X Range. Y should be a dependent
variable and X should be the independent variable. If you want to find out how
Leverage (Debt Ratio) affect Dividend Policy then Leverage (Debt Ratio) should be
your X variable and Dividend Policy (you can choose whether to use DPS, Dividend
Payout or Dividend Yield) as Y variable. You may try all possible combinations
because it could happen that some of your regressions may produce nonsignificant results thus you cannot conclude anything. We can conclude something if
the regression results are significant at 5 percent level.
If you choose DPS as your Y range then click on the button and use mouse to
highlight the range of DPS data e.g. from D2 to D31 and click again the button.
Repeat the same procedure for X range which here you can choose Debt/Equity
Ratio as your X variable. Click tick on Line Fits Plots as it will show you the
regression line. Then click OK.
The following results will appear. Now what you need to do is to interpret the
regression results below.
Remember our X variable is Debt/Equity ratio and Y variable is Dividend Per Share.
The orange line in the graph represents the regression line which shows a positive
linear relationship between X and Y.
Important figures for interpretation:
Multiple R = 0.4047
variable X and Y.
This means that there is a positive linear relationship (because the sign is positive)
but a weak correlation (because the value is less than 0.5) between X and Y.
R Square
= 0.1638 => This shows the Coefficient of Determination which
means that variable X only explains 16.38% of the variation in the variable Y.
This means here that Debt/Equity ratio only explains 16.38% of the variation in the
Dividend per share. While the remaining 83.62% is explained by other factors which
are not included in the model.
The closer the value of R Square to 1.0 means the higher the predictive ability of
the model to predict Y. Thats why in many research they will consider more than
one factor of X in the model to increase the goodness of fit or the predictive ability
of the model.
Significance F = 0.02651 => The value of F-test is less than 0.05 means that
the model is valid and a good fit because it is significant at 5 percent level.
Intercept = 0.07229 => This is the value of a constant, a, where the regression
line intercept the Y axis.
Therefore the relationship between the Debt/Equity ratio and DPS can be written
as :
(remember Y = a + bX )
However, if your F-test (Significance F) is more than 0.05, then the model is not valid and
therefore you cannot model the equation as Y = a + bX because it does not possess predictive
ability to predict value of Y. Therefore if that happen, you can try to change the variable that you
use in Y or X. For e.g try to run regression using Dividend Payout or Dividend Yield to replace
DPS. You can also try to use Market Value as Y variable to find out how leverage affect Firm
Value.
Correlation
Coefficient, r :
The quantity r, called the linear correlation coefficient, measures the strength and
the direction of a linear relationship between two variables. The linear correlation
coefficient is sometimes referred to as the Pearson product moment correlation
coefficient in honor of its developer Karl Pearson.
The mathematical formula for computing r is:
Coefficient of Determination, r 2 or R2 :
The coefficient of determination, r 2, is useful because it gives the proportion of
the variance (fluctuation) of one variable that is predictable from the other
variable.
It is a measure that allows us to determine how certain one can be in making
predictions from a certain model/graph.
The coefficient of determination is the ratio of the explained variation to the total
variation.
The coefficient of determination is such that 0 < r 2 < 1, and denotes the strength
of the linear association between x and y.
The coefficient of determination represents the percent of the data that is the
closest to the line of best fit. For example, if r = 0.922, then r 2 = 0.850, which
means that 85% of the total variation in y can be explained by the linear relationship
between x and y (as described by the regression equation). The other 15% of the
total variation in y remains unexplained.
The coefficient of determination is a measure of how well the regression line
represents the data. If the regression line passes exactly through every point on
the scatter plot, it would be able to explain all of the variation. The further the line is
away from the points, the less it is able to explain.
Here we see no Data Analysis menu on the right top of the screen.
Step 1: Click File menu on the top left of the screen, then click Option
Now Data Analysis menu will appear on the top right of the screen. You can now
begin your regression analysis.
For latest Excel 2013 or older version of Excel, eg. Excel 2007 or below,
you can search at YouTube where you can find a lot of videos showing how
to install Data Analysis steps by steps.