Module 3 PoM-Forecasting
Module 3 PoM-Forecasting
Forecasting is an important tool for making informed business decisions. Regardless of the
size and profile of a company, forecasting helps the organization's management anticipate
trends in important business indicators, such as sales expectations or customer behaviour.
Forecasting is a valuable asset but it requires specific skills and correct data.
The data used for forecasting methods can either come from primary sources or secondary
sources.
Secondary sources provide information that has already been gathered and processed by a
third-party organization. Receiving the data in an organized and compiled way makes the
forecasting process quicker.
Simple Linear Regression, Multiple Linear Regression are different forecasting techniques.
Linear Regression
In a cause and effect relationship, the independent variable is the cause, and the dependent
variable is the effect. Least squares linear regression is a method for predicting the value
of a dependent variable Y, based on the value of an independent variable X.
Simple linear regression is appropriate when the following conditions are satisfied.
The dependent variable Y has a linear relationship to the independent variable X. To check
this, make sure that the XY scatterplot is linear and that the residual plot shows a random
pattern.
For each value of X, the probability distribution of Y has the same standard deviation σ. When
this condition is satisfied, the variability of the residuals will be relatively constant across all
values of X, which is easily checked in a residual plot.
For any given value of X,
The Y values are independent, as indicated by a random pattern on the residual plot.
The Y values are roughly normally distributed (i.e., symmetric and unimodal). A
little skewness is ok if the sample size is large. A histogram or a dotplot will show the
shape of the distribution.
Scatterplot
A residual plot is a graph that shows the residuals on the vertical axis and the independent variable
on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal
axis, a linear regression model is appropriate for the data; otherwise, a nonlinear model is more
appropriate.
Regression Line
Linear regression finds the straight line, called the least squares regression line or LSRL, that best
represents observations in a bivariate data set. Suppose Y is a dependent variable, and X is an
independent variable. The population regression line is:
Y = Β + ΒX
0 1
where Β is a constant, Β is the regression coefficient, X is the value of the independent variable, and
0 1
Given a random sample of observations, the population regression line is estimated by:
ŷ = b + bx
0 1
where b is a constant, b is the regression coefficient, x is the value of the independent variable, and ŷ
0 1
Problem 1
A researcher uses a regression equation to predict home heating bills (dollar cost), based on home
size (square feet). The correlation between predicted bills and home size is 0.70. What is the correct
interpretation of this finding?
(A) 70% of the variability in home heating bills can be explained by home size.
(B) 49% of the variability in home heating bills can be explained by home size.
(C) For each added square foot of home size, heating bills increased by 70 cents.
(D) For each added square foot of home size, heating bills increased by 49 cents.
(E) None of the above.
Solution
The correct answer is (B). The coefficient of determination measures the proportion of variation in the
dependent variable that is predictable from the independent variable. The coefficient of determination
is equal to R ; in this case, (0.70) or 0.49. Therefore, 49% of the variability in heating bills can be
2 2
Problem Statement
Last year, five randomly selected students took a math aptitude test before they began their statistics
course. The Statistics Department has three questions.
What linear regression equation best predicts statistics performance, based on math aptitude
scores?
If a student made an 80 on the aptitude test, what grade would we expect her to make in
statistics?
How well does the regression equation fit the data?
In the table below, the x column shows scores on the aptitude test. Similarly, the y column shows
i i
statistics grades. The last two columns show deviations scores - the difference between the student's
score and the average score on each test. The last two rows show sums and mean scores that we
will use to conduct the regression analysis.
Student x i yi (x -x)
i (y -y) i
1 95 85 17 8
2 85 95 7 18
3 80 70 2 -7
4 70 65 -8 -12
5 60 70 -18 -7
Mean 78 77
And for each student, we also need to compute the squares of the deviation scores (the last two
columns in the table below).
Student x i yi (x -x)
i
2
(y -y)
i
2
1 95 85 289 64
2 85 95 49 324
3 80 70 4 49
4 70 65 64 144
5 60 70 324 49
Mean 78 77
And finally, for each student, we need to compute the product of the deviation scores.
1 95 85 136
2 85 95 126
3 80 70 -14
4 70 65 96
5 60 70 126
Mean 78 77
analysis, we need to solve for b and b . Computations are shown below. Notice that all of our inputs
0 1
for the regression analysis come from the above three tables.
b = Σ [ (x - x)(y - y) ] / Σ [ (x - x) ]
1 i i i
2
b = 470/730
1
b = 0.644
1
Once we know the value of the regression coefficient (b ), we can solve for the regression slope (b ):
1 0
b =y-b *x
0 1
b = 77 - (0.644)(78)
0
b = 26.768
0