Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lecture 7: Exponential Smoothing Methods Please Read Chapter 4 and Chapter 2 of MWH Book

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Lecture 7: Exponential Smoothing Methods

Please read Chapter 4 and Chapter 2 of MWH Book

1
Big Picture

1. In lecture 6, smoothing (averaging) method is used to estimate the trend-cycle


(decomposition)

2. Now, modified smoothing method is used to forecast future values. That means, in general,
the averaging is one-sided, as opposed to two-sided

3. Another difference is, we focus on out-of-sample forecasting errors, other than the
in-sample fitting errors (residuals)

4. Again, which method works best depends on whether trend or seasonality is present

2
Forecasting Scenario

1. The present is the t-th period

2. Past (in-sample) observations Yt ,Yt−1 , . . . ,Y1 are used to estimate the model

3. Then forecasts Ft+1 , Ft+2 , . . . are computed for the future (out-of-sample) values
Yt+1 ,Yt+2 , . . .

4. Do not confuse the out-of-sample forecasting errors Yt+1 − Ft+1 ,Yt+2 − Ft+2 , . . . with the
in-sample fitting errors (residuals) Yt − Ft , . . . ,Y1 − F1

3
Averaging Method—Mean

This method uses all past data available

1 t
Ft+1 = ∑
t i=1
Yi (1)

1 t+1
Ft+2 = ∑ Yi
t + 1 i=1
(2)

The mean method works well only when there is no trend or seasonality or change in the mean
value. In other words, it assumes the underlying process has a constant mean (stationarity).

4
Averaging Method—Moving Average

Moving average, to some degree, can allow for time-varying mean value. It does so by including
only the most recent k observations. More explicitly, the MA(k) forecast (not smoother) is
t
1
Ft+1 = ∑ Yi
k i=t−k+1
(3)

1 t+1
Ft+2 = ∑ Yi
k i=t−k+2
(4)

It follows that
1
Ft+2 = Ft+1 + (Yt+1 −Yt−k+1 ) (5)
k
So each new forecast Ft+2 is an adjustment of the immediately preceding forecast Ft+1 . The
adjustment term is 1k (Yt+1 −Yt−k+1 ). Bigger k leads to smaller adjustment (or smoother
forecasts). Figure 4-5 in the book clearly shows that the mean method cannot capture the change
in the mean value, but the moving average can, though with some lags.

5
Exponential Smoothing Method—Single Exponential Smoothing (SES) I

We hope the forecast can catch up with the change in mean value more quickly. One way is to
assign a bigger weight to the latest observation. Consider

Ft+1 = Ft + α (Yt − Ft ) (6)


Ft+2 = Ft+1 + α (Yt+1 − Ft+1 ) (7)
F1 = Y1 (initialization) (8)

where 0 < α < 1. There is substantial adjustment when α is close to one. In fact, the single
exponential smoothing utilizes the idea of error correction—we revise up the forecast, i.e.,
Ft+1 > Ft if Ft underestimates the true value Yt − Ft > 0. The error correcting process is also
called negative feedback.

6
Single Exponential Smoothing II

Equation (6) can be rewritten as

Ft+1 = (1 − α )Ft + α Yt (9)

By using the lag operator, we can show that the single exponential smoothing is equivalent to
MA(∞) process:

Ft+1 = α [(1 − (1 − α )L]−1Yt = α Yt + α (1 − α )Yt−1 + α (1 − α )2Yt−2 + . . . (10)

Moreover, notice that the weights α , α (1 − α ), α (1 − α )2 , . . . decay exponentially toward zero.

7
Mean Square Error (MSE)

Different smoothing parameter α yields different forecast. We can evaluate or rank various
forecasts using mean square error (MSE), which is the average of squared error (in-sample or
out-of-sample)
1
MSE =
total count i ∑ (Yi − Fi )2

We square the error in order to avoid cancellation. The forecast with the smallest MSE is the
best one. This fact suggests that we can obtain the optimal smoothing parameter α by
minimizing MSE.

8
Theil’s U-Statistic

An alternative way to rank forecasts is considering the Theil’s U-statistic


v
u ( )
u n−1 Yt+1 −Ft+1 2
u ∑t=1
U =u
Yt
t ( )2 (11)
n−1 Y −Y
∑t=1 t+1
Yt
t

Y −F Y −Y
where t+1 Yt t+1 is the forecast relative change based on a particular method, while t+1Yt t is the
forecast relative change based on a naive method Ft+1 = Yt . A U-statistic less than one indicates
a method better than the naive method. The method with smallest U-statistic is the best one.

9
Autocorrelation of Forecasting Error

The forecasting error should be serially uncorrelated if no pattern is left unexploited. So a


method that produces a white noise forecasting error is better than the one with serially
correlated forecasting error.

10
Single Exponential Smoothing III

The equation (6) for SES is recursive. This suggests that we may use loop. The R code below,
for example, computes the SES forecast, and in the end, compute the MSE for α = 0.5.

alpha = 0.5
f = rep(0, length(y))
f[1] = y[1]
tse = 0
for (j in 2:length(y)) {
f[j] = f[(j-1)] + alpha*(y[j-1]-f[(j-1)])
tse = tse + (y[j] - f[j])ˆ2
}
mse =tse/(length(y)-1)

11
Adaptive-Response-Rate SES (ARRSES)

The single SES has three limitations: (i) it is arbitrary regarding which α to use; (2) it employs a
unique α to compute all forecasts. By contrast, ARRSES allows the value of α to be determined
in a data-driven fashion. That is, α is adjusted automatically if there is change in the pattern of
data, which can be signalized by big forecasting error. Intuitively, we prefer bigger α when
bigger forecasting error arises:

Ft+1 = (1 − αt )Ft + αt Yt (12)


At
αt+1 = (13)
Mt
At = β Et + (1 − β )At−1 (14)
Mt = β |Et | + (1 − β )Mt−1 (15)
Et = Yt − Ft (16)

So Et is the forecasting error; Mt is a smoothed estimate of absolute forecasting error, and At is a


smoothed estimate of forecasting error. Notice that it is αt+1 , not αt , in (13).
12
Holt’s Linear Method (HLM)

The third limitation of SES is that it cannot catch up with trend quickly enough, while Holt’s
Linear Method (HLM) is able to account for trend.

Lt = α Yt + (1 − α )(Lt−1 + bt−1 ) (17)


bt = β (Lt − Lt−1 ) + (1 − β )bt−1 (18)
Ft+m = Lt + bt m (19)
L1 = Y1 (20)
b1 = Y2 −Y1 (21)

where 0 < α < 1, 0 < β < 1, and m is the number of periods ahead to be forecasted. Here Lt
denotes the level of the series, and bt the slope of trend. HLM is also called double exponential
smoothing. The HLM has the drawback of failing to account for seasonal effect.

13
Holt-Winter’s Method (HWM)

We use HWM to account for both trend and seasonality. HWM involves three equations: one for
level, one for trend; and one for seasonality.
Yt
Lt = α + (1 − α )(Lt−1 + bt−1 ) (22)
St−s
bt = β (Lt − Lt−1 ) + (1 − β )bt−1 (23)
Yt
St = γ + (1 − γ )St−s (24)
Lt
Ft+m = (Lt + bt m)St−s+m (25)

where s is the length of seasonality (e.g., the number of months in a year), and the seasonality
has a multiplicative form.

14
Prediction Intervals

We can use prediction intervals (PI) to illustrate the inherent uncertainty in the point forecast:

95% PI = Ft+1 ± 1.96 MSE (26)

where Ft+1 is the point forecast, and MSE is the mean square error for the method that produces
Ft+1

15

You might also like