Lecture10 Regression2 TS PDF
Lecture10 Regression2 TS PDF
)
TIME-SERIES ANALYSIS
EVALUATING SIMPLE LINEAR REGRESSION
Number of ads per week is the independent variable (X), and the
number of customers per week is the dependent variable (Y).
The purpose of advertising in the local newspaper is to boost sales, so
we expect a positive relationship between X and Y.
2
a) Graph the paired observations of X and Y.
800
700
600
Customer
500
400
300
200
100
0
0 1 2 3 4 5 6 7 8
Ads
This scatter diagram shows that, at least in the sample, there is some rather
weak positive linear relationship between the numbers of ads and customers.
Regression Statistics
Multiple R 0.292
R Square 0.085
Adjusted R Square 0.047
Standard Error 132.960
Observations 26.000
4
d) Find and interpret the coefficient of determination.
Regression Statistics
Multiple R 0.292
R Square 0.085
Adjusted R Square 0.047
Standard Error 132.960
Observations 26.000
This suggests that only 8.5% of the total variation in the number of customers
can be explained, or is due to, the variation in the number of advertisements.
The remaining 91.5% is unexplained, i.e. is due to some other factors than ads.
This model has a rather poor overall fit.
y j 0 1 x j j
Given xj, the properties of the {Yj } sub-population are determined by
the εj error/random variable.
As regards the probability distributions of εj ( j =1,…, n), it is assumed that:
i. Each εj is normally distributed, Yj is also normal;
ii. Each εj has zero mean, E(Yj) = β0 + β1 xj
iii.Each εj has the same Var(Yj) = σε2 is also
variance, σε2, constant;
iv. The errors are independent of {Yi} and {Yj}, ij, are also
each other, independent;
v. The error does not depend on The effects of X and ε on Y
the independent variable(s). can be separated from
each other.
6
The first three assumptions can be illustrated as follows:
E(Y)
E(Y) β0 β1X
Yi : N (β0+β1xi ; σ )
Yj : N (β0+β1xj ; σ )
X
xi xj
• If all 5 assumptions are met, then the β0-hat and β1-hat least squares
estimators are also normally distributed:
ˆ 0 : N ( 0 ; ˆ )
0
ˆ1 : N ( 1; ˆ )
1
7
ˆ 0 0 ˆ1 1
and are standard normal random variables.
ˆ0
ˆ
1
However, the standard errors (standard deviations) of the β0-hat and β1-hat
estimators are unknown and they depend on the standard deviation of ε, σε,
which is also unknown. They have to be estimated from the sample,
similarly to β0 and β1.
SSE
In the case of simple linear regression (k=1) it is s
n2
Replacing σε with its estimate, sε, the estimated standard errors of
β0-hat and β1-hat are
sˆ s
i /n
x 2
and sˆ
s
0
SS x 1
SS x
8
ˆ 0 0 ˆ1 1
and are t random variables with n -2 degrees
s ˆ s ˆ of freedom (df).
0 1
(Ex 1)
e) Find the standard error of estimate and the standard error of the slope
estimator.
Solving by hand, we have to start with the sum of squares for error, SSE.
A useful computational formula for SSE is
SS xy2 1850.6 2
SSE SS y SSE 463802 424280.2
SS x 86.654
9
Standard Error of Estimate
SSE 424280.2
Hence s 132.96
n2 26 2
Regression Statistics
Multiple R 0.292
R Square 0.085
Adjusted R Square 0.047
s 132.96 Standard Error 132.960
and sˆ 14.283
1
SS x 86.654 Observations 26.000
10
f) Determine the 95% confidence interval estimate of β1.
C = 95, so α/2 = 0.025
t0.025, 24 2.064
n = 26, so df = 24
Thus βˆ t
1 α/ 2 ,n 2 s βˆ 21.356 2.064 14.283 (-8.124 ; 50.836)
1
and with 95% confidence the slope coefficient is within this interval.
Notice that this interval has a negative lower limit and a positive upper limit,
i.e. the slope coefficient can be negative, positive or zero.
The number of advertisement does not necessarily have a positive
impact on the number of customers, so it might not be worth to spend
on advertisements in the local newspaper.
11
• The sampling distributions of the β0-hat and β1-hat least squares
estimators can also be used for testing the regression coefficients.
Recall (see the week 9 notes) that in regression analysis the most
important question is whether there is a linear relationship between
X and Y (β1 0), and if there is, whether this relationship is positive
(β1> 0) or negative (β1< 0).
To this end we can conduct two-tail or one-tail t-tests about β1, the
same way as we test µ when σ is unknown.
ˆ1 0,1
The test statistic is t
sˆ
1
(Ex 1)
g) Is there sufficient evidence at the 5% level to conclude that the number of
advertisements and the number of customers are linearly related?
12
The question suggests that H0 : β1= 0 and HA : β1 0, so β0,1= 0.
Since this a two-tail test, there are two critical values: -tα/2 = -2.064, tα/2= 2.064
(see part f).
Accordingly, reject H0 if the value of the test statistic calculated from the sample
is either smaller than -2.064 or greater than 2.064.
ˆ1 21.356
tobs 1.495 Since 1.495 is in the non-rejection
sˆ 14.283 region we maintain H0.
1
14
USING THE SAMPLE REGRESSION EQUATION
• If the fit of the sample regression equation is satisfactory, it can be
used to predict the dependent variable or to estimate its mean value.
• Traditionally, there are two types of methods for identifying the pattern.
Smoothing: Decomposition:
The random fluctuations are The time series is broken into its
removed from the data by smoothing components and the pattern is the
the time series. combination of the systematic parts.
• The pattern itself is likely to contain some, or all, of the following three
components: trend, seasonal and cyclical.
16
Trend: The long-term general change in the level of the data with a
duration of longer than a year.
t
quadratic Yt a bt ct 2 etc.
Hourly earnings: Manufacturing: Major seven countries Broad money: (sa): Sweden
1995=100 1995=100
120 140
100 120
100
80
80
60
60
40
40
20 20
0 0
Sep-70 Sep-80 Sep-90 Sep-00 Jan-61 Jan-71 Jan-81 Jan-91 Jan-01
17
Seasonal variations: Regular wavelike fluctuations of constant
length, repeating themselves within a period of
no longer than a year.
Yt
500
2000
400
1500
300
1000
200
500
100
0 0
Dec-82 Dec-85 Dec-88 Dec-91 Dec-94 Dec-97 Dec-00 Jan-83 Jan-86 Jan-89 Jan-92 Jan-95 Jan-98 Jan-01
18
Cyclical variations: Wavelike movements, quasi regular
fluctuations around the long-term trend,
lasting longer than a year.
Yt Peak
Aus: Dwelling units approved: Private: New houses Expenditure on GDP: Construction: United States: (sa)
Number bln 96 USD
40000 900
800
35000
700
30000
600
25000
500
20000
400
15000 300
Dec-70 Dec-75 Dec-80 Dec-85 Dec-90 Dec-95 Dec-00 Dec-60 Dec-70 Dec-80 Dec-90 Dec-00
19
The time period between the beginning trough and the peak is called expansion
phase, while the period between the peak and the ending trough is termed
contraction phase.
Cyclical variations are often attributed to business cycles, i.e. to the ups and
downs in the general level of business activity.
Seasonal and cyclical variations might be very similar in their appearance.
However, while seasonal variations are absolutely regular and occur over
calendar periods no longer than a year, cyclical variations might and do change
both in their intensity (amplitude) and duration, and they last longer than a year.
It is far more difficult to study and predict the cyclical component than
the seasonal component.
Additive: Multiplicative:
Yt Tt S t Ct Rt Yt Tt S t Ct Rt
21
Austria: Domestic demand: R etail sales: Volume Australia: Retail turnover: Recreational goods
1995=100 $m
160 900
800
140
700
120
600
100 500
400
80
300
60
200
40 100
Dec-76 Dec-80 Dec-84 Dec-88 Dec-92 Dec-96 Dec-00 Jan-83 Jan-86 Jan-89 Jan-92 Jan-95 Jan-98 Jan-01
22