A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
ANSWERS TO ODD NUMBERED PROBLEMS 1. Descriptive Statistics Variable C1 Variable C1 a. b. c. d. e. N 28 Min 5.00 Mean Median Tr Mean 21.32 17.00 20.69 Max 54.00 Q1 Q3 11.25 28.75 StDev SE Mean 13.37 2.53
X = 21.32
S = 13.37 S2 = 178.76 If the policy is successful, smaller orders will be eliminated and the mean will increase. If the change causes all customers to consolidate a number of small orders into large orders, the standard deviation will probably decrease. Otherwise, it is very difficult to tell how the standard deviation will be affected. The best forecast over the long-term is the mean of 21.32. Point estimate: X = 10.76% 1 = .95 Z = 1.96, n = 30,
X 1.96 S /
f. 3. a. b.
c.
X =10.76,
30 =10.76 4.91
S =13.71
(5.64%, 15.88%) d.
30 = 10.76 5.12
We see that the 95% confidence intervals in b and c are not much different. This explains why a sample of size n = 30 is often taken as the cutoff between large and small samples.
5.
n = 100 S = 1.7
= .05 X = 13.5 1
Since |2.67| = 2.67 > 1.96, reject H 0 at the 5% level. The mean satisfaction rating is different from 5.9. p-value: P(Z < 2.67 or Z > 2.67) = 2 P(Z > 2.67) = 2(.0038) = .0076, very strong evidence against H 0 9. H0: = 700 H1: 700 n = 50 S = 50 = .05 X = 715
11.
a.
b. c.
13.
This is a good population for showing how random samples are taken. If three-digit random numbers are generated from Minitab as demonstrated in Problem 10, the selected items for the sample can be easily found. In this population, = 0.06, so most students will not reject the null hypothesis which so states this. The few students that erroneously 3
reject H0 demonstrate Type I error. 15. n = 175, X =45.2, S =10.3 Point estimate: X = 45.2 98% confidence interval: 1 = .98 Z = 2.33
X 2.33 S /
(43.4, 47.0)
Hypothesis test:
H 0 : = 44 H 1 : 44
Test statistic: Z =
Since |Z| = 1.54 < 2.33, do not reject H 0 at the 2% level. As expected, the results of the hypothesis test are consistent with the confidence interval for ; = 44 is not ruled out by either procedure.
1. 3.
Qualitative forecasting techniques rely on human judgment and intuition. Quantitative forecasting techniques rely more on manipulation of past historical data. The secular trend of a time series is the long-term component that represents the growth or decline in the series over an extended period of time. The cyclical component is the wavelike fluctuation around the trend. The seasonal component is a pattern of change that repeats itself year after year. The irregular component measures the variability of the time series after the other components have been removed. The autocorrelation coefficient measures the correlation between a variable, lagged one or more periods, and itself. a. b. c. d. nonstationary series stationary series nonstationary series stationary series
5. 7.
9.
Naive methods, simple averaging methods, moving averages, simple exponential smoothing, and Box-Jenkins methods. Examples are: the number of breakdowns per week on an assembly line having a uniform production rate; the unit sales of a product or service in the maturation stage of its life cycle; and the number of sales resulting from a constant level of effort. Classical decomposition, census II, Winter's exponential smoothing, time series multiple regression, and Box-Jenkins methods. Examples are: electrical consumption, summer/winter activities (sports like skiing), clothing, and agricultural growing seasons, retail sales influenced by holidays, three-day weekends, and school calendars. 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 2,413 2,407 2,403 2,396 2,403 2,443 2,371 2,362 2,334 2,362 2,336 2,344 2,384 2,244 -6 -4 -7 7 45 -77 -9 -28 28 -26 8 40 -140
11.
13.
Yes! The original series has a decreasing trend. 15. a. b. c. MPE MAPE MSE 5
17.
= 4465.7
(Yt-1 Y ) = 2396.1
= .895
SE( r k ) =
1 + 2 ri 2
i =1
k 1
1 + 2 ( r1 )
i =1
11
n
t= r1 1 .895 0 = = 4.39 SE(rk) .204
24
1 = .204 24
Since the computed t (4.39) is greater than the critical t (2.069), reject the null. H0: 2 = 0H1: 2 0
SE( r k
)=
1 + 2 ri
i =1
k 1
1 + 2 ( .895)
i =1
2 1
n
t =
24
2.6 = .33 24
Since the computed t (4.39) is greater than the critical t (2.069), reject the null. b. The data are nonstationary.
Lag
Corr
LBQ
1 2 3 4 5 6
19.
Figure 3.18 - The data are nonstationary. (Trended data) Figure 3.19 - The data are random. Figure 3.20 - The data are seasonal. (Monthly data) Figure 3.21 - The data are stationary and have some pattern that could be modeled.
21.
b. The sales time series appears to vary about a fixed level so it is stationary. c. The sample autocorrelation function for the sales series follows
The sample autocorrelations die out rapidly. This behavior is consistent with a stationary series. Note that the sales data are not random. Sales in adjacent weeks tend to be positively correlated.
Since, in this case, the residuals differ from the original observations by the constant
CHAPTER 4
8
MAD: 0.94222
a.
9.
3 4 5 6 7 8 9 10 11 12
10.16 10.25 10.61 11.07 11.52 11.09 10.80 10.50 10.86 9.97
9.8133 10.1333 10.3400 10.6433 11.0667 11.2267 11.1367 10.7967 10.7200 10.4433
MAD: 0.49111
MSD: 0.31931
MPE: .6904
Moving Average
Actual
11.5
Predicted Forecast Actual Predicted Forecast
10.5
Yield
Moving Average Length: 3
9.5
10
12
14
Time
b.
5-month moving-average Period Yield MA Predict 1 9.29 * * 2 9.99 * * 3 10.16 * * 4 10.25 * * 5 10.61 10.060 * Error * * * * * 10
6 11.07 10.416 10.060 1.010 7 11.52 10.722 10.416 1.104 8 11.09 10.908 10.722 0.368 9 10.80 11.018 10.908 -0.108 10 10.50 10.996 11.018 -0.518 11 10.86 10.954 10.996 -0.136 12 9.97 10.644 10.954 -0.984 Row Period Forecast Lower Upper 1 13 10.644 9.23041 12.0576 Accuracy Measures MAPE: 5.58295 MAD: 0.60400
MSD: 0.52015
MPE: .7100
Moving Average
Actual
12
Predicted Forecast Actual Predicted
11
Forecast
Yield
Moving Average
10
Length:
10
12
14
Time
g. 11.
Time Demand Smooth Predict 1 2 3 4 205 251 304 284 205.000 228.000 266.000 275.000 205.000 205.000 228.000 266.000 11
5 6 7 8 9 10 11 12
77.0000 -13.5000 -65.7500 10.1250 33.0625 -6.4688 92.7656 -82.6172 MSD: 2943.24
MAD:
43.44
400
Predicted Forecast Actual Predicted Forecast
300
Demand
MAPE:
200
MAD: MSD:
10
12
14
Time
12
13.
a.
= .4 Accuracy Measures MAPE: 14.05 MAD: 24.02 Row Period Forecast Lower Upper 1 57 326.367 267.513 385.221 MSD: 1174.50
b.
= 0.6 Accuracy Measures MAPE: 14.68 MAD: 24.56 Row Period Forecast 1 57 334.07 Lower 273.889 MSD: 1080.21 Upper 394.251
c.
d.
No! When the residual autocorrelations shown above are examined, some of them were found to be significant.
15.
The autocorrelation function shows that the data are seasonal with a slight trend. Therefore, the Winters model is used to forecast revenues. 13
Smoothing Constants - Alpha (level): 0.8 Beta (trend): 0.1 Gamma (seasonal): 0.1
An examination of the autocorrelation coefficients for the residuals of this Winters model shown below indicates that none of them are significantly different from zero.
14
CHAPTER 5
15
3. 5. 7.
YEAR Y 1980 11424 1981 12811 1982 14566 1983 16542 1984 19670 1985 20770 1986 22585 1987 23904 1988 25686 1989 26891 1990 29073 1991 28189
T 11105.2 12900.6 14695.9 16491.3 18286.7 20082.1 21877.4 23672.8 25468.8 27263.6 29059.0 30854.3
C 102.871 99.306 99.116 100.307 107.565 103.426 103.234 100.977 100.855 98.633 100.048 91.362 16
32649.7 93.263 34445.1 92.025 36240.5 97.777 38035.8 99.454 39831.2 106.660 41626.6 107.095
Y = TS = 850(1.12) = $952 All of the statements are correct except d. a. The regression equation and seasonal indexes are shown below. The trend is the most important but both trend and seasonal should be used to forecast. Trend Line Equation Y = 2241.77 + 25.5111X Seasonal Indices Period 1 2 3 4 Index 0.96392 1.02559 1.00323 1.00726
Accuracy of Model MAPE: 3.2 MAD: 87.3 MSD: 11114.7 b. Forecasts Row Period Forecast 1 43 3349.53 2 44 3388.67 c. The forecast for third quarter is very accurate (3,349.5 versus 3,340) The forecast for fourth quarter is high compared to Value Line (3,388.7 versus 3,300). Following are computer generated graphs for this problem: 17
3000 2800
Predicted Forecast
Sales
2600
2400 2200
MAPE: MAD:
2000 0 5 10 15 20 25 30 35 40 45
MSD:
Time
18
15.
a.
Data LnCavSales Length 77.0000 NMissing 0 Trend Line Equation Yt = 4.61322 + 2.19E-02*t Seasonal Indices Period 1 2 3 4 5 6 7 8 9 10 11 12 Index 0.33461 -0.01814 -0.40249 -0.63699 -0.71401 -0.57058 -0.27300 -0.00120 0.46996 0.72291 0.74671 0.34220
b. c.
Pronounced trend and seasonal components. Would use both to forecast. Forecasts Date Period Forecasts(Ln(Sales)) Forecasts(Sales) 20
79 80 81 82 83 84
314
See last column of table in part c. Forecasts of sales developed from additive decomposition are higher (for all months June 2000 through December 2000) than those developed from the multiplicative decomposition. Forecasts from multiplicative decomposition appear to be a little more consistent with recent behavior of Cavanaugh sales time series.
17.
a.
Variation appears to be increasing with level. Multiplicative decomposition may be appropriate or additive decomposition with the logarithms of demand. b. Neither a multiplicative decomposition or an additive decomposition with a linear trend work well for this series. This time series is best modeled with other methods. The multiplicative decomposition is pictured below. Seasonal Indices (Multiplicative Decomposition for Demand) 21
c.
Period 9 10 11 12
Demand tends to be relatively high in the summer months. d. Forecasts derived from a multiplicative decomposition of demand (see plot below). Date Oct. Nov. Dec. Period 130 131 132 Forecast 172.343 175.773 178.803
19.
a.
JAN =
= (140 + 5(72) = 500)(1.20) = 600 = (140 + 5(73) = 505)(1.37) = 692 = (140 + 5(74) = 510)(1.00) = 510 = (140 + 5(75) = 515)(0.33) = 170 = (140 + 5(76) = 520)(0.47) = 244 = (140 + 5(77) = 525)(1.25) = 656 = (140 + 5(78) = 530)(1.53) = 811 22
Y Y Y Y Y
= (140 + 5(79) = 535)(1.51) = 808 = (140 + 5(80) = 540)(0.95) = 513 = (140 + 5(81) = 545)(0.60) = 327 = (140 + 5(82) = 550)(0.82) = 451 = (140 + 5(83) = 555)(0.97) = 538
1289.73(2.847) = 3,671.86
23
3.
The regression equation is Sales = 828 + 10.8 Expend. Predictor Constant Expend. S = 67.19 Coef 828.1 10.787 StDev 136.1 2.384 T 6.08 4.52 P 0.000 0.000
R-Sq = 71.9%
R-Sq(adj) = 68.4%
Analysis of Variance Source DF Regression 1 Error 8 Total 9 a. b. c. d. e. SS 92432 36121 128552 MS 92432 4515 F P 20.47 0.000
72% since r2 = .719 Unexplained sum of squares = 36,121 Divide by df = (n - 2) = 8 to get 4515.
4515 = 67.19 = sy.x
f.
Total sum of squares is 128,552 Divide by df = (n - 1) = 9 to get the variance, 14,283.6. The square root is Y's standard deviation, sy = 119.5.
5. .
The regression equation is Cost = 208 + 70.9 Age Predictor Constant Age S = 111.6 Coef 208.20 70.918 StDev 75.00 9.934 T P 2.78 0.027 7.14 0.000 R-Sq(adj) = 86.2%
R-Sq = 87.9%
Analysis of Variance 24
a.
25
1100 1000 900 800 700 600 500 400 300 200 0 5 10
Cost
Age
b. c.
Y2 = 4,799,724 XY = 48,665
X = 59 r = .938
d. e.
= 208.2033 + 70.9181X
( X X )
t=
Reject the null hypothesis. Age and maintenance cost are linearly related in the population. f.
Y
26
7.
The regression equation is Orders = 15.8 + 1.11 Catalogs T P 5.13 0.000 0.011 R-Sq(adj) = 43.8%
Predictor Coef StDev Constant 15.846 3.092 Catalogs 1.1132 0.3596 3.10 S = 5.757 a. b. c. d. R-Sq = 48.9%
= 15.846 + 1.1132X
sy.x = 5.57 Analysis of Variance Source DF SS Regression 1 317.53 Error 10 331.38 Total 11 648.92 49% since r2 = .489 H0: = 0 H1: 0 Reject H0 if t < -3.169 or t > 3.169
sb1 = s y x /
MS 317.53 33.14
F 9.58
P 0.011
e. f.
( X X )
t=
Fail to reject the null hypothesis. Orders received and catalogs distributed are not linearly related in the population. g. H0: = 0 H1: 0 Reject H0 if F > 10.04
F = MSR 317.53 = = 9.58 MSE 33.14
27
9.
a. b.
The two firms seem to be using very similar rationale since r = .959. If ABC bids 1.01, the prediction for Y becomes COMP = -3.257 + 1.03435(101) = 101.212, the point estimate. For an interval estimate using 95% confidence the appropriate standard error is sy.x, since only the variability of the data points around the (population) regression line need be considered: 101.205 1.96(.7431) 101.205 1.456 99.749 to 102.661 If the data constitute a sample, the point estimate for Y would be the same but the interval estimate would use the standard error of the forecast instead of Sy.x, since the variability of sample regression lines around the population regression line would have to be considered along with the scatter of sample Y values around the calculated line. The probability of winning the bid would then involve the same problem as in c. above except that sf would be used in the calculation: Z = (101 - 101.212)/.768 = -.276 where sf = .768, giving an area of .1103. (.5 .1103) = .3897.
c.
11.
The regression equation is Permits = 2217 - 145 Rate Predictor Constant Rate S = 144.3 Coef 2217.4 -144.95 StDev 316.2 27.96 T 7.01 -5.18 P 0.000 0.000
R-Sq = 79.3%
R-Sq(adj) = 76.4%
Analysis of Variance Source DF Regression 1 Error 7 Total 8 a. SS 559607 145753 705360 MS F P 559607 26.88 0.000 20822
28
b.
c.
Reject H0 if t < - 2.365 or t > 2.365. Computed t score = 5.1842 Reject the null hypothesis: Interest rates and building permits are linearly related in the population. 44.9474 r2 = .7934 Using knowledge of the linear relationship between interest rates and building permits (r = -0.8907) we can explain 79.3% of the building permit variable variance.
d. e. f.
13.
The regression equation is Defects = - 17.7 + 0.355 Size Predictor Coef StDev Constant -17.731 4.626 Size 0.35495 0.02332 S = 7.863 R-Sq = 95.5% Analysis of Variance Source DF Regression 1 Error 11 SS 14331 680 T -3.83 15.22 P 0.003 0.000
Total a.
12
15011
b. c.
Defects = - 17.7 + 0.355 Size The slope coefficient, .355, is significantly different from zero. + 0.00101
d.
e.
The best model transforms the predictor variable and uses X2 as its independent variable. The regression equation is Defects = 4.70 + 0.00101 Sizesqr Predictor Coef StDev Constant 4.6973 0.9997 Sizesqr 0.00100793 0.00001930 S = 2.341 R-Sq = 99.6% T P 4.70 0.000 52.22 0.000
R-Sq(adj) = 99.6%
Analysis of Variance Source DF Regression 1 Error 11 Total 12 SS 14951 60 15011 MS 14951 5 F 2727.00 P 0.000
Examination of the residuals shows a random pattern. h. R denotes an observation with a large standardized residual 31
Fit 95.411 i.
b. c.
assessed values (as predictor variable). There is a considerable amount of unexplained variation. 15. a. b. c. d. The regression equation is: OpExpens = 18.9 + 1.30 PlayCosts by player costs.
r 2 = .751 . About 75% of the variation in operating expenses is explained
F = 72.6 , p-value = .000 < .10. The regression is clearly significant at the = .10 level.
that operating expenses have a fixed cost component represented by the intercept b0 = 18.9 , and then are about 1.3 times player costs. e. f.
2 s =58.6 , Y Y f
gives 58.6 2(5.5) or (47.6, 69.6). SE Fit 17.69 Residual 3.45R St Resid
Unusual Observations Obs Play Cost OpExpens Fit 7 18.0 60.00 42.31 1.64
R denotes an observation with a large standardized residual Team 7 has unusually low player costs relative to operating expenses.
32