Cooks
Cooks
R. Dennis Cook
Technometrics, Vol. 19, No. 1. (Feb., 1977), pp. 15-18.
Stable URL:
http://links.jstor.org/sici?sici=0040-1706%28197702%2919%3A1%3C15%3ADOIOIL%3E2.0.CO%3B2-8
Technometrics is currently published by American Statistical Association.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/astata.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.
http://www.jstor.org
Sat Sep 29 10:54:54 2007
University of Minnesota
A new measure based on confidence ellipsoids is developed for judging the contribution of
each data point to the determination of the least squares estimate of the parameter vector in full
rank linear regression models. It is shown that the measure combines information from the
studentized residuals and the variances of the residuals and predicted values. Two examples are
presented.
KEY WORDS
Influential observations
Confidence ellipsoids
Variances of residuals
Outliers
I. INTRODUCTION
6)
R. DENNIS COOK
16
and
fi[-,, and fi
+ (xlx)-lx,x,l(x'x)-'/(I
u,),
TECHNOMETRICSO,
VOL. 19, NO. 1, FEBRUARY 1977
Data
The data for this example were previously published by Hald (see [2] and p.165 of [5]). There are 4
regressors and 13 observation points. Table 2 lists
Ri/s, ti, V ( ~ , ) / V ( R ,and
) Di. In contrast to the previous example the data here seem well behaved. Observation number 8 has the largest Di value but its
removal moves the least squares estimate to the edge
of only the 10% confidence region for 0.
4. EXTENSIONS
3. EXAMPLES
Example I-Longley
Example 2-Hald
17
Data
Longley [7] presented a data set relating six economic variables to total derived employment for the
years 1947 to 1962. Table 1 lists the residuals standardized by s, t , , v(P,)/ v(R,), Dl and the year. Notice
first that there are considerable differences between
R,/s and t,. Second, the point with the largest Dl
value corresponds to 1951. Removal of this point will
move the least squares estimate to the edge of a 35%
confidence region around 8. The second largest D l
value is at 1962 and its removal will move the estimate of 0 to approximately the edge of a 15% confidence region. Clearly, 1951 and 1962 have the greatest impact on the determination of 8. The point with
the largest studentized residual is 1956; however, the
effect of this point on 8 is not important relative to
the effects of 1951 and 1962. The identification of the
points with max Iti 1 and max v(~,)/v(R,)(or max
ui) would not have isolated 1951. (It is interesting to
note that '1951 was the first full year of the Korean
conflict).
TABLE I -Longley D a ~ a
Year
1947
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
*:
Ri/s
0.88
-0.31
0.15
-1.34
1.02
-0.82
-0.54
-0.04
0.05
1.48
-0.06
-0.13
-0.51
-0.28
1.12
-0.68
1 ti 1
1.15
0.48
0.19
1.70
1.64
1.03
0.75
0.06
0.07
1.83
0.07
0.18
0.64
0.32
1.42
1.21
V(Yi)/V(Ri)
0.74
1.30
0.57
0.59
1.60
0.59
0.97
1.02
0.84
0.49
0.56
0.93
0.60
0.30
0.59
2.21
i
0.14
0.04
0.24
0.61
0.09
0.08
*
*
0.23
*
*
0.04
*
0.17
0.47
smaller than 5 x
TECHNOMETRICSO,
VOL. 19, NO. 1, FEBRUARY 1977
18
R. DENNIS COOK
TABLE 2-Hald
Data
A
Observation
Ri / s
1 ti 1
1
2
3
4
5
6
7
8
9
10
11
12
13
0.002
0.62
-0.68
-0.71
0.10
1.61
-0.59
-1.24
0.56
0.12
0.81
0.40
-0.94
0.003
0.76
1.05
0.84
0.13
1.71
0.74
1.69
0.67
0.21
1.07
0.46
1.12
*:
V(Yi)/V(Ri)
1.22
0.50
1.36
0.42
0.56
0.14
0.58
0.69
0.42
2.34
0.74
0.36
0.44
Di
*
0.06
0.30
0.06
0.08
0.06
0.31
0.04
0.02
0.17
0.02
0.11
smaller than 2 x
where
*,-,, = @,-,,.
Since
it follows that
TECHNOMETRICSO,
VOL. 19, NO. 1, FEBRUARY 1977
The author would like to thank Professor C. Bingham for his suggestions a n d criticisms.
REFERENCES
[ I ] Beckman, R. J. and Trussell, H. J., (1974). The distribution of
an arbitrary studentized residual and the effects of updating in
multiple regression. J Amer. Statist. Assoc. 69, 199-201.
[2] Behnken, D. W. and Draper, N . R., (1972). Residuals and their
variance patterns. Technometrics, 14. 102-1 11.
[3] Box, G . E. P. and Draper, N. R., (1975). Robust design.
Biomefrika, 62, 347-352.
[4] Davies, R. B. and Hutton, B., (1975). The effects of errors in
the independent variables in linear regression. Biomefrika, 62,
383-391.
[5] Draper, N. R. and Smith, H., (1966). Applied Regression Analysis. Wiley, New York.
[6] Huber, P. J., (1975). Robustness and Designs. A Survey of
Statistical Design and Linear Models. North-Holland, Amsterdam.
[7] Longley, J. W., (1967). An appraisal of least squares programs
for the electronic computer from the point of view of the user.
J. Amer. Statist. Assoc., 6 2 , 819-841.
[8] Lund, R. E., (1975). Tables for an approximate test for outliers
in linear models. Technometrics, 17, 473-476.