A fast algorithm for the minimum covariance determinant estimator

PJ Rousseeuw, KV Driessen - Technometrics, 1999 - Taylor & Francis
PJ Rousseeuw, KV Driessen
Technometrics, 1999Taylor & Francis
The minimum covariance determinant (MCD) method of Rousseeuw is a highly robust
estimator of multivariate location and scatter. Its objective is to find h observations (out of n)
whose covariance matrix has the lowest determinant. Until now, applications of the MCD
were hampered by the computation time of existing algorithms, which were limited to a few
hundred objects in a few dimensions. We discuss two important applications of larger size,
one about a production process at Philips with n= 677 objects and p= 9 variables, and a …
The minimum covariance determinant (MCD) method of Rousseeuw is a highly robust estimator of multivariate location and scatter. Its objective is to find h observations (out of n) whose covariance matrix has the lowest determinant. Until now, applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions. We discuss two important applications of larger size, one about a production process at Philips with n = 677 objects and p = 9 variables, and a dataset from astronomy with n = 137,256 objects and p = 27 variables. To deal with such problems we have developed a new algorithm for the MCD, called FAST-MCD. The basic ideas are an inequality involving order statistics and determinants, and techniques which we call “selective iteration” and “nested extensions.” For small datasets, FAST-MCD typically finds the exact MCD, whereas for larger datasets it gives more accurate results than existing algorithms and is faster by orders of magnitude. Moreover, FASTMCD is able to detect an exact fit—that is, a hyperplane containing h or more observations. The new algorithm makes the MCD method available as a routine tool for analyzing multivariate data. We also propose the distance-distance plot (D-D plot), which displays MCD-based robust distances versus Mahalanobis distances, and illustrate it with some examples.
Taylor & Francis Online