0% found this document useful (0 votes)

54 views

Robust Detection of Multiple Outliers in A Multivariate Data Set

ABSTRACT Many methods have been developed for detecting multiple outliers in a single multivariate sample, but very few for the case where there may be groups in the data set. We propose a method of simultaneously determining groups (as in cluster analysis) and detecting outliers, which are points that are distant from every group. Our method is an adaptation of the BACON algorithm proposed by Billor, Hadi and Velleman for the robust detection of multiple outliers in a single group of multivar

Uploaded by

chukwu solomon

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Robust Detection of Multiple Outliers in A Multivariate Data Set

Uploaded by

chukwu solomon

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 30

TITLE PAGE

Robust Detection Of Multiple Outliers In A Multivariate Data Set

ABSTRACT

Many methods have been developed for detecting multiple outliers in a

single multivariate sample, but very few for the case where there

may be groups in the data set. We propose a method of

simultaneously determining groups (as in cluster analysis) and

detecting outliers, which are points that are distant from every

group. Our method is an adaptation of the BACON algorithm

proposed by Billor, Hadi and Velleman for the robust detection of

multiple outliers in a single group of multivariate data. There are two

versions of our method, depending on whether or not the groups can

be assumed to have equal covariance matrices. The effectiveness of

the method is illustrated by its application to two real data sets and

further shown by a simulation study for different sample sizes and

dimensions for 2 and 3 groups, with and without planted outliers in

the data. When the number of groups is not known in advance, the

algorithm could be used as a robust method of cluster analysis, by

running it for various numbers of groups and choosing the best

solution.
CHAPTER ONE

Introduction

The enormous extent of the literature on cluster analysis shows how

important it is in many different fields of science to be able to find

groups of objects on the basis of mul- tidimensional measurements.

To take just one example, it is common in archaeology to come

across papers dealing with the clustering of a set of objects (for

example, pieces of pottery or glass on the basis of their chemical

composition) in order to identify those with a common place of

origin or manufacture. According to Baxter (1999), outliers are typi-

cally present in data of this type and tend to cause problems in the

application of standard statistical procedures. Hence, it is desirable

to identify outliers. However, ‘much statistical methodology dealing

with the detection of such outliers is not well suited to archaeometric

data that, in the event, consists of two or more groups’ (Baxter,

1999, p. 321). Our pur- pose in the present study is to develop a

method of detecting multivariate outliers that can be applied to data

that are expected to have a group structure, although the details of

this grouping are not known beforehand. Outliers in this context are

points that are remote from every group.

Many methods of detecting outliers in multivariate data have

been proposed in the literature, almost all of them applicable to a

single sample (Barnett & Lewis, 1994

Rocke & Woodruffe, 1996; Penny & Jolliffe, 2001). Caroni (1998)

extended the application of Wilks’ well-known test statistic for a

single outlier to the case of samples from several subpopulations

with a common covariance matrix, but she assumed that the number

of populations was known. Thus, her results are not applicable to

the situation in which it is also necessary to carry out a cluster

analysis, with an unknown number of groups. A method given by

Wang et al. (1997) for detecting multiple outliers from a mixture

distribution was also restricted, since it required the existence of a

set of points whose group membership was known, although Sain et

al. (1999) removed this restriction. Recently, Hardin & Rocke (2004)

have proposed another method for the problem we are considering

here. The only other methods in the literature that appear to have the

same purpose as the one we introduce here are those of Glascock

(1992) and Beier & Mommsen (1994).

It is important that the method should be able to detect any number

of outliers, not just a prespecified number. A procedure for doing

this in a single group, using Wilks’ statistic, was given by Caroni
& Prescott (1992) but it is not obvious how it can be extended to the

case examined here. Instead, we propose a method based on a

computationally efficient robust method of outlier detection in a

single group, given the name BACON by Billor et al. (2000). We

extend the BACON algorithm to grouped multivariate data below.

The application of the proposed method is illustrated using two

published real data sets and a simulation study of its performance for

various configurations of data is carried out.

CHAPTER TWO

Robust Detection of Multivariate Outliers

Wilks’ statistic, widely used for detecting multivariate outliers, is

equivalent to using the Mahalanobis distances of the n sample points

xi from the sample mean x¯

di (x̄ , S) = (xi − x̄ ) S −1(xi − x̄ )

where S is the covariance matrix of the entire sample or – in the case

of the multiple outlier version (Caroni & Prescott, 1992) – a sample

that has been reduced by omitting points that are even further from

the mean than the one currently being tested. Points sufficiently

distant from the mean (in relation to percentage points of an F

distribution) are declared to be outliers. An obvious danger is that of

masking: if there are actually more outliers than the number being

tested for, then the covariance matrix will be inflated by these extra

outliers, thus reducing the distances di and making it less likely that

outliers will be declared. Furthermore, it is possible that distortion of

the covariance matrix by outliers may lead to the candidate outliers

being tested in the wrong order in the sequential version of the

procedure.
These considerations make it desirable to construct a robust

version of the multivariate outlier detecting procedure, as can be

done in other multivariate analyses such as principal components

analysis (Campbell, 1980; Caroni, 2000; Hubert et al., 2005). The

Mahalanobis distance is thus replaced by a robust distance

RDi 2 = (xi − T (X)) (C(X))−1(xi − T (X))

where T (X) and C(X) are robust estimators of the center and the

covariance matrix, respec- tively. Various estimators have been

proposed in the literature, such as those based on the minimum

volume ellipsoid (MVE) (Rousseeuw & van Zomeren, 1990) and the

minimum covariance determinant (Rousseeuw & van Driessen,

1999). These estimators have the desirable properties of high

breakdown points and affine invariance. However, their imple-

mentation posed considerable practical problems since the amount of

computation required
to be sure of reaching the solution was, in many cases, infeasible.

The BACON multiple outlier detection method proposed by Billor et

al. (2000) is based on the methods of Hadi (1992, 1994), which

aimed for efficient computation of the MVE. One version of BACON

is nearly affine equivariant, has a high breakdown point (upwards of

40%), and nevertheless is computationally efficient even for very

large datasets. Another version is affine equivariant at the expense of

a somewhat lower breakdown point (about 20%), but with the

advantage of even lower computational cost.

The BACON Algorithm

The following steps summarize the general BACON algorithm for

detecting outliers in a single sample of n independent p-dimensional

observations (Billor et al., 2000).

Step 1. Using either of the following two versions of the method, determine an initial basic subset of m > p
observations where m is an integer chosen by the data analyst: m = cp with c = 3, 4 or 5 is suggested.
Version 1: Initial Subset Selected Based on Mahalanobis Distances
For all i = 1,...,n points, compute the Mahalanobis distances

di (x̄, S) = (xi − x̄) S −1(xi − x̄)

where x¯and S are the mean and covariance matrix, respectively, of all n observations. The m observations with
the smallest values of di (x̄, S) form the basic subset.
Version 2: Initial Subset Selected Based on Distances from the Medians
The basic subset comprises the m observations with the smallest values of xi M , where M is a vector
containing the coordinate-wise median, xi is row i of X, and is the vector norm. " −
·
Step 2. Compute all n discrepancies di (x̄b , Sb ) = (xi − x̄b ) S −1(xi − x̄b ) where
b x̄b and
Sb are the mean and the covariance matrix, respectively, of the basic subset.

Step 3. Determine a new basic subset consisting of the observations with smallest discrep-
ancies, specifically, those satisfying di (x̄b , Sb ) < Cnpr · χp,α/n , where χ 2 is the 100(1
p,α
− α)
percentile of the chi-squared distribution with p degrees of freedom, and Cnpr = Cnp + Chr
is a variance inflation factor, with Chr = max{0,(h − r)/(h + r)} and
p +1 1
C =1+ +
np
−
n p n − h −p
where r is the size of the current basic subset and h = [|(n + p + 1)/2|].

Step 4. Return to Step 2, stopping the iterative procedure when the

basic subset no longer grows. Outliers are those points, if any, still

remaining outside the basic subset when the iterative procedure

terminates.

Billor et al. (2000) presented the results of simulation studies that

demonstrate the success of this method. It has high power for

detecting outliers and the solution is reached in very few iterations.

The speed of computation is a great advantage over earlier methods.

The Extended BACON Algorithm for Identification of Outliers in Grouped

Data

We now examine the problem of detecting outliers from a

multivariate mixture distribution of several populations. We propose

an extended version of the BACON algorithm for use when the

sample comprises two or more subgroups. Since we will not assume

that we know which group each observation belongs to, we need to

identify subgroups and outliers at the same time. The number k of

subgroups will be chosen in advance, but the analysis can be

repeated for different values of k in order to find the best solution, as

is done in many conventional methods of cluster analysis.

We develop two versions of the method. One allows a different

covariance matrix for each group, and hence a basic subset will be

formed in each group separately. The other version assumes a

common covariance matrix and one basic subset will be formed,

comprising observations from different groups.

The first step of the algorithm for both cases, that is, with equal or

unequal covariances, requires first finding the centers of k subgroups

where k has been chosen in advance. Of the various techniques in the

literature for finding the centers of k clusters, the most commonly

used seem to be the k-means and k-medoids methods. Since the

former is known to be sensitive to outliers, we prefer the more robust

k-medoids method (Kaufman & Rousseeuw, 1990; Venables &

Ripley, 1999). The idea of this method is to select k representative

points (objects) from the data set. The corresponding clusters are

then found by assigning each remaining point to the nearest

representative object. The representative objects must be chosen so

that they are centrally located in the clusters they represent. In other

words, the average distance of the representative object to all other

objects of the same cluster is being minimized. Such an optimal

representative object is called the medoid of its cluster.

After finding these starting points and starting clusters, we then

form the basic subset. If we allow a different covariance matrix in

each group, then a basic subset is formed in every cluster separately,

just as in the single-group BACON algorithm. If we assume equal

covariance matrices, then we require a pooled within-group

estimate, which we obtain by choosing for the basic subset those

points nearest to their respective centers, whichever group they

come from.

The extended BACON algorithm can be summarized in the following steps.

Step 1. Start the analysis by using the k-medoids method to find the
centers of k subgroups, where k has been chosen in advance.

Step 2. Compute all the Euclidean distances dij between points i =

1,...,n and centers j = 1,..., k. Form groups by allocating each

point to its nearest center, by identifying di = minj dij.

Step 3. Form the basic subset. In the equal covariance version, we

take the m points with smallest distances from their corresponding

center, d (1) ,..., d(m) irrespective of which group they belong to. In the

unequal covariance version, we take in each group the m points with

the smallest distance from their center.

Step 4. Using the covariance matrix or matrices of the basic subset

or subsets, compute all the Mahalanobis distances between the n

points and the means of the k groups’ basic subsets

Dij = (xi − x̄ jb) Sj−b 1(xi − x̄ jb ) i = 1, . . . , n; j = 1, . . . , k

where x¯jb and Sjb are the mean and covariance, respectively, of the

basic subset in group j ; if the covariances are assumed to be equal,

then we use Sjb = Sb, a pooled within-group estimator. Now

reallocate each point to its nearest center by identifying Di = minj

Dij.
Step 5. Enlarge the basic subset(s) by including all points with Di <

cnprχp,α/n, where the variance inflation factor is defined in Step 3 of

the single-sample version of the algorithm.

Step 6. Return to Step 4, stopping the procedure when the sizes of

the basic subsets do not change. Points remaining outside the basic

subset(s), if any, are outliers.

Examples of Applications

For numerical illustration using actual data, we take two sets of real

data, for both of which the previously published analyses suggest

that there may be outliers. We will not carry out detailed analysis of

the data, but simply demonstrate how easily our method reveals
=
=
outliers. The first set consists of five socioeconomic measures for

each of 47 Swiss provinces (Mosteller & Tukey, 1977). These data

are available within the S-PLUS package. The cluster analyses and

principal components analysis given by Venables & Ripley (1999)

suggest that there are three main groups in the data, with one clear

outlier (point number 45). Figure 1 shows the data plotted in the

space of the first two principal components, with this point
marked. The extended BACON algorithm with k 3 (and p 5)

identifies it as an outlier (at the 5% significance level) in just two

rounds of iteration.

As a second example, we consider a set of data consisting of the

measurements of seven characteristics on 76 young bulls of three

= =
breeds (Johnson & Wichern, 2002). The data are plotted in the=space
=
of the first two principal components in Figure 2. Attention is drawn

to points 16 and 51, which are possible outliers. In fact, our method

identifies point 51 as an outlier at the 5% level for k 3 or 4 groups

and at the 7% level for k 2, whereas none of the solutions

identifies point 16 as an outlier even at the 10% level. Even for k 4

and p 7, results were obtained effectively immediately on an up-to-

date personal computer. This example shows that the algorithm

works without difficulty even in a case where the groups are not

clearly separated.
Figure 1. Data set consisting of five socioeconomic measures on 47

Swiss provinces (taken from Venables & Ripley, 1999), plotted in

the space of the first two principal components. Point number 45 is

identified as an outlier by our method

Figure 2. Data set consisting of seven characteristics measured on

76 young bulls of three breeds, plotted in the space of the first two

principal components. Points 16 and 51 appear to be possible

outliers. The remaining points are labeled as 1, 2 or 3 corresponding

to the breed (labeled 1, 5 and 8 respectively in the original: Johnson

& Wichern, 2002)

CHAPTER THREE

Simulation Study

We carried out a simulation study of the performance of the extended

= =
BACON algorithm for various configurations of= data, with and
=
without outliers. We used k 2 or 3 groups, p 2, 5 or 8 dimensions,

and various choices of the size of the initial subset m, depending on

p. For the equal-covariances version of the algorithm, we generated

n 50 or 100 points and for the unequal-covariances version n 100

points, divided equally between the k groups. The algorithms were

implemented in a program written in double precision in Microsoft

FORTRAN, obtaining pseudorandom multivariate normal deviates

from the IMSL routine DRNMVN.

Initially we examined data generated from multivariate normal

= =
distributions without any outliers. In Table 1, we show results
=
obtained using the equal-covariances version of the algorithm, with

a total of n 50 or n 100 points, equally divided between the groups.

In all cases, the covariance matrix used for data generation was the

identity, the mean of the first group was zero and the mean of the

second group was at (5, 0,..., 0) . For k 3, the mean of the third

group was placed at (2.5, 4.33, 0,..., 0) so that the means of the

three groups formed an equilateral triangle. The results in Table 1

indicate that the algorithm was rather conservative in detecting

outliers. The degree of conservatism increased somewhat as p

increased, but remained quite moderate so that the performance of

the test in the absence of outliers appears to be satisfactory. These

results were obtained using the 5% level of significance for the chi-

squared comparisons in the algorithm; results were also obtained for

the 1% level and show a similar picture.

Table 2 shows equivalent results for the unequal-covariances

=
version=of the algorithm. In this case, since more points are needed

for the starting basic subset, we only examined n 100 points. The

covariance matrix was the identity matrix for the first group, and also

the third if k 3, while for the second group the variance in the first

dimension was increased to 2. The mean of the first group was at

zero and of the second was (0, 8.66, 0,..., 0) so that the

generalized distance was similar to that used with equal covariances.

The third
Table 1. Percentage of 1000 sets of n 50 or n 100 simulated data points in which any outliers were
= =
declared at the nominal 5% level of significance: equal covariances assumed

n = 50 n = 100
No. of outliers detected No. of outliers
Dimensions Groups Basic detected
subset
p k m 0 1 2+ 0 1 2+
10 97.0% 2.8% 0.0% 96.3% 3.5% 0.2%
2 20 96.8 3.1 0.1 95.7 4.1 0.2
2 30 96.8 3.1 0.1 95.8 4.0 0.2
10 97.6 2.3 0.1 97.0 3.0 0.0
3 20 98.2 1.7 0.1 96.5 3.5 0.0
30 96.9 2.9 0.2 96.6 3.4 0.0
15 97.2 2.7 0.1 96.1 3.9 0.0
2 20 98.1 1.9 0.0 97.5 2.4 0.1
5 30 97.6 2.3 0.1 97.5 2.5 0.0
15 95.2 4.6 0.2 96.8 3.1 0.1
3 20 97.8 2.2 0.0 97.0 2.9 0.1
30 96.8 3.1 0.0 97.3 2.6 0.1
2 20 98.4 1.6 0.0 98.3 1.6 0.1
8 30 99.0 1.0 0.0 98.4 1.6 0.0
3 20 96.9 3.0 0.1 97.5 2.3 0.2
30 97.6 2.4 0.0 98.3 1.6 0.1

Table 2. Percentage of 1000 sets of n 100 simulated data points in which any outliers were declared at
=
the nominal 5% level of significance: unequal covariances

Basic subset No. of outliers detected

Dimensions Groups
p k m 0 1 2+
10 96.6% 3.1% 0.3%
2 20 96.4 3.5 0.1
30 96.2 3.6 0.2
2 10 94.1 5.2 0.7
3 20 95.8 4.1 0.1
30 94.8 5.0 0.2
15 96.4 3.5 0.1
2 20 96.8 3.1 0.1
30 95.7 3.8 0.5
5 15 97.0 2.9 0.1
3 20 96.7 3.1 0.2
30 96.2 3.6 0.2
2 20 97.7 2.2 0.1
30 97.9 2.1 0
8 3 20 99.6 0.4 0
30 99.5 0.5 0
group’s mean was again placed to form an equilateral triangle. The results in Table 2 are basically the same as
in Table 1, although perhaps more conservative for k 3 and p 8. We next examined = the performance
= of the
algorithm in data in which outliers had been planted. We kept the same configurations of group means and
covariances as above. Because there are so many possible configurations of the number and location of outliers,
it is only feasible to give the illustrative results of Table 3. Here we generated six outliers by adding
slippages to the first three points of each group for k = 2, or the first two points of each
Table 3. Performance of extended BACON algorithm on data containing six planted outliers: 1000
simulations, n 100, m 30.
= =
For location of outliers, see text
Percentage of planted outliers detected
Equal Unequal covariance
Dimensions Groups covariance matrices
p k matrices
2 2 40.60 35.43
3 38.47 27.70
5 2 16.43 14.35
3 14.82 10.03
8 2 55 03 43.82
3 54.97 19.70

group for k 3. In the case of two groups, the slippages were ( x, 0) , (0, x) and (0, x) (and zero in the remaining
dimensions if p > = 2) in the first group, and (x, 0) , (0, x) and (0, x) in −
the second group, with−x 4 and x 6 for
equal and unequal covariances, respectively. In the case of three groups, the slippages were ( x, 0) and ( y, y) in
− (x, 0) and (y, y) in the second,
the first group, = and (y, y)=and ( y, y) in the third, where x 6, y 4.24 for p 2 or 5
and the same values multiplied by 1.5 for p 8 (otherwise detection − probabilities − − very low). These
became
combinations of values were chosen − to give a symmetric arrangement−of outliers relative to groups, although
=
the symmetry = not hold with
does = respect to generalized distance for unequal covariances.= The results in Table 3
serve to show that both versions of the algorithm work well, despite the apparently lower power for k 3, p 8
which is due to the greater conservatism of the test in this case.
It is possible that the performance of our algorithm could be improved by adjusting the correction factors used
in Step 5. We used the same factors as in the single-sample case. These should be applicable when the basic
= but =
subset is small in size, not necessarily when it is large. A point will be declared to be an outlier if the
minimum of its distances to the k groups exceeds the criterion, so the factors could possibly be adjusted to
allow for this, although in fact this consideration only seems relevant for outliers that fall ‘in between’ the groups
in the multidimensional space. The correction factors used in the single-group case were derived empirically
from very extensive simulation studies. Given the huge range of possible configurations that need to be
examined, it would be rather difficult to carry out an equivalent study in the case of several groups.
CHAPTER FOUR

Conclusions

The simulation studies show that our extended BACON algorithm

works as well for iden- tifying outliers against the background of

several groups as the original BACON algorithm did for one group,

at least over the range of data

=
configurations we have examined in
= =
our simulation study. The extended algorithm has two versions,

according to whether equal within-group covariance matrices are

assumed or not. The advantage of the equal-covariance matrix version

is that the estimate of Sb, the covariance matrix of a basic subset, is a

pooled estimate and thus we require only a single subset of size m to

start the algorithm, instead of k subsets each of size m for unequal

covariances. In the latter case, starting with km points in the initial

basic subset and taking m 3p as a minimum requirement, implies

that at least 3kp points are needed. Thus, if p 8 and k 4, we need

almost 100 points that we are sure are not outliers in order to start the

procedure. Many real data sets are much smaller than

this. For example, only three of the 20 archaeological examples

listed in Table 1 of Baxter (1995) have more than 100 points and 12

of them do not even have 50 points, while p ranges from seven to 20.
In several of these examples, applying even the equal-covariances

version of our algorithm would be problematic. In general, however,

we do not think that assuming equal covariances is a sound strategy

for exploratory data analysis. Furthermore, the results of Caroni

(1998) suggest that it is a dangerous assumption in our particular

application of detecting outliers from groups in multivariate data.

We mentioned in the Introduction four existing methods of

detecting multiple outliers in grouped multivariate data. The one

developed by Wang et al. (1997) initially assumed that a set of data

was available with known group membership, as in discriminant

analysis, in order to have good starting values for their iterative

method of estimation. Sain et al. (1999) removed this restriction by

using conventional clustering algorithms to obtain an initial

empirical grouping to provide starting values. The technique is based

on normal likelihoods and therefore might be susceptible to the kind

of problems the robust BACON algorithm avoids. They also limited

their detailed investigation to the mixture with two components,

which corresponds to the practical problem behind their investigation,

namely, distinguish- ing nuclear blasts (the outliers) from

earthquakes and mining explosions (the two groups) using seismic

data. The method proposed recently by Hardin & Rocke (2004) is

similar to ours in that it seeks to define outliers by their Mahalanobis

distances with respect to robust estimates of location and dispersion

of each group. It operates separately on each group so does not have

the option of equal covariances that we have in our method. The

other two methods of detecting outliers can be found in the

archaeometric literature and are not widely known elsewhere. Beier

& Mommsen’s (1994) method is similar to ours in its later stages,

where it uses Mahalanobis distances and X2 measures of group

membership. However, it forms groups sequentially, not

simultaneously as we require. The method of Glascock (1992) does

form groups simultaneously, but it employs an iterative re-allocation

proce- dure rather like some optimization methods of cluster

analysis. It does not ‘grow’ groups in the fashion of BACON.

Neither of these methods has a version for equal covariance

matrices.

If the number of groups k is not known beforehand, but will be

determined by the analysis, then our algorithm works as a method of

cluster analysis. The analysis must be repeated for different choices

of k, and the ‘best’ solution chosen. As in other methods of cluster

analysis, the selection could be based on various measures of within-

groups homogeneity or between-groups heterogeneity. Since our

method excludes outliers, it should work better than other methods,

which generally do not detect outliers effectively (Baxter, 1999) and

whose results must therefore tend to be distorted by the presence of

outliers.
References

Barnett, V. & Lewis, T. (1994) Outliers in Statistical Data, 3rd edn (Chichester:

Wiley).

Baxter, M.J. (1995) Standardization and transformation in principal

component analysis, with applications to archaeometry, Applied

Statistics, 44, pp. 513–527.

Baxter, M.J. (1999) Detecting multivariate outliers in artefact

compositional data, Archaeometry, 41, pp. 321–338. Beier, T. &

Mommsen, H. (1994) Modified Mahalanobis filters for grouping

pottery by chemical composition,

Archaeometry, 36, pp. 287–306.

Billor, N., Hadi, A.S. & Velleman, P.F. (2000) BACON: blocked

adaptive computationally efficient outlier nominators,

Computational Statistics and Data Analysis, 34, pp. 279–298.

Campbell, N.A. (1980) Robust procedures in multivariate analysis.

I: robust covariance estimation, Applied Statistics, 29, pp. 231–

237.

Caroni, C. (1998) Wilks’ outlier test in more than one multivariate

sample, Communications in Statistics – Simulation and

Computation, 27, pp. 79–94.

Caroni, C. (2000) Outlier detection by robust principal

components analysis, Communications in Statistics –

Simulation and Computation, 29, pp. 139–151.

Caroni, C. & Prescott, P. (1992) Sequential application of Wilks’

multivariate outlier test, Applied Statistics, 41, pp. 355–364.

Glascock, M.D. (1992) Characterization of archaeological ceramics

at MURR by neutron activation analysis and multivariate

statistics, in: H. Neff (ed.) Chemical Characterization of Ceramic

Pastes in Archaeology. Monographs in World Archaeology, 7,

pp. 11–26. (Madison, WI: Prehistory Press).

Hadi, A.S. (1992) Identifying multiple outliers in multivariate

data, Journal of the Royal Statistical Society, Series B, 54, pp.

761–771.

Hadi, A.S. (1994) A modification of a method for the detection of

outliers in multivariate samples, Journal of the Royal Statistical

Society, Series B, 56, pp. 393–396.

Hardin, J. & Rocke, D.M. (2004) Outlier detection in the multiple

cluster setting using the minimum covariance determinant

estimator, Computational Statistics and Data Analysis, 44, pp.

625–638.

Hubert, M., Rousseeuw, P.J. & Vanden Branden, K. (2005) ROBPCA:

a new approach to robust principal component analysis,

Technometrics, 47, pp. 64–79.

Johnson, R.A. & Wichern, D.W. (2002) Applied Multivariate

Statistical Data Analysis, 5th edn (Upper Saddle River, NJ:

Prentice Hall).

Kaufman, L. & Rousseeuw, P.J. (1990) Finding Groups in Data: An

Introduction to Cluster Analysis (New York: John Wiley).

Mosteller, F. & Tukey, J.W. (1977) Data Analysis and Regression (Reading,

MA: Addison-Wesley).

Penny, K.I. & Jolliffe, I.T. (2001) A comparison of multivariate

outlier detection methods for clinical laboratory safety data,

Statistician, 50, pp. 295–307.

Rocke, D.M. & Woodruff, D.L. (1996) Identification of outliers in

multivariate data, Journal of the American Statistical

Association, 91, pp. 1047–1061.

Rousseeuw, P.J. & van Driessen, K. (1999) A fast algorithm for the minimum

covariance determinant estimator,

Technometrics, 41, pp. 212–223.

Rousseeuw, P.J. & van Zomeren, B. (1990) Unmasking multivariate outliers and

leverage points (with Discussion),

Journal of the American Statistical Association, 85, pp. 633–639.

Sain, S.R., Gray, H.L., Woodward, W.A. & Fisk, M.D. (1999) Outlier

detection from a mixture distribution when training data are

unlabeled, Bulletin of the Seismological Society of America, 89,

pp. 294–304.
Venables, W.N. & Ripley, B.D. (1999) Modern Applied Statistics

with S-PLUS, 3rd edn (New York: Springer- Verlag).

Wang, S., Woodward, W.A., Gray, H.L., Wiechecki, S. & Sain, S.R.

(1997) A new test for outlier detection from a multivariate

mixture distribution, Journal of Computational and Graphical

Statistics, 6, pp. 285–299.

Effect of Social Media Health Communication On Knowledge Attitude and Intention To Adopt Health-Enhancing Behav
100% (1)
Effect of Social Media Health Communication On Knowledge Attitude and Intention To Adopt Health-Enhancing Behav
170 pages
Attitude of Undergraduates To The 18 Warning Sign in Alcoholic Beverage Advertisements in Selected Universities
No ratings yet
Attitude of Undergraduates To The 18 Warning Sign in Alcoholic Beverage Advertisements in Selected Universities
136 pages
Method Statement For FA System
No ratings yet
Method Statement For FA System
6 pages
Hydraulic Calculations Fire Protection
100% (1)
Hydraulic Calculations Fire Protection
16 pages
Unit Operations and Processes in Environmental Engineering
89% (28)
Unit Operations and Processes in Environmental Engineering
815 pages
On Detection of Outliers and Their Effect in Supervised Classification
No ratings yet
On Detection of Outliers and Their Effect in Supervised Classification
14 pages
Some Small-Sample Properties of Some Recently Proposed
No ratings yet
Some Small-Sample Properties of Some Recently Proposed
13 pages
A Multivariate Outlier Detection Method
No ratings yet
A Multivariate Outlier Detection Method
5 pages
Tugas Statistik 1
No ratings yet
Tugas Statistik 1
5 pages
Identification of Multivariate Outliers - Problems and Challenges of Visualization Methods
No ratings yet
Identification of Multivariate Outliers - Problems and Challenges of Visualization Methods
15 pages
2001, Pena, Prieto
No ratings yet
2001, Pena, Prieto
25 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
0 (4)
No ratings yet
0 (4)
8 pages
Test To Identify Outliers in Data Series
No ratings yet
Test To Identify Outliers in Data Series
16 pages
ABOD
No ratings yet
ABOD
1 page
A Review of Statistical Outlier Methods
No ratings yet
A Review of Statistical Outlier Methods
8 pages
An Enhanced Monte Carlo Outlier Detection Method
No ratings yet
An Enhanced Monte Carlo Outlier Detection Method
5 pages
Outlier Detection For High-Dimensional Data
No ratings yet
Outlier Detection For High-Dimensional Data
11 pages
Outlier Detection in Non-Gaussian Distributions Uitschieter Detectie in Niet-Gauss Verdelingen
No ratings yet
Outlier Detection in Non-Gaussian Distributions Uitschieter Detectie in Niet-Gauss Verdelingen
45 pages
Sa 16
No ratings yet
Sa 16
5 pages
Outlier Analysis in Data Mining
No ratings yet
Outlier Analysis in Data Mining
5 pages
Identification of Outliers in Multivariate Data
No ratings yet
Identification of Outliers in Multivariate Data
16 pages
20 Cs 112
No ratings yet
20 Cs 112
11 pages
Lecture 8 Data Prepration Techniques
No ratings yet
Lecture 8 Data Prepration Techniques
4 pages
ADII11 Metode Deteksi Outlier
No ratings yet
ADII11 Metode Deteksi Outlier
50 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
Duan
No ratings yet
Duan
18 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Some Methods of Detection of Outliers in Linear Regression Model-Ranjit PDF
No ratings yet
Some Methods of Detection of Outliers in Linear Regression Model-Ranjit PDF
19 pages
Outlier Analysis
No ratings yet
Outlier Analysis
28 pages
Anomaly or Outlier Detection
No ratings yet
Anomaly or Outlier Detection
14 pages
Türkan Et Al (2011) - Outlier Detection by Regression Diagnostics Based on Robust Parameter Estimates
No ratings yet
Türkan Et Al (2011) - Outlier Detection by Regression Diagnostics Based on Robust Parameter Estimates
9 pages
davies1993
No ratings yet
davies1993
12 pages
Ouliers in Statistica
0% (1)
Ouliers in Statistica
5 pages
Outlier Detection Using Robust Mahalanobis Distance: Aparna Bhide, M. PALB 7187
No ratings yet
Outlier Detection Using Robust Mahalanobis Distance: Aparna Bhide, M. PALB 7187
46 pages
Discovering Cluster-Based Local Outliers: Zengyou He, Xiaofei Xu, Shengchun Deng
No ratings yet
Discovering Cluster-Based Local Outliers: Zengyou He, Xiaofei Xu, Shengchun Deng
10 pages
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
No ratings yet
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
12 pages
Statistical Test Methods For Hypothesis Testing
No ratings yet
Statistical Test Methods For Hypothesis Testing
6 pages
Multidimensional Outlier Detection and Robust
No ratings yet
Multidimensional Outlier Detection and Robust
12 pages
Finding The Outliers That Matter
No ratings yet
Finding The Outliers That Matter
10 pages
OutlierDetectionAlgorithms
No ratings yet
OutlierDetectionAlgorithms
38 pages
Subspace Histograms For Outlier Detection in Linear Time: Saket Sathe Charu C. Aggarwal
No ratings yet
Subspace Histograms For Outlier Detection in Linear Time: Saket Sathe Charu C. Aggarwal
25 pages
sullivan2021
No ratings yet
sullivan2021
14 pages
A Fast Algorithm For The Minimum Covariance Determinant Estimator PDF
No ratings yet
A Fast Algorithm For The Minimum Covariance Determinant Estimator PDF
13 pages
How To Calculate Outliers
No ratings yet
How To Calculate Outliers
7 pages
Data Mining Slide Contents
No ratings yet
Data Mining Slide Contents
22 pages
Paella Algorithm
No ratings yet
Paella Algorithm
17 pages
IJERTV1IS6305
No ratings yet
IJERTV1IS6305
7 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Outliers PDF
No ratings yet
Outliers PDF
5 pages
6735367a5d6e24a5f185bf9c_99512104437
No ratings yet
6735367a5d6e24a5f185bf9c_99512104437
2 pages
Outliers
No ratings yet
Outliers
3 pages
Anomaly Detection
No ratings yet
Anomaly Detection
22 pages
Outlier Detection: Univariate and Multivariate
No ratings yet
Outlier Detection: Univariate and Multivariate
13 pages
Handling Outliers
No ratings yet
Handling Outliers
6 pages
ssrn-2986928
No ratings yet
ssrn-2986928
64 pages
Be A 65 Ads Exp 7
No ratings yet
Be A 65 Ads Exp 7
7 pages
Methods To Detect Different Types of Outliers: March 2016
No ratings yet
Methods To Detect Different Types of Outliers: March 2016
7 pages
Anomaly Detection
No ratings yet
Anomaly Detection
10 pages
Seo PDF
No ratings yet
Seo PDF
59 pages
Datos Atipicos
No ratings yet
Datos Atipicos
59 pages
Team2014
No ratings yet
Team2014
4 pages
Lecture 12 1
No ratings yet
Lecture 12 1
46 pages
Elastic Anomalies
No ratings yet
Elastic Anomalies
7 pages
Outlier Analysis BachNX(M7)
No ratings yet
Outlier Analysis BachNX(M7)
108 pages
Proposl Topic Appraisal and The Efficiency of Air Passenger Support Services at The Nigerian Airport
No ratings yet
Proposl Topic Appraisal and The Efficiency of Air Passenger Support Services at The Nigerian Airport
10 pages
Political Interference and Regulatory Role of The National Broadcasting Commission in Nigeria
No ratings yet
Political Interference and Regulatory Role of The National Broadcasting Commission in Nigeria
150 pages
Corporate Social Responsibility As A Conflict Management Strategy in Selected Oil Producing Communities in
No ratings yet
Corporate Social Responsibility As A Conflict Management Strategy in Selected Oil Producing Communities in
157 pages
A New Method For Converting Food Waste
No ratings yet
A New Method For Converting Food Waste
9 pages
Communication As A Management Tool For Crisis Resolution in Selected Tertiary Institutions in Delta State Nigeri
100% (2)
Communication As A Management Tool For Crisis Resolution in Selected Tertiary Institutions in Delta State Nigeri
191 pages
Assessment of Effectiveness of Educational Radio Broadcasting For Adult Literacy in Lagos State Nigeria
100% (1)
Assessment of Effectiveness of Educational Radio Broadcasting For Adult Literacy in Lagos State Nigeria
171 pages
Financial Liberalization and Investments in Nigeria - A Firm Level Analysis
No ratings yet
Financial Liberalization and Investments in Nigeria - A Firm Level Analysis
248 pages
Abstract On Nigeria's National Image and Her Foreign Policy
No ratings yet
Abstract On Nigeria's National Image and Her Foreign Policy
28 pages
Zimbabwes Post Independence Foreign Poli
No ratings yet
Zimbabwes Post Independence Foreign Poli
28 pages
Applications of Taylor Series
No ratings yet
Applications of Taylor Series
4 pages
An Analysis of Domicile Law Under The Nigerian Law
No ratings yet
An Analysis of Domicile Law Under The Nigerian Law
67 pages
Quality Control in Business Areas
No ratings yet
Quality Control in Business Areas
13 pages
An Appraisal of The Role of Government in Poverty Alleviation in Nigeria
No ratings yet
An Appraisal of The Role of Government in Poverty Alleviation in Nigeria
74 pages
Thesis Paper On Transformer Protection Schemes
No ratings yet
Thesis Paper On Transformer Protection Schemes
23 pages
Customer Relationship Management and Patronage in Service Industry (A Study of Hotel)
100% (1)
Customer Relationship Management and Patronage in Service Industry (A Study of Hotel)
76 pages
Gender Based Violence Rape of Minors in Todays Nigeria Communities
No ratings yet
Gender Based Violence Rape of Minors in Todays Nigeria Communities
13 pages
Problems Associated With Outdoor Education Leaning
0% (1)
Problems Associated With Outdoor Education Leaning
69 pages
Achieving A Hitch-Free Balloting Through Effective Logistics Management
No ratings yet
Achieving A Hitch-Free Balloting Through Effective Logistics Management
121 pages
Pattern of Overweight and Obesity PDF
No ratings yet
Pattern of Overweight and Obesity PDF
97 pages
The Provision and Management of Public Toilets in Wa Township
No ratings yet
The Provision and Management of Public Toilets in Wa Township
122 pages
Sustainable Enterpreneurship
No ratings yet
Sustainable Enterpreneurship
92 pages
A Cash Management in A Supper Market Store
67% (3)
A Cash Management in A Supper Market Store
64 pages
Extent of The Use of Instructional Materials in The Effective Teaching and Learning of Home Home Economics
75% (4)
Extent of The Use of Instructional Materials in The Effective Teaching and Learning of Home Home Economics
47 pages
1ST Term Basic 6
No ratings yet
1ST Term Basic 6
101 pages
Insert OJT Provider Name Here: LWDA 12: On-the-Job Training (OJT) Training Plan
No ratings yet
Insert OJT Provider Name Here: LWDA 12: On-the-Job Training (OJT) Training Plan
3 pages
Manual of Pesticide Residue Analysis Volume II PDF
No ratings yet
Manual of Pesticide Residue Analysis Volume II PDF
493 pages
Essay Arts
No ratings yet
Essay Arts
4 pages
Stability of Carotenoids and Vitamin A During
No ratings yet
Stability of Carotenoids and Vitamin A During
7 pages
National Grade Six Assessment Social Studies P1
No ratings yet
National Grade Six Assessment Social Studies P1
17 pages
Sequential Calibration of Options
No ratings yet
Sequential Calibration of Options
15 pages
Placement Project BMIH6006.8 - Autumn Term 2023 Handbook FINAL KF EE PDF
No ratings yet
Placement Project BMIH6006.8 - Autumn Term 2023 Handbook FINAL KF EE PDF
20 pages
Defining Critical Thinking
No ratings yet
Defining Critical Thinking
5 pages
Reemplazos OPTOACOPLADORES
67% (6)
Reemplazos OPTOACOPLADORES
42 pages
Prem Gupta SIP ICICI updated
No ratings yet
Prem Gupta SIP ICICI updated
58 pages
Specialized Cte Industry Certification List
No ratings yet
Specialized Cte Industry Certification List
14 pages
Humility Essay
100% (2)
Humility Essay
9 pages
ND - Olap Lab Manual 19-20
100% (1)
ND - Olap Lab Manual 19-20
52 pages
C1m1mapeh5 - Module 1
No ratings yet
C1m1mapeh5 - Module 1
37 pages
Per Dev Act 2.7 2.9
No ratings yet
Per Dev Act 2.7 2.9
3 pages
DLL SHS STEM Grade 12 - General Biology1 Quarter1 Week1 (Palawan Division)
100% (4)
DLL SHS STEM Grade 12 - General Biology1 Quarter1 Week1 (Palawan Division)
13 pages
Week 4 Quiz: Differential Calculus: Uses of The Derivative: Increasing and Decreasing Functions
No ratings yet
Week 4 Quiz: Differential Calculus: Uses of The Derivative: Increasing and Decreasing Functions
7 pages
Template For Capstone Project Thesis Defense OFFICIAL
No ratings yet
Template For Capstone Project Thesis Defense OFFICIAL
14 pages
كتاب أساسيات جودة الرعاية الصحية و أمان المريض المعهد 1
100% (1)
كتاب أساسيات جودة الرعاية الصحية و أمان المريض المعهد 1
133 pages
Project
No ratings yet
Project
3 pages
ICOM Code of Ethics 2013
100% (1)
ICOM Code of Ethics 2013
22 pages
Imperative Theory of John Austin
No ratings yet
Imperative Theory of John Austin
18 pages
Blept Reviewer
No ratings yet
Blept Reviewer
7 pages
RD Sharma Solutions For Class 10 Chapter 8 Quadratic Equations PDF
No ratings yet
RD Sharma Solutions For Class 10 Chapter 8 Quadratic Equations PDF
75 pages
DTE Question Bank Solution
No ratings yet
DTE Question Bank Solution
15 pages
Game Engine
No ratings yet
Game Engine
13 pages