Abstract
Sometimes novel or outlier data has to be detected. The outliers may indicate some interesting rare event, or they should be disregarded because they cannot be reliably processed further. In the ideal case that the objects are represented by very good features, the genuine data forms a compact cluster and a good outlier measure is the distance to the cluster center. This paper proposes three new formulations to find a good cluster center together with an optimized ℓ p -distance measure. Experiments show that for some real world datasets very good classification results are obtained and that, more specifically, the ℓ1-distance is particularly suited for datasets containing discrete feature values.
Chapter PDF
Similar content being viewed by others
References
Tax, D.: One-class classification. PhD thesis, Delft University of Technology (2001), http://ict.ewi.tudelft.nl/~davidt/thesis.pdf
Koch, M., Moya, M., Hostetler, L., Fogler, R.: Cueing, feature discovery and one-class learning for synthetic aperture radar automatic target recognition. Neural Networks 8(7/8), 1081–1102 (1995)
Huber, P.: Robust statistics: a review. Ann. Statist. 43, 1041 (1972)
Rousseeuw, P., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Pisier, G.: The volume of convex bodies and Banach space geometry. Cambridge University Press, Cambridge (1989)
Tax, D., Duin, R.: Uniform object generation for optimizing one-class classifiers. Journal for Machine Learning Research, 155–173 (2001)
Barnett, V., Lewis, T.: Outliers in statistical data, 2nd edn. Wiley series in probability and mathematical statistics. John Wiley & Sons Ltd, Chichester (1978)
Nelder, J., Mead, R.: A simplex method for function minimization. Computer journal 7(4), 308–311 (1965)
He, X., Simpson, D., Portnoy, S.: Breakdown robustness of tests. Journal of the American Statistical Association 85(40), 446–452 (1990)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Duin, R.: On the choice of the smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C-25(11), 1175–1179 (1976)
Tax, D., Duin, R.: Support vector data description. Machine Learning 54(1), 45–66 (2004)
Bradley, A.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Lloyd, S.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tax, D.M.J., Juszczak, P., Pękalska, E., Duin, R.P.W. (2006). Outlier Detection Using Ball Descriptions with Adjustable Metric. In: Yeung, DY., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2006. Lecture Notes in Computer Science, vol 4109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11815921_64
Download citation
DOI: https://doi.org/10.1007/11815921_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37236-3
Online ISBN: 978-3-540-37241-7
eBook Packages: Computer ScienceComputer Science (R0)