Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance

J Zhang, H Wang - Knowledge and information systems, 2006 - Springer
Knowledge and information systems, 2006Springer
In this paper, we identify a new task for studying the outlying degree (OD) of high-
dimensional data, ie finding the subspaces (subsets of features) in which the given points
are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier
detection techniques fail to handle this new problem, we propose a novel detection
algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the
outlying subspaces of high-dimensional data efficiently. The intuitive idea of HighDOD is that …
Abstract
In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features) in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and itsknearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing outlier detection methods cannot fulfill this new task effectively.
Springer