Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/375663.375668acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Outlier detection for high dimensional data

Published: 01 May 2001 Publication History

Abstract

The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. In fact, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective of proximity-based definitions. Consequently, for high dimensional data, the notion of finding meaningful outliers becomes substantially more complex and non-obvious. In this paper, we discuss new techniques for outlier detection which find the outliers by studying the behavior of projections from the data set.

References

[1]
C. C. Aggarwal. Re-designing Distance Functions and Distance Based Applications for High Dimensional Data. ACM SIGMOD Record, March 2001.]]
[2]
C. C. Aggarwal et al. Fast Algorithms for Projected Clustering. ACM SIGMOD Conference Proceedings, 1999.]]
[3]
C. C. Aggarwal, P. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD Conference Proceedings, 2000.]]
[4]
C. C. Aggarwal, J. B. Orlin, R. P. Tai. Optimized Crossover for the Independent Set Problem. Operations Research 45(2), March 1997.]]
[5]
R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD Conference Proceedings, 1998.]]
[6]
R. Agrawal, T. Imielinski, A. Swami. Mining Association Rules Between Sets of Items in Large Databases. ACM SIGMOD Conference Proceedings, 1993.]]
[7]
A. Arning, R. Agrawal, P. Raghavan. A Linear Method for Deviation Detection in Large Databases. KDD Conference Proceedings, 1995.]]
[8]
V. Barnett, T. Lewis. Outliers in Statistical Data. John Wiley and Sons, NY 1994.]]
[9]
K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft. When is Nearest Neighbors Meaningful? ICDT Conference Proceedings, 1999.]]
[10]
M. M. Breunig, H.-P. Kriegel, R. T. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. ACM SIGMOD Conference Proceedings, 2000.]]
[11]
K. Chakrabarti, S. Mehrotra. Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB Conference Proceedings, 2000.]]
[12]
C. Darwin. The Origin of the Species by Natural Selection. Published, 1859.]]
[13]
D. Hawkins. Identification of Outliers, Chapman and Hall, London, 1980.]]
[14]
K. A. De Jong. Analysis of the Behavior of a Class of Genetic Adaptive Systems. Ph. D. Dissertation, University of Michigan, Ann Arbor, MI, 1975.]]
[15]
M. Ester, H.-P. Kriegel, J. Sander, X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD Conference Proceedings, 1996.]]
[16]
J. J. Grefenstette. Genesis Software Version 5.0. Available at http://www.santafe.edu.]]
[17]
D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley, Reading, MA, 1989.]]
[18]
S. Guha, R. Rastogi, K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD Conference Proceedings, 1998.]]
[19]
A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the nearest neighbor in high dimensional spaces? VLDB Conference Proceedings, 2000.]]
[20]
J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor MI 1975.]]
[21]
S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi. Optimization by Simulated Annealing. Science (220) (4589): pages 671-680, 1983.]]
[22]
E. Knorr, R. Ng. Algorithms for Mining Distance-based Outliers in Large Data Sets. VLDB Conference Proceedings, September 1998.]]
[23]
E. Knorr, R. Ng. Finding Intensional Knowledge of Distance-based Outliers. VLDB Conference Proceedings, 1999.]]
[24]
R. Ng, J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB Conference Proceedings, pages 144-155, 1994.]]
[25]
S. Ramaswamy, R. Rastogi, K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. ACM SIGMOD Conference Proceedings, 2000.]]
[26]
S. Sarawagi, R. Agrawal, N. Meggido. Discovery Driven Exploration of OLAP Data Cubes. EDBT Conference Proceedings, 1998.]]
[27]
T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD Conference Proceedings, 1996.]]

Cited By

View all
  • (2024)LSTM-Autoencoder Based Anomaly Detection Using Vibration Data of Wind TurbinesSensors10.3390/s2409283324:9(2833)Online publication date: 29-Apr-2024
  • (2024)Outlier analysis for accelerating clinical discovery: An augmented intelligence framework and a systematic reviewPLOS Digital Health10.1371/journal.pdig.00005153:5(e0000515)Online publication date: 22-May-2024
  • (2024)GAD-NR: Graph Anomaly Detection via Neighborhood ReconstructionProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635767(576-585)Online publication date: 4-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
May 2001
630 pages
ISBN:1581133324
DOI:10.1145/375663
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS01
Sponsor:

Acceptance Rates

SIGMOD '01 Paper Acceptance Rate 44 of 293 submissions, 15%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)309
  • Downloads (Last 6 weeks)17
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LSTM-Autoencoder Based Anomaly Detection Using Vibration Data of Wind TurbinesSensors10.3390/s2409283324:9(2833)Online publication date: 29-Apr-2024
  • (2024)Outlier analysis for accelerating clinical discovery: An augmented intelligence framework and a systematic reviewPLOS Digital Health10.1371/journal.pdig.00005153:5(e0000515)Online publication date: 22-May-2024
  • (2024)GAD-NR: Graph Anomaly Detection via Neighborhood ReconstructionProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635767(576-585)Online publication date: 4-Mar-2024
  • (2024)An Efficient Adaptive Multi-Kernel Learning With Safe Screening Rule for Outlier DetectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333070836:8(3656-3669)Online publication date: Aug-2024
  • (2024)Outliers Robust Unsupervised Feature Selection for Structured Sparse SubspaceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329722636:3(1234-1248)Online publication date: Mar-2024
  • (2024)FBOD: An Outlier Detection Algorithm based on Data Features Suitable for Processing Large-scale Datasets on Distributed Platforms2024 10th International Symposium on System Security, Safety, and Reliability (ISSSR)10.1109/ISSSR61934.2024.00039(267-276)Online publication date: 16-Mar-2024
  • (2024)Real-time taxi spatial anomaly detection based on vehicle trajectory predictionTravel Behaviour and Society10.1016/j.tbs.2023.10069834(100698)Online publication date: Jan-2024
  • (2024)A comprehensive survey on machine learning applications for drilling and blasting in surface miningMachine Learning with Applications10.1016/j.mlwa.2023.10051715(100517)Online publication date: Mar-2024
  • (2024)Random clustering-based outlier detectorInformation Sciences: an International Journal10.1016/j.ins.2024.120498667:COnline publication date: 1-May-2024
  • (2024)An optimized outlier detection function for multibeam echo-sounder dataComputers & Geosciences10.1016/j.cageo.2024.105572186:COnline publication date: 1-Apr-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media