Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1760894.1760948acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

HOT: hypergraph-based outlier test for categorical data

Published: 30 April 2003 Publication History

Abstract

As a widely used data mining technique, outlier detection is a process which aims at finding anomalies with good explanations. Most existing methods are designed for numeric data. They will have problems with real-life applications that contain categorical data. In this paper, we introduce a novel outlier mining method based on a hypergraph model. Since hypergraphs precisely capture the distribution characteristics in data subspaces, this method is effective in identifying anomalies in dense subspaces and presents good interpretations for the local outlierness. By selecting the most relevant subspaces, the problem of "curse of dimensionality" in very large databases can also be ameliorated. Furthermore, the connectivity property is used to replace the distance metrics, so that the distance-based computation is not needed anymore, which enhances the robustness for handling missing-value data. The fact, that connectivity computation facilitates the aggregation operations supported by most SQL-compatible database systems, makes the mining process much efficient. Finally, experiments and analysis show that our method can find outliers in categorical data with good performance and quality.

References

[1]
C. Aggarwal and P. Yu. Outlier detection for high dimensional data. In Proc. of SIGMOD'2001, pages 37-47, 2001.
[2]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. of VLDB'94, pages 487-499, 1994.
[3]
V. Barnett and T. Lewis. Outliers In Statistical Data. John Wiley, Reading, New York, 1994.
[4]
M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander. Optics-of: Identifying local outliers. In Proc. of PKDD'99, pages 262-270, 1999.
[5]
M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander. Lof: Identifying density-based local outliers. In Proc. of SIGMOD'2000, pages 93-104, 2000.
[6]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of KDD'96, pages 226-231, 1996.
[7]
S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. of SIGMOD'98, pages 73-84, 1998.
[8]
D. Hawkins. Identification of Outliers. Chapman and Hall, Reading, London, 1980.
[9]
F. Hussain, H. Liu, C. L. Tan, and M. Dash. Discretization: An enabling technique. Technical Report TRC6/99, National University of Singapore, School of Computing, 1999.
[10]
W. Jin, A. K. Tung, and J. Han. Mining top-n local outliers in large databases. In Proc. of KDD'2001, pages 293-298, 2001.
[11]
E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. In Proc. of VLDB'98, pages 392-403, 1998.
[12]
E. Knorr and R. Ng. Finding intensional knowledge of distance-based outliers. In Proc. of VLDB'99, pages 211-222, 1999.
[13]
G. Merz and P. Murphy. Uci repository of machine learning databases. Technical Report, University of California, Department of Information and Computer Science: http://www.ics.uci.edu/mlearn/MLRepository.html, 1996.
[14]
F. Preparata and M. Shamos. Computational Geometry: an Introduction. Springer-Verlag, Reading, New York, 1988.
[15]
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proc. of SIGMOD'2000, pages 427-438, 2000.
[16]
I. Ruts and P. Rousseeuw. Computing depth contours of bivariate point clouds. Journal of Computational Statistics and data Analysis, 23:153-168, 1996.
[17]
S. Shekhar, C.-T. Lu, and P. Zhang. Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In Proc. of KDD'2001, 2001.

Cited By

View all
  • (2019)Anomaly Detection Methods for Categorical DataACM Computing Surveys10.1145/331273952:2(1-35)Online publication date: 30-May-2019
  • (2016)DBSherlockProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915218(1599-1614)Online publication date: 26-Jun-2016
  • (2015)Minimal infrequent pattern based approach for mining outliers in data streamsExpert Systems with Applications: An International Journal10.1016/j.eswa.2014.09.05342:4(1998-2012)Online publication date: 1-Mar-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PAKDD '03: Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
April 2003
610 pages
ISBN:3540047603
  • Editors:
  • Kyu-Young Wang,
  • Jongwoo Jeon,
  • Kyuseok Shim,
  • Jaideep Srivastava

Sponsors

  • Statistical Research Center for Complex Systems
  • KAIST: Korea Advanced Institute of Science and Technology
  • The Korean Datamining Society
  • Advanced Information Technology Research Center
  • Korea Info Sci Society: Korea Information Science Society
  • Air Force Office of Scientific Research/Asian Office of Aerospace R&D

In-Cooperation

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 30 April 2003

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Anomaly Detection Methods for Categorical DataACM Computing Surveys10.1145/331273952:2(1-35)Online publication date: 30-May-2019
  • (2016)DBSherlockProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915218(1599-1614)Online publication date: 26-Jun-2016
  • (2015)Minimal infrequent pattern based approach for mining outliers in data streamsExpert Systems with Applications: An International Journal10.1016/j.eswa.2014.09.05342:4(1998-2012)Online publication date: 1-Mar-2015
  • (2011)Parameter-free anomaly detection for categorical dataProceedings of the 7th international conference on Machine learning and data mining in pattern recognition10.5555/2033831.2033841(112-126)Online publication date: 30-Aug-2011
  • (2009)Anomaly detectionACM Computing Surveys10.1145/1541880.154188241:3(1-58)Online publication date: 30-Jul-2009
  • (2009)Detecting outlying properties of exceptional objectsACM Transactions on Database Systems10.1145/1508857.150886434:1(1-62)Online publication date: 23-Apr-2009
  • (2009)Projected outlier detection in high-dimensional mixed-attributes data setExpert Systems with Applications: An International Journal10.1016/j.eswa.2008.08.03036:3(7104-7113)Online publication date: 1-Apr-2009
  • (2005)An optimization model for outlier detection in categorical dataProceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I10.1007/11538059_42(400-409)Online publication date: 23-Aug-2005

View Options

Get Access

Login options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media