Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-540-73499-4_6guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Outlier Detection with Kernel Density Functions

Published: 18 July 2007 Publication History
  • Get Citation Alerts
  • Abstract

    Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel unsupervised algorithm for outlier detection with a solid statistical foundation is proposed. First we modify a nonparametric density estimate with a variable kernel to yield a robust local density estimation. Outliers are then detected by comparing the local density of each point to the local density of its neighbors. Our experiments performed on several simulated data sets have demonstrated that the proposed approach can outperform two widely used outlier detection algorithms (LOF and LOCI).

    References

    [1]
    Joshi, M., Agarwal, R., Kumar, V., Nrule, P.: Mining Needles in a Haystack: Classifying Rare Classes via Two-Phase Rule Induction. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Santa Barbara, CA (May 2001).
    [2]
    Chawla, N., Lazarevic, A., Hall, L., Bowyer, K.: SMOTEBoost: Improving the Prediction of Minority Class in Boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS(LNAI), vol. 2838, pp. 107-119. Springer, Heidelberg (2003).
    [3]
    Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, New York, NY (1994).
    [4]
    Lazarevic, A., Ertoz, L., Ozgur, A., Srivastava, J., Kumar, V.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the Third SIAM Int. Conf. on Data Mining, San Francisco, CA (May 2003).
    [5]
    Lazarevic, A., Kumar, V.: Feature Bagging for Outlier Detection. In: Proc. of the ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Chicago, IL (August 2005).
    [6]
    Billor, N., Hadi, A., Velleman, P.: BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators. Computational Statistics and Data Analysis 34, 279- 298 (2000).
    [7]
    Eskin, E.: Anomaly Detection over Noisy Data using Learned Probability Distributions. In: Proceedings of the Int. Conf. on Machine Learning, Stanford University, CA (June 2000).
    [8]
    Aggarwal, C.C., Yu, P.: Outlier detection for high dimensional data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001).
    [9]
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Identifying, L.O.F.: Density Based Local Outliers. In: Proceedings of the ACM SIGMOD Conference, Dallas, TX (May 2000).
    [10]
    Knorr, E., Ng, R.: Algorithms for Mining Distance based Outliers in Large Data Sets. In: Proceedings of the Very Large Databases (VLDB) Conference, New York City, NY (August 1998).
    [11]
    Yu, D., Sheikholeslami, G., Zhang, A.: FindOut: Finding Outliers in Very Large Datasets. The Knowledge and Information Systems (KAIS) 4, 4 (2002).
    [12]
    Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data. In: Jajodia, S., Barbara, D. (eds.) Applications of Data Mining in Computer Security, Advances In Information Security, Kluwer Academic Publishers, Boston (2002).
    [13]
    Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier Detection Using Replicator Neural Networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170-180. Springer, Heidelberg (2002).
    [14]
    Medioni, G., Cohen, I., Hongeng, S., Bremond, F., Nevatia, R.: Event Detection and Analysis from Video Streams. IEEE Trans. on Pattern Analysis and Machine Intelligence 8(23), 873-889 (2001).
    [15]
    Chen, S.-C., Shyu, M.-L., Zhang, C., Strickrott, J.: Multimedia Data Mining for Traffic Video Sequences. MDM/KDD pp. 78-86 (2001).
    [16]
    Chen, S.-C., Shyu, M.-L., Zhang, C., Kashyap, R.L.: Video Scene Change Detection Method Using Unsupervised Segmentation And Object Tracking. ICME (2001).
    [17]
    Tao, Y., Papadias, D., Lian, X.: Reverse kNN search in arbitrary dimensionality. In: Proceedings of the 30th Int. Conf. on Very Large Data Bases, Toronto, Canada (September 2004).
    [18]
    Singh, A., Ferhatosmanoglu, H., Tosun, A.: High Dimensional Reverse Nearest Neighbor Queries. In: Proceedings of the ACM Int. Conf. on Information and Knowledge Management (CIKM'03), New Orleans, LA (November 2003).
    [19]
    Stanoi, I., Agrawal, D., Abbadi, A.E.: Reverse Nearest Neighbor Queries for Dynamic Databases. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dalas, TX (May 2000).
    [20]
    Anderson, J., Tjaden, B.: The inverse nearest neighbor problem with astrophysical applications. In: Proceedings of the 12th Symposium of Discrete Algorithms (SODA), Washington, DC (January 2001).
    [21]
    Pokrajac, D., Latecki, L.J., Lazarevic, A., et al.: Computational geometry issues of reverse-k nearest neighbors queries, Technical Report TR-CIS5001, Delaware State University (2006).
    [22]
    Conway, J., Sloane, N.H.: Sphere Packings, Lattices and Groups. Springer, Heidelberg (1998).
    [23]
    Preparata, F.P., Shamos, M.I.: Computational Geometry: an Introduction, 2nd Printing. Springer, Heidelberg (1988).
    [24]
    Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of the ACM SIGMOD Conference, San Jose, CA, pp. 71-79 (1995).
    [25]
    Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec. 19(2), 322-331 (1990).
    [26]
    Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for highdimensional data. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) Proceedings of the 22nd International Conference on Very Large Databases, San Francisco, U.S.A, pp. 28-39. Morgan Kaufmann Publishers, Seattle (1996).
    [27]
    Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB '98: Proceedings of the 24rd International Conference on Very Large Data Bases, San Francisco, CA, USA, pp. 194-205. Morgan Kaufmann, Seattle, Washington (1998).
    [28]
    DeMenthon, D., Latecki, L.J., Rosenfeld, A., Stückelberg, M.V.: Relevance Ranking of Video Data using Hidden Markov Model Distances and Polygon Simplification. In: Laurini, R. (ed.) VISUAL 2000. LNCS, vol. 1929, pp. 49-61. Springer, Heidelberg (2000).
    [29]
    Latecki, L.J., Miezianko, R., Megalooikonomou, V., Pokrajac, D.: Using Spatiotemporal Blocks to Reduce the Uncertainty in Detecting and Tracking Moving Objects in Video. Int. Journal of Intelligent Systems Technologies and Applications 1(3/4), 376-392 (2006).
    [30]
    Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, Heidelberg (2002).
    [31]
    Lippmann, R.P., Fried, D.J., Graf, I.J., et al.: Evaluating Intrusion Detection Systems: The 1998 DARPA Off-line Intrusion Detection Evaluation. In: Proc. DARPA Information Survivability Conf. and Exposition (DISCEX) 2000, vol. 2, pp. 12-26. IEEE Computer Society Press, Los Alamitos (2000).
    [32]
    Tcptrace software tool, www.tcptrace.org
    [33]
    UCI KDD Archive, KDD Cup Data Set (1999), www.ics.uci.edu/~kdd/databases/kddcup99/kddcup99.html
    [34]
    Tang, J., Chen, Z., Fu, A., Cheung, D.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS(LNAI), vol. 2336, pp. 535-548. Springer, Heidelberg (2002).
    [35]
    Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In: Proc. of the 19th Int. Conf. on Data Engineering (ICDE'03), Bangalore, India (March 2003).
    [36]
    Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the Ninth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, New York, NY (2003).
    [37]
    Breiman, L., Meisel, W., Purcell, E.: Variable kernel estimates of multivariate densities. Technometrics 19(2), 135-144 (1977).
    [38]
    Loftsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Statist. 36, 1049-1051 (1965).
    [39]
    Terrell, G.R., Scott, D.W.: Variable kernel density estimation. The Annals of Statistics 20(3), 1236-1265 (1992).
    [40]
    Maloof, M., Langley, P., Binford, T., Nevatia, R., Sage, S.: Improved Rooftop Detection in Aerial Images with Machine Learning. Machine Learning 53(1-2), 157-191 (2003).
    [41]
    Michalski, R., Mozetic, I., Hong, J., Lavrac, N.: The Multi-Purpose Incremental Learning System AQ15 and its Testing Applications to Three Medical Domains. In: Proceedings of the Fifth National Conference on Artificial Intelligence, Philadelphia, PA, pp. 1041-1045 (1986).
    [42]
    van der Putten, P., van Someren, M.: CoIL Challenge 2000: The Insurance Company Case, Sentient Machine Research, Amsterdam and Leiden Institute of Advanced Computer Science, Leiden LIACS Technical Report 2000-09 (June 2000).
    [43]
    Ertoz, L.: Similarity Measures, PhD dissertation, University of Minnesota (2005).
    [44]
    Provost, F., Fawcett, T.: Robust Classification for Imprecise Environments. Machine Learning 42(3), 203-231 (2001).
    [45]
    Blake, C., Merz, C.: UCI Repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
    [46]
    Roussopoulos, N., Kelly, S., Vincent, F.: Nearest Neighbor Queries. In: Proc. ACM SIGMOD, pp. 71-79 (1995).
    [47]
    Devore, J.: Probability and Statistics for Engineering and the Sciences, 6th edn. (2003).
    [48]
    Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. (1999).

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    MLDM '07: Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
    July 2007
    910 pages
    ISBN:9783540734987

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 18 July 2007

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing Unsupervised Outlier Model Selection: A Study on IREOS AlgorithmsACM Transactions on Knowledge Discovery from Data10.1145/365371918:7(1-25)Online publication date: 19-Jun-2024
    • (2024)REBKnowledge-Based Systems10.1016/j.knosys.2024.111563290:COnline publication date: 22-Apr-2024
    • (2024)Outlier detection method based on high-density iterationInformation Sciences: an International Journal10.1016/j.ins.2024.120286662:COnline publication date: 1-Mar-2024
    • (2023)Variational weighting for kernel density ratiosProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666344(5010-5027)Online publication date: 10-Dec-2023
    • (2023)Modelling a stacked dense network model for outlier prediction over medical-based heart prediction dataJournal of High Speed Networks10.3233/JHS-22207929:4(279-294)Online publication date: 14-Nov-2023
    • (2023)Self-supervised anomaly detection of medical images based on dual-module discrepancyProceedings of the 5th ACM International Conference on Multimedia in Asia10.1145/3595916.3626388(1-7)Online publication date: 6-Dec-2023
    • (2023)RODD: Robust Outlier Detection in Data CubesBig Data Analytics and Knowledge Discovery10.1007/978-3-031-39831-5_30(325-339)Online publication date: 28-Aug-2023
    • (2022)Detection of local and clustered outliers based on the density–distance decision graphEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.104719110:COnline publication date: 1-Apr-2022
    • (2022)Anomaly Detection Requires Better RepresentationsComputer Vision – ECCV 2022 Workshops10.1007/978-3-031-25069-9_4(56-68)Online publication date: 23-Oct-2022
    • (2022)Similarity-Based Unsupervised Evaluation of Outlier DetectionSimilarity Search and Applications10.1007/978-3-031-17849-8_19(234-248)Online publication date: 5-Oct-2022
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media