Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3317640.3317653acmotherconferencesArticle/Chapter ViewAbstractPublication PagesivspConference Proceedingsconference-collections
research-article

A Fast and Efficient Local Outlier Detection in Data Streams

Published: 25 February 2019 Publication History

Abstract

Outlier detection in data streams is used in many applications, such as network flow monitoring, stock trading fluctuation detection and network intrusion detection [1]. These applications require that the algorithms finish outlier detection effectively in a limited amount of time and memory space. Local Outlier Factor (LOF) is a fundamental density-based outlier detection algorithm [2], it determines whether an object is an outlier by calculating LOF score of each observer. There are many LOF-based algorithms that have achieved excellent results with respect to outlier detection in data streams, while most of existing LOF-based algorithms have problems with excessive computation. In this paper, we propose a fast outlier detection algorithm in data streams, the algorithm effectively reduces the LOF calculation of the whole data by Z-score pruning. The algorithm consists of three phases. Firstly, generate the prediction data through the generator. Secondly, judge whether the observation object is a potential outlier by the Z-score of the residual from the origin value and the prediction value. Finally, calculate the LOF of the observation object in the current time window according to the judgment result of the previous step. It is proved by experiments that our algorithm effectively reduces the detection time consumption through Z-score pruning under the condition of ensuring the detection accuracy.

References

[1]
Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining.
[2]
Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-based local outliers. In ACM sigmod record (Vol. 29, No. 2, pp. 93--104). ACM.
[3]
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
[4]
Salehi, M., Leckie, C., Bezdek, J. C., Vaithianathan, T., & Zhang, X. (2016). Fast memory efficient local outlier detection in data streams. IEEE Transactions on Knowledge and Data Engineering, 28(12), 3246--3260.
[5]
Na, G. S., Kim, D., & Yu, H. (2018, July). DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1993--2002). ACM.
[6]
Pokrajac, D., Lazarevic, A., & Latecki, L. J. (2007, March). Incremental local outlier detection for data streams. In 2007 IEEE symposium on computational intelligence and data mining (pp. 504--515). IEEE.
[7]
Hawkins, D. M. (1980). Identification of outliers (Vol. 11). London: Chapman and Hall.
[8]
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, No. 34, pp. 226--231).
[9]
Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining (pp. 413--422). IEEE.
[10]
Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1), 3.
[11]
Barnett, V. (1978). Outliers in statistical data (No. 04; QA276, B32.).
[12]
Leroy, A. M., & Rousseeuw, P. J. (1987). Robust regression and outlier detection. J. Wiley&Sons, New York.
[13]
Tang, J., Chen, Z., Fu, A. W. C., & Cheung, D. W. (2002, May). Enhancing effectiveness of outlier detections for low density patterns. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 535--548). Springer, Berlin, Heidelberg.
[14]
Papadimitriou, S., Kitagawa, H., Gibbons, P. B., & Faloutsos, C. (2003, March). Loci: Fast outlier detection using the local correlation integral. In Proceedings 19th International Conference on Data Engineering (Cat. No. 03CH37405) (pp. 315--326). IEEE.
[15]
Goldstein, M. (2012, November). FastLOF: An expectation-maximization based local outlier detection algorithm. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) (pp. 2282--2285). IEEE.
[16]
Olah, C. (2015). Understanding lstm networks, 2015. URL http://colah. github. io/posts/2015-08-Understanding-LSTMs.
[17]
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735--1780.
[18]
Valenzuela, O., Rojas, I., Rojas, F., Pomares, H., Herrera, L. J., Guillén, A.,... & Pasadas, M. (2008). Hybridization of intelligent techniques and ARIMA models for time series prediction. Fuzzy sets and systems, 159(7), 821--845.
[19]
Cipra, T., & Hanzák, T. (2008). Exponential smoothing for irregular time series. Kybernetika, 44(3), 385--399.

Cited By

View all
  • (2023)An improved k-NN anomaly detection framework based on locality sensitive hashing for edge computing environmentIntelligent Data Analysis10.3233/IDA-21646127:5(1267-1285)Online publication date: 6-Oct-2023
  • (2023)Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly DetectionIEEE Transactions on Big Data10.1109/TBDATA.2023.32655099:4(1198-1209)Online publication date: 1-Aug-2023
  • (2022)Towards a deep learning-based outlier detection approach in the context of streaming dataJournal of Big Data10.1186/s40537-022-00670-89:1Online publication date: 16-Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IVSP '19: Proceedings of the 2019 International Conference on Image, Video and Signal Processing
February 2019
140 pages
ISBN:9781450361750
DOI:10.1145/3317640
© 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • Wuhan Univ.: Wuhan University, China
  • City University of Hong Kong: City University of Hong Kong

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data streams
  2. LOF-based
  3. Outlier Detection
  4. Z-score Pruning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IVSP 2019

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)4
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An improved k-NN anomaly detection framework based on locality sensitive hashing for edge computing environmentIntelligent Data Analysis10.3233/IDA-21646127:5(1267-1285)Online publication date: 6-Oct-2023
  • (2023)Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly DetectionIEEE Transactions on Big Data10.1109/TBDATA.2023.32655099:4(1198-1209)Online publication date: 1-Aug-2023
  • (2022)Towards a deep learning-based outlier detection approach in the context of streaming dataJournal of Big Data10.1186/s40537-022-00670-89:1Online publication date: 16-Dec-2022
  • (2022)LINK-GUARD: An Effective and Scalable Security Framework for Link Discovery in SDN NetworksIEEE Access10.1109/ACCESS.2022.322989910(130233-130252)Online publication date: 2022
  • (2021)Towards Sustainable Energy Efficiency With Intelligent Electricity Theft Detection in Smart Grids Emphasising Enhanced Neural NetworksIEEE Access10.1109/ACCESS.2021.30565669(25036-25061)Online publication date: 2021
  • (2021)Low-cost sensor outlier detection framework for on-line monitoring of particle pollutants in multiple scenariosEnvironmental Science and Pollution Research10.1007/s11356-021-14419-y28:38(52963-52980)Online publication date: 21-May-2021
  • (2020)A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data StreamsBig Data and Cognitive Computing10.3390/bdcc50100015:1(1)Online publication date: 29-Dec-2020
  • (2020)Benchmarking performance of different noise detection techniques on data stream clusteringProceedings of the 10th Euro-American Conference on Telematics and Information Systems10.1145/3401895.3401898(1-6)Online publication date: 25-Nov-2020
  • (2020)Electricity Theft Detection using Pipeline in Machine Learning2020 International Wireless Communications and Mobile Computing (IWCMC)10.1109/IWCMC48107.2020.9148453(2138-2142)Online publication date: Jun-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media