A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams
Abstract
:1. Introduction
2. Related Works and Background
2.1. Statistical Outlier Detection
2.1.1. Parametric Method
2.1.2. Non-Parametric Method
3. Literature Review Methodology
4. Literature Review Results
4.1. Static Environment
4.1.1. Local Outlier Factor (LOF)
- (1)
- At the least, k data points (records) o’ ∈ D \ {p} maintains that d(p,o’) ≤ d(p,o);
- (2)
- At the most, k − 1 data points (records) o’ ∈ D \ {p} maintains that d(p,o’) < d(p,o).
4.1.2. Connectivity-Based Outlier Factor (COF)
4.1.3. Local Correlation Integral (LOCI)
4.1.4. Approximate Local Correlation Integral (aLOCI)
4.1.5. Cluster-Based Local Outlier Factor (CBLOF)
4.1.6. Influenced Outlierness (INFLO)
4.1.7. Local Outlier Probability (LoOP)
4.1.8. Local Density Cluster-Based Outlier Factor (LDCOF)
4.1.9. Other Local Outlier Algorithms
4.2. Stream Environment
4.2.1. Incremental Local Outlier Factor (ILOF)
4.2.2. Memory Efficient Incremental Local Outlier Factor (MILOF)
4.2.3. Density Summarization Incremental Local Outlier Factor (DILOF)
4.2.4. Other Algorithms
4.3. Applications of Local Outlier Detection
4.3.1. Intrusion Detection
4.3.2. Fraud Detection
4.3.3. Medical Applications
5. Analysis and Discussion
5.1. Computational Complexity
5.2. Strengths and Weaknesses of Existing Methods
5.2.1. Nearest Neighbor-Based Techniques
- The use of techniques to measure the distance of the data point as for the outlier score.
- To identify the outlier score, the calculation of the relative density of each data point.
5.2.2. Clustering-Based Techniques
- With the incremental model, it is simple to adjust.
- No oversight is required.
- Suitable for temporal data to detect outliers.
- Requires only a quick test step since the number of clusters needing comparison is typically small.
- They depend strongly on the efficiency of the clustering algorithm for normal data points.
- The majority of approaches that identify outliers are cluster by-products and thus are not designed to perform well for detecting outliers.
- Several cluster approaches process each point to be distributed in some clusters. This could contribute to abnormalities in a large cluster, and techniques that work under the presumption that anomalies are included in each cluster may be viewed as normal data points.
- Some algorithms demand that each data point is allocated on a cluster. A wide cluster may be used for outliers and handled by methods that often conclude that outliers are isolated.
- Various approaches to the cluster are only applicable where outliers are not part of the main clusters.
- The measurement of the clustering algorithm is complicated.
5.3. New Methods to Be Explored
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Boukerche, A.; Zheng, L.; Alfandi, O. Outlier Detection: Methods, Models, and Classification. ACM Comput. Surv. 2020, 53, 1–37. [Google Scholar]
- Cios, K.J.; Pedrycz, W.; Swiniarski, R.W. Data Mining and Knowledge Discovery; Springer: Boston, MA, USA, 1998; pp. 1–26. [Google Scholar]
- Ramírez-Gallego, S.; Krawczyk, B.; García, S.; Woźniak, M.; Herrera, F. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing 2017, 239, 39–57. [Google Scholar] [CrossRef]
- Kumar, V. Parallel and distributed computing for cybersecurity. IEEE Distrib. Syst. Online 2005, 6. [Google Scholar] [CrossRef] [Green Version]
- Spence, C.; Parra, L.; Sajda, P. Detection, synthesis and compression in mammographic image analysis with a hierarchical image probability model. In Proceedings of the IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2001), Kauai, HI, USA, 9–10 December 2001; IEEE: New York, NY, USA, 2002. [Google Scholar]
- Fujimaki, R.; Yairi, T.; Machida, K. An approach to spacecraft anomaly detection problem using kernel feature space. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, USA, 21–24 August 2005. [Google Scholar]
- Knox, E.M.; Ng, R.T. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the International Conference on very Large Data Bases, New York, NY, USA, 24–27 August 1998; pp. 392–403. [Google Scholar]
- Souiden, I.; Brahmi, Z.; Toumi, H. A Survey on Outlier Detection in the Context of Stream Mining: Review of Existing Approaches and Recommadations. In Intelligent Systems Design and Applications; Springer: Cham, Switzerland, 2017; pp. 372–383. [Google Scholar]
- Patcha, A.; Park, J.-M. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Comput. Netw. 2007, 51, 3448–3470. [Google Scholar] [CrossRef]
- Snyder, D. Online Intrusion Detection Using Sequences of System Calls. Master’s Thesis, Department of Computer Science, Florida State University, Tallahassee, FL, USA, 2001. [Google Scholar]
- Markou, M.; Singh, S. Novelty detection: A review—Part 1: Statistical approaches. Signal Process. 2003, 83, 2481–2497. [Google Scholar] [CrossRef]
- Markou, M.; Singh, S. Novelty detection: A review—Part 2: Neural network based approaches. Signal Process. 2003, 83, 2499–2521. [Google Scholar] [CrossRef]
- Hodge, V.; Austin, J. A Survey of Outlier Detection Methodologies. Artif. Intell. Rev. 2004, 22, 85–126. [Google Scholar] [CrossRef] [Green Version]
- Goldstein, M.; Uchida, S. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data. PLoS ONE 2016, 11, e0152173. [Google Scholar] [CrossRef] [Green Version]
- Wang, H.; Bah, M.J.; Hammad, M. Progress in Outlier Detection Techniques: A. Survey. IEEE Access 2019, 7, 107964–108000. [Google Scholar] [CrossRef]
- Tellis, V.M.; D’souza, D.J. Detecting Anomalies in Data Stream Using Efficient Techniques: A Review. In Proceedings of the 2018 International Conference on Control, Power, Communication and Computing Technologies (ICCPCCT), Kannur, India, 23–24 March 2018. [Google Scholar]
- Park, C.H. Outlier and anomaly pattern detection on data streams. J. Supercomput. 2019, 75, 6118–6128. [Google Scholar] [CrossRef]
- Pimentel, M.A.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A review of novelty detection. Signal Process. 2014, 99, 215–249. [Google Scholar] [CrossRef]
- Chauhan, P.; Shukla, M. A review on outlier detection techniques on data stream by using different approaches of K-Means algorithm. In Proceedings of the 2015 International Conference on Advances in Computer Engineering and Applications, Ghaziabad, India, 19–20 March 2015. [Google Scholar]
- Salehi, M.; Rashidi, L. A Survey on Anomaly detection in Evolving Data. ACM Sigkdd Explor. Newsl. 2018, 20, 13–23. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Domingues, R.; Filippone, M.; Michiardi, P.; Zouaoui, J. A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recognit. 2018, 74, 406–421. [Google Scholar] [CrossRef]
- Safaei, M.; Asadi, S.; Driss, M.; Boulila, W.; Alsaeedi, A.; Chizari, H.; Abdullah, R.; Safaei, M. A Systematic Literature Review on Outlier Detection in Wireless Sensor Networks. Symmetry 2020, 12, 328. [Google Scholar] [CrossRef] [Green Version]
- Barnett, V.; Lewis, T. Outliers in Statistical Data; Wiley: Chichester, UK, 1994. [Google Scholar]
- Zimek, A.; Filzmoser, P. There and Back Again: Outlier Detection between Statistical Reasoning and Data Mining Algorithms. 2018. Available online: https://onlinelibrary.wiley.com/doi/full/10.1002/widm.1280 (accessed on 30 November 2020).
- Eskin, E. Anomaly Detection over Noisy Data Using Learned Probability Distributions. In Proceedings of the 17th International Conference Machine Learning, Stanford, CA, USA, 17–22 July 2000; pp. 255–262. [Google Scholar]
- Maximum Likelihood Estimation. 2015. Available online: https://en.wikipedia.org/w/index.php?title=Maximum_likelihood_estimation&oldid=857905834 (accessed on 30 November 2020).
- Yang, X.; Latecki, L.J.; Pokrajac, D. Outlier Detection with Globally Optimal Exemplar-Based GMM. In Proceedings of the 2009 SIAM International Conference on Data Mining, Sparks, NA, USA, 30 April–2 May 2009. [Google Scholar]
- Tang, X.; Yuan, R.; Chen, J. Outlier detection in energy disaggregation using subspace learning and Gaussian mixture model. Int. J. Control Autom. 2015, 8, 161–170. [Google Scholar] [CrossRef]
- Zhang, J. Advancement of outlier detection: A survey. ICST Trans. Scalable Inf. Syst. 2013, 13, 1–26. [Google Scholar] [CrossRef] [Green Version]
- Satman, M.H. A new algorithm for detecting outliers in linear regression. Int. J. Stat. Probab. 2013, 2, 101–109. [Google Scholar] [CrossRef]
- Park, C.M.; Jeon, J. Regression-Based Outlier Detection of Sensor Measurements Using Independent Variable Synthesis. In Proceedings of the International Conference on Data Science, New York, NY, USA, 7–11 August 2015; pp. 78–86. [Google Scholar]
- Pavlidou, M.; Zioutas, G. Kernel Density Outlier Detector. In Proceedings of the Mathematics & Statistics Topics in Nonparametric Statistics, Chalkidiki, Greece, 15–19 June 2014; pp. 241–250. [Google Scholar]
- Latecki, L.J.; Lazarevic, A.; Pokrajac, D. Outlier Detection with Kernel Density Functions. In Machine Learning and Data Mining in Pattern Recognition; Springer: Berlin, Germany, 2007; pp. 61–75. [Google Scholar]
- Gao, J.; Hu, W.; Zhang, Z.; Wu, O. RKOF: Robust kernel- based local outlier detection. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Delhi, India, 11–14 May 2011; pp. 270–283. [Google Scholar]
- Samparthi, V.S.K.; Verma, H.K. Outlier Detection of Data in Wireless Sensor Networks Using Kernel Density Estimation. Int. J. Comput. Appl. 2010, 5, 28–32. [Google Scholar] [CrossRef]
- Edgeworth, F., XLI. On discordant observations. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1887, 23, 364–375. [Google Scholar] [CrossRef] [Green Version]
- Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection. In Wiley Series in Probability and Statistics; Wiley Online library: Hoboken, NJ, USA, 1987. [Google Scholar]
- Hawkins, D.M. Identification of Outliers. In Monographs on Applied Probability and Statistics; Springer: Berlin/Heidelberg, Germany, 1980. [Google Scholar]
- Barnett, V.; Lewis, T. Statistical Interpretation of Data; John Wiley and Sons: Hoboken, NJ, USA, 1994. [Google Scholar]
- Bakar, Z.; Mohemad, R.; Ahmad, A.; Deris, M. A Comparative Study for Outlier Detection Techniques in Data Mining. In Proceedings of the 2006 IEEE Conference on Cybernetics and Intelligent Systems, Bangkok, Thailand, 7–9 June 2006. [Google Scholar]
- Aggarwal, C.C. Outlier Analysis. In Data Mining; Springer: Berlin/Heidelberg, Germany, 2015; pp. 237–263. [Google Scholar]
- Meng, F.; Yuan, G.; Lv, S.; Wang, Z.; Xia, S. An overview on trajectory outlier detection. Artif. Intell. Rev. 2019, 52, 2437–2456. [Google Scholar] [CrossRef]
- Gökalp, E.; Güngör, O.; Boz, Y. Evaluation of Different Outlier Detection Methods for GPS Networks. Sensors 2008, 8, 7344–7358. [Google Scholar] [CrossRef] [PubMed]
- Joshi, M.V.; Agarwal, R.C.; Kumar, V. Mining needle in a haystack: Classifying rare classes via two-phase rule induction. In Proceedings of the 2001 ACM SIGMOD International conference on Management of data–SIGMOD ‘01, Santa Barbara, CA, USA, 16–18 May 2001. [Google Scholar]
- Joshi, M.V.; Agarwal, R.C.; Kumar, V. Predicting rare classes: Can boosting make any weak learner strong? In Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD ‘02, Edmonton, AB, Canada, 26–26 July 2002. [Google Scholar]
- Chawla, N.V.; Japkowicz, N.; Kotcz, A. Editorial: Special issue on learning from imbalanced data sets. ACM Sigkdd Explor. Newsl. 2004, 6, 1–6. [Google Scholar] [CrossRef]
- Sen, P.C.; Hajra, M.; Ghosh, M. Supervised Classification Algorithms in Machine Learning: A Survey and Review. In Emerging Technology in Modelling and Graphics; Springer: Berlin, Germany, 2020; pp. 99–111. [Google Scholar]
- Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1993. [Google Scholar]
- Mehrotra, K.; Mohan, C.; Ranka, S. Elements of Artificial Neural Networks; MIT Press: Cambridge, MA, USA, 1996. [Google Scholar]
- Moya, M.M.; Hush, D.R. Network constraints and multi-objective optimization for one-class classification. Neural Netw. 1996, 9, 463–474. [Google Scholar] [CrossRef]
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the Support of a High-Dimensional Distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]
- Breunig, M.M.; Kriegel, H.-P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data—SIGMOD ‘00, Dallas, TX, USA, 16–18 May 2000. [Google Scholar]
- Tang, J.; Chen, Z.; Fu, A.; Cheung, D. Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In Advances in Knowledge Discovery and Data Mining. Vol. 2336 of Lecture Notes in Computer Science; Chen, M.S., Yu, P., Liu, B., Eds.; American Association for Artificial Intelligence: Menlo Park, CA, USA, 2002; pp. 535–548. [Google Scholar]
- Papadimitriou, S.; Kitagawa, H.; Gibbons, P.; Faloutsos, C. LOCI: Fast outlier detection using the local correlation integral. In Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, 5–8 March 2003. [Google Scholar]
- He, Z.; Xu, X.; Deng, S. Discovering cluster-based local outliers. Pattern Recognit. Lett. 2003, 24, 1641–1650. [Google Scholar] [CrossRef]
- Jin, W.; Tung, A.K.H.; Han, J.; Wang, W. Ranking Outliers Using Symmetric Neighborhood Relationship. In Proceedings of the Advances in Knowledge Discovery and Data Mining Lecture Notes in Computer Science 20th Pacific-Asia Conference, Auckland, New Zealand, 19–22 April 2016; Volume 3918, pp. 577–593. [Google Scholar]
- Kriegel, H.-P.; Kröger, P.; Schubert, E.; Zimek, A. LoOP: Local Outlier Probabilities. In Proceedings of the 18th ACM conference on Information and knowledge management—CIKM ‘09, Hongkong, China, 2–6 November 2009. [Google Scholar]
- Amer, M.; Goldstein, M. Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner. In Proceedings of the 3rd RapidMiner Community Meeting and Conference, Budapest, Hungary, 28–31 August 2012; pp. 1–12. [Google Scholar]
- Chiu, A.; Fu, A.W.-C. Enhancements on local outlier detection. In Proceedings of the Seventh International Database Engineering and Applications Symposium, Hong Kong, China, 16–18 July 2003; IEEE: New York, NY, USA, 2003. [Google Scholar]
- Jiang, S.Y.; Li, Q.H.; Li, K.L.; Wang, H.; Meng, Z.L. GLOF: A new approach for mining local outlier. In Proceedings of the 2003 International Conference on Machine Learning and Cybernetics, Xi’an, China, 5 November 2003; pp. 157–162. [Google Scholar]
- Ren, D.; Wang, B.; Perrizo, W. Rdf: A density-based outlier detection method using vertical data representation. In Proceedings of the Fourth IEEE International Conference on Data Mining ICDM’04, Brighton, UK, 1–4 November 2004; pp. 503–506. [Google Scholar]
- Lozano, E.; Acuna, E. Parallel Algorithms for Distance-Based and Density-Based Outliers. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), Washington, DC, USA, 27–30 November 2005. [Google Scholar]
- Fan, H.; Zaïane, O.R.; Foss, A.; Wu, J. Resolution-based outlier factor: Detecting the top-n most outlying data points in engineering data. Knowl. Inf. Syst. 2008, 19, 31–51. [Google Scholar] [CrossRef]
- Momtaz, R.; Mohssen, N.; Gowayyed, M.A. DWOF: A Robust Density-Based Outlier Detection Approach. In Pattern Recognition and Image Analysis Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; pp. 517–525. [Google Scholar]
- Cao, K.; Shi, L.; Wang, G.; Han, D.; Bai, M. Density-based local outlier detection on uncertain data. In Proceedings of the International Conference on Web-Age Information Management, Macau, China, 16–18 June 2014; pp. 67–71. [Google Scholar]
- Goldstein, M. Anomaly Detection in Large Datasets. Ph.D. Thesis, University of Kaiserslautern, Kaiserslautern, Germany, 2016. [Google Scholar]
- Tang, B.; He, H. A local density-based approach for outlier detection. Neurocomputing 2017, 241, 171–180. [Google Scholar] [CrossRef] [Green Version]
- Vazquez, F.I.; Zseby, T.; Zimek, A. Outlier Detection Based on Low Density Models. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018. [Google Scholar]
- Ning, J.; Chen, L.; Chen, J. Relative Density-Based Outlier Detection Algorithm. In Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence-CSAI ‘18, Shenzhan, China, 11–13 December 2018. [Google Scholar]
- Su, S.; Xiao, L.; Ruan, L.; Gu, F.; Li, S.; Wang, Z.; Xu, R. An Efficient Density-Based Local Outlier Detection Approach for Scattered Data. IEEE Access 2019, 7, 1006–1020. [Google Scholar] [CrossRef]
- Zhao, Y.; Nasrullah, Z.; Hryniewicki, M.K.; Li, Z. LSCP: Locally Selective Combination in Parallel Outlier Ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining, Calgary, AB, Canada, 2–4 May 2019; pp. 585–593. [Google Scholar]
- Xu, Z.; Kakde, D.; Chaudhuri, A. Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angelas, CA, USA, 9–12 December 2019. [Google Scholar]
- Yang, P.; Wang, D.; Wei, Z.; Du, X.; Li, T. An Outlier Detection Approach Based on Improved Self-Organizing Feature Map Clustering Algorithm. IEEE Access 2019, 7, 115914–115925. [Google Scholar] [CrossRef]
- Pokrajac, D.; Lazarevic, A.; Latecki, L.J. Incremental Local Outlier Detection for Data Streams. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, Honolulu, HI, USA, 1 March–5 April 2007. [Google Scholar]
- Salehi, M.; Leckie, C.; Bezdek, J.C.; Vaithianathan, T.; Zhang, X. Fast Memory Efficient Local Outlier Detection in Data Streams. IEEE Trans. Knowl. Data Eng. 2016, 28, 3246–3260. [Google Scholar] [CrossRef]
- Na, G.S.; Kim, D.; Yu, H. DILOF: Effective and memory efficient local outlier detection in data streams. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar]
- Pokrajac, D.; Reljin, N.; Pejcic, N.; Lazarevic, A. Incremental Connectivity-Based Outlier Factor Algorithm. In Proceedings of the Visions of Computer Science-BCS International Academic Conference, London, UK, 22–24 September 2008; pp. 211–223. [Google Scholar]
- Ren, J.; Wu, Q.; Zhang, J.; Hu, C. Efficient outlier detection algorithm for heterogeneous data streams. Int. Conf. Fuzzy Syst. Knowl. Discov. 2009, 5, 259–264. [Google Scholar]
- Karimian, S.H.; Kelarestaghi, M.; Hashemi, S. I-IncLOF: Improved incremental local outlier detection for data streams. In Proceedings of the The 16th CSI International Symposium on Artificial Intelligence and Signal Processing, Shiraz, Iran, 2–3 May 2012. [Google Scholar]
- Wang, Z.; Zhao, Z.; Weng, S.; Zhang, C. Incremental multiple instance outlier detection. Neural Comput. Appl. 2015, 26, 957–968. [Google Scholar] [CrossRef]
- Salehi, M.; Leckie, C.; Bezdek, J.C.; Vaithianathan, T. Local outlier detection for data streams in sensor networks: Revisiting the utility problem invited paper. In Proceedings of the 2015 IEEE Tenth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), Singapore, 7–9 April 2015. [Google Scholar]
- Kontaki, M.; Gounaris, A.; Papadopoulos, A.N.; Tsichlas, K.; Manolopoulos, Y. Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Inf. Syst. 2016, 55, 37–53. [Google Scholar] [CrossRef]
- Zhang, L.; Lin, J.; Karim, R. Sliding Window-Based Fault Detection From High-Dimensional Data Streams. IEEE Trans. Syst. Man Cybern. Syst. 2016, 47, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Hamlet, C.; Straub, J.; Russell, M.; Kerlin, S. An incremental and approximate local outlier probability algorithm for intrusion detection and its evaluation. J. Cyber Secur. Technol. 2017, 1, 75–87. [Google Scholar] [CrossRef] [Green Version]
- Siffer, A.; Fouque, P.-A.; Termier, A.; Largouet, C. Anomaly Detection in Streams with Extreme Value Theory. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017. [Google Scholar]
- Mu, X.; Ting, K.M.; Zhou, Z.-H. Classification Under Streaming Emerging New Classes: A Solution Using Completely-Random Trees. IEEE Trans. Knowl. Data Eng. 2017, 29, 1605–1618. [Google Scholar] [CrossRef] [Green Version]
- Ishimtsev, V.; Bernstein, A.; Burnaev, E.; Nazarov, I. Conformal k-NN Anomaly Detector for Univariate Data Streams. In Proceedings of the Conformal and Probabilistic Prediction and Applications, Stockholm, UK, 14–16 June 2017; pp. 213–227. [Google Scholar]
- Chen, Q.; Luley, R.; Wu, Q.; Bishop, M.; Linderman, R.W.; Qiu, Q. AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 1622–1636. [Google Scholar] [CrossRef]
- Yao, H.; Fu, X.; Yang, Y.; Postolache, O. An Incremental Local Outlier Detection Method in the Data Stream. Appl. Sci. 2018, 8, 1248. [Google Scholar] [CrossRef] [Green Version]
- Munir, M.; Siddiqui, S.A.; Dengel, A.; Ahmed, S. DeepAnT: A Deep Learning Approach for Unsupervised Anomaly Detection in Time Series. IEEE Access 2018, 7, 1991–2005. [Google Scholar] [CrossRef]
- Manzoor, E.; Lamba, H.; Akoglu, L. xstream: Outlier detection in feature-evolving data streams. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar]
- Yang, X.; Zhou, W.; Shu, N.; Zhang, H. A Fast and Efficient Local Outlier Detection in Data Streams. In Proceedings of the 2019 International Conference on Image, Video and Signal Processing, Shanghai, China, 29–30 October 2019. [Google Scholar]
- Qin, X.; Cao, L.; Rundensteiner, E.A.; Madden, S. Scalable Kernel Density Estimation-based Local Outlier Detection over Large Data Streams. EDBT 2019, 421–432. [Google Scholar] [CrossRef]
- Kalliantzis, I.; Papadopoulos, A.; Gounaris, A.; Tsichlas, K. Efficient Distributed Outlier Detection in Data Streams; Research Report; Aristotle University of Thessaloniki: Thessaloniki, Greece, 2019. [Google Scholar]
- Cai, S.; Li, Q.; Li, S.; Yuan, G.; Sun, R. An efficient maximal frequent-pattern-based outlier detection approach for weighted data streams. Inf. Technol. Control 2019, 48, 505–521. [Google Scholar] [CrossRef]
- Reunanen, N.; Räty, T.; Jokinen, J.J.; Hoyt, T.; Culler, D. Unsupervised online detection and prediction of outliers in streams of sensor data. Int. J. Data Sci. Anal. 2019, 9, 285–314. [Google Scholar] [CrossRef] [Green Version]
- Din, S.U.; Shao, J. Exploiting evolving micro-clusters for data stream classification with emerging class detection. Inf. Sci. 2020, 507, 404–420. [Google Scholar] [CrossRef]
- Alsini, R.; Alghushairy, O.; Ma, X.; Soule, T. A Grid Partition-based Local Outlier Factor for Data Stream Processing. In Proceedings of the 4th International Conference on Applied Cognitive Computing, Las Vegas, NV, USA, 27–30 July 2020. [Google Scholar]
- Portnoy, L. Intrusion detection with unlabeled data using clustering. Ph.D. Dissertation, Columbia University, New York, NY, USA, 2000. [Google Scholar]
- García-Teodoro, P.; Díaz-Verdejo, J.; Maciá-Fernández, G.; Vázquez, E. Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput. Secur. 2009, 28, 18–28. [Google Scholar] [CrossRef]
- Yeung, D.-Y.; Ding, Y. Host-based intrusion detection using dynamic and static behavioral models. Pattern Recognit. 2003, 36, 229–243. [Google Scholar] [CrossRef] [Green Version]
- Phua, C.; Lee, V.; Smith, K.; Gayler, R. A comprehensive survey of data mining-based fraud detection research. arXiv 2010, arXiv:1009.6119. [Google Scholar]
- Thiprungsri, S.; Vasarhelyi, M. Cluster Analysis for Anomaly Detection in Accounting Data: An Audit Approach. Int. J. Digit. Account. Res. 2011, 11. [Google Scholar] [CrossRef]
- Bolton, R.J.; Hand, D.J. Unsupervised profiling methods for fraud detection. In Proceedings of the Credit Scoring and Credit Control XII Conference, Edinburgh, UK, 24–26 August 2011; pp. 235–255. [Google Scholar]
- Bansal, R.; Gaur, N.; Singh, S.N. Outlier detection: Applications and techniques in data mining. In Proceedings of the 6th International Conference-Cloud System and Big Data Engineering, Noida, India, 14–15 January 2016; pp. 373–377. [Google Scholar]
- Lin, J.; Keogh, E.; Fu, A.; Herle, H.V. Approximations to Magic: Finding Unusual Medical Time Series. In Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems (CBMS’05), Dublin, Ireland, 23–24 June 2005. [Google Scholar]
- Alsini, R.; Ma, X. Data Streaming. In Encyclopedia of Big Data; Schintler, L., McNeely, C., Eds.; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Alghushairy, O.; Ma, X. Data Storage. In Encyclopedia of Big Data; Schintler, L., McNeely, C., Eds.; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Balcázar, J.L.; Bonchi, F.; Gionis, A.; Sebag, M. Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2010; Volume 6321. [Google Scholar]
- Fawzy, A.; Mokhtar, H.M.O.; Hegazy, O. Outliers detection and classification in wireless sensor networks. Egypt. Inf. J. 2013, 14, 157–164. [Google Scholar] [CrossRef] [Green Version]
- Jain, A.K.; Dubes, R.C. Algorithms for Clustering Data; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1988. [Google Scholar]
- Boulila, W.; Farah, I.R.; Ettabaa, K.S.; Solaiman, B.; Ghézala, H.B. A data mining based approach to predict spatiotemporal changes in satellite images. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 386–395. [Google Scholar] [CrossRef]
- Han, J.; Pei, J.; Kamber, M. Data Mining: Concepts and Techniques; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
- Alghushairy, O.; Alsini, R.; Ma, X.; Soule, T. A Genetic-Based Incremental Local Outlier Factor Algorithm for Efficient Data Stream Processing. In Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis, Silicon Valley, CA, USA, 9–12 March 2020; pp. 38–49. [Google Scholar]
- Simulation of Genetic Based Incremental Local Outlier Factor. Available online: https://www.youtube.com/watch?v=YY-lHhhe2Ew&t=15s (accessed on 30 July 2019).
- Alghushairy, O.; Alsini, R.; Ma, X.; Soule, T. Improving the Efficiency of Genetic based Incremental Local Outlier Factor Algorithm for Network Intrusion Detection. In Proceedings of the 4th International Conference on Applied Cognitive Computing, Las Vegas, NV, USA, 27–30 July 2020. [Google Scholar]
Authors and Year | Algorithm | Features | Time Complexity | Taxonomy | Remarks |
---|---|---|---|---|---|
Breunig et al., 2000 [53] | LOF | Better in spherical data | O(n2) | Nearest neighbor based | Introduced local outlier detection, and it uses Euclidian distance and kNN to estimate local density. |
Tang et al., 2002 [54] | COF | Better in linear data | O(n2) | Nearest neighbor based | Overcomes the linear distribution and it uses the chaining distance to find the local outlier. |
Papadimitriou et al., 2003 [55] | LOCI | Presumes “half-Gaussian” distribution of quantity of data points density in the neighbors | O(n3) | Nearest neighbor based | Uses the same process as LoOP; the difference is the amount of instance is used rather than the distance. Long computation time but does not need parameters. |
Papadimitriou et al., 2003 [55] | aLOCI | Uses the quad trees to speed up counts | O(NLdg + NL(dg + 2d)) | Nearest neighbor based | Simple approximation of density based on occupancy and depth. Overcomes the high time complexity in LOCI. |
He et al., 2003 [56] | CBLOF | Uses a heuristic procedure for small and large clusters | O(n2) | Clustering-based | Many parameters. Ineffective in detecting the local outliers. It takes into consideration the local variation of clusters. |
Jin et al., 2006 [57] | INFLO | Reversing the data point’s nearest neighbors | O(n2) | Nearest neighbor based | Overcomes the issue of the data points within the boundaries of the clusters. Effective for data points that include clusters with diverse densities. |
Kriegel et al., 2009 [58] | LoOP | Presumes “half-Gaussian” distribution of distances | O(n2) | Nearest neighbor based | Estimates the local density by probabilistic set distance. Combines probability and statically approaches to provide outlier score. |
Amer et al., 2012 [59] | LDCOF | Estimates the clusters’ densities. Spherical distribution is presumed for the cluster members | O(n2) | Clustering-based | Many parameters. Effective in detecting the local outliers. The threshold is included to determine whether or not the points are outliers. |
Authors and Year | Algorithm | Method | Window Form | Time Complexity | S. Technique | Remark | |
---|---|---|---|---|---|---|---|
Opt | Clus | ||||||
Pokrajac et al., 2007 [75] | ILOF | Updating data when new data point np is inserted. | Landmarkwindow | O(N log N) | Need to store all data points in the memory, which requires high memory complexity and high time complexity. | ||
Salehi et al., 2016 [76] | MILOF | Summarizing the data by using k-means. | SlidingWindow | O(N log Ws) | ✓ | Summarized data points using k-means which cannot preserve the density of data and addressed the issue of high time complexity in ILOF. | |
Na et al., 2018 [77] | DILOF | Summarizing the data by using gradient descent. | SlidingWindow | O(N Ws) | ✓ | Summarized data points using gradient descent. Addressed the issues of preserving the density of data points in MILOF. In contrast, it might be stuck in the local minima. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alghushairy, O.; Alsini, R.; Soule, T.; Ma, X. A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams. Big Data Cogn. Comput. 2021, 5, 1. https://doi.org/10.3390/bdcc5010001
Alghushairy O, Alsini R, Soule T, Ma X. A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams. Big Data and Cognitive Computing. 2021; 5(1):1. https://doi.org/10.3390/bdcc5010001
Chicago/Turabian StyleAlghushairy, Omar, Raed Alsini, Terence Soule, and Xiaogang Ma. 2021. "A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams" Big Data and Cognitive Computing 5, no. 1: 1. https://doi.org/10.3390/bdcc5010001