Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Explainable contextual anomaly detection using quantile regression forests

Published: 09 August 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Traditional anomaly detection methods aim to identify objects that deviate from most other objects by treating all features equally. In contrast, contextual anomaly detection methods aim to detect objects that deviate from other objects within a context of similar objects by dividing the features into contextual features and behavioral features. In this paper, we develop connections between dependency-based traditional anomaly detection methods and contextual anomaly detection methods. Based on resulting insights, we propose a novel approach to inherently interpretable contextual anomaly detection that uses Quantile Regression Forests to model dependencies between features. Extensive experiments on various synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art anomaly detection methods in identifying contextual anomalies in terms of accuracy and interpretability.

    References

    [1]
    Aggarwal CC and Sathe S Outlier ensembles: an introduction 2017 Berlin Springer
    [2]
    Ahmad T, Munir A, Bhatti SH, et al. Survival analysis of heart failure patients: a case study PLoS ONE 2017 12 7 e0181001
    [3]
    Ahmed M, Mahmood AN, and Hu J A survey of network anomaly detection techniques J Netw Comput Appl 2016 60 19-31
    [4]
    Ahmed M, Mahmood AN, and Islam MR A survey of anomaly detection techniques in financial domain Futur Gener Comput Syst 2016 55 278-288
    [5]
    Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: European conference on principles of data mining and knowledge discovery, Springer, pp 15–27
    [6]
    Babbar S, Chawla S (2012) Mining causal outliers using gaussian Bayesian networks. In: 2012 IEEE 24th international conference on tools with artificial intelligence. IEEE, pp 97–104
    [7]
    Breiman L Random forests Mach Learn 2001 45 5-32
    [8]
    Breiman L, Friedman JH, Olshen RA, et al. Classification and regression trees 2017 London Routledge
    [9]
    Breunig MM, Kriegel HP, Ng RT, et al (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 93–104
    [10]
    Buczak AL and Guven E A survey of data mining and machine learning methods for cyber security intrusion detection IEEE Commun Surv Tutor 2015 18 2 1153-1176
    [11]
    Cabero I, Epifanio I, Piérola A, et al. Archetype analysis: a new subspace outlier detection approach Knowl-Based Syst 2021 217 106 830
    [12]
    Cai Q, He H, and Man H Spatial outlier detection based on iterative self-organizing learning model Neurocomputing 2013 117 161-172
    [13]
    Calikus E, Nowaczyk S, Bouguelia MR, et al (2021) Wisdom of the contexts: active ensemble learning for contextual anomaly detection. arXiv preprint arXiv:2101.11560
    [14]
    Campos GO, Zimek A, Sander J, et al. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study Data Min Knowl Disc 2016 30 4 891-927
    [15]
    Chandola V, Banerjee A, and Kumar V Anomaly detection: a survey ACM Comput Surv (CSUR) 2009 41 3 1-58
    [16]
    Färber I, Günnemann S, Kriegel HP, et al (2010) On using class-labels in evaluation of clusterings. In: MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD, p 1
    [17]
    Fokkema H, de Heide R, van Erven T (2022) Attribution-based explanations that provide recourse cannot be robust. arXiv preprint arXiv:2205.15834
    [18]
    Friedman M The use of ranks to avoid the assumption of normality implicit in the analysis of variance J Am Stat Assoc 1937 32 200 675-701
    [19]
    Goldstein M, Dengel A (2012) Histogram-based outlier score (hbos): a fast unsupervised anomaly detection algorithm. KI-2012: Poster and Demo Track pp 59–63
    [20]
    Gower JC A general coefficient of similarity and some of its properties Biometrics 1971 1971 857-871
    [21]
    Harrison D Jr and Rubinfeld DL Hedonic housing prices and the demand for clean air J Environ Econ Manag 1978 5 1 81-102
    [22]
    Hawkins DM Identification of outliers 1980 Berlin Springer
    [23]
    Hayes MA, Capretz MA (2014) Contextual anomaly detection in big sensor data. In: 2014 IEEE international congress on Big Data. IEEE, pp 64–71
    [24]
    Hong C, Hauskrecht M (2015) Multivariate conditional anomaly detection and its clinical application. In: Proceedings of the AAAI conference on artificial intelligence
    [25]
    Huang Ya, Fan W, Lee W, et al (2003) Cross-feature analysis for detecting ad-hoc routing anomalies. In: 23rd international conference on distributed computing systems, 2003. Proceedings. IEEE, pp 478–487
    [26]
    Hwang I, Kim S, Kim Y, et al. A survey of fault detection, isolation, and reconfiguration methods IEEE Trans Control Syst Technol 2009 18 3 636-653
    [27]
    Kampstra P Beanplot: a boxplot alternative for visual comparison of distributions J Stat Softw 2008 28 1 1-9
    [28]
    Kandanaarachchi S, Muñoz MA, Hyndman RJ, et al. On normalization and algorithm selection for unsupervised outlier detection Data Min Knowl Disc 2020 34 2 309-354
    [29]
    Koenker R and Hallock KF Quantile regression J Econ Perspect 2001 15 4 143-156
    [30]
    Kriegel HP, Kröger P, Schubert E et al (2009) Outlier detection in axis-parallel subspaces of high dimensional data. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 831–838
    [31]
    Kriegel HP, Kröger P, Schubert E et al (2012) Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th international conference on data mining. IEEE, pp 379–388
    [32]
    Kuo YH, Li Z, Kifer D (2018) Detecting outliers in data with correlated measures. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 287–296
    [33]
    Lei J, G’Sell M, Rinaldo A, et al. Distribution-free predictive inference for regression J Am Stat Assoc 2018 113 523 1094-1111
    [34]
    Liang J, Parthasarathy S (2016) Robust contextual outlier detection: Where context meets sparsity. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 2167–2172
    [35]
    Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 eighth IEEE international conference on data mining. IEEE, pp 413–422
    [36]
    Liu N, Shin D, Hu X (2018) Contextual outlier interpretation. In: Proceedings of the 27th international joint conference on artificial intelligence, pp 2461–2467
    [37]
    Li Z, Zhu Y, van Leeuwen M (2022) A survey on explainable anomaly detection. arXiv preprint arXiv:2210.06959
    [38]
    Lu S, Liu L, Li J, et al. Lopad: a local prediction approach to anomaly detection Adv Knowl Discov Data Min 2020 12085 660
    [39]
    Lu S, Liu L, Li J et al (2020a) Dependency-based anomaly detection: framework, methods and benchmark. arXiv preprint arXiv:2011.06716
    [40]
    Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Advances in neural information processing systems 30
    [41]
    Meghanath M, Pai D, Akoglu L (2018) Conout: Con textual outlier detection with multiple contexts: application to ad fraud. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 139–156
    [42]
    Meinshausen N Quantile regression forests J Mach Learn Res 2006 7 6 983-999
    [43]
    Micenková B, McWilliams B, Assent I (2014) Learning outlier ensembles: the best of both worlds–supervised and unsupervised. In: Proceedings of the ACM SIGKDD 2014 Workshop on Outlier Detection and Description under Data Diversity (ODD2), New York, NY, USA, Citeseer, pp 51–54
    [44]
    Micenková B, McWilliams B, Assent I (2015) Learning representations for outlier detection on a budget. arXiv preprint arXiv:1507.08104
    [45]
    Nemenyi PB Distribution-free multiple comparisons 1963 Princeton Princeton University
    [46]
    Nguyen HV, Müller E, Vreeken J, et al (2013) Cmi: An information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: Proceedings of the 2013 SIAM international conference on data mining, SIAM, pp 198–206
    [47]
    Noto K, Brodley C, Slonim D (2010) Anomaly detection using an ensemble of feature models. In: 2010 IEEE international conference on data mining. IEEE, pp 953–958
    [48]
    Pang G, Shen C, Cao L, et al. Deep learning for anomaly detection: a review ACM Comput Surv (CSUR) 2021 54 2 1-38
    [49]
    Panjei E, Gruenwald L, Leal E, et al. A survey on outlier explanations VLDB J 2022 31 5 977-1008
    [50]
    Pasillas-Díaz JR and Ratté S An unsupervised approach for combining scores of outlier detection techniques, based on similarity measures Electron Notes Theor Comput Sci 2016 329 61-77
    [51]
    Salvador S, Chan P, Brodie J (2004) Learning states and rules for time series anomaly detection. In: FLAIRS conference, pp 306–311
    [52]
    Scutari M, Scutari MM, MMPC HP (2019) Package ‘bnlearn’. Bayesian network structure learning, parameter learning and inference, R package version 4(1)
    [53]
    Segal M and Xiao Y Multivariate random forests. Wiley interdisciplinary reviews Data Min Knowl Discov 2011 1 1 80-87
    [54]
    Seger C (2018) An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing
    [55]
    Smets K, Verdonk B, Jordaan EM (2009) Discovering novelty in spatio/temporal data using one-class support vector machines. In: 2009 International joint conference on neural networks. IEEE, pp 2956–2963
    [56]
    Song X, Wu M, Jermaine C, et al. Conditional anomaly detection IEEE Trans Knowl Data Eng 2007 19 5 631-645
    [57]
    Spinosa EJ and Carvalho A Support vector machines for novel class detection in bioinformatics Genet Mol Res 2005 4 3 608-15
    [58]
    Tang G, Pei J, Bailey J, et al. Mining multidimensional contextual outliers from categorical relational data Intelli Data Anal 2015 19 5 1171-1192
    [59]
    Teng CM (1999) Correcting noisy data. In: ICML, Citeseer, pp 239–248
    [60]
    Valko M, Kveton B, Valizadegan H et al (2011) Conditional anomaly detection with soft harmonic functions. In: 2011 IEEE 11th international conference on data mining. IEEE, pp 735–743
    [61]
    Wang H, Bah MJ, and Hammad M Progress in outlier detection techniques: a survey IEEE Access 2019 7 107964-108000
    [62]
    Wong WK, Moore AW, Cooper GF, et al (2003) Bayesian network anomaly pattern detection for disease outbreaks. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 808–815
    [63]
    Xu H, Wang Y, Jian S, et al. Beyond outlier detection: Outlier interpretation by attention-guided triplet deviation network Proceedings of the Web Conference 2021 2021 1328-1339
    [64]
    Yaramakala S, Margaritis D (2005) Speculative markov blanket discovery for optimal feature selection. In: Fifth IEEE international conference on data mining (ICDM’05), IEEE, pp 4
    [65]
    Zhao Y, Nasrullah Z, and Li Z Pyod: a python toolbox for scalable outlier detection J Mach Learn Res 2019 20 1-7
    [66]
    Zheng G, Brantley SL, Lauvaux T et al (2017) Contextual spatial outlier detection with metric learning. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 2161–2170

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Data Mining and Knowledge Discovery
    Data Mining and Knowledge Discovery  Volume 37, Issue 6
    Nov 2023
    450 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 09 August 2023
    Accepted: 15 July 2023
    Received: 01 September 2022

    Author Tags

    1. Anomaly detection
    2. Anomaly explanation
    3. Outlier detection
    4. Contextual anomaly detection
    5. Quantile regression forests

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media