Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3338501.3357368acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

CADENCE: Conditional Anomaly Detection for Events Using Noise-Contrastive Estimation

Published: 11 November 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Many forms of interaction between computer systems and users are recorded in the form of event records, such as login events, API call records, bank transaction records, etc. These records are often comprised of high-dimensional categorical variables, such as user name, zip code, autonomous system number, etc. In this work, we consider anomaly detection for such data sets, where each record consists of multi-dimensional, potentially very high-cardinality, categorical variables. Our proposed technique, named CADENCE, uses a combination of neural networks, low-dimensional representation learning and noise contrastive estimation. Our approach is based on estimating conditional probability density functions governing observed events, which are assumed to be mostly normal. This conditional modeling approach allows CADENCE to consider each event in its own context, thereby significantly improving its accuracy. We evaluate our proposed method using both synthetic and real world data sets. Our results show that CADENCE performs significantly better than existing methods at real-world anomaly detection tasks.

    References

    [1]
    1999. ROCK: A Robust Clustering Algorithm for Categorical Attributes. In Proceedings of the 15th International Conference on Data Engineering (ICDE '99). IEEE Computer Society, Washington, DC, USA, 512?. http://dl.acm.org/citation.cfm?id=846218.847264
    [2]
    Bovas Abraham and George E. P. Box. 1979. Bayesian analysis of some outlier problems in time series. Biometrika, Vol. 66, 2 (1979), 229--236. https://doi.org/10.1093/biomet/66.2.229
    [3]
    B. Abraham and A. Chuang. 1989. Outlier Detection and Time Series Modeling. Technometrics, Vol. 31, 2 (May 1989), 241--248. https://doi.org/10.2307/1268821
    [4]
    Charu C. Aggarwal. 2013. Outlier Analysis. Springer. https://doi.org/10.1007/978-1-4614-6396-2
    [5]
    Charu C. Aggarwal and Philip S. Yu. 2001. Outlier Detection for High Dimensional Data. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD '01). ACM, New York, NY, USA, 37--46. https://doi.org/10.1145/375663.375668
    [6]
    Jinwon An and Sungzoon Cho. 2015. Variational Autoencoder based Anomaly Detection using Reconstruction Probability.
    [7]
    Leif Azzopardi, Mark Girolami, and Keith van Risjbergen. 2003. Investigating the relationship between language model perplexity and IR precision-recall measures. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 369--370.
    [8]
    Ana Maria Bianco. 2001. Outlier Detection in Regression Models with ARIMA Errors Using Robust Estimates. Journal of Forecasting, Vol. 20, 8 (2001), 565--79.
    [9]
    Richard J. Bolton, David J. Hand, and David J. H. 2001. Unsupervised Profiling Methods for Fraud Detection. In Proc. Credit Scoring and Credit Control VII. 5--7.
    [10]
    Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010. Springer, 177--186.
    [11]
    Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-based Local Outliers. SIGMOD Rec., Vol. 29, 2 (May 2000), 93--104. https://doi.org/10.1145/335191.335388
    [12]
    Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust Principal Component Analysis? J. ACM, Vol. 58, 3, Article 11 (June 2011), pages 37 pages. https://doi.org/10.1145/1970392.1970395
    [13]
    Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection: A Survey. ACM Comput. Surv., Vol. 41, 3, Article 15 (July 2009), pages 58 pages.
    [14]
    Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).
    [15]
    Ting Chen, Lu An Tang, Yizhou Sun, Zhengzhang Chen, and Kai Zhang. 2016b. Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. 1396--1403. http://www.ijcai.org/Abstract/16/201
    [16]
    Wenlin Chen, David Grangier, and Michael Auli. 2016a. Strategies for Training Large Vocabulary Neural Language Models. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1975--1985. https://doi.org/10.18653/v1/P16--1186
    [17]
    Gabriela F. Cretu, Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo, and Angelos D. Keromytis. 2008. Casting out Demons: Sanitizing Training Data for Anomaly Sensors. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP '08). IEEE Computer Society, Washington, DC, USA, 81--95. https://doi.org/10.1109/SP.2008.11
    [18]
    Kaustav Das and Jeff Schneider. 2007. Detecting Anomalous Records in Categorical Datasets. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '07). ACM, New York, NY, USA, 220--229. https://doi.org/10.1145/1281192.1281219
    [19]
    Lian Duan, Lida Xu, Ying Liu, and Jun Lee. 2009. Cluster-based outlier detection. Annals of Operations Research, Vol. 168, 1 (2009), 151--168.
    [20]
    Ran El-Yaniv and Mordechai Nisenson. 2006. Optimal Single-Class Classification Strategies. In Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006. 377--384. http://papers.nips.cc/paper/2987-optimal-single-class-classification-strategies
    [21]
    Tom Fawcett and Foster Provost. 1999. Activity Monitoring: Noticing Interesting Changes in Behavior. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99). ACM, New York, NY, USA, 53--62. https://doi.org/10.1145/312129.312195
    [22]
    David Freeman, Sakshi Jain, Markus Dü rmuth, Battista Biggio, and Giorgio Giacinto. 2016. Who Are You? A Statistical Approach to Measuring User Authenticity. In NDSS. The Internet Society.
    [23]
    Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.
    [24]
    Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust Random Cut Forest Based Anomaly Detection on Streams. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 2712--2721. http://dl.acm.org/citation.cfm?id=3045390.3045676
    [25]
    Michael Gutmann and Aapo Hyv"a rinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010. 297--304. http://www.jmlr.org/proceedings/papers/v9/gutmann10a.html
    [26]
    David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. Vol. 398. John Wiley & Sons.
    [27]
    Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. 2016. Field-aware Factorization Machines for CTR Prediction. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16). ACM, New York, NY, USA, 43--50. https://doi.org/10.1145/2959100.2959134
    [28]
    Alexander D. Kent. 2015. Comprehensive, Multi-Source Cyber-Security Events. Los Alamos National Laboratory. https://doi.org/10.17021/1179829
    [29]
    Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. CoRR, Vol. abs/1609.04836 (2016). arxiv: 1609.04836 http://arxiv.org/abs/1609.04836
    [30]
    Diederik P Kingma and Jimmy Lei Ba. 2014. Adam: Amethod for stochastic optimization. In Proc. 3rd Int. Conf. Learn. Representations.
    [31]
    Kun-Lun Li, Hou-Kuan Huang, Sheng-Feng Tian, and Wei Xu. 2003. Improving one-class SVM for anomaly detection. In Machine Learning and Cybernetics, 2003 International Conference on, Vol. 5. IEEE, 3077--3081.
    [32]
    Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 413--422.
    [33]
    Chang-Tien Lu, Dechang Chen, and Yufeng Kou. 2003. Algorithms for Spatial Outlier Detection. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM '03). IEEE Computer Society, Washington, DC, USA, 597--. http://dl.acm.org/citation.cfm?id=951949.952103
    [34]
    Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, Nov (2008), 2579--2605.
    [35]
    Xi Meng, Haowen Mo, Shenhe Zhao, and Jianqiang Li. 2017. Application of anomaly detection for detecting anomalous records of terroris attacks. In Cloud Computing and Big Data Analysis (ICCCBDA), 2017 IEEE 2nd International Conference on. IEEE, 70--75.
    [36]
    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space. CoRR, Vol. abs/1301.3781 (2013). arxiv: 1301.3781 http://arxiv.org/abs/1301.3781
    [37]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013b. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13). Curran Associates Inc., USA, 3111--3119. http://dl.acm.org/citation.cfm?id=2999792.2999959
    [38]
    Robert Mitchell and Ray Chen. 2013. Behavior-rule based intrusion detection systems for safety critical smart grid applications. IEEE Transactions on Smart Grid, Vol. 4, 3 (2013), 1254--1263.
    [39]
    Andriy Mnih and Koray Kavukcuoglu. 2013. Learning Word Embeddings Efficiently with Noise-contrastive Estimation. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13). Curran Associates Inc., USA, 2265--2273. http://dl.acm.org/citation.cfm?id=2999792.2999865
    [40]
    Andriy Mnih and Yee Whye Teh. 2012. A fast and simple algorithm for training neural probabilistic language models. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. http://icml.cc/2012/papers/855.pdf
    [41]
    Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.
    [42]
    Peter J. Rousseeuw and Mia Hubert. 2018. Anomaly detection by robust statistics. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, Vol. 8, 2 (2018). https://doi.org/10.1002/widm.1236
    [43]
    Stan Salvador and Philip Chan. 2005. Learning States and Rules for Detecting Anomalies in Time Series. Applied Intelligence, Vol. 23, 3 (Dec. 2005), 241--255. https://doi.org/10.1007/s10489-005-4610-3
    [44]
    David Savage, Xiuzhen Zhang, Xinghuo Yu, Pauline Chou, and Qingmai Wang. 2014. Anomaly detection in online social networks. Social Networks, Vol. 39 (2014), 62--70.
    [45]
    Shashi Shekhar, Chang-Tien Lu, and Pusheng Zhang. 2001. Detecting Graph-based Spatial Outliers: Algorithms and Applications (a Summary of Results). In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01). ACM, New York, NY, USA, 371--376. https://doi.org/10.1145/502512.502567
    [46]
    Vasilios A Siris and Fotini Papagalou. 2004. Application of anomaly detection algorithms for detecting SYN flooding attacks. In Global Telecommunications Conference, 2004. GLOBECOM'04. IEEE, Vol. 4. IEEE, 2050--2054.
    [47]
    Xiuyao Song, Mingxi Wu, Christopher M. Jermaine, and Sanjay Ranka. 2007. Conditional Anomaly Detection. IEEE Trans. Knowl. Data Eng., Vol. 19, 5 (2007), 631--645.
    [48]
    Ingo Steinwart, Don Hush, and Clint Scovel. 2005. A Classification Framework for Anomaly Detection. J. Mach. Learn. Res., Vol. 6 (Dec. 2005), 211--232. http://dl.acm.org/citation.cfm?id=1046920.1058109
    [49]
    Pei Sun and Sanjay Chawla. 2004. On Local Spatial Outliers. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM '04). IEEE Computer Society, Washington, DC, USA, 209--216. http://dl.acm.org/citation.cfm?id=1032649.1033456
    [50]
    H. S. Teng, K. Chen, and S. C. Lu. 1990. Adaptive Real-Time Anomaly Detection Using Inductively Generated Sequential Patterns. In Proceedings of the 1990 IEEE Symposium on Research in Computer Security and Privacy. 278--284.
    [51]
    Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. 2013. Decoding with Large-Scale Neural Language Models Improves Translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18--21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL. 1387--1392. http://aclweb.org/anthology/D/D13/D13--1140.pdf
    [52]
    Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised Deep Embedding for Clustering Analysis. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 478--487. http://dl.acm.org/citation.cfm?id=3045390.3045442
    [53]
    Huan Xu, Constantine Caramanis, and Shie Mannor. 2009. Robustness and Regularization of Support Vector Machines. J. Mach. Learn. Res., Vol. 10 (Dec. 2009), 1485--1510. http://dl.acm.org/citation.cfm?id=1577069.1755834
    [54]
    Bo Yang, Xiao Fu, and Nicholas D. Sidiropoulos. 2017a. Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering. IEEE Trans. Signal Processing, Vol. 65, 1 (2017), 256--269. https://doi.org/10.1109/TSP.2016.2614491
    [55]
    Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, and Mingyi Hong. 2017b. Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6--11 August 2017. 3861--3870. http://proceedings.mlr.press/v70/yang17b.html
    [56]
    Jianwei Yang, Devi Parikh, and Dhruv Batra. 2016. Joint Unsupervised Learning of Deep Representations and Image Clusters. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. 5147--5156. https://doi.org/10.1109/CVPR.2016.556
    [57]
    Shuangfei Zhai, Yu Cheng, Weining Lu, and Zhongfei Zhang. 2016. Deep Structured Energy Based Models for Anomaly Detection. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 1100--1109. http://dl.acm.org/citation.cfm?id=3045390.3045507
    [58]
    Manqi Zhao and Venkatesh Saligrama. 2009. Anomaly detection with score functions based on nearest neighbor graphs. In Advances in neural information processing systems. 2250--2258.
    [59]
    Chong Zhou and Randy C Paffenroth. 2017. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 665--674.
    [60]
    Mengxiao Zhu, Charu C. Aggarwal, Shuai Ma, Hui Zhang, and Jinpeng Huai. 2017. Outlier Detection in Sparse Data with Factorization Machines. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017. 817--826. https://doi.org/10.1145/3132847.3132987
    [61]
    Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. (2018).
    [62]
    Barret Zoph, Ashish Vaswani, Jonathan May, and Kevin Knight. 2016. Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016. 1217--1222. http://aclweb.org/anthology/N/N16/N16-1145.pdf

    Cited By

    View all
    • (2022)Contextual anomaly detection for high-dimensional data using Dirichlet process variational autoencoderIISE Transactions10.1080/24725854.2021.202492555:5(433-444)Online publication date: 15-Feb-2022
    • (2022)Deep embedding kernel mixture networks for conditional anomaly detection in high-dimensional dataInternational Journal of Production Research10.1080/00207543.2022.202704061:4(1101-1113)Online publication date: 18-Feb-2022

    Index Terms

    1. CADENCE: Conditional Anomaly Detection for Events Using Noise-Contrastive Estimation

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        AISec'19: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security
        November 2019
        123 pages
        ISBN:9781450368339
        DOI:10.1145/3338501
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 11 November 2019

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article

        Conference

        CCS '19
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 94 of 231 submissions, 41%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)14
        • Downloads (Last 6 weeks)0
        Reflects downloads up to

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)Contextual anomaly detection for high-dimensional data using Dirichlet process variational autoencoderIISE Transactions10.1080/24725854.2021.202492555:5(433-444)Online publication date: 15-Feb-2022
        • (2022)Deep embedding kernel mixture networks for conditional anomaly detection in high-dimensional dataInternational Journal of Production Research10.1080/00207543.2022.202704061:4(1101-1113)Online publication date: 18-Feb-2022

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media