research-article

CADENCE: Conditional Anomaly Detection for Events Using Noise-Contrastive Estimation

Authors:

Mohammad Ruhul Amin,

Baris CoskunAuthors Info & Claims

AISec'19: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security

Pages 71 - 82

https://doi.org/10.1145/3338501.3357368

Published: 11 November 2019 Publication History

Abstract

Many forms of interaction between computer systems and users are recorded in the form of event records, such as login events, API call records, bank transaction records, etc. These records are often comprised of high-dimensional categorical variables, such as user name, zip code, autonomous system number, etc. In this work, we consider anomaly detection for such data sets, where each record consists of multi-dimensional, potentially very high-cardinality, categorical variables. Our proposed technique, named CADENCE, uses a combination of neural networks, low-dimensional representation learning and noise contrastive estimation. Our approach is based on estimating conditional probability density functions governing observed events, which are assumed to be mostly normal. This conditional modeling approach allows CADENCE to consider each event in its own context, thereby significantly improving its accuracy. We evaluate our proposed method using both synthetic and real world data sets. Our results show that CADENCE performs significantly better than existing methods at real-world anomaly detection tasks.

References

[1]

1999. ROCK: A Robust Clustering Algorithm for Categorical Attributes. In Proceedings of the 15th International Conference on Data Engineering (ICDE '99). IEEE Computer Society, Washington, DC, USA, 512?. http://dl.acm.org/citation.cfm?id=846218.847264

[2]

Bovas Abraham and George E. P. Box. 1979. Bayesian analysis of some outlier problems in time series. Biometrika, Vol. 66, 2 (1979), 229--236. https://doi.org/10.1093/biomet/66.2.229

[3]

B. Abraham and A. Chuang. 1989. Outlier Detection and Time Series Modeling. Technometrics, Vol. 31, 2 (May 1989), 241--248. https://doi.org/10.2307/1268821

[4]

Charu C. Aggarwal. 2013. Outlier Analysis. Springer. https://doi.org/10.1007/978-1-4614-6396-2

[5]

Charu C. Aggarwal and Philip S. Yu. 2001. Outlier Detection for High Dimensional Data. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD '01). ACM, New York, NY, USA, 37--46. https://doi.org/10.1145/375663.375668

[6]

Jinwon An and Sungzoon Cho. 2015. Variational Autoencoder based Anomaly Detection using Reconstruction Probability.

[7]

Leif Azzopardi, Mark Girolami, and Keith van Risjbergen. 2003. Investigating the relationship between language model perplexity and IR precision-recall measures. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 369--370.

Digital Library

[8]

Ana Maria Bianco. 2001. Outlier Detection in Regression Models with ARIMA Errors Using Robust Estimates. Journal of Forecasting, Vol. 20, 8 (2001), 565--79.

[9]

Richard J. Bolton, David J. Hand, and David J. H. 2001. Unsupervised Profiling Methods for Fraud Detection. In Proc. Credit Scoring and Credit Control VII. 5--7.

[10]

Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010. Springer, 177--186.

[11]

Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-based Local Outliers. SIGMOD Rec., Vol. 29, 2 (May 2000), 93--104. https://doi.org/10.1145/335191.335388

Digital Library

[12]

Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust Principal Component Analysis? J. ACM, Vol. 58, 3, Article 11 (June 2011), pages 37 pages. https://doi.org/10.1145/1970392.1970395

Digital Library

[13]

Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection: A Survey. ACM Comput. Surv., Vol. 41, 3, Article 15 (July 2009), pages 58 pages.

Digital Library

[14]

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).

[15]

Ting Chen, Lu An Tang, Yizhou Sun, Zhengzhang Chen, and Kai Zhang. 2016b. Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. 1396--1403. http://www.ijcai.org/Abstract/16/201

Digital Library

[16]

Wenlin Chen, David Grangier, and Michael Auli. 2016a. Strategies for Training Large Vocabulary Neural Language Models. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1975--1985. https://doi.org/10.18653/v1/P16--1186

[17]

Gabriela F. Cretu, Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo, and Angelos D. Keromytis. 2008. Casting out Demons: Sanitizing Training Data for Anomaly Sensors. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP '08). IEEE Computer Society, Washington, DC, USA, 81--95. https://doi.org/10.1109/SP.2008.11

Digital Library

[18]

Kaustav Das and Jeff Schneider. 2007. Detecting Anomalous Records in Categorical Datasets. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '07). ACM, New York, NY, USA, 220--229. https://doi.org/10.1145/1281192.1281219

Digital Library

[19]

Lian Duan, Lida Xu, Ying Liu, and Jun Lee. 2009. Cluster-based outlier detection. Annals of Operations Research, Vol. 168, 1 (2009), 151--168.

[20]

Ran El-Yaniv and Mordechai Nisenson. 2006. Optimal Single-Class Classification Strategies. In Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006. 377--384. http://papers.nips.cc/paper/2987-optimal-single-class-classification-strategies

[21]

Tom Fawcett and Foster Provost. 1999. Activity Monitoring: Noticing Interesting Changes in Behavior. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99). ACM, New York, NY, USA, 53--62. https://doi.org/10.1145/312129.312195

Digital Library

[22]

David Freeman, Sakshi Jain, Markus Dü rmuth, Battista Biggio, and Giorgio Giacinto. 2016. Who Are You? A Statistical Approach to Measuring User Authenticity. In NDSS. The Internet Society.

[23]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.

[24]

Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust Random Cut Forest Based Anomaly Detection on Streams. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 2712--2721. http://dl.acm.org/citation.cfm?id=3045390.3045676

Digital Library

[25]

Michael Gutmann and Aapo Hyv"a rinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010. 297--304. http://www.jmlr.org/proceedings/papers/v9/gutmann10a.html

[26]

David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. Vol. 398. John Wiley & Sons.

[27]

Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. 2016. Field-aware Factorization Machines for CTR Prediction. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16). ACM, New York, NY, USA, 43--50. https://doi.org/10.1145/2959100.2959134

Digital Library

[28]

Alexander D. Kent. 2015. Comprehensive, Multi-Source Cyber-Security Events. Los Alamos National Laboratory. https://doi.org/10.17021/1179829

[29]

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. CoRR, Vol. abs/1609.04836 (2016). arxiv: 1609.04836 http://arxiv.org/abs/1609.04836

[30]

Diederik P Kingma and Jimmy Lei Ba. 2014. Adam: Amethod for stochastic optimization. In Proc. 3rd Int. Conf. Learn. Representations.

[31]

Kun-Lun Li, Hou-Kuan Huang, Sheng-Feng Tian, and Wei Xu. 2003. Improving one-class SVM for anomaly detection. In Machine Learning and Cybernetics, 2003 International Conference on, Vol. 5. IEEE, 3077--3081.

[32]

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 413--422.

Digital Library

[33]

Chang-Tien Lu, Dechang Chen, and Yufeng Kou. 2003. Algorithms for Spatial Outlier Detection. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM '03). IEEE Computer Society, Washington, DC, USA, 597--. http://dl.acm.org/citation.cfm?id=951949.952103

[34]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, Nov (2008), 2579--2605.

[35]

Xi Meng, Haowen Mo, Shenhe Zhao, and Jianqiang Li. 2017. Application of anomaly detection for detecting anomalous records of terroris attacks. In Cloud Computing and Big Data Analysis (ICCCBDA), 2017 IEEE 2nd International Conference on. IEEE, 70--75.

[36]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space. CoRR, Vol. abs/1301.3781 (2013). arxiv: 1301.3781 http://arxiv.org/abs/1301.3781

[37]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013b. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13). Curran Associates Inc., USA, 3111--3119. http://dl.acm.org/citation.cfm?id=2999792.2999959

Digital Library

[38]

Robert Mitchell and Ray Chen. 2013. Behavior-rule based intrusion detection systems for safety critical smart grid applications. IEEE Transactions on Smart Grid, Vol. 4, 3 (2013), 1254--1263.

[39]

Andriy Mnih and Koray Kavukcuoglu. 2013. Learning Word Embeddings Efficiently with Noise-contrastive Estimation. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13). Curran Associates Inc., USA, 2265--2273. http://dl.acm.org/citation.cfm?id=2999792.2999865

[40]

Andriy Mnih and Yee Whye Teh. 2012. A fast and simple algorithm for training neural probabilistic language models. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. http://icml.cc/2012/papers/855.pdf

[41]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.

[42]

Peter J. Rousseeuw and Mia Hubert. 2018. Anomaly detection by robust statistics. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, Vol. 8, 2 (2018). https://doi.org/10.1002/widm.1236

[43]

Stan Salvador and Philip Chan. 2005. Learning States and Rules for Detecting Anomalies in Time Series. Applied Intelligence, Vol. 23, 3 (Dec. 2005), 241--255. https://doi.org/10.1007/s10489-005-4610-3

Digital Library

[44]

David Savage, Xiuzhen Zhang, Xinghuo Yu, Pauline Chou, and Qingmai Wang. 2014. Anomaly detection in online social networks. Social Networks, Vol. 39 (2014), 62--70.

[45]

Shashi Shekhar, Chang-Tien Lu, and Pusheng Zhang. 2001. Detecting Graph-based Spatial Outliers: Algorithms and Applications (a Summary of Results). In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01). ACM, New York, NY, USA, 371--376. https://doi.org/10.1145/502512.502567

Digital Library

[46]

Vasilios A Siris and Fotini Papagalou. 2004. Application of anomaly detection algorithms for detecting SYN flooding attacks. In Global Telecommunications Conference, 2004. GLOBECOM'04. IEEE, Vol. 4. IEEE, 2050--2054.

[47]

Xiuyao Song, Mingxi Wu, Christopher M. Jermaine, and Sanjay Ranka. 2007. Conditional Anomaly Detection. IEEE Trans. Knowl. Data Eng., Vol. 19, 5 (2007), 631--645.

Digital Library

[48]

Ingo Steinwart, Don Hush, and Clint Scovel. 2005. A Classification Framework for Anomaly Detection. J. Mach. Learn. Res., Vol. 6 (Dec. 2005), 211--232. http://dl.acm.org/citation.cfm?id=1046920.1058109

[49]

Pei Sun and Sanjay Chawla. 2004. On Local Spatial Outliers. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM '04). IEEE Computer Society, Washington, DC, USA, 209--216. http://dl.acm.org/citation.cfm?id=1032649.1033456

Digital Library

[50]

H. S. Teng, K. Chen, and S. C. Lu. 1990. Adaptive Real-Time Anomaly Detection Using Inductively Generated Sequential Patterns. In Proceedings of the 1990 IEEE Symposium on Research in Computer Security and Privacy. 278--284.

[51]

Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. 2013. Decoding with Large-Scale Neural Language Models Improves Translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18--21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL. 1387--1392. http://aclweb.org/anthology/D/D13/D13--1140.pdf

[52]

Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised Deep Embedding for Clustering Analysis. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 478--487. http://dl.acm.org/citation.cfm?id=3045390.3045442

Digital Library

[53]

Huan Xu, Constantine Caramanis, and Shie Mannor. 2009. Robustness and Regularization of Support Vector Machines. J. Mach. Learn. Res., Vol. 10 (Dec. 2009), 1485--1510. http://dl.acm.org/citation.cfm?id=1577069.1755834

[54]

Bo Yang, Xiao Fu, and Nicholas D. Sidiropoulos. 2017a. Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering. IEEE Trans. Signal Processing, Vol. 65, 1 (2017), 256--269. https://doi.org/10.1109/TSP.2016.2614491

Digital Library

[55]

Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, and Mingyi Hong. 2017b. Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6--11 August 2017. 3861--3870. http://proceedings.mlr.press/v70/yang17b.html

[56]

Jianwei Yang, Devi Parikh, and Dhruv Batra. 2016. Joint Unsupervised Learning of Deep Representations and Image Clusters. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. 5147--5156. https://doi.org/10.1109/CVPR.2016.556

[57]

Shuangfei Zhai, Yu Cheng, Weining Lu, and Zhongfei Zhang. 2016. Deep Structured Energy Based Models for Anomaly Detection. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 1100--1109. http://dl.acm.org/citation.cfm?id=3045390.3045507

Digital Library

[58]

Manqi Zhao and Venkatesh Saligrama. 2009. Anomaly detection with score functions based on nearest neighbor graphs. In Advances in neural information processing systems. 2250--2258.

[59]

Chong Zhou and Randy C Paffenroth. 2017. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 665--674.

Digital Library

[60]

Mengxiao Zhu, Charu C. Aggarwal, Shuai Ma, Hui Zhang, and Jinpeng Huai. 2017. Outlier Detection in Sparse Data with Factorization Machines. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017. 817--826. https://doi.org/10.1145/3132847.3132987

Digital Library

[61]

Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. (2018).

[62]

Barret Zoph, Ashish Vaswani, Jonathan May, and Kevin Knight. 2016. Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016. 1217--1222. http://aclweb.org/anthology/N/N16/N16-1145.pdf

Cited By

Kim HKim H(2022)Contextual anomaly detection for high-dimensional data using Dirichlet process variational autoencoderIISE Transactions10.1080/24725854.2021.202492555:5(433-444)Online publication date: 15-Feb-2022
https://doi.org/10.1080/24725854.2021.2024925
Kim HKim H(2022)Deep embedding kernel mixture networks for conditional anomaly detection in high-dimensional dataInternational Journal of Production Research10.1080/00207543.2022.202704061:4(1101-1113)Online publication date: 18-Feb-2022
https://doi.org/10.1080/00207543.2022.2027040

Index Terms

CADENCE: Conditional Anomaly Detection for Events Using Noise-Contrastive Estimation
1. Computing methodologies
  1. Artificial intelligence
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation

Recommendations

ArThUR: A Tool for Markov Logic Network
Proceedings of the Confederated International Workshops on On the Move to Meaningful Internet Systems: OTM 2014 Workshops - Volume 8842

Logical approaches-and ontologies in particular-offer a well-adapted framework for representing knowledge present on the Semantic Web [InlineEquation not available: see fulltext.]. These ontologies are formulated in [InlineEquation not available: see ...
The Coolest Way to Generate Binary Strings

Pick a binary string of length n and remove its first bit b . Now insert b after the first remaining 10, or insert $\overline{b}$ at the end if there is no remaining 10. Do it again. And again. Keep going! Eventually, you will cycle through all 2ⁿ of the ...
Approximability and exact resolution of the multidimensional binary vector assignment problem

In this paper we consider the multidimensional binary vector assignment problem. An input of this problem is defined by m disjoint multisets $$V^1, V^2, \ldots , V^m$$V1,V2,?,Vm, each composed of n binary vectors of size p. An output is a set of n ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec'19: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security

November 2019

123 pages

ISBN:9781450368339

DOI:10.1145/3338501

General Chairs:
Lorenzo Cavallaro
King's College London
,
Johannes Kinder
Bundeswehr University Munich
,
Program Chairs:
Sadia Afroz
UC Berkeley
,
Battista Biggio
University of Cagliari / Pluribus One
,
Nicholas Carlini
Google Brain
,
Yuval Elovici
Ben-Gurion University
,
Asaf Shabtai
Ben-Gurion University

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

CCS '19

Sponsor:

SIGSAC

CCS '19: 2019 ACM SIGSAC Conference on Computer and Communications Security

November 15, 2019

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '24

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
471
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to

Other Metrics

View Author Metrics

Citations

Cited By

Kim HKim H(2022)Contextual anomaly detection for high-dimensional data using Dirichlet process variational autoencoderIISE Transactions10.1080/24725854.2021.202492555:5(433-444)Online publication date: 15-Feb-2022
https://doi.org/10.1080/24725854.2021.2024925
Kim HKim H(2022)Deep embedding kernel mixture networks for conditional anomaly detection in high-dimensional dataInternational Journal of Production Research10.1080/00207543.2022.202704061:4(1101-1113)Online publication date: 18-Feb-2022
https://doi.org/10.1080/00207543.2022.2027040

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents