research-article

A comparative evaluation of novelty detection algorithms for discrete sequences

Authors:

Rémi Domingues,

Pietro Michiardi,

Jérémie Barlet,

Maurizio FilipponeAuthors Info & Claims

Artificial Intelligence Review, Volume 53, Issue 5

Pages 3787 - 3812

https://doi.org/10.1007/s10462-019-09779-4

Published: 01 June 2020 Publication History

Abstract

The identification of anomalies in temporal data is a core component of numerous research areas such as intrusion detection, fault prevention, genomics and fraud detection. This article provides an experimental comparison of candidate methods for the novelty detection problem applied to discrete sequences. The objective of this study is to identify which state-of-the-art methods are efficient and appropriate candidates for a given use case. These recommendations rely on extensive novelty detection experiments based on a variety of public datasets in addition to novel industrial datasets. We also perform thorough scalability and memory usage tests resulting in new supplementary insights of the methods’ performance, key selection criteria to solve problems relying on large volumes of data and to meet the expectations of applications subject to strict response time constraints.

References

[1]

Aggarwal CC Outlier analysis 2015 Cham Springer 237-263

[2]

Bergroth L, Hakonen H, Raita T (2000) A survey of longest common subsequence algorithms. In: Proceedings seventh international symposium on string processing and information retrieval. SPIRE 2000, pp 39–48

[3]

Breiman L, Friedman J, Olsen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks

[4]

Breunig MM, Kriegel H-P, Ng RT, and Sander J LOF: identifying density-based local outliers SIGMOD Rec 2000 29 2 93-104

[5]

Budalakoti S, Srivastava AN, Akella R,Turkov E (2006) Anomaly detection in large sets of high-dimensional symbol sequences. Technical Report NASA TM-2006-214553

[6]

Budalakoti S, Srivastava AN, and Otey ME Anomaly detection and diagnosis algorithms for discrete symbol sequences with applications to airline safety IEEE Trans Syst Cybern C (Appl Rev) 2009 39 1 101-113

[7]

Chandola V, Banerjee A, and Kumar V Anomaly detection for discrete sequences: a survey IEEE Trans Knowl Data Eng 2012 24 5 823-839

[8]

Chandola V, Mithal V, Kumar V (2008) Comparative evaluation of anomaly detection techniques for sequence data. In: 2008 Eighth IEEE international conference on data mining, pp 743–748

[9]

Chang D, Jones NA, Li D, Ouyang M, Ragade RK (2008) Compute pairwise Euclidean distances of data points with gpus. In: Proceedings of the iASTED international symposium on computational biology and bioinformatics, pp 278–283

[10]

Christ M, Kempa-Liehr AW, Feindt M (2016) Distributed and parallel time series feature extraction for industrial big data applications. arXiv:1610.07717

[11]

Cohen WW Prieditis A and Russell S Fast effective rule induction Machine learning proceedings 1995 1995 San Francisco Morgan Kaufmann 115-123

[12]

Crochemore M, Iliopoulous CS, and Pinzon YJSpeeding-up hirschberg and hunt-szymanski lcs algorithmsFundamenta Informaticae2003561–289-10320146931030.68070

[13]

Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 233–240

[14]

de Fortuny EJ and Martens DActive learning-based pedagogical rule extractionIEEE Trans Neural Netw Learn Syst201526112664-26773453205

[15]

Domingues R, Michiardi P, Zouaoui J, Filippone M (2018) Deep gaussian process autoencoders for novelty detection. Mach Learn

[16]

Emmott A, Das S, Dietterich T, Fern A, Wong W-K (2016) A meta-analysis of the anomaly detection problem. arXiv:1503.01158v2

[17]

Etchells TA and Lisboa PJG Orthogonal search-based rule extraction (osre) for trained neural networks: a practical and efficient approach IEEE Trans Neural Netw 2006 17 2 374-384

[18]

Fowkes J, Sutton C (2016) A subsequence interleaving model for sequential pattern mining. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, New York, NY, USA, 2016. ACM, pp 835–844

[19]

Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2018) A survey of parallel sequential pattern mining. arXiv:1805.10515

[20]

García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064 (Special Issue on Intelligent Distributed Information Systems)

[21]

Gupta M, Gao J, Aggarwal CC, and Han JOutlier detection for temporal data: a surveyIEEE Trans Knowl Data Eng20142692250-22671307.62002

[22]

Hochreiter S and Schmidhuber J Long short-term memory Neural Comput 1997 9 8 1735-1780

[23]

Hodge V and Austin JA survey of outlier detection methodologiesArtif Intell Rev200422285-1261101.68023

[24]

Hofmeyr SA, Forrest S, and Somayaji A Intrusion detection using sequences of system calls J Comput Secur 1998 6 3 151-180

[25]

Hunt JW and Szymanski TGA fast algorithm for computing longest common subsequencesCommun ACM1977205350-3534366550354.68078

[26]

Huysmans J, Baesens B, and Vanthienen J Tjoa AM and Trujillo J Iter: an algorithm for predictive regression rule extraction Data warehousing and knowledge discovery 2006 Berlin Springer 270-279

[27]

Karpathy A, Johnson J, Li F (2016) Visualizing and understanding recurrent networks. In: Proceedings of the fourth international conference on learning representations (ICLR 2016)

[28]

Kundzewicz ZW and Robson AJ Change detection in hydrological records—a review of the methodology Hydrol Sci J 2004 49 1 7-19

[29]

Lazarevic A, Ertoz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM international conference on data mining, pp 25–36

[30]

Lee W, Stolfo SJ, Chan PK (1997) Learning patterns from unix process execution traces for intrusion detection. In: AAAI workshop on AI approaches to fraud detection and risk management, pp 50–56

[31]

Levenshtein VIBinary codes capable of correcting deletions, insertions and reversalsSov. Phys. Dokl.196610707189928

[32]

Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning

[33]

Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: Proceedings of the 2008 eighth IEEE international conference on data mining, ICDM ’08. IEEE Computer Society, pp 413–422

[34]

Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:1508.04025

[35]

Marchi E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional lstm neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1996–2000

[36]

Maxion RA, Townsend TN (2002) Masquerade detection using truncated command lines. In: Proceedings international conference on dependable systems and networks, pp 219–228

[37]

Onderwater M Outlier preservation by dimensionality reduction techniques Int J Data Anal Tech Strateg 2015 7 3 231-252

[38]

Park H-S and Jun C-H A simple and fast algorithm for k-medoids clustering Expert Syst Appl 2009 36 2, Part 2 3336-3341

[39]

Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to twitter. In: Human Language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10. Association for Computational Linguistics, Stroudsburg, pp 181–189

[40]

Pihur V, Datta S, and Datta S RankAggreg, an r package for weighted rank aggregation BMC Bioinform 2009 10 1 62

[41]

Rabiner LR A tutorial on hidden Markov models and selected applications in speech recognition Proc IEEE 1989 77 2 257-286

[42]

Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00, New York, NY, USA. ACM, pp 427–438

[43]

Saad EW and Wunsch DCNeural network explanation using inversionNeural Netw200720178-931158.68454

[44]

Saidi R, Maddouri M, and MephuNguifo E Protein sequences classification by means of feature extraction with substitution matrices BMC Bioinform 2010 11 1 175

[45]

Sakurada M, Yairi T (2014) Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis. ACM, pp 4–11

[46]

Schubert E, Rousseeuw PJ (2018) Faster k-Medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. arXiv:1810.05691

[47]

Sculley D, Brodley CE (2006) Compression and machine learning: a new perspective on feature space vectors. In: Data compression conference (DCC’06), pp 332–341

[48]

Setiono R, Leow WK, and Zurada JM Extraction of rules from artificial neural networks for nonlinear regression IEEE Trans Neural Netw 2002 13 3 564-577

[49]

Sun P, Chawla S, Arunasalam B (2006) Mining for outliers in sequential databases. In: Proceedings of the 2006 SIAM international conference on data mining, pp 94–105

[50]

Sutskever I, Vinyals O, and Le QV Ghahramani Z, Welling M, Cortes C, Lawrence ND, and Weinberger KQ Sequence to sequence learning with neural networks Advances in neural information processing systems 2014 New York Curran Associates Inc. 3104-3112

[51]

Taylor SJ and Letham BForecasting at scaleAm Stat201872137-453790566

[52]

Wang JTL, Ma Q, Shasha D, and Wu CH New techniques for extracting features from protein sequences IBM Syst J 2001 40 2 426-441

[53]

Warrender C, Forrest S, Pearlmutter B (1999) Detecting intrusions using system calls: alternative data models. In: Proceedings of the 1999 IEEE symposium on security and privacy (Cat. No. 99CB36344), pp 133–145

[54]

Zhang Q-S and Zhu S-C Visual interpretability for deep learning: a survey Front Inf Technol Electron Eng 2018 19 1 27-39

Cited By

Gupta NJindal VBedi P(2023)A Survey on Intrusion Detection and Prevention SystemsSN Computer Science10.1007/s42979-023-01926-74:5Online publication date: 10-Jun-2023
https://dl.acm.org/doi/10.1007/s42979-023-01926-7
Fu CWu ZXue MLiu W(2023)Cross-domain decision making based on TrAdaBoost for diagnosis of breast lesionsArtificial Intelligence Review10.1007/s10462-022-10267-556:5(3987-4017)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1007/s10462-022-10267-5

Index Terms

A comparative evaluation of novelty detection algorithms for discrete sequences
1. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms

Index terms have been assigned to the content through auto-classification.

Recommendations

ANDEA: Anomaly and Novelty Detection, Explanation, and Accommodation
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

The detection of, explanation of, and accommodation to anomalies and novelties are active research areas in multiple communities, including data mining, machine learning, and computer vision. They are applied in various guises including anomaly detection,...
Anomaly and Novelty Detection, Explanation, and Accommodation (ANDEA)
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

The detection of, explanation of, and accommodation to anomalies and novelties are active research areas in multiple communities, including data mining, machine learning, and computer vision. They are applied in various guises including anomaly ...
A Consensus Novelty Detection Ensemble Approach for Anomaly Detection in Activities of Daily Living
Abstract
A new approach to creating an ensemble of novelty detection algorithms is proposed in this paper. The novelty detection process identifies new or unknown data by detecting if a test data differs significantly from the data available ...
Highlights
- An ensemble approach is proposed for novelty detection algorithms.
- The proposed ...

Comments

Information & Contributors

Information

Published In

cover image Artificial Intelligence Review

Artificial Intelligence Review Volume 53, Issue 5

Jun 2020

714 pages

ISSN:0269-2821

Issue’s Table of Contents

© Springer Nature B.V. 2019.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gupta NJindal VBedi P(2023)A Survey on Intrusion Detection and Prevention SystemsSN Computer Science10.1007/s42979-023-01926-74:5Online publication date: 10-Jun-2023
https://dl.acm.org/doi/10.1007/s42979-023-01926-7
Fu CWu ZXue MLiu W(2023)Cross-domain decision making based on TrAdaBoost for diagnosis of breast lesionsArtificial Intelligence Review10.1007/s10462-022-10267-556:5(3987-4017)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1007/s10462-022-10267-5

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents