research-article

LogDAPT: Log Data Anomaly Detection with Domain-Adaptive Pretraining (industry track)

Authors:

Hao YangAuthors Info & Claims

Middleware '23: Proceedings of the 24th International Middleware Conference: Industrial Track

Pages 15 - 21

https://doi.org/10.1145/3626562.3626830

Published: 11 December 2023 Publication History

Abstract

Large software systems usually record important runtime information by printing system logs, which are a vital resource for engineers to understand system behaviors and identify fatal errors. However, due to the large scale and high complexity of modern industrial software systems, the conventional manual sifting of logs has become impractical. Recently, many deep learning models, including pre-trained models, have been applied to automatically detect system anomalies via logs. Normally, these methods require large amounts of labeled data to achieve satisfying results. However, abundant labeled data is rarely available in regular industrial scenarios. In order to address this issue, we propose LogDAPT, a novel framework based on domain-adaptive pretraining (DAPT). To improve the performance under few-shot setting, LogDAPT learns the semantics of logs by applying a second phase of pretraining in-domain to the BERT base model. In our experiments, we utilize two different schemes for DAPT, which are MLM scheme and Span scheme. The experimental results on three public log datasets (HDFS, BGL and Thunderbird) show that LogDAPT achieves satisfactory F1-Scores under few-shot industrial conditions.

References

[1]

Marta Catillo, Antonio Pecchia, and Umberto Villano. 2022. AutoLog: Anomaly detection by deep autoencoding of system logs. Expert Systems with Applications 191 (2022), 116263. https://doi.org/10.1016/j.eswa.2021.116263

Digital Library

[2]

Ningjiang Chen, Huan Tu, Xiaoyan Duan, Liangqing Hu, and Chengxiang Guo. 2022. Semisupervised Anomaly Detection of Multivariate Time Series Based on a Variational Autoencoder. Applied Intelligence 53, 5 (jul 2022), 6074--6098. https://doi.org/10.1007/s10489-022-03829-1

Digital Library

[3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423

[4]

Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS '17). Association for Computing Machinery, New York, NY, USA, 1285--1298. https://doi.org/10.1145/3133956.3134015

Digital Library

[5]

Chris Egersdoerfer, Di Zhang, and Dong Dai. 2023. Early Exploration of Using ChatGPT for Log-based Anomaly Detection on Parallel File Systems Logs. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2023, Orlando, FL, USA, June 16-23, 2023, Ali Raza Butt, Ningfang Mi, and Kyle Chard (Eds.). ACM, 315--316. https://doi.org/10.1145/3588195.3595943

Digital Library

[6]

Haixuan Guo, Shuhan Yuan, and Xintao Wu. 2021. LogBERT: Log Anomaly Detection via BERT. In 2021 International Joint Conference on Neural Networks (IJCNN). 1--8. https://doi.org/10.1109/IJCNN52387.2021.9534113

[7]

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8342--8360. https://doi.org/10.18653/v1/2020.acl-main.740

[8]

Shilin He, Jieming Zhu, Pinjia He, and Michael R. Lyu. 2016. Experience Report: System Log Analysis for Anomaly Detection. In 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). 207--218. https://doi.org/10.1109/ISSRE.2016.21

[9]

Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics 8 (01 2020), 64--77. https://doi.org/10.1162/tacl_a_00300 arXiv:https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00300/1923170/tacl_a_00300.pdf

[10]

Van-Hoang Le and Hongyu Zhang. 2021. Log-based Anomaly Detection Without Log Parsing. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 492--504. https://doi.org/10.1109/ASE51524.2021.9678773

Digital Library

[11]

Van-Hoang Le and Hongyu Zhang. 2022. Log-Based Anomaly Detection with Deep Learning: How Far Are We?. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE '22). Association for Computing Machinery, New York, NY, USA, 1356--1367. https://doi.org/10.1145/3510003.3510155

Digital Library

[12]

Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, and Rong Zhou. 2019. Loganomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China) (IJCAI'19). AAAI Press, 4739--4745.

[13]

OpenAI. 2022. ChatGPT. https://openai.com/blog/chatgpt/.

[14]

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).

[15]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).

[16]

Shimin Tao, Weibin Meng, Yimeng Cheng, Yichen Zhu, Ying Liu, Chunning Du, Tao Han, Yongpeng Zhao, Xiangguang Wang, and Hao Yang. 2022. LogStamp: Automatic Online Log Parsing Based on Sequence Labelling. SIGMETRICS Perform. Eval. Rev. 49, 4 (jun 2022), 93--98. https://doi.org/10.1145/3543146.3543168

Digital Library

[17]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://doi.org/10.18653/v1/2020.emnlp-demos.6

[18]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).

[19]

Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation. In Proceedings of the 43rd International Conference on Software Engineering (Madrid, Spain) (ICSE '21). IEEE Press, 1448--1460. https://doi.org/10.1109/ICSE43902.2021.00130

Digital Library

Index Terms

LogDAPT: Log Data Anomaly Detection with Domain-Adaptive Pretraining (industry track)
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Maintaining software

Recommendations

Log-based anomaly detection with deep learning: how far are we?
ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Software-intensive systems produce logs for troubleshooting purposes. Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data. These models typically claim very high detection accuracy. For ...
Robust log-based anomaly detection on unstable log data
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Logs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, the existing methods mainly construct a detection model using log event ...
LogSD: Detecting Anomalies from System Logs through Self-Supervised Learning and Frequency-Based Masking

Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

Middleware '23: Proceedings of the 24th International Middleware Conference: Industrial Track

December 2023

52 pages

ISBN:9798400704277

DOI:10.1145/3626562

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

ACM: Association for Computing Machinery

In-Cooperation

IFIP: International Federation for Information Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

Middleware '23

Sponsor:

ACM

Middleware '23: 24th International Middleware Conference

December 11 - 15, 2023

Bologna, Italy

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
261
Total Downloads

Downloads (Last 12 months)180
Downloads (Last 6 weeks)7

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten