Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626562.3626830acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

LogDAPT: Log Data Anomaly Detection with Domain-Adaptive Pretraining (industry track)

Published: 11 December 2023 Publication History

Abstract

Large software systems usually record important runtime information by printing system logs, which are a vital resource for engineers to understand system behaviors and identify fatal errors. However, due to the large scale and high complexity of modern industrial software systems, the conventional manual sifting of logs has become impractical. Recently, many deep learning models, including pre-trained models, have been applied to automatically detect system anomalies via logs. Normally, these methods require large amounts of labeled data to achieve satisfying results. However, abundant labeled data is rarely available in regular industrial scenarios. In order to address this issue, we propose LogDAPT, a novel framework based on domain-adaptive pretraining (DAPT). To improve the performance under few-shot setting, LogDAPT learns the semantics of logs by applying a second phase of pretraining in-domain to the BERT base model. In our experiments, we utilize two different schemes for DAPT, which are MLM scheme and Span scheme. The experimental results on three public log datasets (HDFS, BGL and Thunderbird) show that LogDAPT achieves satisfactory F1-Scores under few-shot industrial conditions.

References

[1]
Marta Catillo, Antonio Pecchia, and Umberto Villano. 2022. AutoLog: Anomaly detection by deep autoencoding of system logs. Expert Systems with Applications 191 (2022), 116263. https://doi.org/10.1016/j.eswa.2021.116263
[2]
Ningjiang Chen, Huan Tu, Xiaoyan Duan, Liangqing Hu, and Chengxiang Guo. 2022. Semisupervised Anomaly Detection of Multivariate Time Series Based on a Variational Autoencoder. Applied Intelligence 53, 5 (jul 2022), 6074--6098. https://doi.org/10.1007/s10489-022-03829-1
[3]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423
[4]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS '17). Association for Computing Machinery, New York, NY, USA, 1285--1298. https://doi.org/10.1145/3133956.3134015
[5]
Chris Egersdoerfer, Di Zhang, and Dong Dai. 2023. Early Exploration of Using ChatGPT for Log-based Anomaly Detection on Parallel File Systems Logs. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2023, Orlando, FL, USA, June 16-23, 2023, Ali Raza Butt, Ningfang Mi, and Kyle Chard (Eds.). ACM, 315--316. https://doi.org/10.1145/3588195.3595943
[6]
Haixuan Guo, Shuhan Yuan, and Xintao Wu. 2021. LogBERT: Log Anomaly Detection via BERT. In 2021 International Joint Conference on Neural Networks (IJCNN). 1--8. https://doi.org/10.1109/IJCNN52387.2021.9534113
[7]
Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8342--8360. https://doi.org/10.18653/v1/2020.acl-main.740
[8]
Shilin He, Jieming Zhu, Pinjia He, and Michael R. Lyu. 2016. Experience Report: System Log Analysis for Anomaly Detection. In 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). 207--218. https://doi.org/10.1109/ISSRE.2016.21
[9]
Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics 8 (01 2020), 64--77. https://doi.org/10.1162/tacl_a_00300 arXiv:https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00300/1923170/tacl_a_00300.pdf
[10]
Van-Hoang Le and Hongyu Zhang. 2021. Log-based Anomaly Detection Without Log Parsing. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 492--504. https://doi.org/10.1109/ASE51524.2021.9678773
[11]
Van-Hoang Le and Hongyu Zhang. 2022. Log-Based Anomaly Detection with Deep Learning: How Far Are We?. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE '22). Association for Computing Machinery, New York, NY, USA, 1356--1367. https://doi.org/10.1145/3510003.3510155
[12]
Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, and Rong Zhou. 2019. Loganomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China) (IJCAI'19). AAAI Press, 4739--4745.
[13]
OpenAI. 2022. ChatGPT. https://openai.com/blog/chatgpt/.
[14]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).
[15]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
[16]
Shimin Tao, Weibin Meng, Yimeng Cheng, Yichen Zhu, Ying Liu, Chunning Du, Tao Han, Yongpeng Zhao, Xiangguang Wang, and Hao Yang. 2022. LogStamp: Automatic Online Log Parsing Based on Sequence Labelling. SIGMETRICS Perform. Eval. Rev. 49, 4 (jun 2022), 93--98. https://doi.org/10.1145/3543146.3543168
[17]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
[18]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
[19]
Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation. In Proceedings of the 43rd International Conference on Software Engineering (Madrid, Spain) (ICSE '21). IEEE Press, 1448--1460. https://doi.org/10.1109/ICSE43902.2021.00130

Index Terms

  1. LogDAPT: Log Data Anomaly Detection with Domain-Adaptive Pretraining (industry track)

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    Middleware '23: Proceedings of the 24th International Middleware Conference: Industrial Track
    December 2023
    52 pages
    ISBN:9798400704277
    DOI:10.1145/3626562
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    In-Cooperation

    • IFIP: International Federation for Information Processing

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 December 2023

    Check for updates

    Author Tags

    1. anomaly detection
    2. deep learning
    3. domain-adaptive pretraining
    4. log analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Middleware '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 203 of 948 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 261
      Total Downloads
    • Downloads (Last 12 months)180
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media