Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

AS-Parser: Log Parsing Based on Adaptive Segmentation

Published: 12 December 2023 Publication History

Abstract

System logs have long been recognized as valuable data for analyzing and diagnosing system failures. One fundamental task of log processing is to convert unstructured logs into structured logs through log parsing. All previous log parsing approaches follow a general framework that first segments each log into a token sequence and then computes similarity between two sequences. However, all existing approaches share the common drawback: the flat segmentation with fixed delimiters fails to understand the structural information of logs, which causes low parsing accuracy. To address this problem, we propose a novel log parsing approach, AS-Parser. Our approach introduces a hierarchical log segmentation mechanism that can adaptively segment logs into a tree structure. It can automatically recognize the appropriate delimiters and capture the common structural information. Moreover, we propose three improvements that enhance both the effectiveness and efficiency of our approach. On the public benchmark, AS-Parser performs best on 14 out of 16 datasets, with an average parsing accuracy of 0.943, far exceeding existing approaches.

References

[1]
Amey Agrawal, Rohit Karlupia, and Rajat Gupta. 2019. Logan: A distributed online log parser. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1946--1951.
[2]
Guojun Chu, Jingyu Wang, Qi Qi, Haifeng Sun, Shimin Tao, and Jianxin Liao. 2021. Prefix-Graph: A Versatile Log Parsing Approach Merging Prefix Tree with Probabilistic Graph. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2411--2422.
[3]
Oihana Coustié, Josiane Mothe, Olivier Teste, and Xavier Baril. 2020. Meting: A robust log parser based on frequent n-gram mining. In 2020 IEEE International Conference on Web Services (ICWS). IEEE, 84--88.
[4]
Hetong Dai, Heng Li, Che Shao Chen, Weiyi Shang, and Tse-Hsun Chen. 2020. Logram: Efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering (2020).
[5]
Min Du and Feifei Li. 2018. Spell: Online streaming parsing of large unstructured system logs. IEEE Transactions on Knowledge and Data Engineering, Vol. 31, 11 (2018), 2213--2227.
[6]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 1285--1298.
[7]
Diana El-Masri, Fabio Petrillo, Yann-Gaël Guéhéneuc, Abdelwahab Hamou-Lhadj, and Anas Bouziane. 2020. A systematic literature review on automated log abstraction techniques. Information and Software Technology, Vol. 122 (2020), 106276.
[8]
Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li. 2009. Execution anomaly detection in distributed systems through unstructured log analysis. In 2009 ninth IEEE international conference on data mining. IEEE, 149--158.
[9]
Yihan Gao, Silu Huang, and Aditya Parameswaran. 2018. Navigating the data lake with datamaran: Automatically extracting structure from log datasets. In Proceedings of the 2018 International Conference on Management of Data. 943--958.
[10]
Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, and Abdullah Mueen. 2016. Logmine: Fast pattern recognition for log analytics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1573--1582.
[11]
Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE international conference on web services (ICWS). IEEE, 33--40.
[12]
Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. 2021. A survey on automated log analysis for reliability engineering. ACM Computing Surveys (CSUR), Vol. 54, 6 (2021), 1--37.
[13]
Tao Jiang, Lusheng Wang, and Kaizhong Zhang. 1995. Alignment of trees-an alternative to tree edit. Theoretical computer science, Vol. 143, 1 (1995), 137--148.
[14]
Zhen Ming Jiang, Ahmed E Hassan, Parminder Flora, and Gilbert Hamann. 2008. Abstracting execution logs to execution events for enterprise applications (short paper). In 2008 The Eighth International Conference on Quality Software. IEEE, 181--186.
[15]
Pekka Kilpel"ainen and Heikki Mannila. 1995. Ordered and unordered tree inclusion. SIAM J. Comput., Vol. 24, 2 (1995), 340--356.
[16]
Satoru Kobayashi, Kensuke Fukuda, and Hiroshi Esaki. 2014. Towards an NLP-based log template generation algorithm for system log analysis. In Proceedings of The Ninth International Conference on Future Internet Technologies. 1--4.
[17]
Daniel Kocher and Nikolaus Augsten. 2019. A scalable index for top-k subtree similarity queries. In Proceedings of the 2019 International Conference on Management of Data. 1624--1641.
[18]
John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).
[19]
Ruotian Ma, Xin Zhou, Tao Gui, Yiding Tan, Linyang Li, Qi Zhang, and Xuanjing Huang. 2022. Template-free prompt tuning for few-shot NER. (2022), 5721----5732.
[20]
Adetokunbo AO Makanju, A Nur Zincir-Heywood, and Evangelos E Milios. 2009. Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 1255--1264.
[21]
Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, et al. 2019. LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In IJCAI, Vol. 19. 4739--4745.
[22]
Salma Messaoudi, Annibale Panichella, Domenico Bianculli, Lionel Briand, and Raimondas Sasnauskas. 2018. A search-based approach for accurate identification of log message formats. In Proceedings of the 26th Conference on Program Comprehension. 167--177.
[23]
Haibo Mi, Huaimin Wang, Yangfan Zhou, Michael Rung-Tsong Lyu, and Hua Cai. 2013. Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems. IEEE Transactions on Parallel and Distributed Systems, Vol. 24, 6 (2013), 1245--1255.
[24]
Meiyappan Nagappan and Mladen A Vouk. 2010. Abstracting log lines to log event types for mining software system logs. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 114--117.
[25]
Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, and Odej Kao. 2020. Self-supervised log parsing. https://github.com/nulog/nulog. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 122--138.
[26]
Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, Vol. 48, 3 (1970), 443--453.
[27]
Xia Ning, Geoff Jiang, Haifeng Chen, and Kenji Yoshihira. 2014. 1HLAer: a System for Heterogeneous Log Analysis. (2014).
[28]
Shoumik Palkar, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2018. Filter before you parse: Faster analytics on raw data with sparser. Proceedings of the VLDB Endowment, Vol. 11, 11 (2018), 1576--1589.
[29]
Antonio Pecchia, Marcello Cinque, Gabriella Carrozza, and Domenico Cotroneo. 2015. Industry practices and event logging: Assessment of a critical software development process. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 169--178.
[30]
Issam Sedki, Abdelwahab Hamou-Lhadj, Otmane Ait-Mohamed, and Mohammed A Shehab. 2022. An Effective Approach for Parsing Large Log Files. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 1--12.
[31]
Stefan Thaler, Vlado Menkonvski, and Milan Petkovic. 2017. Towards a neural language model for signature extraction from forensic logs. In 2017 5th International Symposium on Digital Forensic and Security (ISDFS). IEEE, 1--6.
[32]
Risto Vaarandi. 2003. A data clustering algorithm for mining patterns from event logs. In Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003)(IEEE Cat. No. 03EX764). Ieee, 119--126.
[33]
Risto Vaarandi and Mauno Pihelgas. 2015. Logcluster-a data clustering and pattern mining algorithm for event logs. In 2015 11th International conference on network and service management (CNSM). IEEE, 1--7.
[34]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[35]
Arthur Vervaet, Raja Chiky, and Mar Callau-Zori. 2021. USTEP: Unfixed Search Tree for Efficient Log Parsing. https://github.com/outscale-dev/ustep-online-log-parser. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 659--668.
[36]
Xuheng Wang, Xu Zhang, Liqun Li, Shilin He, Hongyu Zhang, Yudong Liu, Lingling Zheng, Yu Kang, Qingwei Lin, Yingnong Dang, et al. 2022. SPINE: a scalable log parser with feedback guidance. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1198--1208.
[37]
Tong Xiao, Zhe Quan, Zhi-Jie Wang, Kaiqi Zhao, and Xiangke Liao. 2020. Lpv: A log parser based on vectorization for offline and online log parsing. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 1346--1351.
[38]
Kenji Yamanishi and Yuko Maruyama. 2005. Dynamic syslog mining for network failure monitoring. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 499--508.
[39]
Yi Yang and Arzoo Katiyar. 2020. Simple and effective few-shot named entity recognition with structured nearest neighbor learning. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (2020), 6365--6375.
[40]
Kaizhong Zhang and Dennis Shasha. 1989. Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing, Vol. 18, 6 (1989), 1245--1262.
[41]
Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R Lyu. 2019. Tools and benchmarks for automated log parsing. https://github.com/logpai/loghub. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 121--130.

Cited By

View all
  • (2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 3-May-2024
  • (2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
  • (2024)Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table RepresentationsProceedings of the ACM on Management of Data10.1145/36549252:3(1-27)Online publication date: 30-May-2024
  • Show More Cited By

Index Terms

  1. AS-Parser: Log Parsing Based on Adaptive Segmentation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 4
    PACMMOD
    December 2023
    1317 pages
    EISSN:2836-6573
    DOI:10.1145/3637468
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 December 2023
    Published in PACMMOD Volume 1, Issue 4

    Permissions

    Request permissions for this article.

    Author Tags

    1. adaptive hierarchical segmentation
    2. log parsing
    3. log tree

    Qualifiers

    • Research-article

    Funding Sources

    • The work is supported by the Ministry of Science and Technology of China, National Key Research and Development Program

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)171
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 18 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 3-May-2024
    • (2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
    • (2024)Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table RepresentationsProceedings of the ACM on Management of Data10.1145/36549252:3(1-27)Online publication date: 30-May-2024
    • (2023)Homomorphic Compression: Making Text Processing on Compression UnlimitedProceedings of the ACM on Management of Data10.1145/36267651:4(1-28)Online publication date: 12-Dec-2023
    • (2023)RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624761(268-286)Online publication date: 25-Mar-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media