research-article

AS-Parser: Log Parsing Based on Adaptive Segmentation

Authors:

Wei WangAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 4

Article No.: 232, Pages 1 - 26

https://doi.org/10.1145/3626719

Published: 12 December 2023 Publication History

Abstract

System logs have long been recognized as valuable data for analyzing and diagnosing system failures. One fundamental task of log processing is to convert unstructured logs into structured logs through log parsing. All previous log parsing approaches follow a general framework that first segments each log into a token sequence and then computes similarity between two sequences. However, all existing approaches share the common drawback: the flat segmentation with fixed delimiters fails to understand the structural information of logs, which causes low parsing accuracy. To address this problem, we propose a novel log parsing approach, AS-Parser. Our approach introduces a hierarchical log segmentation mechanism that can adaptively segment logs into a tree structure. It can automatically recognize the appropriate delimiters and capture the common structural information. Moreover, we propose three improvements that enhance both the effectiveness and efficiency of our approach. On the public benchmark, AS-Parser performs best on 14 out of 16 datasets, with an average parsing accuracy of 0.943, far exceeding existing approaches.

References

[1]

Amey Agrawal, Rohit Karlupia, and Rajat Gupta. 2019. Logan: A distributed online log parser. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1946--1951.

[2]

Guojun Chu, Jingyu Wang, Qi Qi, Haifeng Sun, Shimin Tao, and Jianxin Liao. 2021. Prefix-Graph: A Versatile Log Parsing Approach Merging Prefix Tree with Probabilistic Graph. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2411--2422.

[3]

Oihana Coustié, Josiane Mothe, Olivier Teste, and Xavier Baril. 2020. Meting: A robust log parser based on frequent n-gram mining. In 2020 IEEE International Conference on Web Services (ICWS). IEEE, 84--88.

[4]

Hetong Dai, Heng Li, Che Shao Chen, Weiyi Shang, and Tse-Hsun Chen. 2020. Logram: Efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering (2020).

[5]

Min Du and Feifei Li. 2018. Spell: Online streaming parsing of large unstructured system logs. IEEE Transactions on Knowledge and Data Engineering, Vol. 31, 11 (2018), 2213--2227.

[6]

Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 1285--1298.

Digital Library

[7]

Diana El-Masri, Fabio Petrillo, Yann-Gaël Guéhéneuc, Abdelwahab Hamou-Lhadj, and Anas Bouziane. 2020. A systematic literature review on automated log abstraction techniques. Information and Software Technology, Vol. 122 (2020), 106276.

[8]

Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li. 2009. Execution anomaly detection in distributed systems through unstructured log analysis. In 2009 ninth IEEE international conference on data mining. IEEE, 149--158.

[9]

Yihan Gao, Silu Huang, and Aditya Parameswaran. 2018. Navigating the data lake with datamaran: Automatically extracting structure from log datasets. In Proceedings of the 2018 International Conference on Management of Data. 943--958.

Digital Library

[10]

Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, and Abdullah Mueen. 2016. Logmine: Fast pattern recognition for log analytics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1573--1582.

Digital Library

[11]

Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE international conference on web services (ICWS). IEEE, 33--40.

[12]

Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. 2021. A survey on automated log analysis for reliability engineering. ACM Computing Surveys (CSUR), Vol. 54, 6 (2021), 1--37.

Digital Library

[13]

Tao Jiang, Lusheng Wang, and Kaizhong Zhang. 1995. Alignment of trees-an alternative to tree edit. Theoretical computer science, Vol. 143, 1 (1995), 137--148.

[14]

Zhen Ming Jiang, Ahmed E Hassan, Parminder Flora, and Gilbert Hamann. 2008. Abstracting execution logs to execution events for enterprise applications (short paper). In 2008 The Eighth International Conference on Quality Software. IEEE, 181--186.

Digital Library

[15]

Pekka Kilpel"ainen and Heikki Mannila. 1995. Ordered and unordered tree inclusion. SIAM J. Comput., Vol. 24, 2 (1995), 340--356.

Digital Library

[16]

Satoru Kobayashi, Kensuke Fukuda, and Hiroshi Esaki. 2014. Towards an NLP-based log template generation algorithm for system log analysis. In Proceedings of The Ninth International Conference on Future Internet Technologies. 1--4.

Digital Library

[17]

Daniel Kocher and Nikolaus Augsten. 2019. A scalable index for top-k subtree similarity queries. In Proceedings of the 2019 International Conference on Management of Data. 1624--1641.

Digital Library

[18]

John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).

[19]

Ruotian Ma, Xin Zhou, Tao Gui, Yiding Tan, Linyang Li, Qi Zhang, and Xuanjing Huang. 2022. Template-free prompt tuning for few-shot NER. (2022), 5721----5732.

[20]

Adetokunbo AO Makanju, A Nur Zincir-Heywood, and Evangelos E Milios. 2009. Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 1255--1264.

Digital Library

[21]

Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, et al. 2019. LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In IJCAI, Vol. 19. 4739--4745.

[22]

Salma Messaoudi, Annibale Panichella, Domenico Bianculli, Lionel Briand, and Raimondas Sasnauskas. 2018. A search-based approach for accurate identification of log message formats. In Proceedings of the 26th Conference on Program Comprehension. 167--177.

Digital Library

[23]

Haibo Mi, Huaimin Wang, Yangfan Zhou, Michael Rung-Tsong Lyu, and Hua Cai. 2013. Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems. IEEE Transactions on Parallel and Distributed Systems, Vol. 24, 6 (2013), 1245--1255.

Digital Library

[24]

Meiyappan Nagappan and Mladen A Vouk. 2010. Abstracting log lines to log event types for mining software system logs. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 114--117.

[25]

Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, and Odej Kao. 2020. Self-supervised log parsing. https://github.com/nulog/nulog. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 122--138.

[26]

Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, Vol. 48, 3 (1970), 443--453.

[27]

Xia Ning, Geoff Jiang, Haifeng Chen, and Kenji Yoshihira. 2014. 1HLAer: a System for Heterogeneous Log Analysis. (2014).

[28]

Shoumik Palkar, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2018. Filter before you parse: Faster analytics on raw data with sparser. Proceedings of the VLDB Endowment, Vol. 11, 11 (2018), 1576--1589.

Digital Library

[29]

Antonio Pecchia, Marcello Cinque, Gabriella Carrozza, and Domenico Cotroneo. 2015. Industry practices and event logging: Assessment of a critical software development process. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 169--178.

[30]

Issam Sedki, Abdelwahab Hamou-Lhadj, Otmane Ait-Mohamed, and Mohammed A Shehab. 2022. An Effective Approach for Parsing Large Log Files. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 1--12.

[31]

Stefan Thaler, Vlado Menkonvski, and Milan Petkovic. 2017. Towards a neural language model for signature extraction from forensic logs. In 2017 5th International Symposium on Digital Forensic and Security (ISDFS). IEEE, 1--6.

[32]

Risto Vaarandi. 2003. A data clustering algorithm for mining patterns from event logs. In Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003)(IEEE Cat. No. 03EX764). Ieee, 119--126.

[33]

Risto Vaarandi and Mauno Pihelgas. 2015. Logcluster-a data clustering and pattern mining algorithm for event logs. In 2015 11th International conference on network and service management (CNSM). IEEE, 1--7.

Digital Library

[34]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[35]

Arthur Vervaet, Raja Chiky, and Mar Callau-Zori. 2021. USTEP: Unfixed Search Tree for Efficient Log Parsing. https://github.com/outscale-dev/ustep-online-log-parser. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 659--668.

[36]

Xuheng Wang, Xu Zhang, Liqun Li, Shilin He, Hongyu Zhang, Yudong Liu, Lingling Zheng, Yu Kang, Qingwei Lin, Yingnong Dang, et al. 2022. SPINE: a scalable log parser with feedback guidance. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1198--1208.

Digital Library

[37]

Tong Xiao, Zhe Quan, Zhi-Jie Wang, Kaiqi Zhao, and Xiangke Liao. 2020. Lpv: A log parser based on vectorization for offline and online log parsing. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 1346--1351.

[38]

Kenji Yamanishi and Yuko Maruyama. 2005. Dynamic syslog mining for network failure monitoring. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 499--508.

Digital Library

[39]

Yi Yang and Arzoo Katiyar. 2020. Simple and effective few-shot named entity recognition with structured nearest neighbor learning. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (2020), 6365--6375.

[40]

Kaizhong Zhang and Dennis Shasha. 1989. Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing, Vol. 18, 6 (1989), 1245--1262.

[41]

Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R Lyu. 2019. Tools and benchmarks for automated log parsing. https://github.com/logpai/loghub. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 121--130.

Cited By

Cui PLiu HTang BYuan Y(2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648179
Le VZhang H(2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654966
Chen SHe YCui WFan JGe SZhang HZhang DChaudhuri S(2024)Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table RepresentationsProceedings of the ACM on Management of Data10.1145/36549252:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654925
Show More Cited By

Index Terms

AS-Parser: Log Parsing Based on Adaptive Segmentation
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Log Parsing with Generalization Ability under New Log Types
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Log parsing, which converts semi-structured logs into structured logs, is the first step for automated log analysis. Existing parsers are still unsatisfactory in real-world systems due to new log types in new-coming logs. In practice, available logs ...
SPINE: a scalable log parser with feedback guidance
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when ...
Self-supervised log parsing using semantic contribution difference
Abstract
Logs can help developers to promptly diagnose software system failures. Log parsers, which parse semi-structured logs into structured log templates, are the first component for automated log analysis. However, almost all existing log ...
Highlights
- Integrates advanced NLP technology to construct semantic contributions of words to parse logs.

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 4

PACMMOD

December 2023

1317 pages

EISSN:2836-6573

DOI:10.1145/3637468

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2023

Published in PACMMOD Volume 1, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Funding Sources

The work is supported by the Ministry of Science and Technology of China, National Key Research and Development Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
171
Total Downloads

Downloads (Last 12 months)171
Downloads (Last 6 weeks)16

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cui PLiu HTang BYuan Y(2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648179
Le VZhang H(2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654966
Chen SHe YCui WFan JGe SZhang HZhang DChaudhuri S(2024)Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table RepresentationsProceedings of the ACM on Management of Data10.1145/36549252:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654925
Guan JZhang FMa SChen KHu YChen YPan ADu X(2023)Homomorphic Compression: Making Text Processing on Compression UnlimitedProceedings of the ACM on Management of Data10.1145/36267651:4(1-28)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626765
Pan ZZheng ZZhang FWu RLiang HWang DQiu XBai JLin WDu XAamodt TSwift MJerger N(2023)RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624761(268-286)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624761

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents