research-article

UniParser: A Unified Log Parser for Heterogeneous Log Data

Authors:

Saravan Rajmohan,

Dongmei ZhangAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 1893 - 1901

https://doi.org/10.1145/3485447.3511993

Published: 25 April 2022 Publication History

Abstract

Logs provide first-hand information for engineers to diagnose failures in large-scale online service systems. Log parsing, which transforms semi-structured raw log messages into structured data, is a prerequisite of automated log analysis such as log-based anomaly detection and diagnosis. Almost all existing log parsers follow the general idea of extracting the common part as templates and the dynamic part as parameters. However, these log parsing methods, often neglect the semantic meaning of log messages. Furthermore, high diversity among various log sources also poses an obstacle in the generalization of log parsing across different systems. In this paper, we propose UniParser to capture the common logging behaviours from heterogeneous log data. UniParser utilizes a Token Encoder module and a Context Encoder module to learn the patterns from the log token and its neighbouring context. A Context Similarity module is specially designed to model the commonalities of learned patterns. We have performed extensive experiments on 16 public log datasets and our results show that UniParser outperforms state-of-the-art log parsers by a large margin. 1

References

[1]

[n. d.]. Public datasets for log parsing. https://github.com/logpai/logparser.

[2]

Michael Chow, David Meisner, Jason Flinn, Daniel Peek, and Thomas F Wenisch. 2014. The mystery machine: End-to-end performance analysis of large-scale internet services. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14). 217–231.

[3]

Hetong Dai, Heng Li, Che Shao Chen, Weiyi Shang, and Tse-Hsun Chen. 2020. Logram: Efficient log parsing using n-gram dictionaries. IEEE Transactions on Software Engineering(2020).

[4]

Min Du and Feifei Li. 2016. Spell: Streaming parsing of system event logs. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 859–864.

[5]

Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1285–1298.

Digital Library

[6]

Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1285–1298.

Digital Library

[7]

Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li. 2009. Execution anomaly detection in distributed systems through unstructured log analysis. In 2009 ninth IEEE international conference on data mining. IEEE, 149–158.

[8]

Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie. 2014. Where do developers log? an empirical study on logging practices in industry. In Companion Proceedings of the 36th International Conference on Software Engineering. 24–33.

Digital Library

[9]

Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, and Abdullah Mueen. 2016. Logmine: Fast pattern recognition for log analytics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1573–1582.

Digital Library

[10]

Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R Lyu. 2016. An evaluation study on log parsing and its use in log mining. In 2016 46th annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, 654–661.

[11]

Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE international conference on web services (ICWS). IEEE, 33–40.

[12]

Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R Lyu, and Dongmei Zhang. 2018. Identifying impactful service system problems via log analysis. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 60–70.

Digital Library

[13]

Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2021. A survey on contrastive self-supervised learning. Technologies 9, 1 (2021), 2.

[14]

Zhen Ming Jiang, Ahmed E Hassan, Parminder Flora, and Gilbert Hamann. 2008. Abstracting execution logs to execution events for enterprise applications (short paper). In 2008 The Eighth International Conference on Quality Software. IEEE, 181–186.

Digital Library

[15]

Chuan Luo, Pu Zhao, Bo Qiao, Youjiang Wu, Hongyu Zhang, Wei Wu, Weihai Lu, Yingnong Dang, Saravanakumar Rajmohan, Qingwei Lin, 2021. NTAM: Neighborhood-Temporal Attention Model for Disk Failure Prediction in Cloud Platforms. In Proceedings of the Web Conference 2021. 1181–1191.

Digital Library

[16]

Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354(2016).

[17]

Adetokunbo AO Makanju, A Nur Zincir-Heywood, and Evangelos E Milios. 2009. Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 1255–1264.

Digital Library

[18]

Masayoshi Mizutani. 2013. Incremental mining of system log format. In 2013 IEEE International Conference on Services Computing. IEEE, 595–602.

Digital Library

[19]

Meiyappan Nagappan and Mladen A Vouk. 2010. Abstracting log lines to log event types for mining software system logs. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 114–117.

[20]

Keiichi Shima. 2016. Length matters: Clustering system log messages using length of words. arXiv preprint arXiv:1611.03213(2016).

[21]

Liang Tang, Tao Li, and Chang-Shing Perng. 2011. LogSig: Generating system events from raw textual logs. In Proceedings of the 20th ACM international conference on Information and knowledge management. 785–794.

Digital Library

[22]

Risto Vaarandi. 2003. A data clustering algorithm for mining patterns from event logs. In Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003)(IEEE Cat. No. 03EX764). Ieee, 119–126.

[23]

Risto Vaarandi and Mauno Pihelgas. 2015. Logcluster-a data clustering and pattern mining algorithm for event logs. In 2015 11th International conference on network and service management (CNSM). IEEE, 1–7.

Digital Library

[24]

Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I Jordan. 2009. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 117–132.

Digital Library

[25]

Ding Yuan, Soyeon Park, and Yuanyuan Zhou. 2012. Characterizing logging practices in open-source software. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 102–112.

[26]

Shenglin Zhang, Ying Liu, Weibin Meng, Zhiling Luo, Jiahao Bu, Sen Yang, Peixian Liang, Dan Pei, Jun Xu, Yuzhi Zhang, 2018. Prefix: Switch failure prediction in datacenter networks. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 1(2018), 1–29.

Digital Library

[27]

Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arXiv preprint arXiv:1502.01710(2015).

[28]

Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, 2019. Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 807–817.

Digital Library

[29]

Xu Zhang, Yong Xu, Si Qin, Shilin He, Bo Qiao, Ze Li, Hongyu Zhang, Xukun Li, Yingnong Dang, Qingwei Lin, 2021. Onion: identifying incident-indicating logs for cloud systems. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1253–1263.

Digital Library

[30]

Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R Lyu. 2019. Tools and benchmarks for automated log parsing. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 121–130.

Digital Library

Cited By

Corbelle CCarneiro VCacheda F(2024)Semantic Hierarchical Classification Applied to Anomaly Detection Using System Logs with a BERT ModelApplied Sciences10.3390/app1413538814:13(5388)Online publication date: 21-Jun-2024
https://doi.org/10.3390/app14135388
Cao JDi XLiu XXu RLi JRen WQi HHu PZhang KLi B(2024)Towards robust log parsing using self-supervised learning for system security analysisIntelligent Data Analysis10.3233/IDA-23013328:4(1093-1113)Online publication date: 17-Jul-2024
https://doi.org/10.3233/IDA-230133
Cheng HYing SDuan XYuan W(2024)DLLogInternational Journal of Intelligent Systems10.1155/2024/59619932024Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1155/2024/5961993
Show More Cited By

Index Terms

UniParser: A Unified Log Parser for Heterogeneous Log Data

Index terms have been assigned to the content through auto-classification.

Recommendations

SPINE: a scalable log parser with feedback guidance
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when ...
Self-supervised log parsing using semantic contribution difference
Abstract
Logs can help developers to promptly diagnose software system failures. Log parsers, which parse semi-structured logs into structured log templates, are the first component for automated log analysis. However, almost all existing log ...
Highlights
- Integrates advanced NLP technology to construct semantic contributions of words to parse logs.
AS-Parser: Log Parsing Based on Adaptive Segmentation
PACMMOD

System logs have long been recognized as valuable data for analyzing and diagnosing system failures. One fundamental task of log processing is to convert unstructured logs into structured logs through log parsing. All previous log parsing approaches ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
679
Total Downloads

Downloads (Last 12 months)204
Downloads (Last 6 weeks)19

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Corbelle CCarneiro VCacheda F(2024)Semantic Hierarchical Classification Applied to Anomaly Detection Using System Logs with a BERT ModelApplied Sciences10.3390/app1413538814:13(5388)Online publication date: 21-Jun-2024
https://doi.org/10.3390/app14135388
Cao JDi XLiu XXu RLi JRen WQi HHu PZhang KLi B(2024)Towards robust log parsing using self-supervised learning for system security analysisIntelligent Data Analysis10.3233/IDA-23013328:4(1093-1113)Online publication date: 17-Jul-2024
https://doi.org/10.3233/IDA-230133
Cheng HYing SDuan XYuan W(2024)DLLogInternational Journal of Intelligent Systems10.1155/2024/59619932024Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1155/2024/5961993
Le VZhang H(2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654966
Jiang ZLiu JChen ZLi YHuang JHuo YHe PGu JLyu M(2024)LILAC: Log Parsing using LLMs with Adaptive Parsing CacheProceedings of the ACM on Software Engineering10.1145/36437331:FSE(137-160)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643733
Wang JChu GWang JSun HQi QWang YQi JLiao JRoychoudhury APaiva AAbreu RStorey MHierons RMadeira H(2024)LogExpert: Log-based Recommended Resolutions Generation using Large Language ModelProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results10.1145/3639476.3639773(42-46)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639476.3639773
Xu JYang RHuo YZhang CHe PRoychoudhury APaiva AAbreu RStorey M(2024)DivLog: Log Parsing with Prompt Enhanced In-Context LearningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639155(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639155
Ma LYang WXu BJiang SFei BLiang JZhou MXiao YRoychoudhury APaiva AAbreu RStorey M(2024)KnowLog: Knowledge Enhanced Pre-trained Language Model for Log UnderstandingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623304(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623304
Liu YTao SMeng WWang JYang HJiang Y(2024)Multi-Source Log Parsing With Pre-Trained Domain ClassifierIEEE Transactions on Network and Service Management10.1109/TNSM.2023.332914421:3(2651-2663)Online publication date: Jun-2024
https://doi.org/10.1109/TNSM.2023.3329144
Lupton SWashizaki HYoshioka NFukazawa Y(2024)Landscape and Taxonomy of Online Parser-Supported Log Anomaly Detection MethodsIEEE Access10.1109/ACCESS.2024.338728712(78193-78218)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3387287
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents