Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3624062.3624128acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

Heterogeneous Syslog Analysis: There Is Hope

Published: 12 November 2023 Publication History

Abstract

Identifying system hardware failures and anomalies is a unique challenge in heterogeneous testbed clusters because of variation in the ways that the system log reports errors and warnings. We present a novel approach for the real-time classification of syslog messages generated by a heterogeneous testbed cluster to proactively identify potential hardware issues and security events. By integrating machine learning models with high-performance computing systems, our system facilitates continuous system health monitoring. The paper introduces a taxonomy for classifying system issues into actionable categories of problems, while filtering out groups of messages that the system administrators would consider unimportant "noise". Finally, we experiment with using large language models as a message classifier, and share our results and experience with doing so. Results demonstrate promising performance, and more explainable results compared to currently available techniques, but the computational costs may offset the benefits.

Supplemental Material

MP4 File
Recording of "Heterogeneous Syslog Analysis: There Is Hope" presentation at HPCSYSPROS23.

References

[1]
Burak Aksar, Benjamin Schwaller, Omar Aaziz, Vitus J. Leung, Jim Brandt, Manuel Egele, and Ayse K. Coskun. 2021. E2EWatch: An End-to-End Anomaly Diagnosis Framework for Production HPC Systems. In Euro-Par 2021: Parallel Processing, Leonel Sousa, Nuno Roma, and Pedro Tomás (Eds.). Springer International Publishing, Cham, 70–85. https://doi.org/10.1007/978-3-030-85665-6_5
[2]
Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, Merouane Debbah, Etienne Goffinet, Daniel Heslow, Julien Launay, Quentin Malartic, Badreddine Noune, Baptiste Pannier, and Guilherme Penedo. 2023. Falcon-40B: an open large language model with state-of-the-art performance. (2023).
[3]
Elisabeth Baseman, Sean Blanchard, Zongze Li, and Song Fu. 2016. Relational Synthesis of Text and Numeric Data for Anomaly Detection on Computing System Logs. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA). 882–885. https://doi.org/10.1109/ICMLA.2016.0158
[4]
Elisabeth Baseman and Lissa. 2016. Interpretable Anomaly Detection for Monitoring of High Performance Computing Systems. https://api.semanticscholar.org/CorpusID:51763551
[5]
Steven Bird, Edward Loper, and Ewan Klein. 2009. Natural Language Processing with Python. O’Reilly Media Inc.
[6]
William Cavnar and John Trenkle. 2001. N-Gram-Based Text Categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (05 2001).
[7]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(CCS ’17). Association for Computing Machinery, New York, NY, USA, 1285–1298. https://doi.org/10.1145/3133956.3134015
[8]
Fluentd. 2023. Fluentd: Open Source Data Collector - Unified Logging Layer. https://fluentd.org/ Accessed: 2023-07-31.
[9]
Charles Kristopher Garrett. 2018. The Darwin Cluster. (6 2018). https://doi.org/10.2172/1441285
[10]
Asif Iqbal Hajamydeen, Nur Izura Udzir, Ramlan Mahmod, and Abdul Azim Abd. Ghani. 2011. Filtering Events using Clustering in Heterogeneous Security Logs. Information Technology Journal 10, 4 (04 2011). https://doi.org/10.3923/itj.2011.798.806
[11]
Karen Kukich. 1992. Techniques for Automatically Correcting Words in Text. ACM Comput. Surv. 24, 4 (dec 1992), 377–439. https://doi.org/10.1145/146370.146380
[12]
Grafana Labs. 2023. Grafana: The open observability platform. https://grafana.com/ Accessed: 2023-07-30.
[13]
Benjamin Lefaudeux, Francisco Massa, Diana Liskovich, Wenhan Xiong, Vittorio Caggiano, Sean Naren, Min Xu, Jieru Hu, Marta Tintore, Susan Zhang, Patrick Labatut, and Daniel Haziza. 2022. xFormers: A modular and hackable Transformer modelling library. https://github.com/facebookresearch/xformers.
[14]
Vladimir I. Levenshtein. 1965. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics. Doklady 10 (1965), 707–710. https://api.semanticscholar.org/CorpusID:60827152
[15]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.703
[16]
Tao Li, Feng Liang, Sheng Ma, and Wei Peng. 2005. An Integrated Framework on Mining Logs Files for Computing System Management(KDD ’05). Association for Computing Machinery, New York, NY, USA, 776–781. https://doi.org/10.1145/1081870.1081972
[17]
Yinglung Liang, Yanyong Zhang, Hui Xiong, and Ramendra Sahoo. 2007. Failure Prediction in IBM BlueGene/L Event Logs. In Seventh IEEE International Conference on Data Mining (ICDM 2007). 583–588. https://doi.org/10.1109/ICDM.2007.46
[18]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze. 2019. Introduction to information retrieval. Cambridge University Press.
[19]
OpenAI. 2023. Chatbot: GPT-4, Language Model by OpenAI. https://www.openai.com/chat-gpt/. Accessed: 2023-07-30.
[20]
OpenSearch. 2023. OpenSearch. https://opensearch.org/ Accessed: 2023-07-31.
[21]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[22]
Hudan Studiawan and Ferdous Sohel. 2020. Performance Evaluation of Anomaly Detection in Imbalanced System Log Data. In 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4). 239–246. https://doi.org/10.1109/WorldS450073.2020.9210329
[23]
Lei Sun and Xiaolong Xu. 2023. LogPal: A Generic Anomaly Detection Scheme of Heterogeneous Logs for Network Systems. Security and Communication Networks 2023 (Apr 2023). https://doi.org/10.1155/2023/2803139
[24]
Ozan Tuncer, Emre Ates, Yijia Zhang, Ata Turk, Jim Brandt, Vitus J. Leung, Manuel Egele, and Ayse K. Coskun. 2019. Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning. IEEE Transactions on Parallel and Distributed Systems 30, 4 (2019), 883–896. https://doi.org/10.1109/TPDS.2018.2870403
[25]
Wenpeng Yin, Jamaal Hay, and Dan Roth. 2019. Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161 (2019).
[26]
Kalyani Zope, Kuldeep Singh, Sri Harsha Nistala, Arghya Basak, Pradeep Rathore, and Venkataramana Runkana. 2019. Anomaly Detection and Diagnosis In Manufacturing Systems: A Comparative Study Of Statistical, Machine Learning And Deep Learning Techniques. In Proceedings of the Annual Conference of the PHM Society, 11(1). https://doi.org/10.36001/phmconf.2019.v11i1.815

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Applications of Large-Language-Models
  2. Cross-platform Software
  3. Error detection
  4. Failure detection
  5. Heterogeneous Clusters
  6. Log Analysis
  7. Monitoring
  8. Syslog
  9. Testbeds

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SC-W 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 146
    Total Downloads
  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)34
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media