KAD: a knowledge formalization-based anomaly detection approach for distributed systems

Wei, Xinjie; Sun, Chang-ai; Zhang, Xiao-Yi

doi:10.1007/s11219-024-09670-8

KAD: a knowledge formalization-based anomaly detection approach for distributed systems

Research
Published: 10 May 2024

Volume 32, pages 821–845, (2024)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Xinjie Wei¹,
Chang-ai Sun¹ &
Xiao-Yi Zhang¹

187 Accesses
Explore all metrics

Abstract

Large-scale distributed systems are becoming key engines of the IT industry due to their scalability and extensibility. A distributed system often involves numerous complex interactions among components, suffering anomalies such as data inconsistencies between components and unanticipated delays in response times. Existing anomaly detection techniques, which extract knowledge from system logs using either statistical or machine learning techniques, exhibit limitations. Statistical techniques often miss implicit anomalies that are related to complex interactions manifested by logs, whereas machine learning techniques lack explainability and they are usually sensitive to log variations. In this paper, we propose KAD, a knowledge formalization-based anomaly detection approach for distributed systems. KAD includes a general knowledge description language (KDL), leveraging the general structure of system logs and extended Backus-Naur form (EBNF) for complex knowledge extraction. Particularly, the semantic set is constructed based on the bidirectional encoder representation from the transformer (BERT) model to improve the expressive capabilities of KDL in knowledge description. In addition, KAD incorporates distributed scheduling computation module to improve the efficiency of anomaly detection processes. Experimental results based on two widely used benchmarks show that KAD can accurately describe the knowledge associated with anomalies, with a high F1-score in detecting various anomaly types.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 5

Fig. 7

Behavior Flow Graph Construction from System Logs for Anomaly Analysis

Multi-source Anomaly Detection in Distributed IT Systems

A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms

Availability of data and materials

HDFS is a public data set and can be obtained from (Du et al., 2017). Ray is a privacy data set that involves the privacy of a partner company (the Ant Group) and cannot be made public for the time being.

Code availability

The code involves the privacy of a partner company (the Ant Group) and cannot be made public for the time being.

References

Ali, A., Ali, A., Abaluof, H., et al. (2023). Moisture detection in tree trunks in semiarid lands using low-cost non-invasive capacitive sensors with statistical based anomaly detection approach. Sensors, 23(4), 21–31.
Article Google Scholar
Apache Hadoop. (2023). Apache Hadoop Home. http://hadoop.apache.org/
Apache Spark. (2023). What is Apache Spark? http://spark.apache.org/
Bertero, C., Roy, M., Sauvanaud, C., et al. (2017). Experience report: Log mining using natural language processing and application to anomaly detection. In: Proceedings of the 28th IEEE International Symposium on Software Reliability Engineering, pp 351–360.
Breier, J., & Branišová, J. (2015). Anomaly detection from log files using data mining techniques. In: Proceedings of the 2015 Information Science and Applications, pp 449–457.
Chen, L., Dang, Q., Chen, M., et al. (2023). BertHTLG: Graph-based microservice anomaly detection through sentence-Bert enhancement. In: Proceedings of the 2023 International Conference on Web Information Systems and Applications, pp 427–439.
Devlin, J., Chang, M. W., Lee, K., et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp 4171–4186.
Du, M., Li, F., Zheng, G., et al. (2017). DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp 1285–1298.
Farshchi, M., Schneider, J. G., Weber, I., et al. (2015). Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis. In: Proceedings of the 26th IEEE International Symposium on Software Reliability Engineering, pp 24–34.
Fu, Y., Yan, M., Xu, Z., et al. (2023). An empirical study of the impact of log parsers on the performance of log-based anomaly detection. Empirical Software Engineering, 28(1), 1–39.
Article Google Scholar
Gómez, Á. L. P., Maimó, L. F., Celdrán, A. H., et al. (2023). SUSAN: A deep learning based anomaly detection framework for sustainable industry. Sustainable Computing: Informatics and Systems, 37(3), 834–842.
Google Scholar
Haoming, L., & Yuguo, L. (2020). LogSpy: System log anomaly detection for distributed systems. In: Proceedings of the 2020 International Conference on Artificial Intelligence and Computer Engineering, pp 347–352.
He, P., Zhu, J., Zheng, Z., et al. (2017). Drain: An online log parsing approach with fixed depth tree. In: Proceedings of the 2017 IEEE International Conference on Web Services, pp 33–40.
He, S., Lin, Q., Lou, J. G., et al. (2018). Identifying impactful service system problems via log analysis. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 60–70.
Hidayati, J., Vamelia, R., Hammami, J., et al. (2023). Transparent distribution system design of halal beef supply chain. Uncertain Supply Chain Management, 11(1), 31–40.
Article Google Scholar
Hogan, A., Blomqvist, E., Cochez, M., et al. (2021). Knowledge graphs. ACM Computing Surveys, 54(4), 1–37.
Article Google Scholar
Hristov, M., Nenova, M., Iliev, G., et al. (2021). Integration of Splunk enterprise SIEM for DDoS attack detection in IoT. In: Proceedings of the 20th IEEE International Symposium on Network Computing and Applications, pp 1–5.
Huang, S., Liu, Y., Fung, C., et al. (2023). Improving log-based anomaly detection by pre-training hierarchical transformers. IEEE Transactions on Computers, 72(9), 2656–2667.
Article Google Scholar
IBM. (2023). Ariel Query Language Guide. https://www.ibm.com/docs/en/SS42VS_7.4/pdf/b_qradar_aql.pdf
Le, V. H., & Zhang, H. (2022). Log-based anomaly detection with deep learning: How far are we? In: Proceedings of the 44th international conference on software engineering, pp 1356–1367.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Article Google Scholar
Liang, E., Nishihara, R., Mika, S., et al. (2023). Ray. https://github.com/ray-project/ray
Lou, J. G., Fu, Q., Yang, S., et al. (2010). Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX Annual Technical Conference, pp 24–37.
Lu, S., Wei, X., Li, Y., et al. (2018). Detecting anomaly in big data system logs using convolutional neural network. In: Proceedings of the 16th IEEE Intlernational Conference on Dependable, Autonomic and Secure Computing, pp 151–158.
Ma, X., Keung, J., He, P., et al. (2023). A semi-supervised approach for industrial anomaly detection via self-adaptive clustering. IEEE Transactions on Industrial Informatics, 6(2), 1–12.
Google Scholar
Majeed, A., ur Rasool R, Ahmad F, et al. (2019). Near-miss situation based visual analysis of SIEM rules for real time network security monitoring. Journal of Ambient Intelligence and Humanized Computing, 10(7), 1509–1526.
Meng, W., Liu, Y., Zhu, Y., et al. (2019). LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: Proceedings of the 2019 International Joint Conference on Artificial Intelligence, pp 4739–4745.
Moritz, P., Nishihara, R., Wang, S., et al. (2018). Ray: A distributed framework for emerging AI applications. In: Proceedings of the 13th Operating Systems Design and Implementation, pp 561–577.
Nedelkoski, S., Bogatinovski, J., Acker, A., et al. (2020). Self-attentive classification-based anomaly detection in unstructured logs. In: Proceedings of the 2020 IEEE International Conference on Data Mining, pp 1196–1201.
Qi, J., Luan, Z., Huang, S., et al. (2023). LogEncoder: Log-based contrastive representation learning for anomaly detection. IEEE Transactions on Network and Service Management, 20(2), 1378–1391.
Article Google Scholar
Splunk Enterprise. (2023). Search Tutorial-Use the search language. https://docs.splunk.com/Documentation/Splunk/9.1.1/SearchTutorial/Usethesearchlanguage
Tietz, V., & Annighoefer, B. (2022). A formally defined and formally provable EBNF-based constraint language for use in qualifiable software. In: Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, pp 862–871.
Vinayakumar, R., Soman, K., & Poornachandran, P. (2017). Long short-term memory based operation log anomaly detection. In: Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics, pp 236–242.
Xu, W., Huang, L., Fox, A., et al. (2009). Detecting large-scale system problems by mining console logs. In: Proceedings of the 22nd ACM Symposium on Operating Systems Principles, pp 117–132.
Zhang, K., Xu, J., Min, M. R., et al. (2016). Automated it system failure prediction: A deep learning approach. In: Proceedings of the 2016 IEEE International Conference on Big Data, pp 1291–1300.
Zhang, X., Xu, Y., Lin, Q., et al. (2019). Robust log-based anomaly detection on unstable log data. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 807–817.
Zhang, Y., & Sivasubramaniam, A. (2007). Failure prediction in IBM BlueGene/L event logs. In: Proceedings of the 7th International Conference on Data Mining, pp 583–588.

Download references

Funding

This article is supported by the National Natural Science Foundation of China under Grant Nos. 62272037, 61872039, and 62302035, and CCF-Ant Research Fund.

Author information

Authors and Affiliations

School of Computer and Communication Engineering, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing, 100083, China
Xinjie Wei, Chang-ai Sun & Xiao-Yi Zhang

Authors

Xinjie Wei
View author publications
You can also search for this author in PubMed Google Scholar
Chang-ai Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Yi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xinjie Wei initiated the project, wrote the manuscript, and conducted the experiment; Chang-ai Sun proposed the main idea, discussed the settings of the experiment, and made a revision of the paper; Xiao-Yi Zhang discussed the settings of the experiment and made a revision of the paper.

Corresponding author

Correspondence to Chang-ai Sun.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

Not applicable

Consent for publication

The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your contributing authors) by another publisher.

Competing interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wei, X., Sun, Ca. & Zhang, XY. KAD: a knowledge formalization-based anomaly detection approach for distributed systems. Software Qual J 32, 821–845 (2024). https://doi.org/10.1007/s11219-024-09670-8

Download citation

Accepted: 21 March 2024
Published: 10 May 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11219-024-09670-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KAD: a knowledge formalization-based anomaly detection approach for distributed systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Behavior Flow Graph Construction from System Logs for Anomaly Analysis

Multi-source Anomaly Detection in Distributed IT Systems

A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Competing interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

KAD: a knowledge formalization-based anomaly detection approach for distributed systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Behavior Flow Graph Construction from System Logs for Anomaly Analysis

Multi-source Anomaly Detection in Distributed IT Systems

A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Competing interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation