Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

KAD: a knowledge formalization-based anomaly detection approach for distributed systems

  • Research
  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Large-scale distributed systems are becoming key engines of the IT industry due to their scalability and extensibility. A distributed system often involves numerous complex interactions among components, suffering anomalies such as data inconsistencies between components and unanticipated delays in response times. Existing anomaly detection techniques, which extract knowledge from system logs using either statistical or machine learning techniques, exhibit limitations. Statistical techniques often miss implicit anomalies that are related to complex interactions manifested by logs, whereas machine learning techniques lack explainability and they are usually sensitive to log variations. In this paper, we propose KAD, a knowledge formalization-based anomaly detection approach for distributed systems. KAD includes a general knowledge description language (KDL), leveraging the general structure of system logs and extended Backus-Naur form (EBNF) for complex knowledge extraction. Particularly, the semantic set is constructed based on the bidirectional encoder representation from the transformer (BERT) model to improve the expressive capabilities of KDL in knowledge description. In addition, KAD incorporates distributed scheduling computation module to improve the efficiency of anomaly detection processes. Experimental results based on two widely used benchmarks show that KAD can accurately describe the knowledge associated with anomalies, with a high F1-score in detecting various anomaly types.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Algorithm 1
Fig. 8
Algorithm 2

Similar content being viewed by others

Availability of data and materials

HDFS is a public data set and can be obtained from (Du et al., 2017). Ray is a privacy data set that involves the privacy of a partner company (the Ant Group) and cannot be made public for the time being.

Code availability

The code involves the privacy of a partner company (the Ant Group) and cannot be made public for the time being.

References

  • Ali, A., Ali, A., Abaluof, H., et al. (2023). Moisture detection in tree trunks in semiarid lands using low-cost non-invasive capacitive sensors with statistical based anomaly detection approach. Sensors, 23(4), 21–31.

    Article  Google Scholar 

  • Apache Hadoop. (2023). Apache Hadoop Home. http://hadoop.apache.org/

  • Apache Spark. (2023). What is Apache Spark? http://spark.apache.org/

  • Bertero, C., Roy, M., Sauvanaud, C., et al. (2017). Experience report: Log mining using natural language processing and application to anomaly detection. In: Proceedings of the 28th IEEE International Symposium on Software Reliability Engineering, pp 351–360.

  • Breier, J., & Branišová, J. (2015). Anomaly detection from log files using data mining techniques. In: Proceedings of the 2015 Information Science and Applications, pp 449–457.

  • Chen, L., Dang, Q., Chen, M., et al. (2023). BertHTLG: Graph-based microservice anomaly detection through sentence-Bert enhancement. In: Proceedings of the 2023 International Conference on Web Information Systems and Applications, pp 427–439.

  • Devlin, J., Chang, M. W., Lee, K., et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp 4171–4186.

  • Du, M., Li, F., Zheng, G., et al. (2017). DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp 1285–1298.

  • Farshchi, M., Schneider, J. G., Weber, I., et al. (2015). Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis. In: Proceedings of the 26th IEEE International Symposium on Software Reliability Engineering, pp 24–34.

  • Fu, Y., Yan, M., Xu, Z., et al. (2023). An empirical study of the impact of log parsers on the performance of log-based anomaly detection. Empirical Software Engineering, 28(1), 1–39.

    Article  Google Scholar 

  • Gómez, Á. L. P., Maimó, L. F., Celdrán, A. H., et al. (2023). SUSAN: A deep learning based anomaly detection framework for sustainable industry. Sustainable Computing: Informatics and Systems, 37(3), 834–842.

    Google Scholar 

  • Haoming, L., & Yuguo, L. (2020). LogSpy: System log anomaly detection for distributed systems. In: Proceedings of the 2020 International Conference on Artificial Intelligence and Computer Engineering, pp 347–352.

  • He, P., Zhu, J., Zheng, Z., et al. (2017). Drain: An online log parsing approach with fixed depth tree. In: Proceedings of the 2017 IEEE International Conference on Web Services, pp 33–40.

  • He, S., Lin, Q., Lou, J. G., et al. (2018). Identifying impactful service system problems via log analysis. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 60–70.

  • Hidayati, J., Vamelia, R., Hammami, J., et al. (2023). Transparent distribution system design of halal beef supply chain. Uncertain Supply Chain Management, 11(1), 31–40.

    Article  Google Scholar 

  • Hogan, A., Blomqvist, E., Cochez, M., et al. (2021). Knowledge graphs. ACM Computing Surveys, 54(4), 1–37.

    Article  Google Scholar 

  • Hristov, M., Nenova, M., Iliev, G., et al. (2021). Integration of Splunk enterprise SIEM for DDoS attack detection in IoT. In: Proceedings of the 20th IEEE International Symposium on Network Computing and Applications, pp 1–5.

  • Huang, S., Liu, Y., Fung, C., et al. (2023). Improving log-based anomaly detection by pre-training hierarchical transformers. IEEE Transactions on Computers, 72(9), 2656–2667.

    Article  Google Scholar 

  • IBM. (2023). Ariel Query Language Guide. https://www.ibm.com/docs/en/SS42VS_7.4/pdf/b_qradar_aql.pdf

  • Le, V. H., & Zhang, H. (2022). Log-based anomaly detection with deep learning: How far are we? In: Proceedings of the 44th international conference on software engineering, pp 1356–1367.

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  • Liang, E., Nishihara, R., Mika, S., et al. (2023). Ray. https://github.com/ray-project/ray

  • Lou, J. G., Fu, Q., Yang, S., et al. (2010). Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX Annual Technical Conference, pp 24–37.

  • Lu, S., Wei, X., Li, Y., et al. (2018). Detecting anomaly in big data system logs using convolutional neural network. In: Proceedings of the 16th IEEE Intlernational Conference on Dependable, Autonomic and Secure Computing, pp 151–158.

  • Ma, X., Keung, J., He, P., et al. (2023). A semi-supervised approach for industrial anomaly detection via self-adaptive clustering. IEEE Transactions on Industrial Informatics, 6(2), 1–12.

    Google Scholar 

  • Majeed, A., ur Rasool R, Ahmad F, et al. (2019). Near-miss situation based visual analysis of SIEM rules for real time network security monitoring. Journal of Ambient Intelligence and Humanized Computing, 10(7), 1509–1526.

  • Meng, W., Liu, Y., Zhu, Y., et al. (2019). LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: Proceedings of the 2019 International Joint Conference on Artificial Intelligence, pp 4739–4745.

  • Moritz, P., Nishihara, R., Wang, S., et al. (2018). Ray: A distributed framework for emerging AI applications. In: Proceedings of the 13th Operating Systems Design and Implementation, pp 561–577.

  • Nedelkoski, S., Bogatinovski, J., Acker, A., et al. (2020). Self-attentive classification-based anomaly detection in unstructured logs. In: Proceedings of the 2020 IEEE International Conference on Data Mining, pp 1196–1201.

  • Qi, J., Luan, Z., Huang, S., et al. (2023). LogEncoder: Log-based contrastive representation learning for anomaly detection. IEEE Transactions on Network and Service Management, 20(2), 1378–1391.

    Article  Google Scholar 

  • Splunk Enterprise. (2023). Search Tutorial-Use the search language. https://docs.splunk.com/Documentation/Splunk/9.1.1/SearchTutorial/Usethesearchlanguage

  • Tietz, V., & Annighoefer, B. (2022). A formally defined and formally provable EBNF-based constraint language for use in qualifiable software. In: Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, pp 862–871.

  • Vinayakumar, R., Soman, K., & Poornachandran, P. (2017). Long short-term memory based operation log anomaly detection. In: Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics, pp 236–242.

  • Xu, W., Huang, L., Fox, A., et al. (2009). Detecting large-scale system problems by mining console logs. In: Proceedings of the 22nd ACM Symposium on Operating Systems Principles, pp 117–132.

  • Zhang, K., Xu, J., Min, M. R., et al. (2016). Automated it system failure prediction: A deep learning approach. In: Proceedings of the 2016 IEEE International Conference on Big Data, pp 1291–1300.

  • Zhang, X., Xu, Y., Lin, Q., et al. (2019). Robust log-based anomaly detection on unstable log data. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 807–817.

  • Zhang, Y., & Sivasubramaniam, A. (2007). Failure prediction in IBM BlueGene/L event logs. In: Proceedings of the 7th International Conference on Data Mining, pp 583–588.

Download references

Funding

This article is supported by the National Natural Science Foundation of China under Grant Nos. 62272037, 61872039, and 62302035, and CCF-Ant Research Fund.

Author information

Authors and Affiliations

Authors

Contributions

Xinjie Wei initiated the project, wrote the manuscript, and conducted the experiment; Chang-ai Sun proposed the main idea, discussed the settings of the experiment, and made a revision of the paper; Xiao-Yi Zhang discussed the settings of the experiment and made a revision of the paper.

Corresponding author

Correspondence to Chang-ai Sun.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

Not applicable

Consent for publication

The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your contributing authors) by another publisher.

Competing interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, X., Sun, Ca. & Zhang, XY. KAD: a knowledge formalization-based anomaly detection approach for distributed systems. Software Qual J 32, 821–845 (2024). https://doi.org/10.1007/s11219-024-09670-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-024-09670-8

Keywords