Abstract
Microservice architectures are increasingly adopted to design large-scale applications. However, the highly distributed nature and complex dependencies of microservices complicate automatic performance diagnosis and make it challenging to guarantee service level agreements (SLAs). In particular, identifying the culprits of a microservice performance issue is extremely difficult as the set of potential root causes is large and issues can manifest themselves in complex ways. This paper presents an application-agnostic system to locate the culprits for microservice performance degradation with fine granularity, including not only the anomalous service from which the performance issue originates but also the culprit metrics that correlate to the service abnormality. Our method first finds potential culprit services by constructing a service dependency graph and next applies an autoencoder to identify abnormal service metrics based on a ranked list of reconstruction errors. Our experimental evaluation based on injection of performance anomalies to a microservice benchmark deployed in the cloud shows that our system achieves a good diagnosis result, with 92% precision in locating culprit service and 85.5% precision in locating culprit metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Sock-shop - https://microservices-demo.github.io/.
- 2.
Google Cloud Engine - https://cloud.google.com/compute/.
- 3.
Istio - https://istio.io/.
- 4.
Node-exporter - https://github.com/prometheus/node_exporter.
- 5.
Cadvisor - https://github.com/google/cadvisor.
- 6.
Prometheus - https://prometheus.io/.
- 7.
stress-ng - https://kernel.ubuntu.com/~cking/stress-ng/.
References
Brandón, Á., et al.: Graph-based root cause analysis for service-oriented and microservice architectures. J. Syst. Softw. 159, 110432 (2020)
Chen, P., Qi, Y., Hou, D.: Causeinfer: automated end-to-end performance diagnosis with hierarchical causality graph in cloud environment. IEEE Trans. Serv. Comput. 12(02), 214–230 (2019)
Di Francesco, P., Lago, P., Malavolta, I.: Migrating towards microservice architectures: an industrial survey. In: ICSA, pp. 29–2909 (2018)
Gan, Y., et al.: Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, pp. 19–33 (2019)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
Gulenko, A., et al.: Detecting anomalous behavior of black-box services modeled with distance-based online clustering. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 912–915 (2018)
łgorzata Steinder, M., Sethi, A.S.: A survey of fault localization techniques in computer networks. Sci. Comput. Program. 53(2), 165–194 (2004)
Lin, J., et al.: Microscope: pinpoint performance issues with causal graphs in micro-service environments. In: Service-Oriented Computing, pp. 3–20 (2018)
Ma, M., et al.: Automap: diagnose your microservice-based web applications automatically. In: Proceedings of the Web Conference 2020, WWW 2020, pp. 246–258 (2020)
Mariani, L., et al.: Localizing faults in cloud systems. In: ICST, pp. 262–273 (2018)
Meng, Y., et al.: Localizing failure root causes in a microservice through causality inference. In: 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS), pp. 1–10. IEEE (2020)
Newman, S.: Building Microservices. O’Reilly Media Inc., Newton (2015)
Solé, M., Muntés-Mulero, V., Rana, A.I., Estrada, G.: Survey on models and techniques for root-cause analysis (2017)
Thalheim, J., et al.: Sieve: actionable insights from monitored metrics in distributed systems. In: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, pp. 14–27 (2017)
Wang, P., et al.: Cloudranger: root cause identification for cloud native systems. In: CCGRID, pp. 492–502 (2018)
Wu, L., et al.: MicroRCA: root cause localization of performance issues in microservices. In: NOMS 2020 IEEE/IFIP Network Operations and Management Symposium (2020)
Acknowledgment
This work is part of the FogGuru project which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 765452. The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the European Union. Neither the European Union institutions and bodies nor any person acting on their behalf may be held responsible for the use which may be made of the information contained therein.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, L., Bogatinovski, J., Nedelkoski, S., Tordsson, J., Kao, O. (2021). Performance Diagnosis in Cloud Microservices Using Deep Learning. In: Hacid, H., et al. Service-Oriented Computing – ICSOC 2020 Workshops. ICSOC 2020. Lecture Notes in Computer Science(), vol 12632. Springer, Cham. https://doi.org/10.1007/978-3-030-76352-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-76352-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76351-0
Online ISBN: 978-3-030-76352-7
eBook Packages: Computer ScienceComputer Science (R0)