Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3540250.3549146acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

TraceCRL: contrastive representation learning for microservice trace analysis

Published: 09 November 2022 Publication History

Abstract

Due to the large amount and high complexity of trace data, microservice trace analysis tasks such as anomaly detection, fault diagnosis, and tail-based sampling widely adopt machine learning technology. These trace analysis approaches usually use a preprocessing step to map structured features of traces to vector representations in an ad-hoc way. Therefore, they may lose important information such as topological dependencies between service operations. In this paper, we propose TraceCRL, a trace representation learning approach based on contrastive learning and graph neural network, which can incorporate graph structured information in the downstream trace analysis tasks. Given a trace, TraceCRL constructs an operation invocation graph where nodes represent service operations and edges represent operation invocations together with predefined features for invocation status and related metrics. Based on the operation invocation graphs of traces TraceCRL uses a contrastive learning method to train a graph neural network-based model for trace representation. In particular, TraceCRL employs six trace data augmentation strategies to alleviate the problems of class collision and uniformity of representation in contrastive learning. Our experimental studies show that TraceCRL can significantly improve the performance of trace anomaly detection and offline trace sampling. It also confirms the effectiveness of the trace augmentation strategies and the efficiency of TraceCRL.

References

[1]
André Bento, Jaime Correia, Ricardo Filipe, Filipe Araújo, and Jorge Cardoso. 2021. Automated Analysis of Distributed Tracing: Challenges and Research Directions. J. Grid Comput., 19, 1 (2021), 9. https://doi.org/10.1007/s10723-021-09551-5
[2]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020. 119, PMLR, 1597–1607.
[3]
elastic. 2022. Elasticsearch. https://www.elastic.co/elasticsearch/
[4]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855–864. https://doi.org/10.1145/2939672.2939754
[5]
Xiaofeng Guo, Xin Peng, Hanzhang Wang, Wanxue Li, Huai Jiang, Dan Ding, Tao Xie, and Liangfei Su. 2020. Graph-based trace analysis for microservice architecture understanding and problem diagnosis. In 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2016. ACM, 1387–1397. https://doi.org/10.1145/3368089.3417066
[6]
Zicheng Huang, Pengfei Chen, Guangba Yu, Hongyang Chen, and Zibin Zheng. 2021. Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice Systems. In 2021 IEEE International Conference on Web Services, ICWS 2021. IEEE, 436–446. https://doi.org/10.1109/ICWS53863.2021.00063
[7]
Jaegertracing.Io. 2022. Jaeger. https://www.jaegertracing.io/
[8]
Jonathan Kaldor, Jonathan Mace, Michal Bejda, Edison Gao, Wiktor Kuropatwa, Joe O’Neill, Kian Win Ong, Bill Schaller, Pingjia Shan, Brendan Viscomi, Vinod Venkataraman, Kaushik Veeraraghavan, and Yee Jiun Song. 2017. Canopy: An End-to-End Performance Tracing And Analysis System. In Proceedings of the 26th Symposium on Operating Systems Principles, 2017. ACM, 34–50. https://doi.org/10.1145/3132747.3132749
[9]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net.
[10]
Pedro Henrique B. Las-Casas, Jonathan Mace, Dorgival O. Guedes, and Rodrigo Fonseca. 2018. Weighted Sampling of Execution Traces: Capturing More Needles and Less Hay. In Proceedings of the ACM Symposium on Cloud Computing, SoCC 2018. ACM, 326–332. https://doi.org/10.1145/3267809.3267841
[11]
Pedro Henrique B. Las-Casas, Giorgi Papakerashvili, Vaastav Anand, and Jonathan Mace. 2019. Sifter: Scalable Sampling for Distributed Traces, without Feature Engineering. In Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019. ACM, 312–324. https://doi.org/10.1145/3357223.3362736
[12]
Bowen Li, Xin Peng, Qilin Xiang, Hanzhang Wang, Tao Xie, Jun Sun, and Xuanzhe Liu. 2022. Enjoy your observability: an industrial survey of microservice tracing and analysis. Empir. Softw. Eng., 27, 1 (2022), 25. https://doi.org/10.1007/s10664-021-10063-9
[13]
Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister. 2021. CutPaste: Self-Supervised Learning for Anomaly Detection and Localization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021. Computer Vision Foundation / IEEE, 9664–9674.
[14]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated Graph Sequence Neural Networks. In 4th International Conference on Learning Representations, ICLR 2016. arxiv:1511.05493
[15]
Zeyan Li, Junjie Chen, Rui Jiao, Nengwen Zhao, Zhijun Wang, Shuwei Zhang, Yanjun Wu, Long Jiang, Leiqin Yan, Zikai Wang, Zhekang Chen, Wenchi Zhang, Xiaohui Nie, Kaixin Sui, and Dan Pei. 2021. Practical Root Cause Localization for Microservice Systems via Trace Analysis. In 29th IEEE/ACM International Symposium on Quality of Service, IWQOS 2021. IEEE, 1–10. https://doi.org/10.1109/IWQOS52092.2021.9521340
[16]
Ping Liu, Haowen Xu, Qianyu Ouyang, Rui Jiao, Zhekang Chen, Shenglin Zhang, Jiahai Yang, Linlin Mo, Jice Zeng, Wenman Xue, and Dan Pei. 2020. Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks. In 31st IEEE International Symposium on Software Reliability Engineering, ISSRE 2020. IEEE, 48–58. https://doi.org/10.1109/ISSRE5003.2020.00014
[17]
Chaos Mesh. 2022. Chaos Mesh. https://chaos-mesh.org/
[18]
Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, ICLR 2013. arxiv:1301.3781
[19]
Sasho Nedelkoski, Jorge S. Cardoso, and Odej Kao. 2019. Anomaly Detection and Classification using Distributed Tracing and Deep Learning. In 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2019. IEEE, 241–250. https://doi.org/10.1109/CCGRID.2019.00038
[20]
Sasho Nedelkoski, Jorge S. Cardoso, and Odej Kao. 2019. Anomaly Detection from System Tracing Data Using Multimodal Deep Learning. In 12th IEEE International Conference on Cloud Computing, CLOUD 2019. IEEE, 179–186. https://doi.org/10.1109/CLOUD.2019.00038
[21]
Opentracing.io. 2022. OpenTracing. https://opentracing.io/
[22]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: online learning of social representations. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14. ACM, 701–710. https://doi.org/10.1145/2623330.2623732
[23]
Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khandeparkar. 2019. A Theoretical Analysis of Contrastive Unsupervised Representation Learning. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019. 97, PMLR, 5628–5637. http://proceedings.mlr.press/v97/saunshi19a.html
[24]
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The Graph Neural Network Model. IEEE Trans. Neural Networks, 20, 1 (2009), 61–80. https://doi.org/10.1109/TNN.2008.2005605
[25]
Bernhard Schölkopf, Robert C. Williamson, Alexander J. Smola, John Shawe-Taylor, and John C. Platt. 1999. Support Vector Method for Novelty Detection. In Advances in Neural Information Processing Systems 12. The MIT Press, 582–588.
[26]
Benjamin H Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a large-scale distributed systems tracing infrastructure.
[27]
skywalking.apache.org. 2022. Apache SkyWalking. http://skywalking.apache.org/
[28]
Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon, Minho Jin, and Tomas Pfister. 2021. Learning and Evaluating Representations for Deep One-Class Classification. In 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net.
[29]
TraceCRL. 2022. TraceCRL. https://fudanselab.github.io/TraceCRL/
[30]
Twitter. 2022. Zipkin. https://zipkin.io/
[31]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
[32]
Tongzhou Wang and Phillip Isola. 2020. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020. 119, PMLR, 9929–9939. http://proceedings.mlr.press/v119/wang20k.html
[33]
Tian Xie and Jeffrey C Grossman. 2018. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters, 120, 14 (2018), 145301.
[34]
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph Contrastive Learning with Augmentations. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020.
[35]
Guangba Yu, Pengfei Chen, Hongyang Chen, Zijie Guan, Zicheng Huang, Linxiao Jing, Tianjun Weng, Xinmeng Sun, and Xiaoyun Li. 2021. MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments. In WWW ’21: The Web Conference 2021. ACM / IW3C2, 3087–3098. https://doi.org/10.1145/3442381.3449905
[36]
Guangba Yu, Zicheng Huang, and Pengfei Chen. 2021. TraceRank: Abnormal service localization with dis-aggregated end-to-end tracing data in cloud native systems. Journal of Software: Evolution and Process, e2413.
[37]
Chenxi Zhang, Xin Peng, Chaofeng Sha, Ke Zhang, Zhenqing Fu, Xiya Wu, Qingwei Lin, and Dongmei Zhang. 2022. DeepTraLog: Trace-Log Combined Microservice Anomaly Detection through Graph-based Deep Learning. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022. ACM, 623–634. https://doi.org/10.1145/3510003.3510180
[38]
Nengwen Zhao, Junjie Chen, Zhaoyang Yu, Honglin Wang, Jiesong Li, Bin Qiu, Hongyu Xu, Wenchi Zhang, Kaixin Sui, and Dan Pei. 2021. Identifying bad software changes via multimodal anomaly detection for online service systems. In ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021. ACM, 527–539. https://doi.org/10.1145/3468264.3468543
[39]
Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chao Ji, Wenhai Li, and Dan Ding. 2021. Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study. IEEE Trans. Software Eng., 47, 2 (2021), 243–260. https://doi.org/10.1109/TSE.2018.2887384
[40]
Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chao Ji, Dewei Liu, Qilin Xiang, and Chuan He. 2019. Latent error prediction and fault localization for microservice applications by learning from system trace logs. In 2019 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019. ACM, 683–694. https://doi.org/10.1145/3338906.3338961
[41]
Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chenjie Xu, Chao Ji, and Wenyun Zhao. 2018. Benchmarking microservice systems for software engineering research. In 40th International Conference on Software Engineering, ICSE 2018. ACM, 323–324. https://doi.org/10.1145/3183440.3194991

Cited By

View all
  • (2025)Performance issue monitoring, identification and diagnosis of SaaS software: a surveyFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2701-019:1Online publication date: 1-Jan-2025
  • (2024)Trace-based Multi-Dimensional Root Cause Localization of Performance Issues in Microservice SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639088(1-12)Online publication date: 20-May-2024
  • (2024)iTCRL: Causal-Intervention-Based Trace Contrastive Representation Learning for Microservice SystemsIEEE Transactions on Software Engineering10.1109/TSE.2024.344653250:10(2583-2601)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2022
1822 pages
ISBN:9781450394130
DOI:10.1145/3540250
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Contrastive Learning
  2. Deep Learning
  3. Graph Neural Network
  4. Microservice
  5. Tracing

Qualifiers

  • Research-article

Conference

ESEC/FSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)226
  • Downloads (Last 6 weeks)27
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Performance issue monitoring, identification and diagnosis of SaaS software: a surveyFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2701-019:1Online publication date: 1-Jan-2025
  • (2024)Trace-based Multi-Dimensional Root Cause Localization of Performance Issues in Microservice SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639088(1-12)Online publication date: 20-May-2024
  • (2024)iTCRL: Causal-Intervention-Based Trace Contrastive Representation Learning for Microservice SystemsIEEE Transactions on Software Engineering10.1109/TSE.2024.344653250:10(2583-2601)Online publication date: Oct-2024
  • (2024)LabelEase: A Semi-Automatic Tool for Efficient and Accurate Trace Labeling in Microservices2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00032(238-247)Online publication date: 28-Oct-2024
  • (2024)Tracemesh: Scalable and Streaming Sampling for Distributed Traces2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00016(54-65)Online publication date: 7-Jul-2024
  • (2024)MicroCMComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2023.110121238:COnline publication date: 14-Mar-2024
  • (2024)Enhancing fault localization in microservices systems through span-level using graph convolutional networksAutomated Software Engineering10.1007/s10515-024-00445-w31:2Online publication date: 5-Jun-2024
  • (2023)From Point-wise to Group-wise: A Fast and Accurate Microservice Trace Anomaly Detection ApproachProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613861(1739-1749)Online publication date: 30-Nov-2023
  • (2023)Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00012(69-79)Online publication date: 9-Oct-2023
  • (2023)BertHTLG: Graph-Based Microservice Anomaly Detection Through Sentence-Bert EnhancementWeb Information Systems and Applications10.1007/978-981-99-6222-8_36(427-439)Online publication date: 15-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media