Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2022
Metadata-based retrieval for resolution recommendation in AIOps
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1379–1389https://doi.org/10.1145/3540250.3558964For a cloud service provider, the goal is to proactively identify signals that can help reduce outages and/or reduce the mean-time-to-detect and mean-time-to-resolve. After an incident is reported, the Site Reliability Engineers diagnose the fault ...
- research-articleNovember 2022
Industry practice of configuration auto-tuning for cloud applications and services
- Runzhe Wang,
- Qinglong Wang,
- Yuxi Hu,
- Heyuan Shi,
- Yuheng Shen,
- Yu Zhan,
- Ying Fu,
- Zheng Liu,
- Xiaohai Shi,
- Yu Jiang
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1555–1565https://doi.org/10.1145/3540250.3558962Auto-tuning attracts increasing attention in industry practice to optimize the performance of a system with many configurable parameters. It is particularly useful for cloud applications and services since they have complex system hierarchies and ...
- research-articleNovember 2022
Trace analysis based microservice architecture measurement
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1589–1599https://doi.org/10.1145/3540250.3558951Microservice architecture design highly relies on expert experience and may often result in improper service decomposition. Moreover, a microservice architecture is likely to degrade with the continuous evolution of services. Architecture measurement ...
- research-articleNovember 2022
An empirical investigation of missing data handling in cloud node failure prediction
- Minghua Ma,
- Yudong Liu,
- Yuang Tong,
- Haozhe Li,
- Pu Zhao,
- Yong Xu,
- Hongyu Zhang,
- Shilin He,
- Lu Wang,
- Yingnong Dang,
- Saravanakumar Rajmohan,
- Qingwei Lin
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1453–1464https://doi.org/10.1145/3540250.3558946Cloud computing systems have become increasingly popular in recent years. A typical cloud system utilizes millions of computing nodes as the basic infrastructure. Node failure has been identified as one of the most prevalent causes of cloud system ...
- research-articleNovember 2022
Infrastructure as code for dynamic deployments
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1775–1779https://doi.org/10.1145/3540250.3558912Modern DevOps organizations require a high degree of automation to achieve software stability at frequent changes. Further, there is a need for flexible, timely reconfiguration of the infrastructure, e.g., to use pay-per-use infrastructure efficiently ...
- research-articleNovember 2022
TraceCRL: contrastive representation learning for microservice trace analysis
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1221–1232https://doi.org/10.1145/3540250.3549146Due to the large amount and high complexity of trace data, microservice trace analysis tasks such as anomaly detection, fault diagnosis, and tail-based sampling widely adopt machine learning technology. These trace analysis approaches usually use a ...
Actionable and interpretable fault localization for recurring failures in online service systems
- Zeyan Li,
- Nengwen Zhao,
- Mingjie Li,
- Xianglin Lu,
- Lixin Wang,
- Dongdong Chang,
- Xiaohui Nie,
- Li Cao,
- Wenchi Zhang,
- Kaixin Sui,
- Yanhua Wang,
- Xu Du,
- Guoqiang Duan,
- Dan Pei
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 996–1008https://doi.org/10.1145/3540250.3549092Fault localization is challenging in an online service system due to its monitoring data's large volume and variety and complex dependencies across/within its components (e.g., services or databases). Furthermore, engineers require fault localization ...