Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
AML: An accuracy metric model for effective evaluation of log parsing techniques
Journal of Systems and Software (JSSO), Volume 216, Issue Chttps://doi.org/10.1016/j.jss.2024.112154AbstractLogs are essential for the maintenance of large software systems. Software engineers often analyze logs for debugging, root cause analysis, and anomaly detection tasks. Logs, however, are partly structured, making the extraction of useful ...
Highlights- We propose AML, a new metric for evaluating the performance of a log parser.
- AML uses the concepts of omission and commission to measure log parsing inaccuracies.
- AML was used to evaluate the accuracy of 14 parsers applied to eight ...
- ArticleAugust 2024
LogRCA: Log-Based Root Cause Analysis for Distributed Services
AbstractTo assist IT service developers and operators in managing their increasingly complex service landscapes, there is a growing effort to leverage artificial intelligence in operations. To speed up troubleshooting, log anomaly detection has received ...
- research-articleAugust 2024
On the Model Update Strategies for Supervised Learning in AIOps Solutions
ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 33, Issue 7Article No.: 184, Pages 1–38https://doi.org/10.1145/3664599AIOps (Artificial Intelligence for IT Operations) solutions leverage the massive data produced during the operation of large-scale systems and machine learning models to assist software engineers in their system operations. As operation data produced in ...
- research-articleAugust 2024
Cluster-Wide Task Slowdown Detection in Cloud System
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 266–277https://doi.org/10.1145/3637528.3671936Slow task detection is a critical problem in cloud operation and maintenance since it is highly related to user experience and can bring substantial liquidated damages. Most anomaly detection methods detect it from a single-task aspect. However, ...
- research-articleAugust 2024
LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models
- Aoxiao Zhong,
- Dengyao Mo,
- Guiyang Liu,
- Jinbu Liu,
- Qingda Lu,
- Qi Zhou,
- Jiesheng Wu,
- Quanzheng Li,
- Qingsong Wen
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 4559–4570https://doi.org/10.1145/3637528.3671810Logs are ubiquitous digital footprints, playing an indispensable role in system diagnostics, security analysis, and performance optimization. The extraction of actionable insights from logs is critically dependent on the log parsing process, which ...
-
- research-articleJuly 2024
LogSD: Detecting Anomalies from System Logs through Self-Supervised Learning and Frequency-Based Masking
Proceedings of the ACM on Software Engineering (PACMSE), Volume 1, Issue FSEArticle No.: 93, Pages 2098–2120https://doi.org/10.1145/3660800Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by ...
- research-articleJuly 2024
LM-PACE: Confidence Estimation by Large Language Models for Effective Root Causing of Cloud Incidents
FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software EngineeringPages 388–398https://doi.org/10.1145/3663529.3663858Major cloud providers have employed advanced AI-based solutions like large language models to aid humans in identifying the root causes of cloud incidents. Even though AI-driven assistants are be- coming more common in the process of analyzing root ...
- research-articleJuly 2024
Exploring LLM-Based Agents for Root Cause Analysis
FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software EngineeringPages 208–219https://doi.org/10.1145/3663529.3663841The growing complexity of cloud based software systems has resulted in incident management becoming an integral part of the software development lifecycle. Root cause analysis (RCA), a critical part of the incident management process, is a demanding task ...
- research-articleOctober 2024
An AIOps Approach to Data Cloud Based on Large Language Models
CAIBDA '24: Proceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and AlgorithmsPages 634–641https://doi.org/10.1145/3690407.3690515In order to overcome the challenges of inefficiency, high error rate and poor scalability in the traditional operations model, and to ensure that the data cloud can provide efficient and stable quality of service, this study explores AIOps strategies for ...
- research-articleMay 2024
Network security AIOps for online stream data monitoring
Neural Computing and Applications (NCAA), Volume 36, Issue 24Pages 14925–14949https://doi.org/10.1007/s00521-024-09863-zAbstractIn cybersecurity, live production data for predictive analysis pose a significant challenge due to the inherently secure nature of the domain. Although there are publicly available, synthesized, and artificially generated datasets, authentic ...
- research-articleApril 2024
Towards AIOps enabled services in continuously evolving software‐intensive embedded systems
Journal of Software: Evolution and Process (WSMR), Volume 36, Issue 5https://doi.org/10.1002/smr.2592AbstractContinuous deployment has been practiced for many years by companies developing web‐ and cloud‐based applications. To succeed with continuous deployment, these companies have a strong collaboration culture between the operations and development ...
With continuous deployment, the complexity of software‐intensive embedded systems increases. Therefore, the efforts needed to support and service these systems increase. AIOps can be a critical enabler in managing the increasing complexity without ...
- research-articleJune 2024
Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World
CAIN '24: Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AIPages 222–233https://doi.org/10.1145/3644815.3644961Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are ...
- research-articleMay 2024
Dynamic Alert Suppression Policy for Noise Reduction in AIOps
ICSE-SEIP '24: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in PracticePages 178–188https://doi.org/10.1145/3639477.3639752As IT environments evolve in both size and complexity, observability tools are needed to monitor their health. As the anomalous events are detected, alerts are generated, leading to alert notifications to the Site Reliability Engineers(SREs). However, ...
- research-articleMarch 2024
ServiceAnomaly: An anomaly detection approach in microservices using distributed traces and profiling metrics
Journal of Systems and Software (JSSO), Volume 209, Issue Chttps://doi.org/10.1016/j.jss.2023.111917AbstractAnomaly detection is an essential activity for identifying abnormal behaviours in microservice-based systems. A common approach is to model the system behaviour during normal operation using either distributed traces or profiling metrics. The ...
Highlights- Combining distributed traces with profiling metrics enhances anomaly detection.
- Relying solely on a single metric is insufficient for studying system performance.
- Both linear & nonlinear metric relationships are vital for system ...
- short-paperNovember 2023
Privacy-Centric Log Parsing for Timely, Proactive Personal Data Protection
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 2204–2206https://doi.org/10.1145/3611643.3617847This paper presents a privacy-centric approach to log parsing, addressing the growing need for privacy compliance in log management. We propose a novel log parser that focuses on data minimization, a key principle in privacy protection. By integrating ...
- research-articleNovember 2023
On-Premise AIOps Infrastructure for a Software Editor SME: An Experience Report
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 1820–1831https://doi.org/10.1145/3611643.3613876Information Technology has become a critical component in various industries, leading to an increased focus on software maintenance and monitoring. With the complexities of modern software systems, traditional maintenance approaches have become ...
- surveyOctober 2023
A Joint Study of the Challenges, Opportunities, and Roadmap of MLOps and AIOps: A Systematic Survey
ACM Computing Surveys (CSUR), Volume 56, Issue 4Article No.: 84, Pages 1–30https://doi.org/10.1145/3625289Data science projects represent a greater challenge than software engineering for organizations pursuing their adoption. The diverse stakeholders involved emphasize the need for a collaborative culture in organizations. This article aims to offer joint ...
- research-articleOctober 2023
Studying the characteristics of AIOps projects on GitHub
Empirical Software Engineering (KLU-EMSE), Volume 28, Issue 6https://doi.org/10.1007/s10664-023-10382-zAbstractArtificial Intelligence for IT Operations (AIOps) leverages AI approaches to handle the massive amount of data generated during the operations of software systems. Prior works have proposed various AIOps solutions to support different tasks in ...
- research-articleSeptember 2023
Using Digital Twins for Software Change Risk Assessment Toward Proactive AIOps
- Luis F. Rivera,
- Norha M. Villegas,
- Gabriel Tamura,
- Hausi A. Muller,
- Ian Watts,
- Eric Erpenbach,
- Laura Shwartz,
- Xiaotong Liu
CASCON '23: Proceedings of the 33rd Annual International Conference on Computer Science and Software EngineeringPages 211–216The increasing structural and behavioural complexity of modern IT systems and environments (IT-Sys|Envs) calls for adopting automated fault anticipation and forecasting mechanisms to mitigate risks, limit system disturbances and damage, and improve ...
- research-articleSeptember 2023
Meta-learning Generalized AIOps Models for Multi-cloud Computer using Digital Twins
CASCON '23: Proceedings of the 33rd Annual International Conference on Computer Science and Software EngineeringPages 206–210Multi-cloud computing is a vitally important topic from both busi-ness and technical perspectives since it guarantees resiliency, avail-ability, and security. Due to the vast number of configurations among cloud providers, it is quite challenging to ...