research-article

Free access

Interdependent Causal Networks for Root Cause Localization

Authors:

Haifeng ChenAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 5051 - 5060

https://doi.org/10.1145/3580305.3599849

Published: 04 August 2023 Publication History

PDF eReader

Abstract

The goal of root cause analysis is to identify the underlying causes of system problems by discovering and analyzing the causal structure from system monitoring data. It is indispensable for maintaining the stability and robustness of large-scale complex systems. Existing methods mainly focus on the construction of a single effective isolated causal network, whereas many real-world systems are complex and exhibit interdependent structures (i.e., multiple networks of a system are interconnected by cross-network links). In interdependent networks, the malfunctioning effects of problematic system entities can propagate to other networks or different levels of system entities. Consequently, ignoring the interdependency results in suboptimal root cause analysis outcomes.

In this paper, we propose REASON, a novel framework that enables the automatic discovery of both intra-level (i.e., within-network) and inter-level (i.e., across-network) causal relationships for root cause localization. REASON consists of Topological Causal Discovery (TCD) and Individual Causal Discovery (ICD). The TCD component aims to model the fault propagation in order to trace back to the root causes. To achieve this, we propose novel hierarchical graph neural networks to construct interdependent causal networks by modeling both intra-level and inter-level non-linear causal relations. Based on the learned interdependent causal networks, we then leverage random walk with restarts to model the network propagation of a system fault. The ICD component focuses on capturing abrupt change patterns of a single system entity. This component examines the temporal patterns of each entity's metric data (i.e., time series), and estimates its likelihood of being a root cause based on the Extreme Value theory. Combining the topological and individual causal scores, the top K system entities are identified as root causes. Extensive experiments on three real-world datasets validate the effectiveness of the proposed framework.

Supplementary Material

MP4 File (apfp246-2min-promo.mp4)

In this video, Dongjie Wang presents a game-changing approach to system failure analysis?'Interdependent Causal Networks for Root Cause Localization'. Uncover the novel REASON framework that captures both individual and topological properties of interdependent networks. Learn how this approach has outperformed traditional methods in real-world tests, marking a significant advancement in root cause analysis. Join us on this journey into a new frontier of system maintenance.

Download
5.22 MB

References

[1]

Chuadhry Mujeeb Ahmed, Venkata Reddy Palleti, and Aditya P Mathur. 2017. WADI: a water distribution testbed for research in the design of secure cyber physical systems. In Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks. 25--28.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

On Root Cause Localization and Anomaly Mitigation through Causal Inference

Data-driven root cause analysis via causal discovery using time-to-event data

Root Cause Analysis Using Sequence Alignment and Latent Semantic Indexing

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations