Adaptive system anomaly prediction for large-scale hosting infrastructures

Y Tan, X Gu, H Wang - Proceedings of the 29th ACM SIGACT-SIGOPS …, 2010 - dl.acm.org
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of …, 2010dl.acm.org
Large-scale hosting infrastructures require automatic system anomaly management to
achieve continuous system operation. In this paper, we present a novel adaptive runtime
anomaly prediction system, called ALERT, to achieve robust hosting infrastructures. In
contrast to traditional anomaly detection schemes, ALERT aims at raising advance anomaly
alerts to achieve just-in-time anomaly prevention. We propose a novel context-aware
anomaly prediction scheme to improve prediction accuracy in dynamic hosting …
Large-scale hosting infrastructures require automatic system anomaly management to achieve continuous system operation. In this paper, we present a novel adaptive runtime anomaly prediction system, called ALERT, to achieve robust hosting infrastructures. In contrast to traditional anomaly detection schemes, ALERT aims at raising advance anomaly alerts to achieve just-in-time anomaly prevention. We propose a novel context-aware anomaly prediction scheme to improve prediction accuracy in dynamic hosting infrastructures. We have implemented the ALERT system and deployed it on several production hosting infrastructures such as IBM System S stream processing cluster and PlanetLab. Our experiments show that ALERT can achieve high prediction accuracy for a range of system anomalies and impose low overhead to the hosting infrastructure.
ACM Digital Library