Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1855886.1855891guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Predicting computer system failures using support vector machines

Published: 07 December 2008 Publication History

Abstract

Mitigating the impact of computer failure is possible if accurate failure predictions are provided. Resources, applications, and services can be scheduled around predicted failure and limit the impact. Such strategies are especially important for multi-computer systems, such as compute clusters, that experience a higher rate failure due to the large number of components. However providing accurate predictions with sufficient lead time remains a challenging problem.
This paper describes a new spectrum-kernel Support Vector Machine (SVM) approach to predict failure events based on system log files. These files containmessages that represent a change of system state. While a single message in the file may not be sufficient for predicting failure, a sequence or pattern of messages may be. The approach described in this paper will use a sliding window (sub-sequence) of messages to predict the likelihood of failure. The a frequency representation of the message sub-sequences observed are then used as input to the SVM. The SVM then associates the messages to a class of failed or non-failed system. Experimental results using actual system log files from a Linux-based compute cluster indicate the proposed spectrum-kernel SVM approach has promise and can predict hard disk failure with an accuracy of 73% two days in advance.

References

[1]
ADIGA, N., AND ET AL. An overview of the bluegene/l supercomputer. Supercomputing, ACM/IEEE 2002 Conference (Nov. 2002), 60-60.
[2]
CHUANHUAN YIN, S. T., AND MU, S. Using gapinsensitive string kernel to detect masquerading. In Proceedings of the First International Conference on Advanced Data Mining and Applications (2005).
[3]
FAWCETT, T. An introduction to roc analysis. Pattern Recognition Letters 7 (2006).
[4]
FU, S., AND XU, C.-Z. Exploring event correlation for failure prediction in coalitions of clusters. In Proceedings of the IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2007 (Reno, NV, USA, Nov. 15-21, 2007), pp. 1-12.
[5]
GARFINKEL, S. Practical UNIX and Internet Security. O'Reilly, 2003.
[6]
LESLIE, C., ESKIN, E., AND NOBLE, W. S. The spectrum kernel: A string kernel for svm protein classification. In Proceedings of the Pacific Symposium on Biocomputing (2002), pp. 566-576.
[7]
LI, Y., GUJRATI, P., LAN, Z., AND HE SUN, X. Fault-driven re-scheduling for improving systemlevel fault resilience. In Proceedings of the IEEE International Conference on Parallel Processing (2007).
[8]
LIANG, Y., ZHANG, Y., XIONG, H., AND SAHOO, R. Failure prediction in ibm bluegene/l event logs. In Proceedings of the IEEE International Conference on Data Mining (2007).
[9]
PINHEIRO, E., WEBER, W.-D., AND BARROSO, L. A. Failure trends in a large disk drive population. In Proceedings of the USENIX Conference on File and Storage Technologies (2007), pp. 17-29.
[10]
SCHROEDER, B., AND GIBSON, G. A. Understanding failures in petascale computers. Journal of Physics 78 (2007).
[11]
STEARLEY, J., AND OLINER, A. J. Bad words: Finding faults in Spirit's syslogs. In Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid) 2008: Workshop on Resiliency in High Performance Computing (Resilience) 2008 (Lyon, France, May 19-22, 2008).
[12]
TANTAWI, A. N., AND RUSCHITZKA, M. Performance analysis of checkpointing strategies. ACM Trans. Comput. Syst. 2, 2 (1984), 123-144.
[13]
WILLIAM H. TURKETT, J., KARODE, A. V., AND FULP, E. W. In-the-dark network traffic classification using support vector machines. In Proceedings of the AAAI Conference on Artificial Intelligence (2008).
[14]
XUE, Z., DONG, X., MA, S., AND DONG, W. A survey on failure prediction of large-scale server clusters. In Proceedings of the International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (2007), pp. 733-738.
[15]
YAMANISHI, K., AND MARUYAMA, Y. Dynamic syslog mining for network failure monitoring. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (2005), pp. 499-508.

Cited By

View all
  • (2023)SSDLog: a semi-supervised dual branch model for log anomaly detectionWorld Wide Web10.1007/s11280-023-01174-y26:5(3137-3153)Online publication date: 13-Jun-2023
  • (2022)ClairvoyantProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532374(1-14)Online publication date: 28-Jun-2022
  • (2020)Failure Prediction by Utilizing Log AnalysisProceedings of the International Conference on Research in Adaptive and Convergent Systems10.1145/3400286.3418263(188-195)Online publication date: 13-Oct-2020
  • Show More Cited By
  1. Predicting computer system failures using support vector machines

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    WASL'08: Proceedings of the First USENIX conference on Analysis of system logs
    December 2008
    8 pages

    Publisher

    USENIX Association

    United States

    Publication History

    Published: 07 December 2008

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)SSDLog: a semi-supervised dual branch model for log anomaly detectionWorld Wide Web10.1007/s11280-023-01174-y26:5(3137-3153)Online publication date: 13-Jun-2023
    • (2022)ClairvoyantProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532374(1-14)Online publication date: 28-Jun-2022
    • (2020)Failure Prediction by Utilizing Log AnalysisProceedings of the International Conference on Research in Adaptive and Convergent Systems10.1145/3400286.3418263(188-195)Online publication date: 13-Oct-2020
    • (2020)TRACKProceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools10.1145/3388831.3388860(188-191)Online publication date: 18-May-2020
    • (2020)Explainable Deep Learning for Fault Prognostics in Complex Systems: A Particle Accelerator Use-CaseMachine Learning and Knowledge Extraction10.1007/978-3-030-57321-8_8(139-158)Online publication date: 25-Aug-2020
    • (2019)Grouping based on Comparison of data and Grouping Consecutive Data for Logging VisualizationProceedings of the 2nd International Conference on Computer Science and Software Engineering10.1145/3339363.3339371(40-44)Online publication date: 24-May-2019
    • (2019)Failure prediction using machine learning in a virtualised HPC system and applicationCluster Computing10.1007/s10586-019-02917-122:2(471-485)Online publication date: 1-Jun-2019
    • (2018)PreFixACM SIGMETRICS Performance Evaluation Review10.1145/3292040.321964346:1(64-66)Online publication date: 12-Jun-2018
    • (2018)PreFixAbstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems10.1145/3219617.3219643(64-66)Online publication date: 12-Jun-2018
    • (2018)DeshProceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3208040.3208051(40-51)Online publication date: 11-Jun-2018
    • Show More Cited By

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media