Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1217935.1217972acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
Article

Automated known problem diagnosis with event traces

Published: 18 April 2006 Publication History

Abstract

Computer problem diagnosis remains a serious challenge to users and support professionals. Traditional troubleshooting methods relying heavily on human intervention make the process inefficient and the results inaccurate even for solved problems, which contribute significantly to user's dissatisfaction. We propose to use system behavior information such as system event traces to build correlations with solved problems, instead of using only vague text descriptions as in existing practices. The goal is to enable automatic identification of the root cause of a problem if it is a known one, which would further lead to its resolution. By applying statistical learning techniques to classifying system call sequences, we show our approach can achieve considerable accuracy of root cause recognition by studying four case examples.

References

[1]
M. K. Aguilera, J. C. Mogul, J. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In the 19th ACM Symposium on Operating Systems Principles, October 2003
[2]
F. Apap, A. Honig, S. Hershkop, E. Eskin, and S. J. Stolfo. Detecting Malicious Software by Monitoring Anomalous Windows Registry Accesses. In Proceedings of the Fifth International Symposium on Recent Advances in Intrusion Detection (RAID-2002). Zurich, Switzerland, October 2002
[3]
AutoMate. http://www.unisyn.com/automate/
[4]
T. Ball, M. Naik, and S. Rajamani. From Symptom to Cause: Localizing Errors in Counterexample Traces. POPL 2003
[5]
G. Banga. Auto-Diagnosis of Field Problems in an Appliance Operating System. USENIX Annual Technical Conference, San Diego, California, USA, June 2000
[6]
P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using Magpie for Request Extraction and Workload Modelling. 6th Symposium on Operating System Design and Implementation (OSDI), December, 2004
[7]
P. Barham, R. Isaacs, R. Mortier, and D. Narayanan. Magpie: Online Modelling and Performance-aware Systems. 9th Workshop on Hot Topics in Operating Systems, 2003
[8]
W. B. Cavnar and J. M. Trenkle. N-gram Based Text Categorization, Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, April 1994
[9]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. September 2002. Available at http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf
[10]
M. Chen, A. Accardi, E. Kiciman, A. Fox, D. Patterson, and E. Brewer. Path-based Failure and Evolution Management. USENIX/ACM Symposium on Networked Systems Design and Implementation, San Francisco, CA, March 2004
[11]
M. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer. Pinpoint: Problem Determination in Large, Dynamic Systems. International Conference on Dependable Systems and Networks, IPDS track, Washington, DC, June 2002
[12]
I. Cohen, J. Chase, M. Goldszmidt, T. Kelly, and J. Symons. Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control. 6th Symposium on Operating Systems Design and Implementation (OSDI '04), December 2004
[13]
I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox. Capturing, Indexing, Clustering, and Retrieving System History. In the 20th ACM Symposium on Operating Systems Principles, October 2005
[14]
T. G. Dietterich and G. Bakiri. Error-correcting output codes: a general method for improving multiclass inductive learning programs, in the proceedings of AAAI-91, pages 572--577. AAAI press/MIT press, 1991
[15]
D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, January 1997
[16]
S. Hofmeyr, S. Forrest, and A. Somayaji. Intrusion Detection Using Sequences of System Calls. Journal of Computer Security, Vol. 6, pp. 151--180, 1998
[17]
J. O. Kephart and D. M. Chess. The Vision of Autonomic Computing. IEEE Computer, January 2003
[18]
W. Lee and S. Stolfo. Data Mining Approaches for Intrusion Detection. In Proceedings of the Seventh USENIX Security Symposium, San Antonio, TX, January 1998
[19]
J. R. Lorch and A. J. Smith. The VTrace Tool: Building a System Tracer for Windows NT and Windows 2000. MSDN Magazine, October 2000
[20]
J. A. Redstone, M. M. Swift, and B. N. Bershad. Using Computers to Diagnose Computer Problems. 9th Workshop on Hot Topics in Operating Systems (HotOS IX), Lihue, Hawaii, May 2003
[21]
M. Renieris and S. Reiss. Fault Localization with Nearest Neighbor Queries. ASE 2003
[22]
M. Russinovich and B. Cogswell. Windows NT System Call Hooking. Dr. Dobb's Journal, January 1997
[23]
D. A. Solomon and M. E. Russinovich. Inside Microsoft Windows 2000, 3rd Edition. Microsoft Press, September 2000
[24]
M. Steinder and A. S. Sethi. A Survey of Fault Localization Techniques in Computer Networks. Science of Computer Programming, Special Edition on Topics in System Administration Vol. 53, 2 (Nov. 2004), pp. 165--194
[25]
Strace for NT. http://www.bindview.com/Services/RAZOR/Utilities/Windows/strace_readme.cfm
[26]
C. Y. Suen, N-Gram Statistics for Natural Language Understanding and Text Processing., IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 1, No. 2, April 1979
[27]
V. Vapnik. Principles of risk minimization for learning theory. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pp. 831--838. Morgan Kaufmann, 1992
[28]
Y.-M. Wang, C. Verbowski, J. Dunagan, Y. Chen, H. J. Wang, C. Yuan, and Z. Zhang. STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support. Proc. Usenix LISA, pp. 159--172, Oct. 2003
[29]
C. Warrender, S. Forrest, and B. Pearlmutter. Detecting intrusions using system calls: Alternative data models. IEEE Symposium on Security and Privacy, 1999
[30]
A. Whitaker, R. S. Cox, and S. D. Gribble. Configuration Debugging as Search: Finding the Needle in the Haystack. 6th Symposium on Operating System Design and Implementation (OSDI), December, 2004
[31]
Windows XP System Restore. http://msdn.microsoft.com/library/default.asp?url=/library/enus/dnwxp/html/windowsxpsystemrestore.asp
[32]
WPP Software Tracing. http://www.microsoft.com/whdc/devtools/tools/EventTracing. mspx
[33]
A. Yemini and S. Kliger. High Speed and Robust Event Correlation. IEEE Communication Magazine 34, 5 (May 1996), 82--90

Cited By

View all
  • (2022)Praxi: Cloud Software Discovery That Learns From PracticeIEEE Transactions on Cloud Computing10.1109/TCC.2020.297543910:2(872-884)Online publication date: 1-Apr-2022
  • (2021)Static detection of silent misconfigurations with deep interaction analysisProceedings of the ACM on Programming Languages10.1145/34855175:OOPSLA(1-30)Online publication date: 15-Oct-2021
  • (2021)A Comprehensive Study of Bugs in Software Defined Networks2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN48987.2021.00026(101-115)Online publication date: Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '06: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
April 2006
420 pages
ISBN:1595933220
DOI:10.1145/1217935
  • cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 40, Issue 4
    Proceedings of the 2006 EuroSys conference
    October 2006
    383 pages
    ISSN:0163-5980
    DOI:10.1145/1218063
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. root cause analysis
  2. support vector machine
  3. system call sequences

Qualifiers

  • Article

Conference

EUROSYS06
Sponsor:
EUROSYS06: Eurosys 2006 Conference
April 18 - 21, 2006
Leuven, Belgium

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Praxi: Cloud Software Discovery That Learns From PracticeIEEE Transactions on Cloud Computing10.1109/TCC.2020.297543910:2(872-884)Online publication date: 1-Apr-2022
  • (2021)Static detection of silent misconfigurations with deep interaction analysisProceedings of the ACM on Programming Languages10.1145/34855175:OOPSLA(1-30)Online publication date: 15-Oct-2021
  • (2021)A Comprehensive Study of Bugs in Software Defined Networks2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN48987.2021.00026(101-115)Online publication date: Jun-2021
  • (2020)PracExtractorProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489164(265-280)Online publication date: 15-Jul-2020
  • (2020)Testing configuration changes in context to prevent production failuresProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488808(735-751)Online publication date: 4-Nov-2020
  • (2020)A study of event frequency profiling with differential privacyProceedings of the 29th International Conference on Compiler Construction10.1145/3377555.3377887(51-62)Online publication date: 22-Feb-2020
  • (2020)Log-Based Anomaly Detection with the Improved K-Nearest NeighborInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402050011430:02(239-262)Online publication date: 23-Mar-2020
  • (2019)Priolog: Mining Important Logs via Temporal Analysis and PrioritizationSustainability10.3390/su1122630611:22(6306)Online publication date: 9-Nov-2019
  • (2019)Towards Continuous Access Control Validation and ForensicsProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security10.1145/3319535.3363191(113-129)Online publication date: 6-Nov-2019
  • (2019)An RBM Anomaly Detector for the Cloud2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)10.1109/ICST.2019.00024(148-159)Online publication date: Apr-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media