Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SherLog: error diagnosis by connecting clues from run-time logs

Published: 13 March 2010 Publication History

Abstract

Computer systems often fail due to many factors such as software bugs or administrator errors. Diagnosing such production run failures is an important but challenging task since it is difficult to reproduce them in house due to various reasons: (1) unavailability of users' inputs and file content due to privacy concerns; (2) difficulty in building the exact same execution environment; and (3) non-determinism of concurrent executions on multi-processors.
Therefore, programmers often have to diagnose a production run failure based on logs collected back from customers and the corresponding source code. Such diagnosis requires expert knowledge and is also too time-consuming, tedious to narrow down root causes. To address this problem, we propose a tool, called SherLog, that analyzes source code by leveraging information provided by run-time logs to infer what must or may have happened during the failed production run. It requires neither re-execution of the program nor knowledge on the log's semantics. It infers both control and data value information regarding to the failed execution.
We evaluate SherLog with 8 representative real world software failures (6 software bugs and 2 configuration errors) from 7 applications including 3 servers. Information inferred by SherLog are very useful for programmers to diagnose these evaluated failures. Our results also show that SherLog can analyze large server applications such as Apache with thousands of logging messages within only 40 minutes.

References

[1]
H. Agrawal, R. A. DeMillo, and E. H. Spafford. Debugging with dynamic slicing and backtracking. Software -- Practice and Experience, 23(6):589--616, June 1993.
[2]
H. Agrawal, J. R. Horgan, S. London, and W. E.Wong. Fault localization using execution slices and dataflow tests. In ISSRE'95.
[3]
M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In SOSP'03.
[4]
A. Aiken, S. Bugrara, I. Dillig, T. Dillig, P. Hawkins, and B. Hackett. The Saturn Program Analysis System.
[5]
K. Ashcraft and D. Engler. Using programmer-written compiler extensions to catch security holes. In SP '02: Proceedings of the 2002 IEEE Symposium on Security and Privacy.
[6]
A. Ayers, R. Schooler, C. Metcalf, A. Agarwal, J. Rhee, and E. Witchel. Traceback: First fault diagnosis by reconstruction of distributed control flow. In PLDI'05.
[7]
T. Ball, M. Naik, and S. K. Rajamani. From symptom to cause: localizing errors in counterexample traces. ACM SIGPLAN Notices, 38(1):97--105, Jan. 2003.
[8]
P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using magpie for request extraction and workload modelling. In OSDI'04.
[9]
E. Bodden, P. Lam, and L. Hendren. Finding programming errors earlier by evaluating runtime monitors ahead-of-time. In FSE'08.
[10]
C. Cadar, D. Dunbar, and D. R. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI'08.
[11]
F. Chen and G. Rosú. Parametric trace slicing and monitoring. In TACAS'09.
[12]
T. M. Chilimbi, B. Liblit, K. Mehra, A. V. Nori, and K. Vaswani. HOLMES: Effective statistical debugging via efficient path profiling. In ICSE'09.
[13]
V. Chipounov, V. Georgescu, C. Zamfir, and G. Candea. Selective Symbolic Execution. In HotDep'09.
[14]
I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox. Capturing, indexing, clustering, and retrieving system history. In SOSP'05.
[15]
Dell. Streamlined Troubleshooting with the Dell system E--Support tool. Dell Power Solutions, 2008.
[16]
R. A. DeMillo, H. Pan, and E. H. Spafford. Critical slicing for software fault localization. In ISSTA, pages 121--134, 1996.
[17]
J. Devietti, B. Lucia, M. Oskin, and L. Ceze. Dmp: Deterministic shared-memory multiprocessing. In ASPLOS'09.
[18]
I. Dillig, T. Dillig, and A. Aiken. Sound, complete and scalable pathsensitive analysis. SIGPLAN Not., 2008.
[19]
G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08.
[20]
D. Engler, B. Chelf, and A. Chou. Checking system rules using system--specific, programmer--written compiler extensions. In OSDI'00.
[21]
K. Fisher, D. Walker, K. Q. Zhu, and P. White. From dirt to shovels: Fully automatic tool generation from ad hoc data. In POPL'08.
[22]
K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt. Debugging in the (very)large: ten years of implementation and experience. In SOSP'09, pages 103--116, New York, NY, USA, 2009. ACM.
[23]
J. Gray. Why do computers stop and what can be done about it?, 1985.
[24]
Z. Guo, X.Wang, J. Tang, X. Liu, Z. Xu, M.Wu,M. F. Kaashoek, and Z. Zhang. R2: An application-level kernel for record and replay. In OSDI'08.
[25]
R. Gupta, M. L. Soffa, and J. Howard. Hybrid slicing: integrating dynamic information with static analysis. ACMTransactions on Software Engineering and Methodology, 6(4):370--397, Oct. 1997.
[26]
S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In PLDI '88.
[27]
W. Jiang. Understanding storage system problems and diagnosing them through log analysis. Ph.D. Dissertation.
[28]
W. Jiang, C. Hu, S. Pasupathy, A. Kanevsky, Z. Li, and Y. Zhou. Understanding customer problem troubleshooting from storage system logs. In FAST'09.
[29]
S. Kandula, R. Mahajan, P. Verkaik, S. Agrawal, J. Padhye, and P. Bahl. Degailed diagnosis in enterprise networks. In SIGCOMM'09.
[30]
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In USENIX ATC'05.
[31]
B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI'03.
[32]
Apache Logging Services -- Log4j. http://logging.apache.org/log4j.
[33]
R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang. PSE: Explaining program failures via postmortem static analysis. SIGSOFT Softw. Eng. Notes, 29(6):63--72, 2004.
[34]
Mozilla Quality Feedback Agent. http://support.mozilla.com/en-US/kb/quality+feedback+agent.
[35]
S. Narayanasamy, C. Pereira, and B. Calder. Recording shared memory dependencies using strata. In ASPLOS'06.
[36]
S. Narayanasamy, G. Pokam, and B. Calder. Bugnet: Continuously recording program execution for deterministic replay debugging. In ISCA'05.
[37]
NetApp. Proactive health management with auto-support. NetApp White Paper, 2007.
[38]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient determistic multithreading in software. In ASPLOS'09.
[39]
Squid Archives. http://www.squid-cache.org/Versions/v2/2.3/bugs/#squid-2.3.stable4-ftp_icon_not_found.
[40]
M. Sridharan, S. J. Fink, and R. Bodik. Thin slicing. In PLDI'07.
[41]
F. Tip. A survey of program slicing techniques. Journal of Programming Languages, 3:121--189, 1995.
[42]
J. Tucek, S. Lu, C. Huang, S. Xanthos, and Y. Zhou. Triage: Diagnosing production run failures at the user's site. In SOSP'07.
[43]
VMWare. Using the intergrated virtual debugger for visual studio. http://www.vmware.com/pdf/ws65_manual.pdf.
[44]
A. Whitaker, R. S. Cox, and S. D. Gribble. Configuration debugging as search: finding the needle in the haystack. In OSDI'04.
[45]
Windows Error Reporting(Dr.Watson). http://www.microsoft.com/whdc/maintain/StartWER.mspx.
[46]
M. Xu, R. Bodik, and M. D. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In ISCA'03.
[47]
W. Xu, L. Huang,M. Jordan, D. Patterson, and A. Fox. Mining console logs for large-scale system problem detection. In SOSP'09.
[48]
J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In OSDI'04.
[49]
Y.Xie and A.Aiken. Saturn: A scalable framework for error detection using boolean satisfiability. Transactions on Programming Language and Systems, 29(3):1---16, 2007.
[50]
A. Zeller. Isolating cause-effect chains from computer programs. In FSE'02.

Cited By

View all
  • (2024)A Review of Software Testing Process Log Parsing and Mining2024 IEEE International Conference on Software Services Engineering (SSE)10.1109/SSE62657.2024.00055(334-343)Online publication date: 7-Jul-2024
  • (2024)Log statements generation via deep learning: Widening the support provided to developersJournal of Systems and Software10.1016/j.jss.2023.111947210(111947)Online publication date: Apr-2024
  • (2024)A blockchain integration to support failures prediction from log files in multi-agent systems technologyExpert Systems with Applications10.1016/j.eswa.2023.122122240(122122)Online publication date: Apr-2024
  • Show More Cited By

Index Terms

  1. SherLog: error diagnosis by connecting clues from run-time logs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 45, Issue 3
    ASPLOS '10
    March 2010
    399 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1735971
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
      March 2010
      422 pages
      ISBN:9781605588391
      DOI:10.1145/1736020
      • General Chair:
      • James C. Hoe,
      • Program Chair:
      • Vikram S. Adve
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 March 2010
    Published in SIGPLAN Volume 45, Issue 3

    Check for updates

    Author Tags

    1. failure diagnostics
    2. log
    3. static analysis

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)114
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Review of Software Testing Process Log Parsing and Mining2024 IEEE International Conference on Software Services Engineering (SSE)10.1109/SSE62657.2024.00055(334-343)Online publication date: 7-Jul-2024
    • (2024)Log statements generation via deep learning: Widening the support provided to developersJournal of Systems and Software10.1016/j.jss.2023.111947210(111947)Online publication date: Apr-2024
    • (2024)A blockchain integration to support failures prediction from log files in multi-agent systems technologyExpert Systems with Applications10.1016/j.eswa.2023.122122240(122122)Online publication date: Apr-2024
    • (2023)HCLPars: Α New Hierarchical Clustering Log Parsing MethodEngineering, Technology & Applied Science Research10.48084/etasr.601313:4(11130-11138)Online publication date: 9-Aug-2023
    • (2023)SLocator: Localizing the Origin of SQL Queries in Database-Backed Web ApplicationsIEEE Transactions on Software Engineering10.1109/TSE.2023.325370049:6(3376-3390)Online publication date: 1-Jun-2023
    • (2023)On the effectiveness of log representation for log-based anomaly detectionEmpirical Software Engineering10.1007/s10664-023-10364-128:6Online publication date: 9-Oct-2023
    • (2022)Transparent DIFC: Harnessing Innate Application Event Logging for Fine-Grained Decentralized Information Flow Control2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP53844.2022.00037(487-501)Online publication date: Jun-2022
    • (2022)An empirical study of the impact of log parsers on the performance of log-based anomaly detectionEmpirical Software Engineering10.1007/s10664-022-10214-628:1Online publication date: 8-Nov-2022
    • (2022)Log message anomaly detection with fuzzy C-means and MLPApplied Intelligence10.1007/s10489-022-03300-152:15(17708-17717)Online publication date: 4-Apr-2022
    • (2022)Improved Software Reliability Through Failure Diagnosis Based on Clues from Test and Production LogsNew Advances in Dependability of Networks and Systems10.1007/978-3-031-06746-4_5(42-49)Online publication date: 27-May-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media