Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Improving software diagnosability via log enhancement

Published: 05 March 2011 Publication History

Abstract

Diagnosing software failures in the field is notoriously difficult, in part due to the fundamental complexity of trouble-shooting any complex software system, but further exacerbated by the paucity of information that is typically available in the production setting. Indeed, for reasons of both overhead and privacy, it is common that only the run-time log generated by a system (e.g., syslog) can be shared with the developers. Unfortunately, the ad-hoc nature of such reports are frequently insufficient for detailed failure diagnosis. This paper seeks to improve this situation within the rubric of existing practice. We describe a tool, LogEnhancer that automatically "enhances" existing logging code to aid in future post-failure debugging. We evaluate LogEnhancer on eight large, real-world applications and demonstrate that it can dramatically reduce the set of potential root failure causes that must be considered during diagnosis while imposing negligible overheads.

References

[1]
Cisco system log management.
[2]
EMC seen collecting and managing log as key driver for 94 percent of customers.
[3]
M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In SOSP'03.
[4]
A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools (2nd Edition), Page 528.
[5]
A. Aiken, S. Bugrara, I. Dillig, T. Dillig, P. Hawkins, and B. Hackett. An overview of the Saturn project. In PASTE'07.
[6]
Apple Inc., CrashReport. Technical Report TN2123, 2004.
[7]
A. Ayers, R. Schooler, C. Metcalf, A. Agarwal, J. Rhee, and E. Witchel. Traceback: First fault diagnosis by reconstruction of distributed control flow. In PLDI'05.
[8]
P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using magpie for request extraction and workload modelling. In OSDI'04.
[9]
E. D. Berger and B. G. Zorn. Diehard: probabilistic memory safety for unsafe languages. In PLDI'06.
[10]
S. Bhatia, A. Kumar, M. Fiuczynski, and L. Peterson. Lightweight, high-resolution monitoring for troubleshooting production systems. In OSDI'08.
[11]
C. Cadar, D. Dunbar, and D. R. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI'08.
[12]
M. Castro, M. Costa, and J.-P. Martin. Better bug reporting with better privacy. In ASPLOS'08.
[13]
S. Chen, M. Kozuch, T. Strigkos, B. Falsafi, P. B. Gibbons, T. C. Mowry, V. Ramachandran, O. Ruwase, M. Ryan, and E. Vlachos. Flexible hardware acceleration for instruction-grain program monitoring. In ISCA'08.
[14]
L. Chew and D. Lie. Kivati: fast detection and prevention of atomicity violations. In EuroSys'10.
[15]
T. M. Chilimbi, B. Liblit, K. Mehra, A. V. Nori, and K. Vaswani. HOLMES: Effective statistical debugging via efficient path profiling. In ICSE'09.
[16]
I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox. Capturing, indexing, clustering, and retrieving system history. In SOSP'05.
[17]
M. Costa, M. Castro, L. Zhou, L. Zhang, and M. Peinado. Bouncer: securing software by blocking bad input. In SOSP'07.
[18]
Dell. Streamlined Troubleshooting with the Dell system E-Support tool. Dell Power Solutions, 2008.
[19]
D. L. Detlefs, K. R. M. Leino, K. Rustan, M. Leino, G. Nelson, and J. B. Saxe. Extended static checking. In TR SRC-159, COMPAQ SRC, 1998.
[20]
J. Devietti, B. Lucia, M. Oskin, and L. Ceze. Dmp: Deterministic shared-memory multiprocessing. In ASPLOS'09.
[21]
G. Dunlap, D. Lucchetti, M. Fetterman, and P. Chen. Execution replay of multiprocessor virtual machines. In VEE, 2008.
[22]
The DWARF Debugging Format. http://dwarfstd.org.
[23]
D. Engler and K. Ashcraft. Racerx: effective, static detection of race conditions and deadlocks. In SOSP'03.
[24]
C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, and R. Stata. Extended static checking for java. In PLDI'02.
[25]
Man page for gcore (Linux section 1).
[26]
K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt. Debugging in the (very) large: ten years of implementation and experience. In SOSP'09.
[27]
Google Inc., Breakpad.
[28]
Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An application-level kernel for record and replay. In OSDI'08.
[29]
J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E. Witchel. Improved error reporting for software that uses black-box components. In PLDI'07.
[30]
B. Hackett and A. Aiken. How is aliasing used in systems software? In FSE'06.
[31]
A. Kadav, M. J. Renzelmann, and M. M. Swift. Tolerating hardware device failures in software. In SOSP'09.
[32]
B. W. Kernighan and R. Pike. The Practice of Programming. Addison-Wesley, 1999.
[33]
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In USENIX ATC'05.
[34]
T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4), 1987.
[35]
D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: efficient online multiprocessor replay via speculation and external determinism. In ASPLOS'10.
[36]
Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai. Have things changed now? An empirical study of bug characteristics in modern open source software. In ASID '06: Proceedings of the 1st workshop on Architectural and system support for improving software dependability, October 2006.
[37]
B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI'03.
[38]
B. Lucia, L. Ceze, and K. Strauss. Colorsafe: architectural support for debugging and dynamically avoiding multi-variable atomicity violations. In ISCA'10.
[39]
R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang. PSE: Explaining program failures via postmortem static analysis. SIGSOFT Softw. Eng. Notes, 29(6):63--72, 2004.
[40]
P. Montesinos, L. Ceze, and J. Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In ISCA'08.
[41]
Mozilla Quality Feedback Agent. http://support.mozilla.com/en-US/kb/quality feedback agent.
[42]
M. Naik and A. Aiken. Conditional must not aliasing for static race detection. In POPL'07.
[43]
M. Naik, A. Aiken, and J. Whaley. Effective static race detection for java. In PLDI'06.
[44]
S. Narayanasamy, G. Pokam, and B. Calder. Bugnet: Continuously recording program execution for deterministic replay debugging. In ISCA'05.
[45]
G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. Cil: Intermediate language and tools for analysis and transformation of c programs. In CC'02.
[46]
NetApp Inc., Savecore. ONTAP 7.3 Manual Page Reference, Volume 1, Pages 471--472.
[47]
NetApp. Proactive health management with auto-support. NetApp White Paper, 2007.
[48]
G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: automatically correcting memory errors with high probability. In PLDI'07.
[49]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in software. In ASPLOS'09.
[50]
S. Park, W. Xiong, Z. Yin, R. Kaushik, K. H.Lee, S. Lu, and Y. Zhou. Pres:probabilistic replay with execution sketching on multiprocessors. In SOSP, 2009.
[51]
S. Schmidt. 7 more good tips on logging. http://codemonkeyism.com/7-more-good-tips-on-logging/.
[52]
E. Vlachos, M. L. Goodstein, M. A. Kozuch, S. Chen, B. Falsafi, P. B. Gibbons, and T. C. Mowry. Paralog: enabling and accelerating online parallel monitoring of multithreaded applications. In ASPLOS'10.
[53]
VMware. Using the integrated virtual debugger for visual studio. http://www.vmware.com/pdf/ws65_manual.pdf.
[54]
D. Weeratunge, X. Zhang, and S. Jagannathan. Analyzing multicore dumps to facilitate concurrency bug reproduction. SIGARCH Comput. Archit. News, 38(1):155--166, 2010.
[55]
M. Xu, R. Bodik, and M. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In ISCA'03.
[56]
W. Xu, L. Huang, M. Jordan, D. Patterson, and A. Fox. Mining console logs for large-scale system problem detection. In SOSP'09.
[57]
D. Yuan, H. Mai, W. Xiong, L. Tan, Y. Zhou, and S. Pasupathy. SherLog: Error diagnosis by connecting clues from run-time logs. In ASPLOS'10.
[58]
C. Zamfir and G. Candea. Execution synthesis: a technique for automated software debugging. In EuroSys'10.
[59]
C. Zamfir and G. Candea. Low-overhead bug fingerprinting for fast debugging. In Runtime Verification, volume 6418 of Lecture Notes in Computer Science, pages 460--468. 2010.
[60]
Q. Zhao, R. Rabbah, S. Amarasinghe, L. Rudolph, and W.-F. Wong. How to do a million watchpoints: efficient debugging using dynamic instrumentation. In CC'08.

Cited By

View all
  • (2024)Automatic Configurator to Prevent Attacks for Azure Cloud System2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA61261.2024.10685612(174-181)Online publication date: 30-May-2024
  • (2021)DeepLVProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00131(1461-1472)Online publication date: 22-May-2021
  • (2021)An Evolutionary Study of Configuration Design and Implementation in Cloud SystemsProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00029(188-200)Online publication date: 22-May-2021
  • Show More Cited By

Index Terms

  1. Improving software diagnosability via log enhancement

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 3
    ASPLOS '11
    March 2011
    407 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1961296
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
      March 2011
      432 pages
      ISBN:9781450302661
      DOI:10.1145/1950365
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 March 2011
    Published in SIGPLAN Volume 46, Issue 3

    Check for updates

    Author Tags

    1. log
    2. software diagnosability
    3. static analysis

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Automatic Configurator to Prevent Attacks for Azure Cloud System2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA61261.2024.10685612(174-181)Online publication date: 30-May-2024
    • (2021)DeepLVProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00131(1461-1472)Online publication date: 22-May-2021
    • (2021)An Evolutionary Study of Configuration Design and Implementation in Cloud SystemsProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00029(188-200)Online publication date: 22-May-2021
    • (2024)A systematic mapping study of bug reproduction and localizationInformation and Software Technology10.1016/j.infsof.2023.107338165:COnline publication date: 1-Jan-2024
    • (2024)The impact of concept drift and data leakage on log level prediction modelsEmpirical Software Engineering10.1007/s10664-024-10518-929:5Online publication date: 25-Jul-2024
    • (2023)Log-it: Supporting Programming with Interactive, Contextual, Structured, and Visual LogsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581403(1-16)Online publication date: 19-Apr-2023
    • (2023)An Empirical Study on Log Level Prediction for Multi-Component SystemsIEEE Transactions on Software Engineering10.1109/TSE.2022.315467249:2(473-484)Online publication date: 1-Feb-2023
    • (2023)Studying and Complementing the Use of Identifiers in Logs2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00019(97-107)Online publication date: Mar-2023
    • (2023)LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00151(1763-1775)Online publication date: May-2023
    • (2023)Did We Miss Something Important? Studying and Exploring Variable-Aware Log Abstraction2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00078(830-842)Online publication date: May-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media