Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2884781.2884844acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

RETracer: triaging crashes by reverse execution from partial memory dumps

Published: 14 May 2016 Publication History

Abstract

Many software providers operate crash reporting services to automatically collect crashes from millions of customers and file bug reports. Precisely triaging crashes is necessary and important for software providers because the millions of crashes that may be reported every day are critical in identifying high impact bugs. However, the triaging accuracy of existing systems is limited, as they rely only on the syntactic information of the stack trace at the moment of a crash without analyzing program semantics.
In this paper, we present RETracer, the first system to triage software crashes based on program semantics reconstructed from memory dumps. RETracer was designed to meet the requirements of large-scale crash reporting services. RETracer performs binary-level backward taint analysis without a recorded execution trace to understand how functions on the stack contribute to the crash. The main challenge is that the machine state at an earlier time cannot be recovered completely from a memory dump, since most instructions are information destroying.
We have implemented RETracer for x86 and x86-64 native code, and compared it with the existing crash triaging tool used by Microsoft. We found that RETracer eliminates two thirds of triage errors based on a manual analysis of 140 bugs fixed in Microsoft Windows and Office. RETracer has been deployed as the main crash triaging system on Microsoft's crash reporting service.

References

[1]
411-spyware.com. Win64cert.dll. http://www.411-spyware.com/file-win64cert-dll.
[2]
Adobe Systems Inc. Adobe crash reporter. https://helpx.adobe.com/creative-suite/kb/changing-settings-crash-reporter.html.
[3]
H. Agrawal, J. Horgan, S. London, and W. Wong. Fault localization using execution slices and dataflow tests. In Proceedings of the 16th International Symposium on Software Reliability Engineering (ISSRE), pages 143--151, 1995.
[4]
J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In Proceedings of the 28th International Conference on Software Engineering (ICSE), pages 361--370, 2006.
[5]
Apple Inc. Crash logs in xcode 6.3 beta 2. https://developer.apple.com/news/?id=02232015d.
[6]
Apple Inc. Technical note TN2123: CrashReporter. https://developer.apple.com/library/mac/technotes/tn2004/tn2123.html.
[7]
S. Artzi, S. Kim, and M. D. Ernst. ReCrash: Making software failures reproducible by preserving object states. In Proceedings of the 22nd European Conference on Object-Oriented Programming (ECOOP), pages 542--565, 2008.
[8]
S. Bhansali, W.-K. Chen, S. de Jong, A. Edwards, R. Murray, M. Drinic, D. Mihocka, and J. Chau. Framework for instruction-level tracing and analysis of program executions. In Proceedings of the Second International Conference on Virtual Execution Environments (VEE), pages 154--163, 2006.
[9]
BSDaemon. Dynamic program analysis and software exploitation: From the crash to the exploit code. http://phrack.org/issues/67/10.html.
[10]
Y. Cao, H. Zhang, and S. Ding. SymCrash: Selective recording for reproducing crashes. In Proceedings of the 29th International Conference on Automated Software Engineering (ASE), pages 791--802, 2014.
[11]
Y. Chen, A. Groce, C. Zhang, W.-K. Wong, X. Fern, E. Eide, and J. Regehr. Taming compiler fuzzers. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 197--208, 2013.
[12]
H. Cleve and A. Zeller. Locating causes of program failures. In Proceedings of the 27th International Conference on Software Engineering (ICSE), pages 342--351, 2005.
[13]
Y. Dang, R. Wu, H. Zhang, D. Zhang, and P. Nobel. ReBucket: A method for clustering duplicate crash reports based on call stack similarity. In Proceedings of the 34th International Conference on Software Engineering (ICSE), 2012.
[14]
GDB. Reverse debugging. http://www.gnu.org/software/gdb/news/reversible.html.
[15]
K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt. Debugging in the (very) large: ten years of implementation and experience. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP), pages 103--116, 2009.
[16]
Google Inc. Chrome: Send usage statistics and crash reports. https://support.google.com/chrome/answer/96817?hl=en.
[17]
S. Hangal and M. Lam. Tracking down software bugs using automatic anomaly detection. In Proceedings of the 24th International Conference on Software Engineering (ICSE), pages 291--301, 2002.
[18]
Hex-Rays. IDA. https://www.hex-rays.com/products/ida/index.shtml.
[19]
N. Jalbert and W. Weimer. Automated duplicate detection for bug tracking systems. In Proceedings of the 38th IEEE International Conference on Dependable Systems and Networks (DSN), pages 52--61, 2008.
[20]
W. Jin and A. Orso. BugRedux: Reproducing field failures for in-house debugging. In Proceedings of the 34th International Conference on Software Engineering (ICSE), 2012.
[21]
S. Kim, T. Zimmermann, and N. Nagappan. Crash Graphs: An aggreated view of multiple crashes to improve crash triage. In Proceedings of the 41st International Conference on Dependable Systems and Networks (DSN), 2011.
[22]
R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang. PSE: Explaining program failures via postmortem static analysis. In Proceedings of the 12th ACM SIGSOFT 12th International Symposium on Foundations of Software Engineering (FSE), pages 63--72, 2004.
[23]
Microsoft. The !analyze extension. https://msdn.microsoft.com/en-us/library/windows/hardware/ff562112(v=vs.85).aspx.
[24]
Microsoft. Debugging tools for Windows. https://msdn.microsoft.com/en-us/library/windows/hardware/ff551063(v=vs.85).aspx.
[25]
Microsoft. Visual Studio. https://www.visualstudio.com/.
[26]
N. Modani, R. Gupta, G. Lohman, T. Syeda-Mahmood, and L. Mignet. Automatically identifying known software problems. In Proceedings of the 23rd International Conference on Data Engineering (ICDE), pages 433--441, 2007.
[27]
D. Molnar, X. C. Li, and D. A. Wagner. Dynamic test generation to find integer bugs in x86 binary linux programs. In Proceedings of the 18th USENIX Security Symposium, pages 67--82, 2009.
[28]
Mozilla. Mozilla crash reporter. https://support.mozilla.org/en-US/kb/mozillacrashreporter?redirectlocale=en-US&redirectslug=Mozilla+Crash+Reporter.
[29]
E. B. Nightingale, J. R. Douceur, and V. Orgovan. Cycles, cells and platters: An empirical analysis of hardware failures on a million consumer PCs. In Proceedings of the Sixth European Conference on Computer Systems (EuroSys), pages 343--356, 2011.
[30]
A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In Proceedings of the 25th International Conference on Software Engineering (ICSE), pages 465--475, 2003.
[31]
M. Renieres and S. Reiss. Fault localization with nearest neighbor queries. In Proceedings of the 18th IEEE International Conference on Automated Software Engineering (ASE), pages 30--39, 2003.
[32]
T. Reps, T. Ball, M. Das, and J. Larus. The use of program profiling for software maintenance with applications to the year 2000 problem. In Proceedings of the 6th European Software Engineering Conference, pages 432--449, 1997.
[33]
A. Richards. Writing a Debugging Tools for Windows extension. In MSDN Magazine, March 2011.
[34]
J. Robler, A. Zeller, G. Fraser, C. Zamfir, and G. Candea. Reconstructing core dumps. In Proceedings of the Sixth IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 114--123, 2013.
[35]
S. K. Sahoo, J. Criswell, C. Geigle, and V. Adve. Using likely invariants for automated software fault localization. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 139--152, 2013.
[36]
A. Schroter, N. Bettenburg, and R. Premraj. Do stack trace help developer fix bugs? In Proceedings of the Seventh IEEE Working Conference on Mining Software Repositories (MSR), 2010.
[37]
M. Sridharan, S. J. Fink, and R. Bodik. Thin slicing. In Proceedings of the 28th ACM Conference on Programming Language Design and Implementation (PLDI), 2007.
[38]
R. E. Strom and S. Yemini. Typestate: A programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering, 12(1):157--171, Jan. 1986.
[39]
J. Tucek, S. Lu, C. Huang, S. Xanthos, and Y. Zhou. Triage: Diagnosing production run failures at the user's site. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP), pages 131--144, 2007.
[40]
Ubuntu. Apport crash duplicates. https://wiki.ubuntu.com/ApportCrashDuplicates.
[41]
Ubuntu. Crash reporting. https://launchpad.net/ubuntu/+spec/crash-reporting.
[42]
M. Weiser. Programmers use slices when debugging. Communications of the ACM, 25(7):446--452, July 1982.
[43]
Wikipedia. Call stack. http://en.wikipedia.org/wiki/Call_stack.
[44]
Wikipedia. Calling convention. http://en.wikipedia.org/wiki/Calling_convention.
[45]
Wikipedia. Core dump. http://en.wikipedia.org/wiki/Core_dump.
[46]
Wikipeida. Tail Call. http://en.wikipedia.org/wiki/Tail_call.
[47]
Windows Dev Center. Slim reader/writer locks. https://msdn.microsoft.com/en-us/library/windows/desktop/aa904937(v=vs.85).aspx.
[48]
R. Wu, H. Zhang, S.-C. Cheung, and S. Kim. CrashLocator: Locating crashing faults based on crash stacks. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA), 2014.
[49]
C. Zamfir. Execution Synthesis: A Technique for Automating the Debugging of Software. PhD thesis, EPFL, 2013.
[50]
C. Zamfir and G. Candea. Execution synthesis: A technique for automated software debugging. In Proceedings of the Fifth European Conference on Computer Systems (EuroSys), pages 321--334, 2010.
[51]
C. Zamfir, B. Kasikci, J. Kinder, E. Bugnion, and G. Candea. Automated debugging for arbitrarily long executions. In Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems (HotOS), pages 20--20, 2013.
[52]
A. Zeller. Isolating cause-effect chains from computer programs. In Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE), pages 1--10, 2002.

Cited By

View all
  • (2024)Who Should We Blame for Android App Crashes? An In-Depth Study at Scale and Practical ResolutionsACM Transactions on Sensor Networks10.1145/364989520:3(1-24)Online publication date: 13-Apr-2024
  • (2024)A Survey of Software Dynamic Analysis MethodsProgramming and Computing Software10.1134/S036176882401007950:1(90-114)Online publication date: 1-Feb-2024
  • (2024)Benzene: A Practical Root Cause Analysis System with an Under-Constrained State Mutation2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00074(1865-1883)Online publication date: 19-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '16: Proceedings of the 38th International Conference on Software Engineering
May 2016
1235 pages
ISBN:9781450339001
DOI:10.1145/2884781
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. backward taint analysis
  2. reverse execution
  3. triaging

Qualifiers

  • Research-article

Conference

ICSE '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Who Should We Blame for Android App Crashes? An In-Depth Study at Scale and Practical ResolutionsACM Transactions on Sensor Networks10.1145/364989520:3(1-24)Online publication date: 13-Apr-2024
  • (2024)A Survey of Software Dynamic Analysis MethodsProgramming and Computing Software10.1134/S036176882401007950:1(90-114)Online publication date: 1-Feb-2024
  • (2024)Benzene: A Practical Root Cause Analysis System with an Under-Constrained State Mutation2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00074(1865-1883)Online publication date: 19-May-2024
  • (2024)Enhanced Fast and Reliable Statistical Vulnerability Root Cause Analysis with Sanitizer2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00014(47-58)Online publication date: 27-May-2024
  • (2023)A Survey on Bug Deduplication and Triage Methods from Multiple Points of ViewApplied Sciences10.3390/app1315878813:15(8788)Online publication date: 29-Jul-2023
  • (2023)Alligator in Vest: A Practical Failure-Diagnosis Framework via Arm Hardware FeaturesProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598106(917-928)Online publication date: 12-Jul-2023
  • (2023)Virtual Device Farms for Mobile App Testing at Scale: A Pursuit for Fidelity, Efficiency, and AccessibilityProceedings of the 29th Annual International Conference on Mobile Computing and Networking10.1145/3570361.3613259(1-17)Online publication date: 2-Oct-2023
  • (2023)Diagnosing Kernel Concurrency Failures with AITIAProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567486(94-110)Online publication date: 8-May-2023
  • (2023)Capturing Invalid Input Manipulations for Memory Corruption DiagnosisIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.314502220:2(917-930)Online publication date: 1-Mar-2023
  • (2022)FuzzerAid: Grouping Fuzzed Crashes Based On Fault SignaturesProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556959(1-12)Online publication date: 10-Oct-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media