Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1519065.1519083acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

First-aid: surviving and preventing memory management bugs during production runs

Published: 01 April 2009 Publication History

Abstract

Memory bugs in C/C++ programs severely affect system availability and security. This paper presents First-Aid, a lightweight runtime system that survives software failures caused by common memory management bugs and prevents future failures by the same bugs during production runs. Upon a failure, First-Aid diagnoses the bug type and identifies the memory objects that trigger the bug. To do so, it rolls back the programto previous checkpoints and uses two types of environmental changes that can prevent or expose memory bug manifestation during re-execution. Based on the diagnosis, First-Aid generates and applies runtime patches to avoid the memory bug and prevent its reoccurrence. Furthermore, First-Aid validates the consistent effects of the runtime patches and generates on-site diagnostic reports to assist developers in fixing the bugs.
We have implemented First-Aid on Linux and evaluated it with seven applications that contain various types of memory bugs, including buffer overflow, uninitialized read, dangling pointer read/write, and double free. The results show that First-Aid can quickly diagnose the tested bugs and recover applications from failures (in 0.084 to 3.978 seconds). The results also show that the runtime patches generated by First-Aid can prevent future failures caused by the diagnosed bugs. Additionally, First-Aid provides detailed diagnostic information on both the root cause and the manifestation of the bugs. Furthermore, First-Aid incurs low overhead (0.4-11.6% with an average of 3.7%) during normal execution for the tested buggy applications, SPEC INT2000, and four allocation intensive programs.

References

[1]
H. Agrawal, R. A. DeMillo, and E. H. Spafford. An execution-backtracking approach to debugging. IEEE Software., 8(3):21--26, 1991.
[2]
W. A. Arbaugh, W. L. Fithen, and J. McHugh. Windows of vulnerability: A case study analysis. Computer, 33 (12):52--59, 2000.
[3]
E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: a scalable memory allocator for multithreaded applications. In Proceedings of Intl. Conf. on Architectural support for programming languages and operating systems (ASPLOS'00), pages 117--128, 2000.
[4]
E. D. Berger and B. G. Zorn. Diehard: probabilistic memory safety for unsafe languages. In Proceedings of ACM SIGPLAN conference on Programming language design and implementation (PLDI'06), pages 158--168, 2006.
[5]
A. Bobbio and M. Sereno. Fine grained software rejuvenation models. In Intl. Computer Performance and Dependability Symposium (ICPDS '98), pages 4--12, 1998.
[6]
M. D. Bond and K. S. McKinley. Tolerating memory leaks. In Proceedings of Intl. Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA '08), pages 109--126, Oct. 2008.
[7]
D. Brumley, H. Wang, S. Jha, and D. Song. Creating vulnerability signatures using weakest pre-conditions. In Proceedings of Computer Security Foundations Symposium (CSF'07), pages 311--325, Venice, Italy, 2007.
[8]
G. Candea, J. Cutler, A. Fox, R. Doshi, P. Garg, and R. Gowda. Reducing recovery time in a small recursively restartable system. In Proceedings of Intl. Conf. on Dependable Systems and Networks (DSN'02), pages 605--614, 2002.
[9]
G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot -- A technique for cheap recovery. In Proceedings of Symposium on Operating System Design and Implementation (OSDI'04), pages 31--44, 2004.
[10]
M. Costa, M. Castro, L. Zhou, L. Zhang, and M. Peinado. Bouncer: Securing software by blocking bad input. In Proceedings of ACM SIGOPS symposium on Operating systems principles (SOSP'07), pages 117--130, 2007.
[11]
B. Cully, G. Lefebvre, D. T. Meyer, A. Karollil, M. J. Feeley, N. C. Hutchinson, and A. Warfield. Remus: High availability via asynchronous virtual machine replication. In Proceedings of Symposium on Networked Systems Design and Implementation (NSDI'08), pages 161--174, Apr 2008.
[12]
S. Garg, A. Puliafito, M. Telek, and K. S. Trivedi. On the analysis of software rejuvenation policies. In Proceedings of the Annual Conference on Computer Assurance (CA'97), pages 88---96, 1997.
[13]
GNU. Gdb: The gnu project debugger.
[14]
J. Gray. Why do computers stop and what can be done about it? In Proceedings of Symposium on Reliable Distributed Systems (RDS' 86), pages 3--12, 1986.
[15]
W. Gu, Z. Kalbarczyk, R. K. Iyer, and Z. Yang. Characterization of Linux kernel behavior under errors. In Proceedings of Intl. Conf. on Dependable Systems and Networks (DSN'03), pages 459--468, Jun 2003.
[16]
R. Hasting and B. Joyce. Purify: Fast detection of memory leaks and access errors. In Proceedings of the USENIX Winter 1992 Technical Conference, pages 125--136, Dec 1992.
[17]
Y. Huang, C. Kintala, N. Kolettis, and N. D. Fulton. Software rejuvenation: Analysis, module and applications. In Proceedings of International Symposium on Fault-Tolerant Computing (FTC'95), pages 381--390, Jun 1995.
[18]
H. Jula, D. Tralamazza, C. Zamfir, and G. Candea. Deadlock immunity: Enabling systems to defend against deadlocks. In Proceedings of Symposium on Operating System Design and Implementation (OSDI'08), pages 295--308, Dec 2008.
[19]
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proceedings of USENIX Annual Technical Conference (USENIX'05), pages 1--15, 2005.
[20]
D. Lea. A Memory Allocator, 1996.
[21]
D. E. Lowell and P. M. Chen. Discount checking: Transparent, low-overhead recovery for general applications. Technical report, CSE-TR-410-99, University of Michigan, 1998.
[22]
C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of Programming language design and implementation (PLDI'05), pages 190--200, 2005.
[23]
V. B. Lvin, G. Novark, E. D. Berger, and B. G. Zorn. Archipelago: trading address space for reliability and security. In Proceedings of Intl. Conf. on Architectural support for programming languages and operating systems (ASPLOS'08), pages 115--124, 2008.
[24]
G. Misherghi and Z. Su. Hdd: hierarchical delta debugging. In Proceedings of the Intl. Conf. on Software engineering (ICSE'06), pages 142--151, 2006.
[25]
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of ACM conference on Programming language design and implementation (PLDI'07), pages 89--100, 2007.
[26]
G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: automatically correcting memory errors with high probability. In Proceedings of ACM SIGPLAN conference on Programming language design and implementation (PLDI'07), pages 1--11, 2007.
[27]
S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of Zap: a system for migrating computing environments. In Symposium on Operating systems design and implementation (OSDI'02), pages 361--376, 2002.
[28]
J. S. Plank,M. Beck, G. Kingsley, and K. Li. Libckpt: Transparent checkpointing under Unix. In Usenix Winter Technical Conference, pages 213--224, January 1995.
[29]
F. Qin, S. Lu, and Y. Zhou. Safemem: Exploiting ECCmemory for detecting memory leaks and memory corruption during production runs. In Proceedings of Intl. Symposium on High-Performance Computer Architecture (HPCA'05), pages 291--302, Feb 2005.
[30]
F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: Treating bugs as allergies -- a safe method to survive software failure. In Proceedings of ACMSymposium on Operating System Principles (SOSP'05), pages 235--248, Oct 2005.
[31]
M. Rinard, C. Cadar, D. Dumitran, D. M. Roy, T. Leu, and W. S. Beebee. Enhancing server availability and security through failure-oblivious computing. In Proceedings of Symposium on Operating System Design and Implementation (OSDI '04), pages 21--21, Dec 2004.
[32]
S. Sidiroglou, M. E. Locasto, S. W. Boyd, and A. D. Keromytis. Building a reactive immune system for software services. In Proceedings of USENIX Annual Technical Conference (USENIX'05), pages 149--161, 2005.
[33]
SPEC. http://www.spec.org/cpu2000.
[34]
S. Srinivasan, C. Andrews, S. Kandula, and Y. Zhou. Flashback: A light-weight extension for rollback and deterministic replay for software debugging. In Proceedings of the USENIX 2004 Annual Technical Conference (USENIX'04), pages 29--44, Jun 2004.
[35]
M. Sullivan and R. Chillarege. Software defects and their impact on system availability -- A study of field failures in operating systems. In Proceedings of the Annual Intl. Symposium on Fault-Tolerant Computing (FTC'91), pages 2--9, Jun 1991.
[36]
Symantec. Internet security threat report. http://www.symantec.com/enterprise/threatreport/index.jsp, Sept 2006.
[37]
Y. Tang, Q. Gao, and F. Qin. Leaksurvivor: Towards safely tolerating memory leaks for garbage-collected languages. In Proceedings of USENIX Annual Technical Conference (USENIX'08), pages 307--320, Jun. 2008.
[38]
J. Tucek, S. Lu, C. Huang, S. Xanthos, and Y. Zhou. Triage: diagnosing production run failures at the user's site. In Proceedings of Symposium on Operating systems principles (SOSP'07), pages 131--144, 2007.
[39]
J. Tucek, J. Newsome, S. Lu, C. Huang, S. Xanthos, D. Brumley, Y. Zhou, and D. Song. Sweeper: A lightweight end-to-end system for defending against fast worms. In Proceedings of 2007 EuroSys Conference, pages 115--128, 2007.
[40]
US-CERT. US-CERT vulnerability notes database. http://www.kb.cert.org/vuls.
[41]
M.Weiser. Programmers use slices when debugging. ACM Commun., 25(7):446--452, 1982.
[42]
A. Zeller. Isolating cause-effect chains from computer programs. In Proceedings of ACM symposium on Foundations of software engineering (FSE'02), pages 1--10, 2002.
[43]
X. Zhang, R. Gupta, and Y. Zhang. Precise dynamic slicing algorithms. In Proceedings of International Conference on Software Engineering (ICSE'03), pages 319--329, 2003.
[44]
P. Zhou, W. Liu, L. Fei, S. Lu, F. Qin, Y. Zhou, S. Midkiff, and J. Torrellas. AccMon: Automatically detecting memory-related bugs via program counter-based invariants. In Proceedings of IEEE/ACM International Symposium on Microarchitecture (MICRO'04), pages 269--280, 2004.

Cited By

View all
  • (2022)Checkpointing and deterministic training for deep learningProceedings of the 1st International Conference on AI Engineering: Software Engineering for AI10.1145/3522664.3528605(65-76)Online publication date: 16-May-2022
  • (2022)Runtime Recovery for Integer Overflows2022 6th International Conference on System Reliability and Safety (ICSRS)10.1109/ICSRS56243.2022.10067783(324-330)Online publication date: 23-Nov-2022
  • (2022)Forced continuation of malware execution beyond exceptionsJournal of Computer Virology and Hacking Techniques10.1007/s11416-022-00457-819:4(483-501)Online publication date: 15-Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '09: Proceedings of the 4th ACM European conference on Computer systems
April 2009
342 pages
ISBN:9781605584829
DOI:10.1145/1519065
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. error prevention
  2. memory bug diagnosis
  3. software failure
  4. software reliability

Qualifiers

  • Research-article

Conference

EuroSys '09
Sponsor:
EuroSys '09: Fourth EuroSys Conference 2009
April 1 - 3, 2009
Nuremberg, Germany

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Checkpointing and deterministic training for deep learningProceedings of the 1st International Conference on AI Engineering: Software Engineering for AI10.1145/3522664.3528605(65-76)Online publication date: 16-May-2022
  • (2022)Runtime Recovery for Integer Overflows2022 6th International Conference on System Reliability and Safety (ICSRS)10.1109/ICSRS56243.2022.10067783(324-330)Online publication date: 23-Nov-2022
  • (2022)Forced continuation of malware execution beyond exceptionsJournal of Computer Virology and Hacking Techniques10.1007/s11416-022-00457-819:4(483-501)Online publication date: 15-Dec-2022
  • (2019)Fast in-memory CRIU for docker containersProceedings of the International Symposium on Memory Systems10.1145/3357526.3357542(53-65)Online publication date: 30-Sep-2019
  • (2016)A Framework for Practical Dynamic Software UpdatingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.243085427:4(941-950)Online publication date: 1-Apr-2016
  • (2016)Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Response2016 IEEE Symposium on Security and Privacy (SP)10.1109/SP.2016.43(618-635)Online publication date: May-2016
  • (2016)Peeking into the Past: Efficient Checkpoint-Assisted Time-Traveling Debugging2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE.2016.9(455-466)Online publication date: Oct-2016
  • (2016)Auditing buffer overflow vulnerabilities using hybrid static–dynamic analysisIET Software10.1049/iet-sen.2014.018510:2(54-61)Online publication date: 1-Apr-2016
  • (2016)A runtime fault survival method for deployed software during production runsJournal of Software: Evolution and Process10.1002/smr.176728:2(97-119)Online publication date: 1-Feb-2016
  • (2015)Process recovery by rollback and input modificationInternational Journal of Communication Networks and Distributed Systems10.1504/IJCNDS.2015.07028815:1(61-83)Online publication date: 1-Jul-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media