research-article

ASSURE: automatic software self-healing using rescue points

Authors:

Angelos D. KeromytisAuthors Info & Claims

ACM SIGPLAN Notices, Volume 44, Issue 3

Pages 37 - 48

https://doi.org/10.1145/1508284.1508250

Published: 07 March 2009 Publication History

Get Access

Abstract

Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover software from unknown faults while maintaining both system integrity and availability, by mimicking system behavior under known error conditions. Rescue points are locations in existing application code for handling a given set of programmer-anticipated failures, which are automatically repurposed and tested for safely enabling fault recovery from a larger class of (unanticipated) faults. When a fault occurs at an arbitrary location in the program, ASSURE restores execution to an appropriate rescue point and induces the program to recover execution by virtualizing the program's existing error-handling facilities. Rescue points are identified using fuzzing, implemented using a fast coordinated checkpoint-restart mechanism that handles multi-process and multi-threaded applications, and, after testing, are injected into production code using binary patching. We have implemented an ASSURE Linux prototype that operates without application source code and without base operating system kernel changes. Our experimental results on a set of real-world server applications and bugs show that ASSURE enabled recovery for all of the bugs tested with fast recovery times, has modest performance overhead, and provides automatic self-healing orders of magnitude faster than current human-driven patch deployment methods.

References

[1]

M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. Control-flow Integrity. In Proceedings of the ACM conference on Computer and Communications Security (CCS), pages 340--353, November 2005.

Digital Library

Google Scholar

[2]

J. Boyd. Patterns of Conflict. Unpublished briefing, http://www.d-n-i.net/boyd/pdf/poc.pdf, 1986.

Google Scholar

[3]

T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. ACM Trans. Comput. Syst., 14(1):80--107, 1996.

Digital Library

Google Scholar

[4]

D. Brumley, H. Wang, S. Jha, and D. Song. Creating vulnerability signatures using weakest pre-conditions. In Proceedings of the 2007 Computer Security Foundations Symposium, Venice, Italy, July 2007.

Digital Library

Google Scholar

[5]

B. Buck and J. K. Hollingsworth. An API for runtime code patching. The International Journal of High Performance Computing Applications, 14(4):317--329, Winter 2000.

Digital Library

Google Scholar

[6]

G. Candea and A. Fox. Crash-only software. In Proceedings of the 9th Workshop on Hot Topics in Operating Systems, May 2003.

Digital Library

Google Scholar

[7]

S. Chandra. An evaluation of the recovery-related properties of Software Faults. PhD thesis, University of Michigan, 2000.

Digital Library

Google Scholar

[8]

M. Costa, J. Crowcroft, M. Castro, and A. Rowstron. Vigilante: End-to-End Containment of Internet Worms. In Proceedings of the ACM Symposium on Systems and Operating Systems Principles (SOSP), December 2005.

Digital Library

Google Scholar

[9]

B. Demsky and M. C. Rinard. Automatic detection and repair of errors in data structures. In Proceedings of the ACM Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), October 2003.

Digital Library

Google Scholar

[10]

J. Etoh. GCC extension for protecting applications from stack-smashing attacks. http://www.trl.ibm.com/projects/security/ssp/.

Google Scholar

[11]

S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proceedings of the USENIX Technical Conference, 2005.

Digital Library

Google Scholar

[12]

V. Kiriansky, D. Bruening, and S. Amarasinghe. Secure execution via program shepherding. In Proceedings of the USENIX Security Symposium, August 2002.

Digital Library

Google Scholar

[13]

N. Kolettis and N. D. Fulton. Software rejuvenation: analysis, module and applications. In FTCS '95: Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing, page 381, Washington, DC, USA, 1995. IEEE Computer Society.

Digital Library

Google Scholar

[14]

O. Laadan and J. Nieh. Transparent checkpoint-restart of multiple processes on commodity operating systems. In Proceedings of the USENIX Technical Conference, 2007.

Digital Library

Google Scholar

[15]

B. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of unix utilities. Communications of the ACM, 33(12), December 1990.

Digital Library

Google Scholar

[16]

J. Newsome, D. Brumley, and D. Song. Vulnerability-specific execution filtering for exploit prevention on commodity software. In Proceedings of the Symposium on Network and Distributed System Security (SNDSS), February 2006.

Google Scholar

[17]

National Vulnerability Database. http://nvd.nist.gov/statistics.cfm, April 2006.

Google Scholar

[18]

S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of Zap: A system for migrating computing environments. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 361--376, December 2002.

Digital Library

Google Scholar

[19]

PaX Project. Address space layout randomization, Mar 2003. http://pageexec.virtualave.net/docs/aslr.txt.

Google Scholar

[20]

A. D. Roelker. Snort 2.0: Protocol flow analyzer.

Google Scholar

[21]

S. Sidiroglou, Y. Giovanidis, and A. Keromytis. A dynamic mechanism for recovery from buffer overflow attacks. In Proceedings of the Information Security Conference (ISC), September 2005.

Digital Library

Google Scholar

[22]

S. Sidiroglou, M. E. Locasto, S. W. Boyd, and A. D. Keromytis. Building a reactive immune system for software services. In Proceedings of the USENIX Technical Conference, April 2005.

Digital Library

Google Scholar

[23]

Y. Song, M. E. Locasto, A. Stavrou, A. D. Keromytis, and S. J. Stolfo. On the infeasibility of modeling polymorphic shellcode. In Proceedings of the 14th ACM conference on Computer and communications security (CCS), 2007.

Digital Library

Google Scholar

[24]

M. Sullivan and R. Chillarege. Software defects and their impact on system availability -- a study of field failures in operating systems. 21st Int. Symp. on Fault-Tolerant Computing (FTCS--21), pages 2--9, 1991.

Crossref

Google Scholar

[25]

J. Tucek, J. Newsome, S. Lu, C. Huang, S. Xanthos, D. Brumley, Y. Zhou, and D. Song. Sweeper: a lightweight end-to-end system for defending against fast worms. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EUROSYS), 2007.

Digital Library

Google Scholar

[26]

H. J. Wang, C. Guo, D. R. Simon, and A. Zugenmaier. Shield: vulnerability-driven network filters for Preventing Known Vulnerability Exploits. In Proceedings of the ACM SIGCOMM Conference, August 2004.

Digital Library

Google Scholar

[27]

V. Paxson. Bro: a system for detecting network intruders in real-time. Computer Networks (Amsterdam, Netherlands: 1999), 31(23-24):2435--2463, 1999.

Digital Library

Google Scholar

[28]

F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: treating bugs as allergies -- a safe method to survive software failures. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), October 2005.

Digital Library

Google Scholar

[29]

E. Rescorla. Security holes. Who cares? In Proceedings of the 12th USENIX Security Symposium, Washington, D.C., 2003.

Digital Library

Google Scholar

[30]

M. Rinard. Acceptability-oriented Computing. In Proceedings of ACM Conference on Object Oriented Programming, Systems, Languages, and Applications, October 2003.

Digital Library

Google Scholar

[31]

M. Rinard, C. Cadar, D. Dumitran, D. Roy, T. Leu, and J. W Beebee. Enhancing server availability and security through Failure-Oblivious Computing. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI), December 2004.

Digital Library

Google Scholar

Cited By

View all

Li GLiu HChen XGunawi HLu SMcKinley KFisher K(2019)DFix: automatically fixing timing bugs in distributed systemsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314620(994-1009)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314620
Gazzola LMicucci DMariani L(2019)Automatic Software RepairIEEE Transactions on Software Engineering10.1109/TSE.2017.275501345:1(34-67)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/TSE.2017.2755013
Draghici ASteen M(2018)A Survey of Techniques for Automatically Sensing the Behavior of a CrowdACM Computing Surveys10.1145/312934351:1(1-40)Online publication date: 19-Feb-2018
https://dl.acm.org/doi/10.1145/3129343
Show More Cited By

Index Terms

ASSURE: automatic software self-healing using rescue points

Recommendations

ASSURE: automatic software self-healing using rescue points
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover software from unknown faults while maintaining both system integrity and ...
ASSURE: automatic software self-healing using rescue points
ASPLOS 2009

Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover software from unknown faults while maintaining both system integrity and ...
Self-healing multitier architectures using cascading rescue points
ACSAC '12: Proceedings of the 28th Annual Computer Security Applications Conference

Software bugs and vulnerabilities cause serious problems to both home users and the Internet infrastructure, limiting the availability of Internet services, causing loss of data, and reducing system integrity. Software self-healing using rescue points (...

Reviews

Reviewer: Rafael Corchuelo

Software crashes due to programming bugs constitute a major problem for systems that need to be available 24 hours a day, seven days a week. Many authors are researching techniques that allow computer applications to detect their own failures and recover from them automatically. "Recovering" means that the application rolls back to a safe state and returns a reasonable error code to the client; recovering in such a way improves system availability, while an appropriate patch is created by a programmer. Sidiroglou et al. have devised a tool called ASSURE that allows server applications that run on Linux 2.6 systems to recover from their failures. ASSURE is innovative insofar as it can deal with applications that are available in binary form only, run on multiple threads and processes, handle polymorphic or encrypted input, or have deterministic bugs (not necessarily memory leaks); furthermore, it does not require any modifications to the underlying operating system. The authors have tested ASSURE on a number of actual bugs in well-known server applications, such as Apache, Squid, and MySQL. They prove that the tool is very efficient. ASSURE builds on so-called rescue points, which are functions that return integer error codes or null pointers when a known error is detected. Rescue points and error codes are identified automatically by running the application in a testing environment where it is fed invalid inputs. When a failure is detected for the first time, ASSURE analyzes the problem in a sandbox. First, it determines what function failed. Then, it identifies the closest rescue point to which the application can be rolled back to keep working well. A piece of code is then inserted at this rescue point that returns an error code, thus preventing the application from continuing and failing. Finally, the application is restarted. The paper is not at all difficult to read, although it does not provide enough details for other scientists to repeat the work. The authors' writing style is didactic and they get to the point very straightforwardly. They also make it very clear what their original contributions are. I recommend this paper and ASSURE for system administrators who need to keep their servers highly available. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

ACM SIGPLAN Notices Volume 44, Issue 3

ASPLOS 2009

March 2009

346 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1508284

Issue’s Table of Contents

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
March 2009
358 pages
ISBN:9781605584065
DOI:10.1145/1508244
General Chair:
Mary Lou Soffa
University of Virginia, USA
,
Program Chair:
Mary Jane Irwin
Penn State University, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2009

Published in SIGPLAN Volume 44, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

177
Total Citations
View Citations
1,270
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)6

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Li GLiu HChen XGunawi HLu SMcKinley KFisher K(2019)DFix: automatically fixing timing bugs in distributed systemsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314620(994-1009)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314620
Gazzola LMicucci DMariani L(2019)Automatic Software RepairIEEE Transactions on Software Engineering10.1109/TSE.2017.275501345:1(34-67)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/TSE.2017.2755013
Draghici ASteen M(2018)A Survey of Techniques for Automatically Sensing the Behavior of a CrowdACM Computing Surveys10.1145/312934351:1(1-40)Online publication date: 19-Feb-2018
https://dl.acm.org/doi/10.1145/3129343
Monperrus M(2018)Automatic Software RepairACM Computing Surveys10.1145/310590651:1(1-24)Online publication date: 23-Jan-2018
https://dl.acm.org/doi/10.1145/3105906
Pandita AUpadhyay PJoshi N(2018)Fault Tolerance Based Comparative Analysis of Scheduling Algorithms in Cloud Computing2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET)10.1109/ICCSDET.2018.8821216(1-6)Online publication date: Dec-2018
https://doi.org/10.1109/ICCSDET.2018.8821216
Viotti PDobre DVukolić M(2017)HybrisACM Transactions on Storage10.1145/311989613:3(1-32)Online publication date: 28-Sep-2017
https://dl.acm.org/doi/10.1145/3119896
Liu QFeng DJiang HHu YJiao T(2017)Systematic Erasure Codes with Optimal Repair Bandwidth and StorageACM Transactions on Storage10.1145/310947913:3(1-27)Online publication date: 28-Sep-2017
https://dl.acm.org/doi/10.1145/3109479
Kwon YWang WZheng YZhang XXu DBultan TSen K(2017)CPR: cross platform binary code reuse via platform independent trace programProceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3092703.3092707(158-169)Online publication date: 10-Jul-2017
https://dl.acm.org/doi/10.1145/3092703.3092707
David YPartush NYahav E(2016)Statistical similarity of binariesACM SIGPLAN Notices10.1145/2980983.290812651:6(266-280)Online publication date: 2-Jun-2016
https://dl.acm.org/doi/10.1145/2980983.2908126
Heule SSchkufza ESharma RAiken A(2016)Stratified synthesis: automatically learning the x86-64 instruction setACM SIGPLAN Notices10.1145/2980983.290812151:6(237-250)Online publication date: 2-Jun-2016
https://dl.acm.org/doi/10.1145/2980983.2908121
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations