Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2420950.2421005acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article

Self-healing multitier architectures using cascading rescue points

Published: 03 December 2012 Publication History

Abstract

Software bugs and vulnerabilities cause serious problems to both home users and the Internet infrastructure, limiting the availability of Internet services, causing loss of data, and reducing system integrity. Software self-healing using rescue points (RPs) is a known mechanism for recovering from unforeseen errors. However, applying it on multitier architectures can be problematic because certain actions, like transmitting data over the network, cannot be undone. We propose cascading rescue points (CRPs) to address the state inconsistency issues that can arise when using traditional RPs to recover from errors in interconnected applications. With CRPs, when an application executing within a RP transmits data, the remote peer is notified to also perform a checkpoint, so the communicating entities checkpoint in a coordinated, but loosely coupled way. Notifications are also sent when RPs successfully complete execution, and when recovery is initiated, so that the appropriate action is performed by remote parties. We developed a tool that implements CRPs by dynamically instrumenting binaries and transparently injecting notifications in the already established TCP channels between applications. We tested our tool with various applications, including the MySQL and Apache servers, and show that it allows them to successfully recover from errors, while incurring moderate overhead between 4.54% and 71.56%.

References

[1]
H. Agrawal, R. A. Demillo, and E. H. Spafford. Debugging with dynamic slicing and backtracking. Software Practice and Experience, 23: 589--616, 1993.
[2]
P. Akritidis, C. Cadar, C. Raiciu, M. Costa, and M. Castro. Preventing memory error exploits with WIT. In Proc. of the Symposium on Security and Privacy, pages 263--277, May 2008.
[3]
A. Arora, R. Krishnan, R. Telang, and Y. Yang. An empirical analysis of software vendors' patch release behavior: Impact of vulnerability disclosure. Information Systems Research, 21(1): 115--132, 2010.
[4]
A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-Gros, A. Kamsky, S. McPeak, and D. Engler. A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM, 53: 66--75, February 2010.
[5]
B. Bhargava and S. Lian. Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach. In Proc. of the 7th Symposium on Reliable Distributed Systems, pages 3--12, October 1998.
[6]
T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. In Proc. of the 15th ACM symposium on Operating systems principles (SOSP), pages 1--11, 1995.
[7]
B. Buck and J. K. Hollingsworth. An api for runtime code patching. Int. J. High Perform. Comput. Appl., 14: 317--329, November 2000.
[8]
C. Cadar, D. Dunbar, and D. Engler. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Proc. of the 8th OSDI, pages 209--224, 2008.
[9]
G. Candea and A. Fox. Crash-only software. In Proc. of the 9th Workshop on Hot Topics in Operating Systems (HotOS IX), May 2003.
[10]
J. Etoh. GCC extension for protecting applications from stack-smashing attacks. http://www.trl.ibm.com/projects/security/ssp/.
[11]
M. Hicks and S. Nettles. Dynamic software updating. ACM Trans. Program. Lang. Syst., 27: 1049--1096, November 2005.
[12]
M. Howard. A look inside the security development lifecycle at microsoft. MSDN Magazine -- http://msdn.microsoft.com/en-us/magazine/cc163705.aspx, November 2005.
[13]
Y. Huang, C. Kintala, N. Kolettis, and N. Fulton. Software rejuvenation: Analysis, module and applications. In Proc. of the 25th International Symposium on Fault-Tolerant Computing (FTCS), page 381, 1995.
[14]
InformationWeek. Windows home server bug could lead to data loss. http://informationweek.com/news/205205974, December 2007.
[15]
V. P. Kemerlis, G. Portokalidis, K. Jee, and A. D. Keromytis. libdft: Practical dynamic data flow tracking for commodity systems. In Proc. of the 8th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), March 2012.
[16]
A. D. Keromytis. Characterizing self-healing software systems. In Proc. of the 4th MMM-ACNS, September 2007.
[17]
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proc. of the USENIX Annual Technical Conference, 2005.
[18]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. of the 2005 PLDI, pages 190--200, June 2005.
[19]
K. Makris and K. D. Ryu. Dynamic and adaptive updates of non-quiescent subsystems in commodity operating system kernels. In Proc. of the 2nd EuroSys, pages 327--340, March 2007.
[20]
S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of Zap: a system for migrating computing environments. In Proc. of the 5th OSDI, pages 361--376, December 2002.
[21]
V. Pappas, M. Polychronakis, and A. D. Keromytis. Smashing the gadgets: Hindering return-oriented programming using in-place code randomization. In Proceedings of the 33rd IEEE Symposium on Security & Privacy (S&P), 2012.
[22]
PaX Project. Address space layout randomization, Mar 2003. http://pageexec.virtualave.net/docs/aslr.txt.
[23]
PCWorld. Amazon EC2 outage shows risks of cloud. http://www.pcworld.com/businesscenter/article/226199/amazon_ec2_outage_shows_risks_of_cloud.html, April 2011.
[24]
J. H. Perkins, S. Kim, S. Larsen, S. Amarasinghe, J. Bachrach, M. Carbin, C. Pacheco, F. Sherwood, S. Sidiroglou, G. Sullivan, W.-F. Wong, Y. Zibin, M. D. Ernst, and M. Rinard. Automatically patching errors in deployed software. In Proc. of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 87--102, 2009.
[25]
J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: transparent checkpointing under unix. In Proceedings of the USENIX 1995 Technical Conference Proceedings, TCON'95, pages 18--18, Berkeley, CA, USA, 1995. USENIX Association.
[26]
P. Porras, H. Saidi, and V. Yegneswaran. Conficker C analysis. Technical report, SRI International, 2009.
[27]
G. Portokalidis and A. D. Keromytis. Fast and practical instruction-set randomization for commodity systems. In Proc. of the 2010 Annual Computer Security Applications Conference (ACSAC), December 2010.
[28]
G. Portokalidis and A. D. Keromytis. REASSURE: A self-contained mechanism for healing software using rescue points. In Proc. of the 6th International Workshop in Security (IWSEC), pages 16--32, November 2011.
[29]
M. Rinard, C. Cadar, D. Dumitran, D. Roy, T. Leu, and J. W Beebee. Enhancing server availability and security through failure-oblivious computing. In Proc. of the 6th OSDI, December 2004.
[30]
S. Sidiroglou, O. Laadan, C. Perez, N. Viennot, J. Nieh, and A. D. Keromytis. ASSURE: automatic software self-healing using rescue points. In Proc. of the 14th ASPLOS, pages 37--48, 2009.
[31]
S. Sidiroglou, M. E. Locasto, S. W. Boyd, and A. D. Keromytis. Building a reactive immune system for software services. In Proc. of the 2005 USENIX ATC, April 2005.
[32]
A. P. Sistla and J. L. Welch. Efficient distributed recovery using message logging. In Proc. of the 8th annual ACM Symposium on Principles of distributed computing (PODC), pages 223--238, 1989.
[33]
W. R. Stevens, B. Fenner, and A. M. Rudoff. Chapter 24. Out-of-Band Data. In UNIX Network Programming Volume 1, Third Edition: The Sockets Networking API. Addison Wesley, 2003.
[34]
M. Sullivan and R. Chillarege. Software defects and their impact on system availability - A study of field failures in operating systems. In Digest of Papers., 21st International Symposium on Fault Tolerant Computing (FTCS-21), pages 2--9, 1991.
[35]
M. Susskraut and C. Fetzer. Automatically finding and patching bad error handling. In Proc. of the Sixth European Dependable Computing Conference, pages 13--22, 2006.
[36]
K. Venkatesh, T. Radhakrishnan, and H. Li. Optimal checkpointing and local recording for domino-free rollback recovery. Inf. Process. Lett., 25: 295--304, July 1987.
[37]
C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. How long will it take to fix this bug? In Proc. of the 4th International Workshop on Mining Software Repositories (MSR), 2007.

Cited By

View all
  • (2019)Fast in-memory CRIU for docker containersProceedings of the International Symposium on Memory Systems10.1145/3357526.3357542(53-65)Online publication date: 30-Sep-2019
  • (2016)Peeking into the Past: Efficient Checkpoint-Assisted Time-Traveling Debugging2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE.2016.9(455-466)Online publication date: Oct-2016
  • (2015)Speculative Memory CheckpointingProceedings of the 16th Annual Middleware Conference10.1145/2814576.2814802(197-209)Online publication date: 24-Nov-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACSAC '12: Proceedings of the 28th Annual Computer Security Applications Conference
December 2012
464 pages
ISBN:9781450313124
DOI:10.1145/2420950
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • ACSA: Applied Computing Security Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. error recovery
  2. multitier applications
  3. reliable software
  4. software self-healing

Qualifiers

  • Research-article

Conference

ACSAC '12
Sponsor:
  • ACSA
ACSAC '12: Annual Computer Security Applications Conference
December 3 - 7, 2012
Florida, Orlando, USA

Acceptance Rates

ACSAC '12 Paper Acceptance Rate 44 of 231 submissions, 19%;
Overall Acceptance Rate 104 of 497 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Fast in-memory CRIU for docker containersProceedings of the International Symposium on Memory Systems10.1145/3357526.3357542(53-65)Online publication date: 30-Sep-2019
  • (2016)Peeking into the Past: Efficient Checkpoint-Assisted Time-Traveling Debugging2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE.2016.9(455-466)Online publication date: Oct-2016
  • (2015)Speculative Memory CheckpointingProceedings of the 16th Annual Middleware Conference10.1145/2814576.2814802(197-209)Online publication date: 24-Nov-2015
  • (2015)Lightweight Memory CheckpointingProceedings of the 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks10.1109/DSN.2015.45(474-484)Online publication date: 22-Jun-2015
  • (2013)Chronicler: lightweight recording to reproduce field failuresProceedings of the 2013 International Conference on Software Engineering10.5555/2486788.2486836(362-371)Online publication date: 18-May-2013
  • (2013)Chronicler: Lightweight recording to reproduce field failures2013 35th International Conference on Software Engineering (ICSE)10.1109/ICSE.2013.6606582(362-371)Online publication date: May-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media