Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Transparent fault-tolerant network services using off-the-shelf components
Publisher:
  • University of California at Los Angeles
  • Computer Science Department 405 Hilgard Avenue Los Angeles, CA
  • United States
ISBN:978-0-542-56791-9
Order Number:AAI3208405
Pages:
170
Reflects downloads up to 12 Nov 2024Bibliometrics
Skip Abstract Section
Abstract

The growth of the Internet has led to the development of critical network services where erroneous processing or outages are unacceptable. The availability and reliability of services such as online banking, stock trading, reservation processing, and online shopping, have become increasingly important as their popularity grows. Downtime and failures lead to unsatisfied customers and translate directly into lost revenue for the service providers.

Fault-tolerance techniques use redundant components and/or redundant processing to ensure continued correct operation despite component failures. Most existing fault-tolerance solutions for network services do not provide fault-tolerance for active connections at failure time, expect servers to be deterministic, or require changes to the clients. These limitations are unacceptable for many current and future network service applications. We propose a methodology for providing fault-tolerance without the limitations mentioned above. Our solution, based on a standby backup approach, is transparent to the clients and requires minimal changes to the server OS and application.

We have used our methodology to add fault-tolerance features to two popular types of network services---web service and video conferencing. Off-the-shelf hardware and software components were used as the basis for both implementations. Modifications to the OS network stack using Linux kernel modules provide fault-tolerance at the connection level. At the application level, modifications to the web server and multi-conferencing unit, respectively, provide application-level synchronization and allow handling of non-deterministic server behavior. The associated issues, challenges, and tradeoffs of our methodology are presented in this work. The evaluation of our prototype implementations shows that client-transparent fault-tolerance can be achieved with relatively low overheads.

Contributors
  • University of California, Los Angeles
  • Electronic Arts Inc

Recommendations