Transparent fault-tolerant network services using off-the-shelf components

January 2005

Author:
Navid Aghdaie
University of California, Los Angeles
,
Adviser:
Yuval Tamir
University of California, Los Angeles

Publisher:

University of California at Los Angeles
Computer Science Department 405 Hilgard Avenue Los Angeles, CA
United States

ISBN:978-0-542-56791-9

Order Number:AAI3208405

Pages:

170

Purchase on ProQuest

Bibliometrics

Abstract

The growth of the Internet has led to the development of critical network services where erroneous processing or outages are unacceptable. The availability and reliability of services such as online banking, stock trading, reservation processing, and online shopping, have become increasingly important as their popularity grows. Downtime and failures lead to unsatisfied customers and translate directly into lost revenue for the service providers.

Fault-tolerance techniques use redundant components and/or redundant processing to ensure continued correct operation despite component failures. Most existing fault-tolerance solutions for network services do not provide fault-tolerance for active connections at failure time, expect servers to be deterministic, or require changes to the clients. These limitations are unacceptable for many current and future network service applications. We propose a methodology for providing fault-tolerance without the limitations mentioned above. Our solution, based on a standby backup approach, is transparent to the clients and requires minimal changes to the server OS and application.

We have used our methodology to add fault-tolerance features to two popular types of network services---web service and video conferencing. Off-the-shelf hardware and software components were used as the basis for both implementations. Modifications to the OS network stack using Linux kernel modules provide fault-tolerance at the connection level. At the application level, modifications to the web server and multi-conferencing unit, respectively, provide application-level synchronization and allow handling of non-deterministic server behavior. The associated issues, challenges, and tradeoffs of our methodology are presented in this work. The evaluation of our prototype implementations shows that client-transparent fault-tolerance can be achieved with relatively low overheads.

Cited By

Hasircioglu B, Pignolet Y and Sivanthi T Transparent Fault Tolerance for Real-Time Automation Systems Proceedings of the 1st International Workshop on Internet of People, Assistive Robots and Things, (7-12)

Contributors

Yuval Tamir
University of California, Los Angeles
- Publication Years1983 - 2021
- Publication counts27
- Citation count584
- Available for Download9
- Downloads (cumulative)6,088
- Downloads (12 months)943
- Downloads (6 weeks)158
- Average Downloads per Article676
- Average Citation per Article22
View Full Profile
Navid Aghdaie
Electronic Arts Inc
- Publication Years2005 - 2019
- Publication counts7
- Citation count84
- Available for Download5
- Downloads (cumulative)3,605
- Downloads (12 months)483
- Downloads (6 weeks)74
- Average Downloads per Article721
- Average Citation per Article12
View Full Profile

Index Terms

Transparent fault-tolerant network services using off-the-shelf components
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
    2. Extra-functional properties
      1. Software fault tolerance

Comments

Recommendations

Engineering fault-tolerant tcp/ip services
Middleware-Based Failure Detection and Recovery Services for Fault-Tolerant E-services
DESE '09: Proceedings of the 2009 Second International Conference on Developments in eSystems Engineering

The runtime detection of failure and recovery from failure is a major challenge facing e-business and e-commerce applications. Different types of failure are well understood through the failure model, but the detection and differentiation between these ...
Fault Tolerant Video on Demand Services
ICDCS '99: Proceedings of the 19th IEEE International Conference on Distributed Computing Systems

This paper describes a highly available distributed video on demand (VoD) service which is inherently fault tolerant. The VoD service is provided by multiple servers that reside at different sites. New servers may be brought up ``on the fly'' to ...

Browse Theses

Sections

Cited By

Index Terms

Engineering fault-tolerant tcp/ip services

Middleware-Based Failure Detection and Recovery Services for Fault-Tolerant E-services

Fault Tolerant Video on Demand Services

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Engineering fault-tolerant tcp/ip services

Middleware-Based Failure Detection and Recovery Services for Fault-Tolerant E-services

Fault Tolerant Video on Demand Services