Critical distributed programs require robust fault tolerance. One method of fault tolerance is distributed exception handling. An exception is an abnormal condition that is typically an error. Should a cooperating process in a distributed program signal an exception, the program determines if the exception can be handled locally or needs a coordinated recovery with other processes in the program. If coordinated recovery is required, the exception is signaled in all the processes and is called a global exception.
Existing distributed exception handling models focus on multiple concurrently signaled exceptions and how to structure a program to invoke the correct exception handler in each process. These models resolve multiple concurrent exceptions into one exception that represents all the signaled exceptions, and the correct exception handler is invoked by using a transaction-like program structure with synchronized entry and exit points. However, there is a wide range of applications that these models are not easily applied to, such as monitoring, re-configuring, and workflow due to four limitations in the models. First, concurrently signaled exceptions are assumed to be related, otherwise a single global exception can not represent them all. Second, a synchronized program structure may not be suitable for a program. Third, existing models do not detect global exception conditions, rather the program needs to. Finally, there is little separation between local and global exception handling making it difficult to update recovery actions without program changes.
The guardian model for exception handling addresses all these limitations to allow general distributed exception handling. The guardian is a global exception handler, and uses the concept of a context to define an execution or recovery stage of a program and program-defined rules that determine which exception handler to invoke in each of the program processes. Incorporating the guardian model into a program removes the need for the program to detect global exception conditions. The rules separate local exception handling from global exception handling, and can determine causality and priority among multiple concurrently signaled exceptions. The use of contexts removes the requirement of a transaction-like program structure.
Cited By
- Miller R and Tripathi A (2004). The Guardian Model and Primitives for Exception Handling in Distributed Systems, IEEE Transactions on Software Engineering, 30:12, (1008-1022), Online publication date: 1-Dec-2004.
- Campéas A, Dony C, Urtado C and Vauttier S Distributed exception handling Proceedings of the First international conference on Rapid Integration of Software Engineering Techniques, (82-92)
Index Terms
- The guardian model for exception handling in distributed systems
Recommendations
The Guardian Model for Exception Handling in Distributed Systems
SRDS '02: Proceedings of the 21st IEEE Symposium on Reliable Distributed SystemsWe present an abstraction called guardian for exception handling in distributed systems. The guardian can solve several limitations with existing distributed exception handling techniques. To understand these limitations, this paper analyzes distributed ...
The Guardian Model and Primitives for Exception Handling in Distributed Systems
This paper presents an abstraction called guardian for exception handling in distributed and concurrent systems that use coordinated exception handling. This model addresses two fundamental problems with distributed exception handling in a group of ...
Efficient Java exception handling in just-in-time compilation
Research ArticlesJava uses exceptions to provide elegant error handling capabilities during program execution. However, the presence of exception handlers complicates the job of the just-in-time (JIT) compiler, while exceptions are rarely used in most programs. This ...