2013 25th Euromicro Conference on Real-Time Systems, 2013
Real-time computing and communication systems are often required to operate with prespecified lev... more Real-time computing and communication systems are often required to operate with prespecified levels of reliability in harsh environments, which may lead to the exposure of the system to random errors and random bursts of errors. The classical fault-tolerant schedulability analysis in such cases assumes a pseudo-periodic arrival of errors, and does not effectively capture any underlying randomness or burst characteristics. More modern approaches employ much richer stochastic error models to capture these behaviors, but this is at the expense of greatly increased complexity. In this paper, we develop a quantile-based approach to probabilistic schedulability analysis in a bid to improve efficiency whilst still retaining a rich stochastic error model capturing random errors and random bursts of errors. Our principal contribution is the derivation of a simple closed-form expression that tightly bounds the number of errors that a system must be able to tolerate at any time subsequent to its critical instant in order to achieve a specified level of reliability. We apply this technique to develop an efficient 'oneshot' schedulability analysis for a simple fault-tolerant EDF scheduler. The paper concludes that the proposed method is capable of giving efficient probabilistic scheduling guarantees, and may easily be coupled with more representative higherlevel job failure models, giving rise to efficient analysis procedures for safety-critical fault-tolerant real-time systems.
This paper takes a distributed fault-tolerant architecture made of N real-time nodes and introduc... more This paper takes a distributed fault-tolerant architecture made of N real-time nodes and introduces schedulability tests for real-time systems running on this architecture. This architecture has been specifically designed to fulfill real-time and fault-tolerance requirements. Replicas of the real-time system are distributed on the the different nodes and interchange partial results for voting through a CAN-based communication link. The schedulability analysis presented is based on computing worst case response times of the tasks of the system. This includes the time for interchanging results and voting. Finally, it is shown that with this architecture it is possible not only to tolerate value faults but also timing faults to meet hard deadlines. Keywords: schedulability analysis, fault-tolerance, N-version programming, real-time systems 1
Dynamic Reconfiguration (DR) has been generating a substantial interest since it allows improving... more Dynamic Reconfiguration (DR) has been generating a substantial interest since it allows improving efficiency in the use of system resources, which can impact both on the maximum functionality that the system can execute, on the level of resources needed for a given functionality, on the number of instantaneous users that the system can support, or even on the capacity to adapt to changes in the environ-ment or on the system operational architecture such as those caused by hazardous events. However, DR also requires ex-tra mechanisms to manage the reconfiguration itself, which can increase system complexity and reduce a priori knowl-edge, increasing the potential for lower reliability. There-fore, DR has not been considered in safety-critical systems. In this paper we argue that adequately preventing specific error situations at the lower levels of the architecture sim-plifies the upper-level systemwide Fault Tolerance mecha-nisms, and may compensate for the extra complexity and lowe...
Fault-tolerance and real-time computing have been traditionally considered as di erent domains. H... more Fault-tolerance and real-time computing have been traditionally considered as di erent domains. However, missing deadlines is a fault in a real-time system. In order to build fault free real-time systems, a real-time fault-tolerant architecture based on a Redundancy eXecutive (RX) is presented. The timing properties of such an executive are predictable. On this basis, a technique, based on xed priority schedulability analysis, for predicting the temporal behaviour of a system is provided. Moreover, this analysis can be applied to hard and soft real-time systems that present bounded transient overloads. In these systems the number of missed deadlines over a given period of time is bounded. The architecture, together with a dual time-out scheme, masks both value errors and timing errors. Thus, providing a feasible mechanism for achieving fault-tolerance for both the functional aspects and the timing aspects. Its application leads to cost-e ective systems because the resources do not have to be sized for the worst case and, moreover, the response times are sometimes better than in the non-fault-tolerant equivalent system.
Parallel and Distributed Processing Techniques and Applications, 1997
This paper takes a distributed fault-tolerant architectur e made of real-time nodes and introduce... more This paper takes a distributed fault-tolerant architectur e made of real-time nodes and introduces schedu- lability tests for real-time systems running on this archit ecture. This architecture has been specifically designed to fulfill real-time and fault-tolerance requirements. Replicas of the real-time system are distributed on the the differ ent nodes and interchange partial results for voting through a C AN-based
Although the Controller Area Network (CAN) protocol is increasingly used for real-time critical a... more Although the Controller Area Network (CAN) protocol is increasingly used for real-time critical applications, its original specification does not provide a clock synchronization service. In this paper we introduce the architecture of a clock subsystem that provides any CAN system with a clock synchronized with high precision. The main advantage of our subsystem is that, unlike what the previous solutions do, it provides this service without replacing the CAN circuitry nor importantly changing the software of the nodes. For this reason, we consider our subsystem as orthogonal to the rest of the network. Another advantage of our clock subsystem is the presence of specific fault tolerance mechanisms, which improve those existent in previous solutions. As a result, our subsystem is able to tolerate its own faults without affecting the nodes of the system.
Proceedings of the Symposium on Reliable Distributed Systems, 2008
The main goal of a clock synchronization service is to keep a consistent perception of time among... more The main goal of a clock synchronization service is to keep a consistent perception of time among the nodes of the system. In this context, consistency means that at any instant, the values of all the clocks in the system do not differ more than a given amount, which is called the precision. Moreover, clock synchronization is said to be
Proceedings of the IEEE International Conference on Electronics, Circuits, and Systems, 2002
There is a growing interest in using the controller area network (CAN) protocol for critical cont... more There is a growing interest in using the controller area network (CAN) protocol for critical control applications. In many studies considering this possibility, it is assumed that CAN controllers, which are the circuits that implement most of the protocol specification, never present ...
ABSTRACT There is a growing interest in using the controller area network (CAN) protocol for crit... more ABSTRACT There is a growing interest in using the controller area network (CAN) protocol for critical control applications. In many articles studying this possibility, it is assumed that CAN controllers, which are the circuits that implement most of the protocol specification, never present faults. In this paper we present the architecture of a fault-tolerant CAN controller subsystem that allows substantiation of this assumption. Our subsystem is made up of three standard CAN controllers and a specifically designed circuit, called redundancy manager (RM), which guarantees the coordinated operation of the introduced redundancy. Said circuit his been designed to adapt to the specific characteristics of the CAN protocol and is totally compatible with other fault tolerance mechanisms that have been designed in the past for other parts of the system. After presenting the behaviour and architecture for the fault-tolerant CAN controller subsystem, we present a complete design for the RM. This design has been implemented and simulated using a VHDL tool. Results of this simulation are also shown.
2013 25th Euromicro Conference on Real-Time Systems, 2013
Real-time computing and communication systems are often required to operate with prespecified lev... more Real-time computing and communication systems are often required to operate with prespecified levels of reliability in harsh environments, which may lead to the exposure of the system to random errors and random bursts of errors. The classical fault-tolerant schedulability analysis in such cases assumes a pseudo-periodic arrival of errors, and does not effectively capture any underlying randomness or burst characteristics. More modern approaches employ much richer stochastic error models to capture these behaviors, but this is at the expense of greatly increased complexity. In this paper, we develop a quantile-based approach to probabilistic schedulability analysis in a bid to improve efficiency whilst still retaining a rich stochastic error model capturing random errors and random bursts of errors. Our principal contribution is the derivation of a simple closed-form expression that tightly bounds the number of errors that a system must be able to tolerate at any time subsequent to its critical instant in order to achieve a specified level of reliability. We apply this technique to develop an efficient 'oneshot' schedulability analysis for a simple fault-tolerant EDF scheduler. The paper concludes that the proposed method is capable of giving efficient probabilistic scheduling guarantees, and may easily be coupled with more representative higherlevel job failure models, giving rise to efficient analysis procedures for safety-critical fault-tolerant real-time systems.
This paper takes a distributed fault-tolerant architecture made of N real-time nodes and introduc... more This paper takes a distributed fault-tolerant architecture made of N real-time nodes and introduces schedulability tests for real-time systems running on this architecture. This architecture has been specifically designed to fulfill real-time and fault-tolerance requirements. Replicas of the real-time system are distributed on the the different nodes and interchange partial results for voting through a CAN-based communication link. The schedulability analysis presented is based on computing worst case response times of the tasks of the system. This includes the time for interchanging results and voting. Finally, it is shown that with this architecture it is possible not only to tolerate value faults but also timing faults to meet hard deadlines. Keywords: schedulability analysis, fault-tolerance, N-version programming, real-time systems 1
Dynamic Reconfiguration (DR) has been generating a substantial interest since it allows improving... more Dynamic Reconfiguration (DR) has been generating a substantial interest since it allows improving efficiency in the use of system resources, which can impact both on the maximum functionality that the system can execute, on the level of resources needed for a given functionality, on the number of instantaneous users that the system can support, or even on the capacity to adapt to changes in the environ-ment or on the system operational architecture such as those caused by hazardous events. However, DR also requires ex-tra mechanisms to manage the reconfiguration itself, which can increase system complexity and reduce a priori knowl-edge, increasing the potential for lower reliability. There-fore, DR has not been considered in safety-critical systems. In this paper we argue that adequately preventing specific error situations at the lower levels of the architecture sim-plifies the upper-level systemwide Fault Tolerance mecha-nisms, and may compensate for the extra complexity and lowe...
Fault-tolerance and real-time computing have been traditionally considered as di erent domains. H... more Fault-tolerance and real-time computing have been traditionally considered as di erent domains. However, missing deadlines is a fault in a real-time system. In order to build fault free real-time systems, a real-time fault-tolerant architecture based on a Redundancy eXecutive (RX) is presented. The timing properties of such an executive are predictable. On this basis, a technique, based on xed priority schedulability analysis, for predicting the temporal behaviour of a system is provided. Moreover, this analysis can be applied to hard and soft real-time systems that present bounded transient overloads. In these systems the number of missed deadlines over a given period of time is bounded. The architecture, together with a dual time-out scheme, masks both value errors and timing errors. Thus, providing a feasible mechanism for achieving fault-tolerance for both the functional aspects and the timing aspects. Its application leads to cost-e ective systems because the resources do not have to be sized for the worst case and, moreover, the response times are sometimes better than in the non-fault-tolerant equivalent system.
Parallel and Distributed Processing Techniques and Applications, 1997
This paper takes a distributed fault-tolerant architectur e made of real-time nodes and introduce... more This paper takes a distributed fault-tolerant architectur e made of real-time nodes and introduces schedu- lability tests for real-time systems running on this archit ecture. This architecture has been specifically designed to fulfill real-time and fault-tolerance requirements. Replicas of the real-time system are distributed on the the differ ent nodes and interchange partial results for voting through a C AN-based
Although the Controller Area Network (CAN) protocol is increasingly used for real-time critical a... more Although the Controller Area Network (CAN) protocol is increasingly used for real-time critical applications, its original specification does not provide a clock synchronization service. In this paper we introduce the architecture of a clock subsystem that provides any CAN system with a clock synchronized with high precision. The main advantage of our subsystem is that, unlike what the previous solutions do, it provides this service without replacing the CAN circuitry nor importantly changing the software of the nodes. For this reason, we consider our subsystem as orthogonal to the rest of the network. Another advantage of our clock subsystem is the presence of specific fault tolerance mechanisms, which improve those existent in previous solutions. As a result, our subsystem is able to tolerate its own faults without affecting the nodes of the system.
Proceedings of the Symposium on Reliable Distributed Systems, 2008
The main goal of a clock synchronization service is to keep a consistent perception of time among... more The main goal of a clock synchronization service is to keep a consistent perception of time among the nodes of the system. In this context, consistency means that at any instant, the values of all the clocks in the system do not differ more than a given amount, which is called the precision. Moreover, clock synchronization is said to be
Proceedings of the IEEE International Conference on Electronics, Circuits, and Systems, 2002
There is a growing interest in using the controller area network (CAN) protocol for critical cont... more There is a growing interest in using the controller area network (CAN) protocol for critical control applications. In many studies considering this possibility, it is assumed that CAN controllers, which are the circuits that implement most of the protocol specification, never present ...
ABSTRACT There is a growing interest in using the controller area network (CAN) protocol for crit... more ABSTRACT There is a growing interest in using the controller area network (CAN) protocol for critical control applications. In many articles studying this possibility, it is assumed that CAN controllers, which are the circuits that implement most of the protocol specification, never present faults. In this paper we present the architecture of a fault-tolerant CAN controller subsystem that allows substantiation of this assumption. Our subsystem is made up of three standard CAN controllers and a specifically designed circuit, called redundancy manager (RM), which guarantees the coordinated operation of the introduced redundancy. Said circuit his been designed to adapt to the specific characteristics of the CAN protocol and is totally compatible with other fault tolerance mechanisms that have been designed in the past for other parts of the system. After presenting the behaviour and architecture for the fault-tolerant CAN controller subsystem, we present a complete design for the RM. This design has been implemented and simulated using a VHDL tool. Results of this simulation are also shown.
Uploads
Papers by Julian Proenza