In avionics, like glide computers, the problem of No Faults Found (NFF) is a very serious and ext... more In avionics, like glide computers, the problem of No Faults Found (NFF) is a very serious and extremely costly affair. The rare occurrences and short bursts of these faults are the most difficult ones to detect and diagnose in the testing arena. Several techniques are now being developed in ICs by us to cope with one particular category of NFFs, being the intermittent resistive faults (IRF). The reuse of these (on-chip) embedded instruments for detection of these faults at the board-level has been investigated in conjunction with the possibilities of enhancing the (mixed-signal) boundary-scan standard IEEE 1149.4. This paper will explore how this can be accomplished.
2017 International Test Conference in Asia (ITC-Asia), 2017
In safety-critical systems, many-processor Systems-on-Chip are being increasingly employed. An ex... more In safety-critical systems, many-processor Systems-on-Chip are being increasingly employed. An example is an imminent collision detection System-on-Chip for cars. Such a system requires zero downtime and a very high reliability despite aging issues under harsh environmental conditions. By monitoring the health status of processor cores and other IPs, and taking appropriate counteractions if required, we accomplished this goal via IJTAG compatible embedded instruments. This paper shows the design of the required IJTAG network, and a number of new IJTAG-compatible embedded instruments like slack-delay, power-supply current IDDT and Intermittent Resistive Fault monitors. In addition, we discuss their numbers and optimal locations in a processor core and provide a PDL description for one of our embedded instruments. In the case of for instance a four-processor implementation, requiring only two for actual data processing, the lifetime can increase by a factor of roughly three.
The dependability of highly dependable systems relies on the reliability of its components and in... more The dependability of highly dependable systems relies on the reliability of its components and interconnections. One of the most challenging faults that threatens the reliability of interconnections in a system are intermittent resistive faults (IRFs). They may occur randomly in time, duration and amplitude in every interconnection. The occurrence rate can vary from a few nanoseconds to months. As a result, evoking and detecting such faults is a major challenge. In this paper, IRF detection at the chip level has been tackled by utilising a fully digital insitu IRF monitor. This paper introduces a new algorithm for inserting IRF monitors in a design. The goal of this algorithm is to minimise the number of IRF monitors while providing a high fault coverage for IRFs. The algorithm has been validated using software-based fault injection. The simulation results show that the proposed algorithm improves the IRF coverage at the chip level at the cost of a small area and power-consumption o...
2018 IEEE 21st International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), 2018
interconnection reliability issues threat the dependability of highly dependable systems. One of ... more interconnection reliability issues threat the dependability of highly dependable systems. One of the most challenging interconnection-induced reliability threats is intermittent resistive faults (IRFs). They may occur randomly in time, duration and amplitude in every interconnection. The occurrence rate can vary from a few nanoseconds to months. As a result, evoking and detecting such faults is a major challenge. In this paper, IRF detection at the board level has been investigated by introducing a new digital IRF monitor. This monitor has been validated by using hardware-based fault injection. Two widely used on-board transmission protocols —UART and SPI —have been used as case studies. In addition, one fault management framework —based on the IJTAG standard —has been implemented to collect and characterize information from the monitors. The experimental results show that the proposed monitor is effective in detecting IRFs at the board level.
2018 IEEE Industrial Cyber-Physical Systems (ICPS), 2018
Safety-critical cyber-physical systems-on-chip, consisting of analog/mixed-signal front- and back... more Safety-critical cyber-physical systems-on-chip, consisting of analog/mixed-signal front- and back-ends combined with massive digital many-processor cores, are being increasingly applied. The imminent collision detection chip for cars is an example of this and such a complex system requires zero downtime and a very high dependability. By on-line monitoring the health status of processor cores and IPs and taking counteractions, we have accomplished this goal via IJTAG-compatible embedded instruments and appropriate embedded software. An IJTAG-compatible Iddt monitor has been designed, a slack-delay embedded instrument for detecting timing issues, as well as a monitor for detecting intermitted resistive faults in interconnections. By the on-chip replacement of degraded (non-healthy) cores, the lifetime can be increased by a factor of around four of our mixed-signal cyber-physical systems-on-chip.
The reliability of board-level data communications intensively depends on the reliability of inte... more The reliability of board-level data communications intensively depends on the reliability of interconnections on a board. One of the most challenging interconnections reliability threats is intermittent resistive faults (IRFs). Detecting such faults is a major challenge. The main reason is the random behavior of these faults. They may occur randomly in time, duration and amplitude. The occurrence rate can vary from a few nanoseconds to months. This paper investigates IRF detection at the board level by introducing a new digital in situ IRF monitor. Hardware-based fault injection has been used to validate the proposed IRF monitor. As case studies, two widely used on-board transmission protocols namely the Universal Asynchronous Receiver Transmitter (UART) and the Serial Peripheral Interface bus (SPI), have been used. In addition, one fault management framework, based on the IJTAG standard, has been implemented to collect and characterize information from the monitors. The experimenta...
SRAM-based FPGAs suffer from soft errors caused by cosmic particles. This paper introduces a new ... more SRAM-based FPGAs suffer from soft errors caused by cosmic particles. This paper introduces a new switch box architecture to mitigate soft errors. In this switch box architecture, the number of SRAM bits required for programming the switch boxes is reduced by means of switch reduction with slight impact on routing capability of the switch box. This architecture does not require any modification of the existing placement and routing algorithms. The proposed architecture was evaluated on several MCNC benchmarks using VPR tool. The experimental results show that this architecture decreases the susceptibility of switch boxes to single event upsets by about 18% on average compared to the traditional ones. Also, our architecture decreases the probability of ridging and short faults in the switch boxes by about 32% on average.
In avionics, like glide computers, the problem of No Faults Found (NFF) is a very serious and ext... more In avionics, like glide computers, the problem of No Faults Found (NFF) is a very serious and extremely costly affair. The rare occurrences and short bursts of these faults are the most difficult ones to detect and diagnose in the testing arena. Several techniques are now being developed in ICs by us to cope with one particular category of NFFs, being the intermittent resistive faults (IRF). The reuse of these (on-chip) embedded instruments for detection of these faults at the board-level has been investigated in conjunction with the possibilities of enhancing the (mixed-signal) boundary-scan standard IEEE 1149.4. This paper will explore how this can be accomplished.
2017 International Test Conference in Asia (ITC-Asia), 2017
In safety-critical systems, many-processor Systems-on-Chip are being increasingly employed. An ex... more In safety-critical systems, many-processor Systems-on-Chip are being increasingly employed. An example is an imminent collision detection System-on-Chip for cars. Such a system requires zero downtime and a very high reliability despite aging issues under harsh environmental conditions. By monitoring the health status of processor cores and other IPs, and taking appropriate counteractions if required, we accomplished this goal via IJTAG compatible embedded instruments. This paper shows the design of the required IJTAG network, and a number of new IJTAG-compatible embedded instruments like slack-delay, power-supply current IDDT and Intermittent Resistive Fault monitors. In addition, we discuss their numbers and optimal locations in a processor core and provide a PDL description for one of our embedded instruments. In the case of for instance a four-processor implementation, requiring only two for actual data processing, the lifetime can increase by a factor of roughly three.
The dependability of highly dependable systems relies on the reliability of its components and in... more The dependability of highly dependable systems relies on the reliability of its components and interconnections. One of the most challenging faults that threatens the reliability of interconnections in a system are intermittent resistive faults (IRFs). They may occur randomly in time, duration and amplitude in every interconnection. The occurrence rate can vary from a few nanoseconds to months. As a result, evoking and detecting such faults is a major challenge. In this paper, IRF detection at the chip level has been tackled by utilising a fully digital insitu IRF monitor. This paper introduces a new algorithm for inserting IRF monitors in a design. The goal of this algorithm is to minimise the number of IRF monitors while providing a high fault coverage for IRFs. The algorithm has been validated using software-based fault injection. The simulation results show that the proposed algorithm improves the IRF coverage at the chip level at the cost of a small area and power-consumption o...
2018 IEEE 21st International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), 2018
interconnection reliability issues threat the dependability of highly dependable systems. One of ... more interconnection reliability issues threat the dependability of highly dependable systems. One of the most challenging interconnection-induced reliability threats is intermittent resistive faults (IRFs). They may occur randomly in time, duration and amplitude in every interconnection. The occurrence rate can vary from a few nanoseconds to months. As a result, evoking and detecting such faults is a major challenge. In this paper, IRF detection at the board level has been investigated by introducing a new digital IRF monitor. This monitor has been validated by using hardware-based fault injection. Two widely used on-board transmission protocols —UART and SPI —have been used as case studies. In addition, one fault management framework —based on the IJTAG standard —has been implemented to collect and characterize information from the monitors. The experimental results show that the proposed monitor is effective in detecting IRFs at the board level.
2018 IEEE Industrial Cyber-Physical Systems (ICPS), 2018
Safety-critical cyber-physical systems-on-chip, consisting of analog/mixed-signal front- and back... more Safety-critical cyber-physical systems-on-chip, consisting of analog/mixed-signal front- and back-ends combined with massive digital many-processor cores, are being increasingly applied. The imminent collision detection chip for cars is an example of this and such a complex system requires zero downtime and a very high dependability. By on-line monitoring the health status of processor cores and IPs and taking counteractions, we have accomplished this goal via IJTAG-compatible embedded instruments and appropriate embedded software. An IJTAG-compatible Iddt monitor has been designed, a slack-delay embedded instrument for detecting timing issues, as well as a monitor for detecting intermitted resistive faults in interconnections. By the on-chip replacement of degraded (non-healthy) cores, the lifetime can be increased by a factor of around four of our mixed-signal cyber-physical systems-on-chip.
The reliability of board-level data communications intensively depends on the reliability of inte... more The reliability of board-level data communications intensively depends on the reliability of interconnections on a board. One of the most challenging interconnections reliability threats is intermittent resistive faults (IRFs). Detecting such faults is a major challenge. The main reason is the random behavior of these faults. They may occur randomly in time, duration and amplitude. The occurrence rate can vary from a few nanoseconds to months. This paper investigates IRF detection at the board level by introducing a new digital in situ IRF monitor. Hardware-based fault injection has been used to validate the proposed IRF monitor. As case studies, two widely used on-board transmission protocols namely the Universal Asynchronous Receiver Transmitter (UART) and the Serial Peripheral Interface bus (SPI), have been used. In addition, one fault management framework, based on the IJTAG standard, has been implemented to collect and characterize information from the monitors. The experimenta...
SRAM-based FPGAs suffer from soft errors caused by cosmic particles. This paper introduces a new ... more SRAM-based FPGAs suffer from soft errors caused by cosmic particles. This paper introduces a new switch box architecture to mitigate soft errors. In this switch box architecture, the number of SRAM bits required for programming the switch boxes is reduced by means of switch reduction with slight impact on routing capability of the switch box. This architecture does not require any modification of the existing placement and routing algorithms. The proposed architecture was evaluated on several MCNC benchmarks using VPR tool. The experimental results show that this architecture decreases the susceptibility of switch boxes to single event upsets by about 18% on average compared to the traditional ones. Also, our architecture decreases the probability of ridging and short faults in the switch boxes by about 32% on average.
Uploads
Papers by Hassan Ebrahimi