Mechanisms for runtime fault-tolerance in Multi-Processor System-on-Chips (MPSoCs) are mandatory ... more Mechanisms for runtime fault-tolerance in Multi-Processor System-on-Chips (MPSoCs) are mandatory to cope with transient and permanent faults. This issue is even more relevant in nanotechnologies due to process variability, aging effects, and susceptibility to upsets, among other factors. The literature presents isolated solutions to deal with faults in the MPSoC communication infrastructure. In this context, one gap to be fulfilled is to integrate all layers, resulting in a solution to cope with NoC faults from the physical layer up to the application layer. The goal of this work is to present a runtime integrated approach to cope with NoC faults in MPSoCs. The original contribution is the proposal of a set of hardware and software mechanisms to ensure both efficient and reliable communication in NoC-based MPSoCs. The proposal has an acceptable silicon area overhead and a small memory footprint. Experiments demonstrate that benchmarks (synthetic and real MPSoC applications) were simulated with thousands of random fault injections, and all of them were executed correctly. Moreover, the average application execution time overhead is lower than 0.5%. This suggests the proposed fault tolerant method could be used in applications with reliability and performance constraints.
Fifteenth International Symposium on Quality Electronic Design, 2014
ABSTRACT The design of MPSoCs is a complex task. From the designer side point of view, a new feat... more ABSTRACT The design of MPSoCs is a complex task. From the designer side point of view, a new feature inserted into the system (e.g. a mapping heuristic or a new function in the operating system) must be validated with a large set of the MPSoC configurations. From the application developer side point of view, the performance of a set of applications running simultaneously in the MPSoC platform must be also evaluated for different MPSoC configurations. Therefore, for both designers and application developers a framework enabling the automatic MPSoCs generation and simulation is mandatory for design space exploration. This is the goal of the present work, present a parameterizable MPSoC, including distributed management, and a framework to generate and simulate several MPSoCs configurations automatically. Results show that it is feasible to simulate large platforms, up to 400 processing elements, using a cycle accurate SystemC description.
Fifteenth International Symposium on Quality Electronic Design, 2014
ABSTRACT The design of reliable MPSoCs is mandatory to cope with faults during fabrication or pro... more ABSTRACT The design of reliable MPSoCs is mandatory to cope with faults during fabrication or product lifetime. For instance, permanent faults on the interconnect network can stall or crash applications even though the network has alternative fault-free paths to a given destination. This paper presents a novel fault-tolerant communication protocol that takes advantage of the NoC parallelism to provide alternative paths between any source-target pair of processors, even in the presence of multiple faults. At the application layer, the method is seen as a typical MPI-like message passing protocol. At the lower layers, the method consists of a software kernel layer that monitors the regularity of message exchanges between pairs of tasks. If a message is not delivered in a certain time, the software fires a path finding mechanism implemented in hardware, which guarantees complete network reachability. The proposed approach determines new paths quickly, and the costs of extra silicon area and memory usage are small.
Mechanisms for runtime fault-tolerance in Multi-Processor System-on-Chips (MPSoCs) are mandatory ... more Mechanisms for runtime fault-tolerance in Multi-Processor System-on-Chips (MPSoCs) are mandatory to cope with transient and permanent faults. This issue is even more relevant in nanotechnologies due to process variability, aging effects, and susceptibility to upsets, among other factors. The literature presents isolated solutions to deal with faults in the MPSoC communication infrastructure. In this context, one gap to be fulfilled is to integrate all layers, resulting in a solution to cope with NoC faults from the physical layer up to the application layer. The goal of this work is to present a runtime integrated approach to cope with NoC faults in MPSoCs. The original contribution is the proposal of a set of hardware and software mechanisms to ensure both efficient and reliable communication in NoC-based MPSoCs. The proposal has an acceptable silicon area overhead and a small memory footprint. Experiments demonstrate that benchmarks (synthetic and real MPSoC applications) were simulated with thousands of random fault injections, and all of them were executed correctly. Moreover, the average application execution time overhead is lower than 0.5%. This suggests the proposed fault tolerant method could be used in applications with reliability and performance constraints.
Fifteenth International Symposium on Quality Electronic Design, 2014
ABSTRACT The design of MPSoCs is a complex task. From the designer side point of view, a new feat... more ABSTRACT The design of MPSoCs is a complex task. From the designer side point of view, a new feature inserted into the system (e.g. a mapping heuristic or a new function in the operating system) must be validated with a large set of the MPSoC configurations. From the application developer side point of view, the performance of a set of applications running simultaneously in the MPSoC platform must be also evaluated for different MPSoC configurations. Therefore, for both designers and application developers a framework enabling the automatic MPSoCs generation and simulation is mandatory for design space exploration. This is the goal of the present work, present a parameterizable MPSoC, including distributed management, and a framework to generate and simulate several MPSoCs configurations automatically. Results show that it is feasible to simulate large platforms, up to 400 processing elements, using a cycle accurate SystemC description.
Fifteenth International Symposium on Quality Electronic Design, 2014
ABSTRACT The design of reliable MPSoCs is mandatory to cope with faults during fabrication or pro... more ABSTRACT The design of reliable MPSoCs is mandatory to cope with faults during fabrication or product lifetime. For instance, permanent faults on the interconnect network can stall or crash applications even though the network has alternative fault-free paths to a given destination. This paper presents a novel fault-tolerant communication protocol that takes advantage of the NoC parallelism to provide alternative paths between any source-target pair of processors, even in the presence of multiple faults. At the application layer, the method is seen as a typical MPI-like message passing protocol. At the lower layers, the method consists of a software kernel layer that monitors the regularity of message exchanges between pairs of tasks. If a message is not delivered in a certain time, the software fires a path finding mechanism implemented in hardware, which guarantees complete network reachability. The proposed approach determines new paths quickly, and the costs of extra silicon area and memory usage are small.
Uploads
Papers by Augusto Erichsen