Reunion: Complexity-effective multicore redundancy

JC Smolens, BT Gold, B Falsafi… - 2006 39th Annual IEEE …, 2006 - ieeexplore.ieee.org
JC Smolens, BT Gold, B Falsafi, JC Hoe
2006 39th Annual IEEE/ACM International Symposium on …, 2006ieeexplore.ieee.org
To protect processor logic from soft errors, multicore redundant architectures execute two
copies of a program on separate cores of a chip multiprocessor (CMP). Maintaining identical
instruction streams is challenging because redundant cores operate independently, yet must
still receive the same inputs (eg, load values and shared-memory invalidations). Past
proposals strictly replicate load values across two cores, requiring significant changes to the
highly-optimized core. We make the key observation that, in the common case, both cores …
To protect processor logic from soft errors, multicore redundant architectures execute two copies of a program on separate cores of a chip multiprocessor (CMP). Maintaining identical instruction streams is challenging because redundant cores operate independently, yet must still receive the same inputs (e.g., load values and shared-memory invalidations). Past proposals strictly replicate load values across two cores, requiring significant changes to the highly-optimized core. We make the key observation that, in the common case, both cores load identical values without special hardware. When the cores do receive different load values (e.g., due to a data race), the same mechanisms employed for soft error detection and recovery can correct the difference. This observation permits designs that relax input replication, while still providing correct redundant execution. In this paper, we present Reunion, an execution model that provides relaxed input replication and preserves the existing memory interface, coherence protocols, and consistency models. We evaluate a CMP-based implementation of the Reunion execution model with full-system, cycle-accurate simulation. We show that the performance overhead of relaxed input replication is only 5% and 6% for commercial and scientific workloads, respectively
ieeexplore.ieee.org