research-article

Skewed redundancy

Authors:

Gordon B. Bell,

Mikko H. LipastiAuthors Info & Claims

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Pages 62 - 71

https://doi.org/10.1145/1454115.1454126

Published: 25 October 2008 Publication History

Abstract

Technology scaling in integrated circuits has consistently provided dramatic performance improvements in modern microprocessors. However, increasing device counts and decreasing on-chip voltage levels have made transient errors a first-order design constraint that can no longer be ignored. Several proposals have provided fault detection and tolerance through redundantly executing a program on an additional hardware thread or core. While such techniques can provide high fault coverage, they at best provide equivalent performance to the original execution and at worst incur a slowdown due to error checking, contention for shared resources, and synchronization overheads. This work achieves a similar goal of detecting transient errors by redundantly executing a program on an additional processor core, however it speeds up (rather than slows down) program execution compared to the unprotected baseline case. It makes the observation that a small number of instructions are detrimental to overall performance, and selectively skipping them enables one core to advance far ahead of the other to obtain prefetching and large instruction window benefits. We highlight the modest incremental hardware required to support skewed redundancy and demonstrate a speedup of 6%/54% for a collection of integer/floating point benchmarks while still providing 100% error detection coverage within our sphere of replication. Additionally, we show that a third core can further improve performance while adding error recovery capabilities.

References

[1]

N. Aggarwal, P. Ranganathan, N. Jouppi, and J. Smith. Configurable isolation: building high availability systems with commodity multi-core processors. In ISCA 2007, June 2007.

Digital Library

[2]

R. Barnes et al. Beating in-order stalls with "flea-flicker" two-pass pipelining. In MICRO-36, 2003.

Digital Library

[3]

G. Bell and M. Lipasti. Deconstructing commit. In ISPASS-4, Austin, Texas, March 2004.

Digital Library

[4]

B. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970.

Digital Library

[5]

H. Cain. Detecting and Exploiting Causal Relationships in Hardware Shared-Memory Multiprocessors. PhD thesis, University of Wisconsin-Madison, 2004.

Digital Library

[6]

H. Cain, K. Lepak, B. Schwarz, and M. Lipasti. Precise and accurate processor simulation. In CAECW, Feb. 2002.

[7]

A. Cristal et al. Large virtual ROBs by processor checkpointing. Tech. Rep. UPC-DAC-2002-39, Univ. UPC, July 2002.

[8]

A. Cristal, D. Ortega, J. Llosa, and M. Valero. Out-of-order commit processors. HPCA-10, Madrid, Spain, Feb. 2004.

Digital Library

[9]

J. Dundas. Improving processor performance by dynamically pre-processing the instruction stream. PhD, 1998.

Digital Library

[10]

I. Ganusov and M. Burtscher. Future execution: A hardware prefetching technique for chip multiprocessors. In PACT '05, pages 350--360, Washington, DC, USA, 2005.

Digital Library

[11]

M. Gomaa, C. Scarbrough, T. N. Vijaykumar, and I. Pomeranz. Transient-fault recovery for chip multiprocessors. In ISCA '03, pages 98--109, New York, NY, USA, 2003.

Digital Library

[12]

L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS-VIII, 1998.

Digital Library

[13]

P. Jordan, B. Konigsburg, H. Le, and S. White. US patent #5805849: Data processing system and method for using an unique identifier to maintain an age relationship between executing instructions, 1997.

[14]

T. Karkhanis and J. Smith. A day in the life of a data cache miss, In Workshop on Memory Performance Issues, 2002.

[15]

I. Kim and M. Lipasti. Understanding scheduling replay schemes. In HPCA-10, San Diego, California, Feb. 2004.

Digital Library

[16]

V. Krishnan and J. Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, 1999.

Digital Library

[17]

A.R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg. A large, fast instruction window for tolerating cache misses. In ISCA-29, pages 59--70, 2002.

Digital Library

[18]

Y. Ma, H. Gao, M. Dimitrov, and H. Zhou. Optimizing dual-core execution for power efficiency and transient-fault recovery. IEEE TPDS, 18(8):1080--1093, 2007.

Digital Library

[19]

J. Martinez, J. Renau, M. Huang, M. Prvulovic, and J. Torrellas. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In MICRO-25, Nov. 2002.

Digital Library

[20]

S. Mukherjee, M Kontz, and S. Reinhardt. Detailed design and evaluation of redundant multithreading alternatives. In ISCA-29, 2002.

Digital Library

[21]

O Mutlu, J Stark, C Wilkerson, and YN Patt. Runahead execution: an alternative to very large instruction windows for out-of-order processors. In HPCA-9, Jan. 2003.

Digital Library

[22]

J. Ray, J. Hoe, and B. Falsafi. Dual use of superscalar datapath for transient-fault detection and recovery. In MICRO 34, 2001.

Digital Library

[23]

V. Reddy, E. Rotenberg, and S. Parthasarathy. Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance. In ASPLOS-XII, October 2006.

Digital Library

[24]

S. Reinhardt and S. Mukherjee. Transient fault detection via simultaneous multithreading. In ISCA-27, NY, 2000.

Digital Library

[25]

E. Rotenberg. AR-SMT: A microarchitectural approach to fault tolerance in microprocessors. In FTCS-29, June 1999.

Digital Library

[26]

S. Sethumadhavan, R. Desikan, D. Burger, C. Moore, and S. Keckler. Scalable hardware memory disambiguation for high-ilp processors. IEEE Micro, 24(6):118--127, 2004.

Digital Library

[27]

J. Smolens, B. Gold, B. Falsafi, and J. Hoe. Reunion: Complexity-effective multicore redundancy. In MICRO 39, 2006.

Digital Library

[28]

J. Smolens, J. Kim, J. Hoe, and B. Falsafi. Efficient resource sharing in concurrent error detecting superscalar microarchitectures. MICRO-37, 2004.

Digital Library

[29]

G. Sohi, S. Breach, and T.N. Vijaykumar. Multiscalar processors. In ISCA-22, June 1995.

Digital Library

[30]

S. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton. Continual flow pipelines. In ASPLOS-XI, 2004.

Digital Library

[31]

J. Steffan and T Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In HPCA-4, 1998.

Digital Library

[32]

K. Sundaramoorthy et al. Slipstream processors: improving both performance and fault tolerance. In ASPLOS-IX, 2000.

Digital Library

[33]

H. Zhou. Dual-core execution: Building a highly scalable single-thread instruction window. In PACT '05, 2005.

Digital Library

Cited By

Hyman Jr. RBhattacharya KRanganathan N(2011)Redundancy Mining for Soft Error Detection in Multicore ProcessorsIEEE Transactions on Computers10.1109/TC.2010.16860:8(1114-1125)Online publication date: 1-Aug-2011
https://dl.acm.org/doi/10.1109/TC.2010.168
Hyman RBhattacharya KRanganathan N(2009)A strategy for soft error reduction in multi core designs2009 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2009.5118238(2217-2220)Online publication date: May-2009
https://doi.org/10.1109/ISCAS.2009.5118238

Index Terms

Skewed redundancy
1. Hardware
  1. Hardware test
  2. Robustness

Recommendations

A study of source-level compiler algorithms for automatic construction of pre-execution code

Pre-execution is a promising latency tolerance technique that uses one or more helper threads running in spare hardware contexts ahead of the main computation to trigger long-latency memory operations early, hence absorbing their latency on behalf of ...
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single ...
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

October 2008

328 pages

ISBN:9781605582825

DOI:10.1145/1454115

General Chair:
Andreas Moshovos
University of Toronto, Canada
,
Program Chairs:
David Tarditi
Microsoft, USA
,
Kunle Olukotun
Stanford University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '08

Sponsor:

PACT '08: International Conference on Parallel Architectures and Compilation Techniques

October 25 - 29, 2008

Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Upcoming Conference

PACT '24

Sponsor:
sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Long Beach , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
199
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hyman Jr. RBhattacharya KRanganathan N(2011)Redundancy Mining for Soft Error Detection in Multicore ProcessorsIEEE Transactions on Computers10.1109/TC.2010.16860:8(1114-1125)Online publication date: 1-Aug-2011
https://dl.acm.org/doi/10.1109/TC.2010.168
Hyman RBhattacharya KRanganathan N(2009)A strategy for soft error reduction in multi core designs2009 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2009.5118238(2217-2220)Online publication date: May-2009
https://doi.org/10.1109/ISCAS.2009.5118238

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents