Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1785481.1785516acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
poster

Energy-efficient redundant execution for chip multiprocessors

Published: 16 May 2010 Publication History

Abstract

Relentless CMOS scaling coupled with lower design tolerances is making ICs increasingly susceptible to wear-out related permanent faults and transient faults, necessitating on-chip fault tolerance in future chip microprocessors (CMPs). In this paper, we describe a power-efficient architecture for redundant execution on chip multiprocessors (CMPs) which when coupled with our per-core dynamic voltage and frequency scaling (DVFS) algorithm significantly reduces the energy overhead of redundant execution without sacrificing performance. Our evaluation shows that this architecture has a performance overhead of only 0.3% and consumes only 1.48 times the energy of a non-fault-tolerant baseline.

References

[1]
Nidhi Aggarwal, Parthasarathy Ranganathan, Norman P. Jouppi, and James E. Smith. Configurable isolation: building high availability systems with commodity multi-core processors. SIGARCH Comput. Archit. News, 35(2), 2007.
[2]
Todd Austin. DIVA: A Reliable Substrate For Deep Submicron Microarchitecture Design. Proceedings of the 32nd MICRO, 1999.
[3]
Todd Austin, V. Bertacco, S. Mahlke, and Yu Cao. Reliable Systems on Unreliable Fabrics. IEEE Des. Test, 25(4), 2008.
[4]
D. Bernick, B. Bruckert, P. D. Vigna, D. Garcia, R. Jardine, J. Klecka, and J. Smullen. Nonstop R advanced architecture. In DSN '05: Proc. of DSN, 2005.
[5]
S. Y. Borkar. Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation. IEEE Micro, 25(6), 2005.
[6]
M.L. Fair, C.R. Conklin, S. B. Swaney, P. J. Meaney, W. J. Clarke, L. C. Alves, I. N. Modi, F. Freier, W. Fischer, and N. E. Weber. Reliability, Availability, and Serviceability (RAS) of the IBM eServer z990. IBM Journal of Research and Development, 2004.
[7]
M. Gomma, C. Scarbrough, T. N. Vijaykumar, and I. Pomeranz. Transient-Fault Recovery for Chip Multiprocessors. Proceedings of the 30th ISCA, 2003.
[8]
C. Isci, A. Buyuktosunoglu, C-Y. Cher, P. Bose, and M. Martonosi. An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget. Proc. of the 39th MICRO, 2006.
[9]
J. Dorsey et al. An Integrated Quad-core Opteron processor. International Solid State Circuits Conference, 2007.
[10]
W. Kim, M. S. Gupta, Wei Gu-Yeon, and D. Brooks. System level analysis of fast, per-core DVFS using on-chip switching regulators. Proceedings of the 14th HPCA, 2008.
[11]
Israel Koren and C. Mani Krishna. Fault Tolerant Systems. Morgan Kaufmann Publishers Inc., 2007.
[12]
S. S. Mukherjee, M. Kontz, and S. K. Reinhardt. Detailed Design and Evaluation of Redundant Multithreading Alternatives. Proceedings of the 29th ISCA, 2002.
[13]
I. Parulkar, A. Wood, J. C. Hoe, B. Falsafi, S. V. Adve, and J. Torrellas. OpenSPARC: An Open Platform for Hardware Reliability Experimentation. Fourth Workshop on Silicon Errors in Logic-System Effects (SELSE), 2008.
[14]
M. W. Rashid, E. J. Tan, M. C. Huang, and D. H. Albonesi. Exploiting Coarse-Grain Verification Parallelism for Power-Efficient Fault Tolerance. Proc. of the 14th International Conference on Parallel Architectures and Compilation Techniques, 2005.
[15]
S. K. Reinhardt and S. S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. Proceedings of the 27th ISCA, 2002.
[16]
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC Simulator. http://sesc.sourceforge.net/, 2005.
[17]
E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in a Microprocessor. Proceedings of FTCS, 1999.
[18]
P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic. Proceedings of the 32nd DSN, 2002.
[19]
J. C. Smolens, B. T. Gold, J. Kim, B. Falsafi, J. C. Hoe, and A. G. Nowatzyk. Fingerprinting: Bounding soft error detection latency and bandwidth. Proceedings of the 9th ASPLOS, 2004.
[20]
J. C. Smolens, B. T. Gold, B. Falsafi, and J. C. Hoe. Reunion: Complexity-Effective Multicore Redundancy. Proceedings of the 39th MICRO, 2006.
[21]
P. Subramanyan, V. Singh, K. K. Saluja, and E. Larsson. Power-Efficient Redundant Execution for Chip Multiprocessors. Proc. of 3rd WDSN, 2009.
[22]
P. Subramanyan, V. Singh, K. K. Saluja, and E. Larsson. Mulitplexed Redundant Execution: A Technique for Efficient Fault Tolerance in Chip Multiprocessors. Proc. of DATE, 2010.
[23]
P. Subramanyan, V. Singh, K. K. Saluja, and E. Larsson. Energy-Efficient Fault Tolerance in Chip Multiprocessors Using Critical Value Forwarding. To appear in Proc. of DSN, 2010.

Cited By

View all
  • (2019)32-Bit One Instruction Core: A Low-Cost, Reliable, and Fault-„Tolerant Core for Multicore SystemsJournal of Testing and Evaluation10.1520/JTE2018049247:6(20180492)Online publication date: 31-Jan-2019
  • (2019)A Survey on Multithreading Alternatives for Soft Error Fault ToleranceACM Computing Surveys10.1145/330225552:2(1-38)Online publication date: 27-Mar-2019
  • (2019)Soft‐error reliable architecture for future microprocessorsIET Computers & Digital Techniques10.1049/iet-cdt.2018.501513:3(233-242)Online publication date: 5-Mar-2019
  • Show More Cited By

Index Terms

  1. Energy-efficient redundant execution for chip multiprocessors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GLSVLSI '10: Proceedings of the 20th symposium on Great lakes symposium on VLSI
    May 2010
    502 pages
    ISBN:9781450300124
    DOI:10.1145/1785481
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CEDA
    • IEEE CASS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 May 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. microarchitecture
    2. permanent faults
    3. redundant execution
    4. transient faults

    Qualifiers

    • Poster

    Conference

    GLSVLSI '10
    Sponsor:
    GLSVLSI '10: Great Lakes Symposium on VLSI 2010
    May 16 - 18, 2010
    Rhode Island, Providence, USA

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,156 submissions, 27%

    Upcoming Conference

    GLSVLSI '25
    Great Lakes Symposium on VLSI 2025
    June 30 - July 2, 2025
    New Orleans , LA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)32-Bit One Instruction Core: A Low-Cost, Reliable, and Fault-„Tolerant Core for Multicore SystemsJournal of Testing and Evaluation10.1520/JTE2018049247:6(20180492)Online publication date: 31-Jan-2019
    • (2019)A Survey on Multithreading Alternatives for Soft Error Fault ToleranceACM Computing Surveys10.1145/330225552:2(1-38)Online publication date: 27-Mar-2019
    • (2019)Soft‐error reliable architecture for future microprocessorsIET Computers & Digital Techniques10.1049/iet-cdt.2018.501513:3(233-242)Online publication date: 5-Mar-2019
    • (2011)Adaptive execution assistance for multiplexed fault-tolerant chip multiprocessorsProceedings of the 2011 IEEE 29th International Conference on Computer Design10.1109/ICCD.2011.6081432(419-426)Online publication date: 9-Oct-2011
    • (2010)Energy-efficient fault tolerance in chip multiprocessors using Critical Value Forwarding2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN)10.1109/DSN.2010.5544918(121-130)Online publication date: Jun-2010

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media