Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1454115.1454126acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Skewed redundancy

Published: 25 October 2008 Publication History
  • Get Citation Alerts
  • Abstract

    Technology scaling in integrated circuits has consistently provided dramatic performance improvements in modern microprocessors. However, increasing device counts and decreasing on-chip voltage levels have made transient errors a first-order design constraint that can no longer be ignored. Several proposals have provided fault detection and tolerance through redundantly executing a program on an additional hardware thread or core. While such techniques can provide high fault coverage, they at best provide equivalent performance to the original execution and at worst incur a slowdown due to error checking, contention for shared resources, and synchronization overheads. This work achieves a similar goal of detecting transient errors by redundantly executing a program on an additional processor core, however it speeds up (rather than slows down) program execution compared to the unprotected baseline case. It makes the observation that a small number of instructions are detrimental to overall performance, and selectively skipping them enables one core to advance far ahead of the other to obtain prefetching and large instruction window benefits. We highlight the modest incremental hardware required to support skewed redundancy and demonstrate a speedup of 6%/54% for a collection of integer/floating point benchmarks while still providing 100% error detection coverage within our sphere of replication. Additionally, we show that a third core can further improve performance while adding error recovery capabilities.

    References

    [1]
    N. Aggarwal, P. Ranganathan, N. Jouppi, and J. Smith. Configurable isolation: building high availability systems with commodity multi-core processors. In ISCA 2007, June 2007.
    [2]
    R. Barnes et al. Beating in-order stalls with "flea-flicker" two-pass pipelining. In MICRO-36, 2003.
    [3]
    G. Bell and M. Lipasti. Deconstructing commit. In ISPASS-4, Austin, Texas, March 2004.
    [4]
    B. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970.
    [5]
    H. Cain. Detecting and Exploiting Causal Relationships in Hardware Shared-Memory Multiprocessors. PhD thesis, University of Wisconsin-Madison, 2004.
    [6]
    H. Cain, K. Lepak, B. Schwarz, and M. Lipasti. Precise and accurate processor simulation. In CAECW, Feb. 2002.
    [7]
    A. Cristal et al. Large virtual ROBs by processor checkpointing. Tech. Rep. UPC-DAC-2002-39, Univ. UPC, July 2002.
    [8]
    A. Cristal, D. Ortega, J. Llosa, and M. Valero. Out-of-order commit processors. HPCA-10, Madrid, Spain, Feb. 2004.
    [9]
    J. Dundas. Improving processor performance by dynamically pre-processing the instruction stream. PhD, 1998.
    [10]
    I. Ganusov and M. Burtscher. Future execution: A hardware prefetching technique for chip multiprocessors. In PACT '05, pages 350--360, Washington, DC, USA, 2005.
    [11]
    M. Gomaa, C. Scarbrough, T. N. Vijaykumar, and I. Pomeranz. Transient-fault recovery for chip multiprocessors. In ISCA '03, pages 98--109, New York, NY, USA, 2003.
    [12]
    L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS-VIII, 1998.
    [13]
    P. Jordan, B. Konigsburg, H. Le, and S. White. US patent #5805849: Data processing system and method for using an unique identifier to maintain an age relationship between executing instructions, 1997.
    [14]
    T. Karkhanis and J. Smith. A day in the life of a data cache miss, In Workshop on Memory Performance Issues, 2002.
    [15]
    I. Kim and M. Lipasti. Understanding scheduling replay schemes. In HPCA-10, San Diego, California, Feb. 2004.
    [16]
    V. Krishnan and J. Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, 1999.
    [17]
    A.R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg. A large, fast instruction window for tolerating cache misses. In ISCA-29, pages 59--70, 2002.
    [18]
    Y. Ma, H. Gao, M. Dimitrov, and H. Zhou. Optimizing dual-core execution for power efficiency and transient-fault recovery. IEEE TPDS, 18(8):1080--1093, 2007.
    [19]
    J. Martinez, J. Renau, M. Huang, M. Prvulovic, and J. Torrellas. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In MICRO-25, Nov. 2002.
    [20]
    S. Mukherjee, M Kontz, and S. Reinhardt. Detailed design and evaluation of redundant multithreading alternatives. In ISCA-29, 2002.
    [21]
    O Mutlu, J Stark, C Wilkerson, and YN Patt. Runahead execution: an alternative to very large instruction windows for out-of-order processors. In HPCA-9, Jan. 2003.
    [22]
    J. Ray, J. Hoe, and B. Falsafi. Dual use of superscalar datapath for transient-fault detection and recovery. In MICRO 34, 2001.
    [23]
    V. Reddy, E. Rotenberg, and S. Parthasarathy. Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance. In ASPLOS-XII, October 2006.
    [24]
    S. Reinhardt and S. Mukherjee. Transient fault detection via simultaneous multithreading. In ISCA-27, NY, 2000.
    [25]
    E. Rotenberg. AR-SMT: A microarchitectural approach to fault tolerance in microprocessors. In FTCS-29, June 1999.
    [26]
    S. Sethumadhavan, R. Desikan, D. Burger, C. Moore, and S. Keckler. Scalable hardware memory disambiguation for high-ilp processors. IEEE Micro, 24(6):118--127, 2004.
    [27]
    J. Smolens, B. Gold, B. Falsafi, and J. Hoe. Reunion: Complexity-effective multicore redundancy. In MICRO 39, 2006.
    [28]
    J. Smolens, J. Kim, J. Hoe, and B. Falsafi. Efficient resource sharing in concurrent error detecting superscalar microarchitectures. MICRO-37, 2004.
    [29]
    G. Sohi, S. Breach, and T.N. Vijaykumar. Multiscalar processors. In ISCA-22, June 1995.
    [30]
    S. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton. Continual flow pipelines. In ASPLOS-XI, 2004.
    [31]
    J. Steffan and T Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In HPCA-4, 1998.
    [32]
    K. Sundaramoorthy et al. Slipstream processors: improving both performance and fault tolerance. In ASPLOS-IX, 2000.
    [33]
    H. Zhou. Dual-core execution: Building a highly scalable single-thread instruction window. In PACT '05, 2005.

    Cited By

    View all
    • (2011)Redundancy Mining for Soft Error Detection in Multicore ProcessorsIEEE Transactions on Computers10.1109/TC.2010.16860:8(1114-1125)Online publication date: 1-Aug-2011
    • (2009)A strategy for soft error reduction in multi core designs2009 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2009.5118238(2217-2220)Online publication date: May-2009

    Index Terms

    1. Skewed redundancy

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
        October 2008
        328 pages
        ISBN:9781605582825
        DOI:10.1145/1454115
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 October 2008

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. distributed processing
        2. error tolerance
        3. memory-level parallelism

        Qualifiers

        • Research-article

        Conference

        PACT '08
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 121 of 471 submissions, 26%

        Upcoming Conference

        PACT '24

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 11 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2011)Redundancy Mining for Soft Error Detection in Multicore ProcessorsIEEE Transactions on Computers10.1109/TC.2010.16860:8(1114-1125)Online publication date: 1-Aug-2011
        • (2009)A strategy for soft error reduction in multi core designs2009 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2009.5118238(2217-2220)Online publication date: May-2009

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media