Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1146909.1146926acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
Article

Prototyping a fault-tolerant multiprocessor SoC with run-time fault recovery

Published: 24 July 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Modern integrated circuits (ICs) are becoming increasingly complex. The complexity makes it difficult to design, manufacture and integrate these high performance ICs. The advent of multiprocessor Systems-on-chips (SoCs) makes it even more challenging for programmers to utilize the full potential of the computation resources on the chips. In the mean time, the complexity of the chip design creates new reliability challenges. As a result, chip designers and users cannot fully exploit the tremendous silicon resources on the chip. This research proposes a prototype which is composed of a fault tolerantmultiprocessor SoC and a coupled single program, multiple data (SPMD) programming framework. We use a SystemC based modeling and simulation environment to design and analyze this prototype. Our analysis shows that this prototype as a reliable computing platform constructed from the potentially unreliable chip resources, thus protecting the previous investment of hardware and software designs. Moreover, the promising application-driven simulation results shed light on the potential of a scalable and reliable multiprocessing computing platform for a wide range of mission-critical applications.

    References

    [1]
    D. Bertozzi, L. Benini, and G. De Micheli. Low power error resilient encoding for on-chip data buses. In Proceedings of 2002 Design Automation and Test in Europe Conference (DATE), 2002.
    [2]
    D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. Razor: A low-power pipeline based on circuit-level timing speculation. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), 2003.
    [3]
    S. Manolache, P. Eles, and Z. Peng. Fault and energy-aware communication mapping with guaranteed latency for applications implemented on NoC. In Proceedings of 42nd ACM/IEEE Design Automation Conference (DAC), 2005.
    [4]
    D. K. Pradhan. Fault-Tolerant Computer System Design. Prentice-Hall, Inc., 1996.
    [5]
    W. Qin. SimIt-ARM. http://sourceforge.net/projects/simit-arm/.
    [6]
    W. Robbins. Redundancy and binning of picoChip processors. Fall Processor Forum, 2004, San Jose, CA.
    [7]
    M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J.-W. Lee, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal. The R aw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(2), 2002.
    [8]
    X. Zhu, W. Qin, and S. Malik. Modeling operation and microarchitecture concurrency for communication architec tures with application to retargetable simulation. In Proceedings of International Conference on Hardware/Software Co-design and System Synthesis (CODES+ISSS), 2004.

    Cited By

    View all
    • (2018)A Hierarchical and Distributed Fault Tolerant Proposal for NoC-Based MPSoCsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2016.25936406:4(524-537)Online publication date: 1-Oct-2018
    • (2016)Distributed Sensor Network-on-Chip for Performance Optimization of Soft-Error-Tolerant Multiprocessor System-on-ChipIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2015.245291024:4(1546-1559)Online publication date: Apr-2016
    • (2016)A layered approach for fault tolerant NoC-based MPSoCs — Special session: Dependable MPSoCs2016 17th Latin-American Test Symposium (LATS)10.1109/LATW.2016.7483367(189-194)Online publication date: Apr-2016
    • Show More Cited By

    Index Terms

    1. Prototyping a fault-tolerant multiprocessor SoC with run-time fault recovery

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DAC '06: Proceedings of the 43rd annual Design Automation Conference
      July 2006
      1166 pages
      ISBN:1595933816
      DOI:10.1145/1146909
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 July 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. fault-tolerance
      2. multiprocessor system
      3. network-on-chip
      4. retargetable simulation
      5. run-time verification
      6. system-on-chip

      Qualifiers

      • Article

      Conference

      DAC06
      Sponsor:
      DAC06: The 43rd Annual Design Automation Conference 2006
      July 24 - 28, 2006
      CA, San Francisco, USA

      Acceptance Rates

      Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

      Upcoming Conference

      DAC '25
      62nd ACM/IEEE Design Automation Conference
      June 22 - 26, 2025
      San Francisco , CA , USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 28 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)A Hierarchical and Distributed Fault Tolerant Proposal for NoC-Based MPSoCsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2016.25936406:4(524-537)Online publication date: 1-Oct-2018
      • (2016)Distributed Sensor Network-on-Chip for Performance Optimization of Soft-Error-Tolerant Multiprocessor System-on-ChipIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2015.245291024:4(1546-1559)Online publication date: Apr-2016
      • (2016)A layered approach for fault tolerant NoC-based MPSoCs — Special session: Dependable MPSoCs2016 17th Latin-American Test Symposium (LATS)10.1109/LATW.2016.7483367(189-194)Online publication date: Apr-2016
      • (2014)On-chip sensor networks for soft-error tolerant real-time multiprocessor systems-on-chipACM Journal on Emerging Technologies in Computing Systems10.1145/256492810:2(1-20)Online publication date: 6-Mar-2014
      • (2014)Runtime fault recovery protocol for NoC-based MPSoCsFifteenth International Symposium on Quality Electronic Design10.1109/ISQED.2014.6783316(132-139)Online publication date: Mar-2014
      • (2013)Framework for simulation of heterogeneous MpSoC for design space explorationVLSI Design10.1155/2013/9361812013(11-11)Online publication date: 1-Jan-2013
      • (2012)An efficient soft error protection scheme for MPSoC and FPGA-based verificationAnti-counterfeiting, Security, and Identification10.1109/ICASID.2012.6325306(1-5)Online publication date: Aug-2012
      • (2011)A Hardware-Software Collaborated Method for Soft-Error Tolerant MPSoCProceedings of the 2011 IEEE Computer Society Annual Symposium on VLSI10.1109/ISVLSI.2011.48(260-265)Online publication date: 4-Jul-2011
      • (2011)Matrix control-flow algorithm-based fault toleranceProceedings of the 2011 IEEE 17th International On-Line Testing Symposium10.1109/IOLTS.2011.5993808(37-42)Online publication date: 13-Jul-2011
      • (2010)Compiler directed network-on-chip reliability enhancement for chip multiprocessorsACM SIGPLAN Notices10.1145/1755951.175590245:4(85-94)Online publication date: 13-Apr-2010
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media