Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1870926.1871179acmconferencesArticle/Chapter ViewAbstractPublication PagesdateConference Proceedingsconference-collections
research-article

Cross-layer resilience challenges: metrics and optimization

Published: 08 March 2010 Publication History
  • Get Citation Alerts
  • Abstract

    With increasing sources of disturbances in the underlying hardware, a key challenge in design of robust systems is to meet user expectations at required cost. Cross-layer resilience techniques, implemented across multiple layers of the system stack and designed to work together, can help system designers build effective robust systems at the desired cost point. This paper brings to the forefront two major cross-layer resilience challenges:
    1. Quantification and validation of the effectiveness of a cross-layer resilience approach to robust system design in overcoming hardware reliability challenges.
    2. Global optimization of a robust system design using cross-layer resilience techniques.

    References

    [1]
    {Agarwal 07} Agarwal, M., et al., "Circuit Failure Prediction and Its Application to Transistor Aging," Proc. IEEE VLSI Test Symp., pp. 277--286, 2007.
    [2]
    {Ando 03} Ando, H., et al., "A 1.3-GHz Fifth-Generation SPARC64 Microprocessor", IEEE Journal Solid-State Circuits, Vol. 38, Issue 11, pp. 1896--1905, Nov. 2003.
    [3]
    {Bowman 09} Bowman, K., et al., "Energy-Efficient and Metastability-Immune Resilient Circuits for Dynamic Variation Tolerance," IEEE Journal Solid-State Circuits, Vol. 44, Issue 1, pp. 49--63, Jan. 2009.
    [4]
    {Bernick 05} Bernick, D., et al., "Non-Stop Advanced Architecture," Proc. IEEE Intl. Conf. Dependable Systems and Networks, pp. 12--21, 2005.
    [5]
    {Borkar 05} Borkar, S. Y., "Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation," IEEE Micro, pp. 10--16, Nov.-Dec. 2005.
    [6]
    {Calin 96} Calin, T., M. Nicolaidis, and R. Velaco, "Upset Hardened Memory Design for Submicron CMOS Technology," IEEE Trans. Nucl. Sci., Vol. 43, No. 12, pp. 2874--2878, Dec. 1996.
    [7]
    {Carter 10} Carter, N. P., et al., "Design Techniques for Cross-Layer Resilience," Proc. Design Automation and Test in Europe, 2010.
    [8]
    {DeHon 10} DeHon, A., et al., " Vision for Cross-Layer Optimization to Address the Dual Challenges of Energy and Reliability," Proc. Design Automation and Test in Europe, 2010.
    [9]
    {Ernst 03} Ernst, D., et al., "Razor: A Low-Power Pipeline based on Circuit-level Timing Speculation," Proc. IEEE Intl. Symp. Microarchitecture, pp. 7--18, 2003.
    [10]
    {Franco 94} Franco, P., and E. J. McCluskey, "On-line Delay Testing of Digital Circuits," Proc. IEEE VLSI Test Symp., pp. 167--173, 1994.
    [11]
    {Goswami 97} Goswami, K. K., R. K. Iyer L. Young, "DEPEND: A Simulation-Based Environment for System Level Dependability Analysis," IEEE Trans. Computers, Vol. 46, pp. 60--74, Jan. 1997.
    [12]
    {Huang 84} Huang, K. H., and J. A. Abraham, "Algorithm Based Fault Tolerance for Matrix Operations," IEEE Trans. Computers, Vol. C-33, No. 6, pp. 518--528, June 1984.
    [13]
    {Inoue 08} Inoue, H., Y. Li and S. Mitra, "VAST: Virtualization-Assisted Concurrent Autonomous Self-Test," Proc. IEEE Intl. Test Conf., 2008.
    [14]
    {Kogge 08} Kogge, P., et al., "Exascale Computing Study: Technology Challenges in Achieving Exascale Systems," 2008.
    [15]
    {Lee 10} Lee, H., et al., "LEAP: Layout Design through Error-Aware Placement for Soft-Error Resilient Sequential Cell Design," Proc. IEEE Intl. Reliability Physics Symp., 2010.
    [16]
    {Leem 10} Leem, L., et al., "ERSA: Error Resilient System Architecture for Probabilistic Applications," Proc. Design Automation and Test in Europe, 2010.
    [17]
    {Li 08} Li, Y., S. Makar and S. Mitra, "CASP: Concurrent Autonomous Chip Self-Test using Stored Test Patterns," Proc. Design Automation and Test in Europe, pp. 885--890, 2008.
    [18]
    {Li 09a} Li, Y., et al., "Overcoming Early-Life Failure and Aging for Robust Systems," IEEE Design and Test of Computers, Nov.-Dec. 2009.
    [19]
    {Li 09b} Li, Y., O. Mutlu and S. Mitra, "Operating System Scheduling for Efficient Online Self-Test in Robust Systems," Proc. IEEE/ACM Intl. Conf. Computer-Aided Design, 2009.
    [20]
    {Li 10} Li, Y., D. S. Gardner and S. Mitra, "Concurrent Autonomous Self-Test for Uncore Components in SoCs," Proc. IEEE VLSI Test Symp., 2010.
    [21]
    {McCluskey 90} McCluskey, E. J., "Design Techniques for Testable Embedded Error Checkers," IEEE Computer, Vol. 23, No. 7, pp. 84--88, July 1990.
    [22]
    {Meaney 05} Meaney, P., et al., "IBM Z990 Soft Error Detection and Recovery," IEEE Trans. Device and Materials Reliability, Vol. 5, Issue 3, pp. 419--427, Sept. 2005.
    [23]
    {Mintarno 10} Mintarno, E., "Optimized Self-Tuning for Circuit Aging," Proc. Design Automation and Test in Europe, 2010.
    [24]
    {Mitra 00} Mitra, S., and E. J. McCluskey, "Which Concurrent Error Detection Schemes to Choose?" Proc. IEEE Intl. Test Conf., pp. 985--994, 2000.
    [25]
    {Mitra 05} Mitra, S., et al., "Robust System Design with Built-In Soft Error Resilience," IEEE Computer, Vol. 38, pp. 43--52, Feb. 2005.
    [26]
    {Mitra 08} Mitra, S., "Globally Optimized Robust Systems to Overcome Scaled CMOS Reliability Challenges," Proc. Design Automation and Test in Europe, 2008.
    [27]
    {Muller 10} Muller, K. P., and P. N. Sanda, "Soft Error Assessments for Servers," Proc. Intl. Reliability Physics Symp., 2010.
    [28]
    {Nassif 10} Nassif, S. R., et al., "A Resilience Roadmap," Proc. Design Automation and Test in Europe, 2010.
    [29]
    {Nepal 08} Nepal, K., et al., "Using Implications for Online Error Detection," Proc. IEEE Intl. Test Conf., 2008.
    [30]
    {Oh 02a} Oh, N., P. P. Shirvani and E. J. McCluskey, "Error Detection by Duplicated Instructions in Super-Scalar Processors," IEEE Trans. Reliability, Vol. 51, Issue 1, pp. 63--75, March 2002.
    [31]
    {Oh 02b} Oh, N., S. Mitra and E. J. McCluskey, "ED4I: Error Detection by Diverse Data and Duplicated Instructions" IEEE Trans. Computers, Vol. 51, No. 2, pp. 180--199, Feb. 2002.
    [32]
    {Pattabiraman 07} Pattabiraman, K., et. al., "Automated Derivation of Application-Aware Error Detectors using Static Analysis," Proc. IEEE Intl. Symp. On-line Testing, pp. 211--16, 2007.
    [33]
    {Sanda 08} Sanda, P. N., et al., "Soft Error Resilience of the IBM POWER6 Processor," IBM Journal Research and Development, Vol 52, Number 3, 2008.
    [34]
    {Spainhower 99} Spainhower, L., and T. A. Gregg, "S/390 Parallel Enterprise Server G5 Fault Tolerance," IBM Journal Res. and Dev., Vol. 43, pp. 863--873, Sept./Nov., 1999.
    [35]
    {Seshia 07} Seshia, S., W. Li and S. Mitra, "Verification Guided Soft Error Resilience," Proc. Design Automation and Test in Europe, pp. 1442--1447, 2007.
    [36]
    {Siewiorek 98} Siewiorek, D. P., and R. S Swarz, Reliable Computer Systems: Design and Evaluation, 1998.
    [37]
    {Singhee 07} Singhee, A., and R. A. Rutenbar, "Statistical Blockade: A Novel Method for Very Fast Monte Carlo Simulation of Rare Circuit Events, and its Application," Proc. Design Automation and Test in Europe, pp. 1379--1384, 2007.
    [38]
    {Zhang 06} Zhang, M., et al., "Sequential Element Design with Built-In Soft Error Resilience," IEEE Trans. VLSI, Vol. 14, Issue 12, pp. 1368--1378, Dec. 2006.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DATE '10: Proceedings of the Conference on Design, Automation and Test in Europe
    March 2010
    1868 pages
    ISBN:9783981080162

    Sponsors

    • EDAA: European Design Automation Association
    • ECSI
    • EDAC: Electronic Design Automation Consortium
    • SIGDA: ACM Special Interest Group on Design Automation
    • The IEEE Computer Society TTTC
    • The IEEE Computer Society DATC
    • The Russian Academy of Sciences: The Russian Academy of Sciences

    Publisher

    European Design and Automation Association

    Leuven, Belgium

    Publication History

    Published: 08 March 2010

    Check for updates

    Qualifiers

    • Research-article

    Conference

    DATE '10
    Sponsor:
    • EDAA
    • EDAC
    • SIGDA
    • The Russian Academy of Sciences
    DATE '10: Design, Automation and Test in Europe
    March 8 - 12, 2010
    Germany, Dresden

    Acceptance Rates

    Overall Acceptance Rate 518 of 1,794 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Implications of accelerated self-healing as a key design knob for cross-layer resilienceIntegration, the VLSI Journal10.1016/j.vlsi.2016.10.00856:C(167-180)Online publication date: 1-Jan-2017
    • (2016)Toward Smart Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/287293615:2(1-27)Online publication date: 17-Feb-2016
    • (2015)Understanding soft errors in uncore componentsProceedings of the 52nd Annual Design Automation Conference10.1145/2744769.2744923(1-6)Online publication date: 7-Jun-2015
    • (2014)Connecting different worldsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616982(1-8)Online publication date: 24-Mar-2014
    • (2013)Improving the fault resilience of an H.264 decoder using static analysis methodsACM Transactions on Embedded Computing Systems10.1145/2536747.253675313:1s(1-27)Online publication date: 6-Dec-2013
    • (2013)Quantitative evaluation of soft error injection techniques for robust system designProceedings of the 50th Annual Design Automation Conference10.1145/2463209.2488859(1-10)Online publication date: 29-May-2013
    • (2011)Design and architectures for dependable embedded systemsProceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis10.1145/2039370.2039384(69-78)Online publication date: 9-Oct-2011
    • (2010)Cross-layer error resilience for robust systemsProceedings of the International Conference on Computer-Aided Design10.5555/2133429.2133465(177-180)Online publication date: 7-Nov-2010

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media