Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1870926.1871179acmconferencesArticle/Chapter ViewAbstractPublication PagesdateConference Proceedingsconference-collections
research-article

Cross-layer resilience challenges: metrics and optimization

Published: 08 March 2010 Publication History

Abstract

With increasing sources of disturbances in the underlying hardware, a key challenge in design of robust systems is to meet user expectations at required cost. Cross-layer resilience techniques, implemented across multiple layers of the system stack and designed to work together, can help system designers build effective robust systems at the desired cost point. This paper brings to the forefront two major cross-layer resilience challenges:
1. Quantification and validation of the effectiveness of a cross-layer resilience approach to robust system design in overcoming hardware reliability challenges.
2. Global optimization of a robust system design using cross-layer resilience techniques.

References

[1]
{Agarwal 07} Agarwal, M., et al., "Circuit Failure Prediction and Its Application to Transistor Aging," Proc. IEEE VLSI Test Symp., pp. 277--286, 2007.
[2]
{Ando 03} Ando, H., et al., "A 1.3-GHz Fifth-Generation SPARC64 Microprocessor", IEEE Journal Solid-State Circuits, Vol. 38, Issue 11, pp. 1896--1905, Nov. 2003.
[3]
{Bowman 09} Bowman, K., et al., "Energy-Efficient and Metastability-Immune Resilient Circuits for Dynamic Variation Tolerance," IEEE Journal Solid-State Circuits, Vol. 44, Issue 1, pp. 49--63, Jan. 2009.
[4]
{Bernick 05} Bernick, D., et al., "Non-Stop Advanced Architecture," Proc. IEEE Intl. Conf. Dependable Systems and Networks, pp. 12--21, 2005.
[5]
{Borkar 05} Borkar, S. Y., "Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation," IEEE Micro, pp. 10--16, Nov.-Dec. 2005.
[6]
{Calin 96} Calin, T., M. Nicolaidis, and R. Velaco, "Upset Hardened Memory Design for Submicron CMOS Technology," IEEE Trans. Nucl. Sci., Vol. 43, No. 12, pp. 2874--2878, Dec. 1996.
[7]
{Carter 10} Carter, N. P., et al., "Design Techniques for Cross-Layer Resilience," Proc. Design Automation and Test in Europe, 2010.
[8]
{DeHon 10} DeHon, A., et al., " Vision for Cross-Layer Optimization to Address the Dual Challenges of Energy and Reliability," Proc. Design Automation and Test in Europe, 2010.
[9]
{Ernst 03} Ernst, D., et al., "Razor: A Low-Power Pipeline based on Circuit-level Timing Speculation," Proc. IEEE Intl. Symp. Microarchitecture, pp. 7--18, 2003.
[10]
{Franco 94} Franco, P., and E. J. McCluskey, "On-line Delay Testing of Digital Circuits," Proc. IEEE VLSI Test Symp., pp. 167--173, 1994.
[11]
{Goswami 97} Goswami, K. K., R. K. Iyer L. Young, "DEPEND: A Simulation-Based Environment for System Level Dependability Analysis," IEEE Trans. Computers, Vol. 46, pp. 60--74, Jan. 1997.
[12]
{Huang 84} Huang, K. H., and J. A. Abraham, "Algorithm Based Fault Tolerance for Matrix Operations," IEEE Trans. Computers, Vol. C-33, No. 6, pp. 518--528, June 1984.
[13]
{Inoue 08} Inoue, H., Y. Li and S. Mitra, "VAST: Virtualization-Assisted Concurrent Autonomous Self-Test," Proc. IEEE Intl. Test Conf., 2008.
[14]
{Kogge 08} Kogge, P., et al., "Exascale Computing Study: Technology Challenges in Achieving Exascale Systems," 2008.
[15]
{Lee 10} Lee, H., et al., "LEAP: Layout Design through Error-Aware Placement for Soft-Error Resilient Sequential Cell Design," Proc. IEEE Intl. Reliability Physics Symp., 2010.
[16]
{Leem 10} Leem, L., et al., "ERSA: Error Resilient System Architecture for Probabilistic Applications," Proc. Design Automation and Test in Europe, 2010.
[17]
{Li 08} Li, Y., S. Makar and S. Mitra, "CASP: Concurrent Autonomous Chip Self-Test using Stored Test Patterns," Proc. Design Automation and Test in Europe, pp. 885--890, 2008.
[18]
{Li 09a} Li, Y., et al., "Overcoming Early-Life Failure and Aging for Robust Systems," IEEE Design and Test of Computers, Nov.-Dec. 2009.
[19]
{Li 09b} Li, Y., O. Mutlu and S. Mitra, "Operating System Scheduling for Efficient Online Self-Test in Robust Systems," Proc. IEEE/ACM Intl. Conf. Computer-Aided Design, 2009.
[20]
{Li 10} Li, Y., D. S. Gardner and S. Mitra, "Concurrent Autonomous Self-Test for Uncore Components in SoCs," Proc. IEEE VLSI Test Symp., 2010.
[21]
{McCluskey 90} McCluskey, E. J., "Design Techniques for Testable Embedded Error Checkers," IEEE Computer, Vol. 23, No. 7, pp. 84--88, July 1990.
[22]
{Meaney 05} Meaney, P., et al., "IBM Z990 Soft Error Detection and Recovery," IEEE Trans. Device and Materials Reliability, Vol. 5, Issue 3, pp. 419--427, Sept. 2005.
[23]
{Mintarno 10} Mintarno, E., "Optimized Self-Tuning for Circuit Aging," Proc. Design Automation and Test in Europe, 2010.
[24]
{Mitra 00} Mitra, S., and E. J. McCluskey, "Which Concurrent Error Detection Schemes to Choose?" Proc. IEEE Intl. Test Conf., pp. 985--994, 2000.
[25]
{Mitra 05} Mitra, S., et al., "Robust System Design with Built-In Soft Error Resilience," IEEE Computer, Vol. 38, pp. 43--52, Feb. 2005.
[26]
{Mitra 08} Mitra, S., "Globally Optimized Robust Systems to Overcome Scaled CMOS Reliability Challenges," Proc. Design Automation and Test in Europe, 2008.
[27]
{Muller 10} Muller, K. P., and P. N. Sanda, "Soft Error Assessments for Servers," Proc. Intl. Reliability Physics Symp., 2010.
[28]
{Nassif 10} Nassif, S. R., et al., "A Resilience Roadmap," Proc. Design Automation and Test in Europe, 2010.
[29]
{Nepal 08} Nepal, K., et al., "Using Implications for Online Error Detection," Proc. IEEE Intl. Test Conf., 2008.
[30]
{Oh 02a} Oh, N., P. P. Shirvani and E. J. McCluskey, "Error Detection by Duplicated Instructions in Super-Scalar Processors," IEEE Trans. Reliability, Vol. 51, Issue 1, pp. 63--75, March 2002.
[31]
{Oh 02b} Oh, N., S. Mitra and E. J. McCluskey, "ED4I: Error Detection by Diverse Data and Duplicated Instructions" IEEE Trans. Computers, Vol. 51, No. 2, pp. 180--199, Feb. 2002.
[32]
{Pattabiraman 07} Pattabiraman, K., et. al., "Automated Derivation of Application-Aware Error Detectors using Static Analysis," Proc. IEEE Intl. Symp. On-line Testing, pp. 211--16, 2007.
[33]
{Sanda 08} Sanda, P. N., et al., "Soft Error Resilience of the IBM POWER6 Processor," IBM Journal Research and Development, Vol 52, Number 3, 2008.
[34]
{Spainhower 99} Spainhower, L., and T. A. Gregg, "S/390 Parallel Enterprise Server G5 Fault Tolerance," IBM Journal Res. and Dev., Vol. 43, pp. 863--873, Sept./Nov., 1999.
[35]
{Seshia 07} Seshia, S., W. Li and S. Mitra, "Verification Guided Soft Error Resilience," Proc. Design Automation and Test in Europe, pp. 1442--1447, 2007.
[36]
{Siewiorek 98} Siewiorek, D. P., and R. S Swarz, Reliable Computer Systems: Design and Evaluation, 1998.
[37]
{Singhee 07} Singhee, A., and R. A. Rutenbar, "Statistical Blockade: A Novel Method for Very Fast Monte Carlo Simulation of Rare Circuit Events, and its Application," Proc. Design Automation and Test in Europe, pp. 1379--1384, 2007.
[38]
{Zhang 06} Zhang, M., et al., "Sequential Element Design with Built-In Soft Error Resilience," IEEE Trans. VLSI, Vol. 14, Issue 12, pp. 1368--1378, Dec. 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DATE '10: Proceedings of the Conference on Design, Automation and Test in Europe
March 2010
1868 pages
ISBN:9783981080162

Sponsors

  • EDAA: European Design Automation Association
  • ECSI
  • EDAC: Electronic Design Automation Consortium
  • SIGDA: ACM Special Interest Group on Design Automation
  • The IEEE Computer Society TTTC
  • The IEEE Computer Society DATC
  • The Russian Academy of Sciences: The Russian Academy of Sciences

Publisher

European Design and Automation Association

Leuven, Belgium

Publication History

Published: 08 March 2010

Check for updates

Qualifiers

  • Research-article

Conference

DATE '10
Sponsor:
  • EDAA
  • EDAC
  • SIGDA
  • The Russian Academy of Sciences
DATE '10: Design, Automation and Test in Europe
March 8 - 12, 2010
Germany, Dresden

Acceptance Rates

Overall Acceptance Rate 518 of 1,794 submissions, 29%

Upcoming Conference

DATE '25
Design, Automation and Test in Europe
March 31 - April 2, 2025
Lyon , France

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Implications of accelerated self-healing as a key design knob for cross-layer resilienceIntegration, the VLSI Journal10.1016/j.vlsi.2016.10.00856:C(167-180)Online publication date: 1-Jan-2017
  • (2016)Toward Smart Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/287293615:2(1-27)Online publication date: 17-Feb-2016
  • (2015)Understanding soft errors in uncore componentsProceedings of the 52nd Annual Design Automation Conference10.1145/2744769.2744923(1-6)Online publication date: 7-Jun-2015
  • (2014)Connecting different worldsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616982(1-8)Online publication date: 24-Mar-2014
  • (2013)Improving the fault resilience of an H.264 decoder using static analysis methodsACM Transactions on Embedded Computing Systems10.1145/2536747.253675313:1s(1-27)Online publication date: 6-Dec-2013
  • (2013)Quantitative evaluation of soft error injection techniques for robust system designProceedings of the 50th Annual Design Automation Conference10.1145/2463209.2488859(1-10)Online publication date: 29-May-2013
  • (2011)Design and architectures for dependable embedded systemsProceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis10.1145/2039370.2039384(69-78)Online publication date: 9-Oct-2011
  • (2010)Cross-layer error resilience for robust systemsProceedings of the International Conference on Computer-Aided Design10.5555/2133429.2133465(177-180)Online publication date: 7-Nov-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media