Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2380445.2380534acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
tutorial

Soft errors: the hardware-software interface

Published: 07 October 2012 Publication History

Abstract

A recent report from the ITRS identifies soft errors, as one of the most important reliability challenges for the coming decades. Soft errors are transient errors caused by several effects e.g., voltage fluctuations, wire-cross talks, and cosmic particle strikes; and manifest as a temporary switch of the logic value of a transistor. While it is not possible to prove nor disprove that a certain error happened due to soft errors, several fiscal disasters e.g., Sun server crashes in 2000, and HP server crashes in 2005, have been attributed to soft errors. Industry has moved from the position of ignoring soft errors to adding design efforts for protection from them. For instance, in the recently announced nVIDIA's Fermi GPUs, the L1 cache, L2 cache and register files are ECC protected. Although the soft error rate is about once-per year today, it is expected to reach alarming levels of once-per-day in about a decade or two. Researchers are busy finding cost-effective solutions to protect computing devices from soft errors.
This tutorial will attempt to cover the entire gamut of soft error protection techniques, but will particularly focus on the soft error mitigation techniques at the hardware/software interface. Much time will be spent on microarchitectural, compiler, and hybrid compiler-microarchitectural techniques for soft error mitigation. This tutorial will be particularly useful for budding researchers who are fascinated by soft errors, and want to explore this as their research direction. For such researchers, this tutorial will be a one-stop-shop to acquire knowledge of and analyze seminal research work in the field of soft error mitigation, at several design layers. For developers who have been working on soft errors at different levels, this will give them a picture of what can be done at other levels, so that they can provide complementary cross-layer protection. Finally, researchers and developers working on other aspects of system design can learn how soft errors are going to affect them.

Cited By

View all
  • (2013)A Reliable, Safe, and Secure Run-Time Platform for Cyber Physical SystemsProceedings of the 2013 IEEE 6th International Conference on Service-Oriented Computing and Applications10.1109/SOCA.2013.65(268-274)Online publication date: 16-Dec-2013

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CODES+ISSS '12: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
October 2012
596 pages
ISBN:9781450314268
DOI:10.1145/2380445
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2012

Check for updates

Author Tags

  1. compiler
  2. microarchitectural techniques
  3. soft error mitigation

Qualifiers

  • Tutorial

Conference

ESWEEK'12
ESWEEK'12: Eighth Embedded System Week
October 7 - 12, 2012
Tampere, Finland

Acceptance Rates

CODES+ISSS '12 Paper Acceptance Rate 48 of 163 submissions, 29%;
Overall Acceptance Rate 280 of 864 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2013)A Reliable, Safe, and Secure Run-Time Platform for Cyber Physical SystemsProceedings of the 2013 IEEE 6th International Conference on Service-Oriented Computing and Applications10.1109/SOCA.2013.65(268-274)Online publication date: 16-Dec-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media