Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Error Detector Placement for Soft Computing Applications

Published: 13 January 2016 Publication History

Abstract

The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. At the same time, emerging workloads in the form of soft computing applications (e.g., multimedia applications) can tolerate most hardware errors as long as the erroneous outputs do not deviate significantly from error-free outcomes. We term outcomes that deviate significantly from the error-free outcomes as Egregious Data Corruptions (EDCs).
In this study, we propose a technique to place detectors for selectively detecting EDC-causing errors in an application. We performed an initial study to formulate heuristics that identify EDC-causing data. Based on these heuristics, we developed an algorithm that identifies program locations for placing high coverage detectors for EDCs using static analysis. Our technique achieves an average EDC coverage of 82%, under performance overheads of 10%, while detecting 10% of the Non-EDC and benign faults. We also evaluate the error resilience of these applications under the 14 compiler optimizations.

References

[1]
W. Baek and T. M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In PLDI'10.
[2]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In PACT'08. 72--81.
[3]
M. Carbin, S. Misailovic, and M. Rinard. 2013. Rely: Verifying quantitative reliability for programs that execute on unreliable hardware. In OOPSLA'13. 33--52.
[4]
M. Carbin and M. Rinard. 2010. Automatically identifying critical input regions and code in applications. In ISSTA'10. 37--48.
[5]
N. P. Carter, H. Naeimi, and D. S. Gardner. 2010. Design techniques for cross-layer resilience. In DATE'10. 1023--1028.
[6]
J. Cong and K. Gururaj. 2011. Assuring application-level correctness against soft errors. In ICCAD'11. 150--157.
[7]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2001. Introduction to Algorithms.
[8]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. TOPLAS 13, 4 (1991), 451--490.
[9]
M. De Kruijf, S. Nomura, and K. Sankaralingam. 2010. Relax: An architectural framework for software recovery of hardware faults. In ISCA'10. 497--508.
[10]
P. Dubey. 2005. Recognition, mining and synthesis moves computers to the era of tera. Technology@ Intel Magazine (2005), 1--10.
[11]
J. E. Fritts, F. W. Steiling, and J. A. Tucek. 2005. MediaBench II video: Expediting the next generation of video systems research. SPIE - Embedded Processors for Multimedia and Communications II (2005), 79--93.
[12]
S. Hari, S. Adve, and H. Naeimi. 2012. Low-cost program-level detectors for reducing silent data corruptions. In DSN'12. 181--188.
[13]
M. Hiller, A. Jhumka, and N. Suri. 2002. On the placement of software mechanisms for detection of data errors. In DSN'02. 135--144.
[14]
D. Khudia, G. Wright, and S. Mahlke. 2012. Efficient soft error protection for commodity embedded microprocessors using profile information. In LCTES'12.
[15]
C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO'04. 75--86.
[16]
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In MICRO'97. 330--335.
[17]
M. Leeke, S. Arif, A. Jhumka, and S. S. Anand. 2011. A methodology for the generation of efficient error detection mechanisms. In DSN'11. 25--36.
[18]
M. Leeke and A. Jhumka. 2010. Towards understanding the importance of variables in dependable software. In EDCC'10.
[19]
L. Leem, H. Cho, J. Bau, Q. Jacobson, and S. Mitra. 2010. ERSA: Error resilient system architecture for probabilistic applications. In DATE'10. 1560--1565.
[20]
X. Li and D. Yeung. 2007. Application-level correctness and its impact on fault tolerance. In HPCA'07. 181--192.
[21]
S. Liu, K. Pattabiraman, T. Moscibroda, and B. Zorn. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In ASPLOS'11. 213--224.
[22]
S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. 2010. Quality of service profiling. In ICSE'10. 25--34.
[23]
S. Narayanan, J. Sartori, R. Kumar, and D. Jones. 2010. Scalable stochastic processor. In DATE'10. 335--338.
[24]
K. Pattabiraman, Z. Kalbarczyk, and R. K. Iyer. 2005. Application-based metrics for strategic placement of detectors. In PRDC'05. 8.
[25]
S. Rehman, M. Shafique, F. Kriebel, and J. Henkel. 2011. Reliable software for unreliable hardware: Embedded code generation aiming at reliability. In CODES+ISSS'11. 237--246.
[26]
M. Samadi, J. Lee, D. Jamshidi, A. Hormati, and S. Mahlke. 2013. “SAGE”: Self-tuning approximation for graphics engines. In MICRO-46'13. New York, NY.
[27]
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. In PLDI'11. 164--174.
[28]
P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In DSN'02. 389--398.
[29]
D. P. Siewiorek. 1991. Architecture of fault-tolerant computers. Proceedings of IEEE (1991), 79--91.
[30]
V. Sridharan and D. Kaeli. 2009. Eliminating microarchitectural dependency from architectural vulnerability. In HPCA'09. 117--128.
[31]
A. Sundaram, A. Aakel, D. Lockhart, D. Thaker, and D. Franklin. 2008. Efficient fault tolerance in multi-media applications through selective instruction replication. In WREFT'08. 339--346.
[32]
A. Thomas and K. Pattabiraman. 2013a. Error detector placement for soft computing applications. In DSN'13. 12.
[33]
A. Thomas and K. Pattabiraman. 2013b. LLFI: An intermediate code level fault injector for soft computing applications. In SELSE'13.
[34]
L. A. Zadeh. 1997. What is soft computing? Soft Computing 1, 1 (1997), 1--1.
[35]
Y. Zhang, J. Lee, N. Johnson, and D. August. 2010. DAFT: Decoupled acyclic fault tolerance. In PACT'10. 87--98.

Cited By

View all
  • (2023)Reliability Analysis for Programs with Redundancy Computation for Soft ErrorsJournal of Physics: Conference Series10.1088/1742-6596/2522/1/0120222522:1(012022)Online publication date: 1-Jun-2023
  • (2022)Silent Data Corruption Estimation and Mitigation Without Fault InjectionIEEE Canadian Journal of Electrical and Computer Engineering10.1109/ICJECE.2022.318904345:3(318-327)Online publication date: Oct-2023
  • (2022)Software Application of Control and Compensation for EAST Articulated Maintenance Arm2022 China Automation Congress (CAC)10.1109/CAC57257.2022.10054851(6925-6928)Online publication date: 25-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 15, Issue 1
February 2016
530 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2872313
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 13 January 2016
Accepted: 01 June 2015
Revised: 01 April 2015
Received: 01 August 2014
Published in TECS Volume 15, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. EDCs
  2. Hardware fault detection
  3. detector placement
  4. static analysis

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Discovery Grant and an Engage Grant from the Natural Science and Engineering Research Council (NSERC), Canada

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Reliability Analysis for Programs with Redundancy Computation for Soft ErrorsJournal of Physics: Conference Series10.1088/1742-6596/2522/1/0120222522:1(012022)Online publication date: 1-Jun-2023
  • (2022)Silent Data Corruption Estimation and Mitigation Without Fault InjectionIEEE Canadian Journal of Electrical and Computer Engineering10.1109/ICJECE.2022.318904345:3(318-327)Online publication date: Oct-2023
  • (2022)Software Application of Control and Compensation for EAST Articulated Maintenance Arm2022 China Automation Congress (CAC)10.1109/CAC57257.2022.10054851(6925-6928)Online publication date: 25-Nov-2022
  • (2020)Investigating the Inherent Soft Error Resilience of Embedded Applications by Full-System SimulationProceedings of the 25th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC47756.2020.9045132(80-84)Online publication date: 17-Jan-2020
  • (2018)Leto: verifying application-specific hardware fault tolerance with programmable execution modelsProceedings of the ACM on Programming Languages10.1145/32765332:OOPSLA(1-30)Online publication date: 24-Oct-2018

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media