Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Verifying quantitative reliability for programs that execute on unreliable hardware

Published: 29 October 2013 Publication History

Abstract

Emerging high-performance architectures are anticipated to contain unreliable components that may exhibit soft errors, which silently corrupt the results of computations. Full detection and masking of soft errors is challenging, expensive, and, for some applications, unnecessary. For example, approximate computing applications (such as multimedia processing, machine learning, and big data analytics) can often naturally tolerate soft errors.
We present Rely a programming language that enables developers to reason about the quantitative reliability of an application -- namely, the probability that it produces the correct result when executed on unreliable hardware. Rely allows developers to specify the reliability requirements for each value that a function produces.
We present a static quantitative reliability analysis that verifies quantitative requirements on the reliability of an application, enabling a developer to perform sound and verified reliability engineering. The analysis takes a Rely program with a reliability specification and a hardware specification that characterizes the reliability of the underlying hardware components and verifies that the program satisfies its reliability specification when executed on the underlying unreliable hardware platform. We demonstrate the application of quantitative reliability analysis on six computations implemented in Rely.

References

[1]
J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and Amarasinghe. PetaBricks: A language and compiler for algorithmic choice. PLDI, 2009.
[2]
W. Baek and T. M. Chilimbi. Green: A framework for supporting energy-conscious programming using controlled approximation. PLDI, 2010.
[3]
T. Bao, Y. Zheng, and X. Zhang. White box sampling in uncertain data processing enabled by program analysis. In OOPSLA, 2012.
[4]
G. Barthe, D. Demange, and D. Pichardie. A formally verified ssa-based middle-end: Static single assignment meets compcert. ESOP, 2012.
[5]
G. Barthe, B. Grégoire, and S. Zanella Béguelin. Formal certification of code-based cryptographic proofs. POPL, 2009.
[6]
G. Barthe, B. Köpf, F. Olmedo, and S. Zanella Béguelin. Probabilistic reasoning for differential privacy. POPL, 2012.
[7]
M. Blum and S. Kanna. Designing programs that check their work. STOC, 1989.
[8]
M. Blum, M. Luby, and R. Rubinfeld. Self-testing/correcting with applications to numerical problems. Journal of computer and system sciences, 1993.
[9]
F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, and M. Snir. Toward exascale resilience. International Journal of High Performance Computing Applications, 2009.
[10]
M. Carbin, D. Kim, S. Misailovic, and M. Rinard. Proving acceptability properties of relaxed nondeterministic approximate programs. PLDI, 2012.
[11]
M. Carbin, D. Kim, S. Misailovic, and M. Rinard. Verified integrity properties for safe approximate program transformations. PEPM, 2013.
[12]
M. Carbin and M. Rinard. Automatically identifying critical input regions and code in applications. ISSTA, 2010.
[13]
L. Chakrapani, B. Akgul, S. Cheemalavagu, P. Korkmaz, K. Palem, and B. Seshasayee. Ultra-efficient (embedded) soc architectures based on probabilistic cmos (pcmos) technology. DATE, 2006.
[14]
S. Chaudhuri, S. Gulwani, R. Lublinerman, and S. Navidpour. Proving programs robust. FSE, 2011.
[15]
P. Cousot and M. Monerau. Probabilistic abstract interpretation. ESOP, 2012.
[16]
M. de Kruijf, S. Nomura, and K. Sankaralingam. Relax: an architectural framework for software recovery of hardware faults. ISCA, 2010.
[17]
A. Di Pierro and H. Wiklicky. Concurrent constraint programming: Towards probabilistic abstract interpretation. PPDP'00.
[18]
E. W. Dijkstra. Guarded commands, nondeterminacy and formal derivation of programs. CACM, 18(8), August 1975.
[19]
D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. Razor: A low-power pipeline based on circuit-level timing speculation. MICRO, 2003.
[20]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Architecture support for disciplined approximate programming. ASPLOS, 2012.
[21]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. MICRO, 2012.
[22]
S. Feng, S. Gupta, A. Ansari, and S. Mahlke. Shoestring: probabilistic soft error reliability on the cheap. ASPLOS'10.
[23]
A. Filieri, C. P\uas\uareanu, and W. Visser. Reliability analysis in symbolic pathfinder. In ICSE, 2013.
[24]
M. Hiller, A. Jhumka, and N. Suri. On the placement of software mechanisms for detection of data errors. DSN, 2002.
[25]
H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic knobs for responsive power-aware computing. ASPLOS, 2011.
[26]
M. Kling, S. Misailovic, M. Carbin, and M. Rinard. Bolt: on-demand infinite loop escape in unmodified binaries. OOPSLA, 2012.
[27]
K. Knobe and V. Sarkar. Array ssa form and its use in parallelization. POPL, 1998.
[28]
D. Kozen. Semantics of probabilistic programs. Journal of Computer and System Sciences, 1981.
[29]
K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian. Mitigating soft error failures for multimedia applications by selective data protection. CASES, 2006.
[30]
L. Leem, H. Cho, J. Bau, Q. Jacobson, and S. Mitra. Ersa: error resilient system architecture for probabilistic applications. DATE, 2010.
[31]
N. Leveson, S. Cha, J. C. Knight, and T. Shimeall. The use of self checks and voting in software error detection: An empirical study. IEEE TSE, 1990.
[32]
N. Leveson and P. Harvey. Software fault tree analysis. Journal of Systems and Software, 3(2), 1983.
[33]
X. Li and D. Yeung. Application-level correctness and its impact on fault tolerance. HPCA, 2007.
[34]
S. Liu, K. Pattabiraman, T. Moscibroda, and B. Zorn. Flikker: saving dram refresh-power through critical data partitioning. ASPLOS, 2011.
[35]
M. Rinard M. Carbin, S. Misailovic. Verifying quantitative reliability of programs that execute on unreliable hardware (appendix). http://groups.csail.mit.edu/pac/rely.
[36]
M. Rinard M. Carbin, S. Misailovic. Verifying quantitative reliability of programs that execute on unreliable hardware. Technical Report MIT-CSAIL-TR-2013-014, MIT, 2013.
[37]
Xiph.org Video Test Media. http://media.xiph.org/video/derf.
[38]
J. Meng, A. Raghunathan, S. Chakradhar, and S. Byna. Exploiting the forgiving nature of applications for scalable parallel execution. In IPDPS, 2010.
[39]
S. Misailovic, D. Kim, and M. Rinard. Parallelizing sequential programs with statistical accuracy tests. ACM TECS Special Issue on Probabilistic Embedded Computing, 2013.
[40]
S. Misailovic, D. Roy, and M. Rinard. Probabilistically accurate program transformations. SAS, 2011.
[41]
S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. Quality of service profiling. ICSE, 2010.
[42]
D. Monniaux. Abstract interpretation of probabilistic semantics. SAS, 2000.
[43]
C. Morgan, A. McIver, and K. Seidel. Probabilistic predicate transformers. TOPLAS, 1996.
[44]
D. Murta and J. N. Oliveira. Calculating fault propagation in functional programs. Technical report, Univ. Minho, 2013.
[45]
S. Narayanan, J. Sartori, R. Kumar, and D. Jones. Scalable stochastic processors. DATE, 2010.
[46]
J. Nelson, A. Sampson, and L. Ceze. Dense approximate storage in phase-change memory. ASPLOS Ideas & Perspectives, 2011.
[47]
K. Pattabiraman, V. Grover, and B. Zorn. Samurai: protecting critical data in unsafe languages. EuroSys, 2008.
[48]
J. Perkins, S. Kim, S. Larsen, S. Amarasinghe, J. Bachrach, M. Carbin, C. Pacheco, F. Sherwood, S. Sidiroglou, G. Sullivan, W. Wong, Y. Zibin, M. Ernst, and M. Rinard. Automatically patching errors in deployed software. SOSP, 2009.
[49]
F. Perry, L. Mackey, G.A. Reis, J. Ligatti, D.I. August, and D. Walker. Fault-tolerant typed assembly language. PLDI'07.
[50]
F. Perry and D. Walker. Reasoning about control flow in the presence of transient faults. SAS, 2008.
[51]
P. Prata and J. Silva. Algorithm based fault tolerance versus result-checking for matrix computations. FTCS, 1999.
[52]
J. Reed and B. Pierce. Distance makes the types grow stronger: a calculus for differential privacy. ICFP, 2010.
[53]
G. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. August. Swift: Software implemented fault tolerance. CGO'05.
[54]
M. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. ICS, 2006.
[55]
M. Rinard. Using early phase termination to eliminate load imbalances at barrier synchronization points. OOPSLA, 2007.
[56]
M. Rinard, C. Cadar, D. Dumitran, D.M. Roy, T. Leu, and W.S. Beebee Jr. Enhancing server availability and security through failure-oblivious computing. OSDI, 2004.
[57]
M. Rinard, C. Cadar, and H. Nguyen. Exploring the acceptability envelope. OOPSLA, 2005.
[58]
M. Rinard, H. Hoffmann, S. Misailovic, and S. Sidiroglou. Patterns and statistical analysis for understanding reduced resource computing. 2010.
[59]
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. Enerj: Approximate data types for safe and general low-power computation. PLDI, 2011.
[60]
S. Sankaranarayanan, A. Chakarov, and S. Gulwani. Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In PLDI, 2013.
[61]
C. Schlesinger, K. Pattabiraman, N. Swamy, D. Walker, and B. Zorn. Yarra: An extension to c for data integrity and partial safety. CSF, 2011.
[62]
P. Shivakumar, M. Kistler, S.W. Keckler, D. Burger, and L. Alvisi. Modeling the effect of technology trends on the soft error rate of combinational logic. DSN, 2002.
[63]
S. Sidiroglou, S. Misailovic, H. Hoffmann, and M. Rinard. Managing performance vs.\ accuracy trade-offs with loop perforation. FSE, 2011.
[64]
M. Smith. Probabilistic abstract interpretation of imperative programs using truncated normal distributions. Electronic\,Notes\,in\,Theoretical\,Computer\,Science, 2008.
[65]
W. N. Sumner, T. Bao, X. Zhang, and S. Prabhakar. Coalescing executions for fast uncertainty analysis. In ICSE, 2011.
[66]
A. Thomas and K. Pattabiraman. Error detector placement for soft computation. DSN, 2013.
[67]
x264. http://www.videolan.org/x264.html.
[68]
Z. Zhu, S. Misailovic, J. Kelner, and M. Rinard. Randomized accuracy-aware program transformations for efficient approximate computations. POPL, 2012.

Cited By

View all
  • (2024)System-Scenario Methodology to Design a Highly Reliable Radiation-Hardened Memory for Space ApplicationsComputer Memory and Data Storage10.5772/intechopen.113327Online publication date: 10-Jan-2024
  • (2023)Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of ProgramsProceedings of the ACM on Programming Languages10.1145/36228567:OOPSLA2(1648-1676)Online publication date: 16-Oct-2023
  • (2023)Automating NISQ Application Design with Meta Quantum Circuits with Constraints (MQCC)ACM Transactions on Quantum Computing10.1145/35793694:3(1-29)Online publication date: 8-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 48, Issue 10
OOPSLA '13
October 2013
867 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2544173
Issue’s Table of Contents
  • cover image ACM Conferences
    OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
    October 2013
    904 pages
    ISBN:9781450323741
    DOI:10.1145/2509136
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2013
Published in SIGPLAN Volume 48, Issue 10

Check for updates

Author Tag

  1. approximate computing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)25
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)System-Scenario Methodology to Design a Highly Reliable Radiation-Hardened Memory for Space ApplicationsComputer Memory and Data Storage10.5772/intechopen.113327Online publication date: 10-Jan-2024
  • (2023)Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of ProgramsProceedings of the ACM on Programming Languages10.1145/36228567:OOPSLA2(1648-1676)Online publication date: 16-Oct-2023
  • (2023)Automating NISQ Application Design with Meta Quantum Circuits with Constraints (MQCC)ACM Transactions on Quantum Computing10.1145/35793694:3(1-29)Online publication date: 8-Apr-2023
  • (2022)DEVoT: Dynamic Delay Modeling of Functional Units Under Voltage and Temperature VariationsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.307697041:4(827-839)Online publication date: Apr-2022
  • (2021)HPACProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476216(1-14)Online publication date: 14-Nov-2021
  • (2021)Fluid: a framework for approximate concurrency via controlled dependency relaxationProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454042(252-267)Online publication date: 19-Jun-2021
  • (2021)Fast and Accurate SER Estimation for Large Combinational Blocks in Early Stages of the DesignIEEE Transactions on Sustainable Computing10.1109/TSUSC.2018.28866406:3(427-440)Online publication date: 1-Jul-2021
  • (2020)A Survey of Testing Techniques for Approximate Integrated CircuitsProceedings of the IEEE10.1109/JPROC.2020.2999613108:12(2178-2194)Online publication date: Dec-2020
  • (2020)AxRAM: A lightweight implicit interface for approximate data accessFuture Generation Computer Systems10.1016/j.future.2020.07.029113(556-570)Online publication date: Dec-2020
  • (2020)ADeLeFuture Generation Computer Systems10.1016/j.future.2019.07.073102:C(245-258)Online publication date: 1-Jan-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media