Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2833157.2833160acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

FITL: extending LLVM for the translation of fault-injection directives

Published: 15 November 2015 Publication History

Abstract

The frequency of hardware errors in HPC systems continues to grow as system designs evolve toward exascale. Tolerating these errors efficiently and effectively will require software-based resilience solutions. With this requirement in mind, recent research has increasingly employed LLVM-based tools to simulate transient hardware faults in order to study the resilience characteristics of specific applications. However, such tools require researchers to configure their experiments at the level of the LLVM intermediate representation (LLVM IR) rather than at the source level of the applications under study. In this paper, we present FITL (Fault-Injection Toolkit for LLVM), a set of LLVM extensions to which it is straightforward to translate source-level pragmas that specify fault injection. While we have designed FITL not to be tied to any particular compiler front end or high-level language, we also describe how we have extended our OpenARC compiler to translate a novel set of fault-injection pragmas for C to FITL. Finally, we present several resilience studies we have conducted using FITL, including a comparison with a source-level fault injector we have built as part of OpenARC.

References

[1]
Merriam-Webster. Merriam-Webster, Incorporated. {Online}. Available: http://www.merriam-webster.com/
[2]
S. Ashby et al., "The Opportunities and Challenges of Exascale Computing." U.S. Department of Energy, Office of Science, Advanced Scientific Computing Advisory Committee (ASCAC), 2010.
[3]
A. Geist et al., "Fault Management Workshop Final Report." U.S. Department of Energy, Office of Science, 2012.
[4]
J. Lidman et al., "ROSE::FTTransform --- A Source-to-Source Translation Framework for Exascale Fault-Tolerance Research," in Dependable Systems and Networks Workshops (DSN-W), 2012 IEEE/IFIP 42nd International Conference on. IEEE, 2012, pp. 1--6.
[5]
S. Liu et al., "Flikker: Saving dram refresh-power through critical data partitioning," in Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XVI. New York, NY, USA: ACM, 2011, pp. 213--224.
[6]
M. de Kruijf et al., "Relax: An Architectural Framework for Software Recovery of Hardware Faults," in Proceedings of the 37th Annual International Symposium on Computer Architecture, ser. ISCA '10. New York, NY, USA: ACM, 2010, pp. 497--508.
[7]
S. Narayanan et al., "Scalable stochastic processors," in Proceedings of the Conference on Design, Automation and Test in Europe, ser. DATE '10. 3001 Leuven, Belgium, Belgium: European Design and Automation Association, 2010, pp. 335--338.
[8]
A. Sampson et al., "EnerJ: Approximate Data Types for Safe and General Low-power Computation," in Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '11. New York, NY, USA: ACM, 2011, pp. 164--174.
[9]
J. Cong and K. Gururaj, "Assuring application-level correctness against soft errors," in Proceedings of the International Conference on Computer-Aided Design, ser. ICCAD '11. Piscataway, NJ, USA: IEEE Press, 2011, pp. 150--157.
[10]
H. Madeira et al., "RIFLE: A General Purpose Pin-level Fault Injector," in Proceedings of the First European Dependable Computing Conference on Dependable Computing, ser. EDCC-1. London, UK, UK: Springer-Verlag, 1994, pp. 199--216.
[11]
P. Civera et al., "Exploiting FPGA for Accelerating Fault Injection Experiments," in Proceedings of the Seventh International On-Line Testing Workshop, ser. IOLTW '01. Washington, DC, USA: IEEE Computer Society, 2001, pp. 9--13.
[12]
J. Karlsson et al., "Using Heavy-Ion Radiation to Validate Fault-Handling Mechanisms," IEEE Micro, vol. 14, no. 1, pp. 8--11, 13--23, Feb. 1994.
[13]
D. Li et al., "Classifying Soft Error Vulnerabilities in Extreme-Scale Scientific Applications Using a Binary Instrumentation Tool," Salt Lake City, 2012.
[14]
D. T. Stott et al., "A Framework for Assessing Dependability in Distributed Systems with Lightweight Fault Injectors," in Proceedings of the 4th International Computer Performance and Dependability Symposium, ser. IPDS '00. Washington, DC, USA: IEEE Computer Society, 2000.
[15]
D. Skarin et al., "GOOFI-2: A tool for experimental dependability assessment," in Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on. IEEE, 2010, pp. 557--562.
[16]
J. M. Bieman et al., "Using Fault Injection to Increase Software Test Coverage," in Proceedings of the The Seventh International Symposium on Software Reliability Engineering, ser. ISSRE '96. Washington, DC, USA: IEEE Computer Society, 1996, pp. 166--.
[17]
J. Wei et al., "Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults," in Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, ser. DSN '14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 375--382.
[18]
V. C. Sharma et al., "Towards Formal Approaches to System Resilience," in Proceedings of the 2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing, ser. PRDC '13. Washington, DC, USA: IEEE Computer Society, 2013, pp. 41--50.
[19]
S. Lee and J. Vetter, "OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study," in WACCPD: Workshop on Accelerator Programming Using Directives in Conjunction with SC'14, november 2014.
[20]
C. Dave et al., "Cetus: A source-to-source compiler infrastructure for multicores," IEEE Computer, vol. 42, no. 12, pp. 36--42, 2009.
[21]
jllvm. {Online}. Available: https://github.com/eligottlieb/jllvm
[22]
LLVM Releases. {Online}. Available: http://llvm.org/releases/
[23]
S. Che et al., "Rodinia: A benchmark suite for heterogeneous computing," in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2009.

Cited By

View all
  • (2018)Programmer-guided reliability for extreme-scale applicationsInternational Journal of High Performance Computing Applications10.1177/109434201666762532:5(598-612)Online publication date: 1-Sep-2018
  • (2016)NVL-CProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907303(125-136)Online publication date: 31-May-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC
November 2015
74 pages
ISBN:9781450340052
DOI:10.1145/2833157
  • Conference Chair:
  • Hal Finkel
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. C pragma
  2. LLVM
  3. OpenARC
  4. compiler intermediate representation
  5. directive-based programming
  6. fault injection
  7. resiliency

Qualifiers

  • Research-article

Conference

SC15
Sponsor:

Acceptance Rates

LLVM '15 Paper Acceptance Rate 7 of 12 submissions, 58%;
Overall Acceptance Rate 16 of 22 submissions, 73%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Programmer-guided reliability for extreme-scale applicationsInternational Journal of High Performance Computing Applications10.1177/109434201666762532:5(598-612)Online publication date: 1-Sep-2018
  • (2016)NVL-CProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907303(125-136)Online publication date: 31-May-2016

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media