Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

An Accurate Cross-Layer Approach for Online Architectural Vulnerability Estimation

Published: 17 September 2016 Publication History

Abstract

Processor soft-error rates are projected to increase as feature sizes scale down, necessitating the adoption of reliability-enhancing techniques, but power and performance overhead remain a concern of such techniques. Dynamic cross-layer techniques are a promising way to improve the cost-effectiveness of resilient systems. As a foundation for making such a system, we propose a cross-layer approach for estimating the architectural vulnerability of a processor core online that works by combining information from software, compiler, and microarchitectural layers at runtime. The hardware layer combines the metadata from software and compiler layers with microarchitectural measurements to estimate architectural vulnerability online. We describe our design and evaluate it in detail on a set of SPEC CPU 2006 applications. We find that our online AVF estimate is highly accurate with respect to a postmortem AVF analysis, with only 0.46% average absolute error. Also, our design incurs negligible performance impact for SPEC2006 applications and about 1.2% for a Monte Carlo application, requires approximately 1.4% area overhead, and costs about 3.3% more power on average. We compare our technique against two prior online AVF estimation techniques, one using a linear regression to estimate AVF and another based on PVF-HVF; our evaluation finds that our approach, on average, is more accurate. Our case study of a Monte Carlo simulation shows that our AVF estimate can adapt to the inherent resiliency of the algorithm. Finally, we demonstrate the effectiveness of our approach using a dynamic protection scheme that limits vulnerability to soft errors while reducing the energy consumption by an average of 4.8%, and with a target normalized SER of 10%, compared to enabling a simple parity+ECC protection at all times.

References

[1]
Arijit Biswas, Niranjan Soundararajan, Shubhendu S. Mukherjee, and Sudhanva Gurumurthi. 2009. Quantized AVF: A means of capturing vulnerability variations over small windows of time. In SELSE’09.
[2]
S. Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. Micro.
[3]
H. Cho and L. Leem. 2012. ERSA: Error resilient system architecture for probabilistic applications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[4]
N. Choudhary, S. Wadhavkar, T. Shah, H. Mayukh, J. Gandhi, B. Dwiel, S. Navada, H. Najaf-abadi, and E. Rotenberg. 2011. FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template. ISCA 11--22.
[5]
C. Constantinescu. 2007. Intermittent faults in VLSI circuits. In Proceedings of the SELSE Workshop.
[6]
Marc de Kruijf, Shuou Nomura, and Karthikeyan Sankaralingam. 2010. Relax: An Architectural Framework for Software Recovery of Hardware Faults. Vol. 38. ACM.
[7]
M. Dechene, J. E. Forbes, and E. Rotenberg. 2010. Multithreaded Instruction Sharing. Technical Report. Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC.
[8]
L. Duan, Bin L, and L. Peng. 2009. Versatile prediction and fast estimation of architectural vulnerability factor from processor performance metrics. IEEE 15th International Symposium on HPCA. 129--140.
[9]
B. Farahani and S. Safari. 2015. A cross-layer approach to online adaptive reliability prediction of transient faults. In IEEE International Symposium on DFTS.
[10]
X. Fu, J. Poe, T. Li, and J. Fortes. 2006. Characterizing microarchitecture soft error vulnerability phase behavior. In 14th IEEE International Symposium on Modeling, Analysis, and Simulation. 147--155.
[11]
M. A. Gomaa and T. N. Vijaykumar. 2005. Opportunistic transient-fault detection. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA’05). 172--183.
[12]
S. Gupta, A. Ansari, and S. Mahlke. 2010. Shoestring: Probabilistic soft error reliability on the cheap. ACM SIGPLAN Notices.
[13]
S. Hari, S. Adve, H. Naeimi, and P. Ramachandran. 2012. Relyzer: Exploiting application-level fault equivalence to analyze application resiliency to transient faults. In ASPLOS’12. ACM.
[14]
Siva Kumar Sastry Hari, Radha Venkatagiri, Sarita V. Adve, and Helia Naeimi. 2014. GangES: Gang error simulation for hardware resiliency evaluation. In International Symposium on Computer Architecture (ISCA’14). 61--72.
[15]
Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose. 2004. Microarchitectural techniques for power gating of execution units (ISLPED’04). ACM, New York, NY, 32--37.
[16]
M. Kondo, H. Kobyashi, R. Sakamoto, M. Wada, J. Tsukamoto, M. Namiki, W. Wang, H. Amano, K. Matsunaga, M. Kudo, K. Usami, T. Komoda, and H. Nakamura. 2014. Design and evaluation of fine-grained power-gating for embedded microprocessors. In DATE’14. 1--6.
[17]
H. Li, J. Mundy, W. Paterson, et al. 2007. Thermally-induced soft errors in nanoscale CMOS circuits. International Symposium on Nanoscale Architectures.
[18]
M. L. Li, P. Ramachandran, S. K. Sahoo, Sarita V. Adve, Vikram Adve, and Yuanyuan Zhou. 2008. SWAT: An error resilient system. IEEE SELSE Workshop.
[19]
X. Li, S. V. Adve, P. Bose, and J. A. Rivers. 2008. Online estimation of architectural vulnerability factor for soft errors. In International Symposium on Computer Architecture (ISCA’08).
[20]
A. Meixner, M. E. Bauer, and D. J. Sorin. 2007. Argus: Low-cost, comprehensive error detection in simple cores. 40th Annual IEEE/ACM International Symposium on Microarchitecture. 210--222.
[21]
Steven S. Muchnick. 1997. Advanced Compiler Design Implementation. Morgan Kaufmann, San Francisco, CA.
[22]
S. S. Mukherjee and J. Emer. 2005. The soft error problem: An architectural perspective. In HPCA.
[23]
S. S. Mukherjee, M. Kontz, and S. K. Reinhardt. 2002. Detailed design and evaluation of redundant multithreading alternatives. In 29th Annual International Symposium on Computer Architecture, IEEE Computer Society, 99--110.
[24]
S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T. Austin. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In MICRO’03. 29--40.
[25]
A. A. Nair, S. Eyerman, L. Eeckhout, and L. K. John. 2012. A first-order mechanistic model for architectural vulnerability factor. In 39th Annual International Symposium on Computer Architecture (ISCA’12). 273--284.
[26]
NanGate. 2011. NanGate 45nm Open Cell Library. Retrieved August 17, 2016 from http://www.nangate.com.
[27]
N. Oh, S. Mitra, and E. J. McCluskey. 2002. ED/sup 4/I: Error detection by diverse data and duplicated instructions. IEEE Transactions on Computers 51, 2, 180--199.
[28]
V. K. Reddy, E. Rotenberg, and S. Parthasarathy. 2006. Understanding prediction-based partial redundant threading for low-overhead, high-coverage fault tolerance. In ASPLOS’06.
[29]
S. Rehman, M. Shafique, F. Kriebel, and J. Henkel. 2011. Reliable software for unreliable hardware: Embedded code generation aiming at reliability. In CODES+ISSS’11. ACM.
[30]
G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. August. 2005a. SWIFT: Software implemented fault tolerance. In Proceedings of the International Symposium on Code Generation and Optimization.
[31]
G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, D. I. August, and S. S. Mukherjee. 2005b. Design and evaluation of hybrid fault-detection systems. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. ACM.
[32]
G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, D. I. August, and S. S. Mukherjee. 2005c. Software-controlled fault tolerance. Transactions on Architecture and Code Optimization 2, 4.
[33]
Rosettacode.org. 2016. Monte Carlo Simulation program. Retrieved August 16, 2016 from http://rosettacode.org/wiki/Monte_Carlo_methods.
[34]
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. In PLDI’11. 164--174.
[35]
J. Sharkey, N. Abu-Ghazeleh, D. Ponomarev, K. Ghose, and A. Aggarwal. 2006. Trade-offs in transient fault recovery schemes for redundant multithreaded processors. In HiPC’06. Springer-Verlag.
[36]
Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 2002. Automatically characterizing large scale program behavior. SIGOPS Opererating Systems Review 36, 5, 45--57.
[37]
A. Shye, J. Blomstedt, T. Moseley, V. J. Reddi, and D. A. Connors. 2009. PLR: A software approach to transient fault tolerance for multicore architectures. IEEE Transactions on Dependable and Secure Computing 6, 2, 135--148.
[38]
N. K. Soundararajan, A. Parashar, and A. Sivasubramaniam. 2007. Mechanisms for bounding vulnerabilities of processor structures. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM.
[39]
V. Sridharan and D. R. Kaeli. 2009. Eliminating microarchitectural dependency from architectural vulnerability. IEEE 15th International Symposium on High Performance Computer Architecture. 117--128.
[40]
Vilas Sridharan and David R. Kaeli. 2010. Using hardware vulnerability factors to enhance AVF analysis. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM.
[41]
Hamed Tabkhi and Gunar Schirner. 2015. A joint SW/HW approach for reducing register file vulnerability. ACM Transactions on Architectural Code Optimization. 12, 2, Article 9, 28 pages.
[42]
J. M. Tendler, J. S. Dodson, and J. S. Fields. 2002. POWER4 system microarchitecture. IBM Journal of Research and Development 46, 1, 5--25.
[43]
K. R. Walcott, G. Humphreys, and S. Gurumurthi. 2007. Dynamic prediction of architectural vulnerability from microarchitectural state. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM.
[44]
N. Wang, M. Fertig, and Sanjay Patel. 2003. Y-branches: When you come to a fork in the road, take it. In Proceedings of the 12th International Conference on PACT’03. 56--66.
[45]
N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel. 2004. Characterizing the effects of transient faults on a high-performance processor pipeline. In International Conference on Dependable Systems and Networks. 61--70.
[46]
S. J. E. Wilton and N. P. Jouppi. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits 31, 5, 677--688.
[47]
Yun Zhang, Jae W. Lee, Nick P. Johnson, and David I. August. 2010. DAFT: Decoupled acyclic fault tolerance. In Proceedings of the 19th International Conference on PACT’10. ACM.

Cited By

View all
  • (2021)Methods for Improving the Reliability of Intelligent Semiconductor2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia)10.1109/ICCE-Asia53811.2021.9641987(1-4)Online publication date: 1-Nov-2021
  • (2020)Gem5Panalyzer: A Light-weight tool for Early-stage Architectural Reliability Evaluation & Prediction2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS)10.1109/MWSCAS48704.2020.9184536(482-485)Online publication date: Aug-2020
  • (2019)MinotaurProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304050(1087-1103)Online publication date: 4-Apr-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 3
September 2016
207 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2988523
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 September 2016
Accepted: 01 July 2016
Revised: 01 July 2016
Received: 01 December 2015
Published in TACO Volume 13, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Architectural vulnerability factor
  2. cross-layer reliability

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)8
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Methods for Improving the Reliability of Intelligent Semiconductor2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia)10.1109/ICCE-Asia53811.2021.9641987(1-4)Online publication date: 1-Nov-2021
  • (2020)Gem5Panalyzer: A Light-weight tool for Early-stage Architectural Reliability Evaluation & Prediction2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS)10.1109/MWSCAS48704.2020.9184536(482-485)Online publication date: Aug-2020
  • (2019)MinotaurProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304050(1087-1103)Online publication date: 4-Apr-2019
  • (2018)PRISMProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291748(1-14)Online publication date: 11-Nov-2018
  • (2018)PRISMProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00072(1-14)Online publication date: 11-Nov-2018
  • (2017)Characterizing the impact of soft errors across microarchitectural structures and implications for predictability2017 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2017.8167782(250-260)Online publication date: Oct-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media