Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Asymmetrically reliable caches for multicore architectures under performance and energy constraints

Published: 01 December 2016 Publication History

Abstract

Cache structures in a multicore system are more vulnerable to soft errors due to high transistor density. Protecting all caches unselectively has notable overhead on performance and energy consumption. In this study, we propose asymmetrically reliable caches to supply reliability need of the system using sufficient additional hardware under the performance and energy constraints. In our framework, a chip multiprocessor is composed of a high reliability core which has ECC protection, and a set of low reliability cores which have no protection on their data caches. Between two types of cores, there is also a middle-level reliability core which has only parity check. Application threads are mapped on the different cores in terms of reliability based on their critical data usage. The experimental results for selected applications show that our proposed techniques improve reliability with considerable performance and energy overhead on the average compared to traditional unsafe caches.

References

[1]
Alameldeen, A.R., Wagner, I., Chishti, Z., Wu, W., Wilkerson, C., Lu, S.L.: Energy-efficient cache design using variable-strength error-correcting codes. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11. ACM, New York, NY, USA (2011).
[2]
Arslan, S., Topcuoglu, H., Kandemir, M., Tosun, O.: Performance and energy efficient asymmetrically reliable caches for multicore architectures. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp. 1025---1032 (2015).
[3]
Arslan, S., Topcuoglu, H.R., Kandemir, M.T., Tosun, O.: Protecting Code Regions on Asymmetrically Reliable Caches, pp. 375---387. Springer, Cham (2016).
[4]
Asadi, G.H., Mehdi, V.S., Tahoori, B., Kaeli, D.: Balancing performance and reliability in the memory hierarchy. In: IEEE International Symposium on Performance Analysis of Systems and Software, 2005, ISPASS 2005, pp. 269---279 (2005).
[5]
Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1---7 (2011).
[6]
Cai, Y., Schmitz, M., Ejlali, A., Al-Hashimi, B., Reddy, S. (2006) Cache size selection for performance, energy and reliability of time-constrained systems. In: Asia and South Pacific Conference on Design Automation, 2006.
[7]
Carbin, M., Misailovic, S., Rinard, M.C.: Verifying quantitative reliability for programs that execute on unreliable hardware. SIGPLAN Not. 48(10), 33---52 (2013).
[8]
Ebrahimi, M., Evans, A., Tahoori, M.B., Costenaro, E., Alexandrescu, D., Chandra, V., Seyyedi, R.: Comprehensive analysis of sequential and combinational soft errors in an embedded processor. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(10), 1586---1599 (2015).
[9]
Eltawil, A.A., Engel, M., Geuskens, B., Djahromi, A.K., Kurdahi, F.J., Marwedel, P., Niar, S., Saghir, M.A.: A survey of cross-layer power-reliability tradeoffs in multi and many core systems-on-chip. Microprocess. Microsyst. 37(8), 760---771 (2013).
[10]
González, A., Aliagas, C., Valero, M.: A data cache with multiple caching strategies tuned to different types of locality. In: Proceedings of the 9th International Conference on Supercomputing, ICS '95, pp. 338---347. ACM, New York (1995).
[11]
Iqbal, S., Liang, Y., Grahn, H.: Parmibench--an open-source benchmark for embedded multiprocessor systems. Comput. Archit. Lett. 9(2), 45---48 (2010).
[12]
de Kruijf, M., Nomura, S., Sankaralingam, K.: Relax: An architectural framework for software recovery of hardware faults. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, pp. 497---508. ACM, New York, NY (2010).
[13]
Lee, K., Shrivastava, A., Issenin, I., Dutt, N., Venkatasubramanian, N.: Mitigating soft error failures for multimedia applications by selective data protection. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES '06, pp. 411---420. ACM, New York, NY (2006).
[14]
Leem, L., Cho, H., Bau, J., Jacobson, Q., Mitra, S.: Ersa: Error resilient system architecture for probabilistic applications. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2010, pp. 1560---1565 (2010).
[15]
Leveugle, R., Calvez, A., Maistri, P., Vanhauwaert, P.: Statistical fault injection: quantified error and confidence. In: Design, Automation Test in Europe Conference Exhibition, 2009, DATE '09, pp. 502---506 (2009).
[16]
Luo, Y., Govindan, S., Sharma, B., Santaniello, M., Meza, J., Kansal, A., Liu, J., Khessib, B., Vaid, K., Mutlu, O.: Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 467---478 (2014),
[17]
Meaney, P., Lastras-Montano, L., Papazova, V., Stephens, E., Johnson, J., Alves, L., O'Connor J., Clarke, W.: Ibm zenterprise redundant array of independent memory subsystem. IBM J. Res. Dev. 56(1.2):4:1---4:11 (2012).
[18]
Misailovic, S., Carbin, M., Achour, S., Qi, Z., Rinard, M.C.: Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. SIGPLAN Not. 49(10), 309---328 (2014).
[19]
Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: Architecting efficient interconnects for large caches with cacti 6.0. IEEE Micro. 28(1), 69---79 (2008).
[20]
Naseer, R., Boulghassoul, Y., Draper, J., DasGupta, S., Witulski, A.: Critical charge characterization for soft error rate modeling in 90 nm sram. In: IEEE International Symposium on Circuits and Systems, 2007 (ISCAS 2007), pp. 1879---1882 (2007).
[21]
Oz, I., Topcuoglu, H.R., Kandemir, M., Tosun, O.: Thread vulnerability in parallel applications. J. Parallel Distrib. Comput. 72(10), 1171---1185 (2012).
[22]
Rehman, S., Kriebel, F., Shafique, M., Henkel, J.: Compiler-driven dynamic reliability management for on-chip systems under variabilities. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1---4 (2014).
[23]
Rehman, S., Kriebel, F., Shafique, M., Henkel, J.: Reliability-driven software transformations for unreliable hardware. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 33(11), 1597---1610 (2014).
[24]
Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., Grossman, D.: Enerj: Approximate data types for safe and general low-power computation. SIGPLAN Not. 46(6), 164---174 (2011).
[25]
Shantharam, M., Srinivasmurthy, S., Raghavan, P.: Characterizing the impact of soft errors on iterative methods in scientific computing. In: Proceedings of the International Conference on Supercomputing, ICS '11, pp. 152---161. ACM, New York, NY (2011).
[26]
Shivakumar, P., Kistler, M., Keckler, S., Burger, D., Alvisi, L.: Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proceedings of the International Conference on Dependable Systems and Networks, 2002. DSN 2002, pp. 389---398.
[27]
Suleman, M.A., Mutlu, O., Qureshi, M.K., Patt, Y.N.: Accelerating critical section execution with asymmetric multi-core architectures. SIGARCH Comput. Archit. News 37(1), 253---264 (2009).
[28]
Ungsunan, P., Lin, C., Gai, Y., Kong, X.: Improving multi-core system dependability with asymmetrically reliable cores. In: International Conference on Complex, Intelligent and Software Intensive Systems, 2009. CISIS '09, pp. 1252---1257 (2009).
[29]
Wilkerson, C., Alameldeen, A.R., Chishti, Z., Wu, W., Somasekhar, D., Lu, S.l.: Reducing cache power with low-cost, multi-bit error-correcting codes. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, pp. 83---93. ACM, New York, NY, USA (2010).
[30]
Woo, S., Ohara, M., Torrie, E., Singh, J., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995, pp. 24---36 (1995)
[31]
Woo, S.C., Singh, J.P., Hennessy, J.L.: The performance advantages of integrating block data transfer in cache-coherent multiprocessors. SIGOPS Oper. Syst. Rev. 28(5), 219---229 (1994).
[32]
Yetim, Y., Malik, S., Martonosi, M.: Commguard: Mitigating communication errors in error-prone parallel execution. SIGARCH Comput Archit News 43(1), 311---323 (2015).
[33]
Yoon DH, Erez M (2009) Memory mapped ecc: Low-cost error protection for last level caches. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, pp. 116---127. ACM, New York, NY, USA.
[34]
Yoon, D.H., Erez, M.: Virtualized ecc: Flexible reliability in main memory. Micro, IEEE 31(1), 11---19 (2011).

Cited By

View all
  • (2019)Scheduling opportunities for asymmetrically reliable cachesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.01.005126:C(134-151)Online publication date: 1-Apr-2019
  1. Asymmetrically reliable caches for multicore architectures under performance and energy constraints

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Cluster Computing
      Cluster Computing  Volume 19, Issue 4
      December 2016
      625 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 December 2016

      Author Tags

      1. Asymmetric Cores
      2. Fault Injection
      3. Reliability
      4. Selective Protection

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Scheduling opportunities for asymmetrically reliable cachesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.01.005126:C(134-151)Online publication date: 1-Apr-2019

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media