Abstract
Non volatile memory (NVM) technologies are being explored extensively to replace conventional SRAM based memories. The main focus of this paper is the exploration of a NVM based instruction memory in low power embedded systems for wireless or multimedia target applications. A SRAM based traditional instruction memory organization suitable for the target applications is taken as the base. Different Resistive RAM (ReRAM) based organizations are then designed as alternatives keeping in mind their limitations (write process related), and energy and performance trade-offs. The NVM array design is explored and optimized based on energy and performance trade-offs. Dynamic instruction mapping and architectural design changes are utilized to minimize ReRAM limitations and maximize its positive contributions. Energy and performance values are obtained by extension of CACTI models, Spice and VHDL simulations. The best ReRAM based hybrid instruction memory organization that utilizes our proposed methodology showed significantly lower energy consumption (up-to 82.07 % read energy reduction) even in case of 0 % performance penalty.
Similar content being viewed by others
References
Texas Instruments (2010) TMS320C64x/C64x+ DSP CPU and Instruction Set : Reference guide http://www.ti.com/lit/ug/spru732j/spru732j
Burr G, Kurdi B, Scott J, Lam C, Gopalakrishnan K, Shenoy R (2008) Overview of candidate device technologies for storage-class memory. IBM J Res Dev 52(4.5):449–464. doi:10.1147/rd.524.0449
Chen YT, Cong J, Huang H, Liu B, Liu C, Potkonjak M, Reinman G (2012) Dynamically reconfigurable hybrid cache: an energy-efficient last-level cache design. In: Design, automation test in Europe conference exhibition (DATE), pp 45–50. doi:10.1109/DATE.2012.6176431
Mangalagiri P, Sarpatwari K, Yanamandra A, Narayanan V, Xie Y, Irwin MJ, Karim OA (2008) In: Proceedings of the 18th ACM Great Lakes symposium on VLSI (ACM, 2008), GLSVLSI ’08, pp 395–398. doi:10.1145/1366110.1366204
Smullen CW, Mohan V, Nigam A, Gurumurthi S, Stan MR (2011) In: IEEE 17th international symposium on high performance computer architecture (HPCA), 2011, University of Virginia, pp 50–61
Sun H, Liu C, Xu W, Zhao J, Zheng N, Zhang T (2012) Using magnetic RAM to build low-power and soft error-resilient L1 cache. IEEE Trans Very Large Scale Integr Syst 20(1):19. doi:10.1109/TVLSI.2010.2090914
Hu J, Xue C, Zhuge Q, Tseng WC, Sha EM (2011) Towards energy efficient hybrid on-chip Scratch Pad Memory with non-volatile memory. In: Design, automation test in Europe conference exhibition, pp 1–6. doi:10.1109/DATE.2011.5763127
Apalkov D, Khvalkovskiy A, Watts S, Nikitin V, Tang X, Lottis D, Moon K, Luo X, Chen E, Ong A, Driskill-Smith A, Krounbi M (2013) Spin-transfer torque magnetic random access memory (STT-MRAM). J Emerg Technol Comput Syst 9(2):13:1–13:35
Kawahara A, Kawai K, Ikeda Y, Katoh Y, Azuma R, Yoshimoto Y, Tanabe K, Wei Z, Ninomiya T, Katayama K, Yasuhara R, Muraoka S, Himeno A, Yoshikawa N, Murase H, Shimakawa K, Takagi T, Mikawa T, Aono K (2013) In: Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International, pp 220–221. doi: 10.1109/ISSCC.2013.6487708
Cheng C, Chin A (2013) Nano-crystallized titanium oxide resistive memory with uniform switching and long endurance. Appl Phys A 111(1):203. doi:10.1007/s00339-013-7547-0
Raghavan P, Lambrechts A, Jayapala M, Catthoor F, Verkest D, Corporaal H (2007) In : Design, automation and test in Europe conference and exhibition (DATE), (IMEC, 2007), pp 1–6
Kin J, Gupta M, Mangione-Smith WH (1997) In: Proceedings of the 30th annual ACM/IEEE international symposium on microarchitecture (IEEE Computer Society, Washington, DC, USA, 1997), MICRO 30, pp 184–193. http://dl.acm.org/citation.cfm?id=266800.266818
Monazzah A, Farbeh H, Miremadi S, Fazeli M, Asadi H (2013) In: 43rd Annual IEEE/IFIP international conference on dependable systems and networks (DSN), 2013, pp 1–10. doi:10.1109/DSN.2013.6575351
Li Q, Zhao Y, Hu J, Xue C, Sha E, He Y (2012) In: 16th Workshop on interaction between compilers and computer architectures (INTERACT), 2012, pp 17–24. doi:10.1109/INTERACT.2012.6339622
Hu J, Zhuge Q, Xue C, Tseng WC, Sha EM (2012) In: IEEE 26th international parallel and distributed processing symposium workshops PhD forum (IPDPSW), 2012, pp 982–989. doi:10.1109/IPDPSW.2012.120
Hu J, Xue C, Zhuge Q, Tseng WC, Sha E (2013) Data allocation optimization for hybrid scratch pad memory with SRAM and nonvolatile memory. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(6):1094–1102. doi:10.1109/TVLSI.2012.2202700
Hu J, Zhuge Q, Xue CJ, Tseng WC, Sha EHM (2014) Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors. ACM Trans Embed Comput Syst 13(4):79:1–79:25. doi:10.1145/2560019
Wang P, Sun G, Wang T, Xie Y, Cong J (2013) In: IEEE international symposium on circuits and systems, 2013, pp 1244–1247. doi:10.1109/ISCAS.2013.6572078
Cosemans S, Dehaene W, Catthoor F (2009) A 3.6 pJ/access 480 MHz, 128 kb on-chip SRAM with 850 mHz boost mode in 90 nm cmos with tunable sense amplifiers. IEEE J Solid-State Circuits 44(7):2065–2077
Sarpeshkar R, Delbruck T, Mead CA (1993) White noise in MOS transistors and resistors. IEEE Circuits Devices Mag 9(6):23–29
Karandikar A, Parhi KK (1998) In: Proceedings international conference on computer design: VLSI in Computers and Processors, ICCD ’98. (Intel Corp., 1998), pp 82–88
Uh GR, Wang Y, Whalley D, Jinturkar S, Burns C, Cao V (1999) In: Workshop on languages, compilers, and tools for embedded systems (Lucent Technologies, 1999), pp 10–19
Bajwa R, Hiraki M, Kojima H, Gorny D, Nitta K, Shridhar A, Seki K, Sasaki K (1997) Instruction buffering to reduce power in processors for signal processing. IEEE Trans Very Large Scale Integr (VLSI) Syst 5(4):417–424
Bellas N, Hajj I, Polychronopoulos C, Stamoulis G (2000) Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans Very Large Scale Integr (VLSI) Syst 8(3):317–326
Jayapala M, Barat F, Aa T, Catthoor F, Corporaal H, Deconinck G (2005) Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Trans Comput 54(6):672–683
Raghavan P, Lambrechts A, Jayapala M, Catthoor F, Verkest D (2009) Distributed loop controller for multithreading in unithreaded ILP architectures. IEEE Trans Comput 58(3):311–321
Artes A, Ayala J, Sathanur A, Huisken J, Catthoor F (2011) In: 19th International conference on VLSI and system-on-chip, VLSI-SoC 2011, pp 136–141
Govoreanu B, Kar G, Chen Y, Paraschiv V, Kubicek S, Fantini A, Radu IP, Goux L, Clima S, Degraeve R, Jossart N, Richard O, Vandeweyer T, Seo K, Hendrickx P, Pourtois G, Bender H, Altimime L, Wouters D, Kittl J, Jurczak M (2011) In: IEEE international electron devices meeting technical digest (IEDM), 2011 (imec, 2011), pp 729–732
Sheu SS, Cheng KH, Chang MF, Chiang PC, Lin WP, Lee HY, Chen PS, Chen YS, Wu TY, Chen F, Su KL, Kao MJ, Tsai MJ (2011) Fast-write resistive RAM (RRAM) for embedded applications. IEEE Des Test Comput 28(1):64–71
Dong X, Xu C, Xie Y, Jouppi N (2012) NVSim: a circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Trans Comput-Aided Des Integr Circuits Syst 31(7):994–1007
Muralimanohar N, Balasubramonian R (2009) Cacti 6.0: a tool to model large caches
Graphics M (2011) Modelsim se 10.0c. http://www.mentor.com/products/fv/modelsim/
Author information
Authors and Affiliations
Corresponding author
Additional information
This is a submission for the Special Issue on Memory Architecture and Organization for Embedded Systems. This project was partially funded by the Spanish government’s research contract: TIN 2012-32180.
Rights and permissions
About this article
Cite this article
Komalan, M.P., Pérez, J.I.G., Tenllado, C. et al. Design exploration of a NVM based hybrid instruction memory organization for embedded platforms. Des Autom Embed Syst 17, 459–483 (2013). https://doi.org/10.1007/s10617-014-9151-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-014-9151-8