Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Benzene: An Energy-Efficient Distributed Hybrid Cache Architecture for Manycore Systems

Published: 22 March 2018 Publication History

Abstract

This article proposes Benzene, an energy-efficient distributed SRAM/STT-RAM hybrid cache for manycore systems running multiple applications. It is based on the observation that a naïve application of hybrid cache techniques to distributed caches in a manycore architecture suffers from limited energy reduction due to uneven utilization of scarce SRAM. We propose two-level optimization techniques: intra-bank and inter-bank. Intra-bank optimization leverages highly associative cache design, achieving more uniform distribution of writes within a bank. Inter-bank optimization evenly balances the amount of write-intensive data across the banks. Our evaluation results show that Benzene significantly reduces energy consumption of distributed hybrid caches.

References

[1]
Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2014. DASCA: Dead write prediction assisted STT-RAM cache architecture. In Proceedings of the International Symposium on High Performance Computer Architecture.
[2]
Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2016. Prediction hybrid cache: An energy-efficient STT-RAM cache architecture. IEEE Trans. Comput. 65, 3 (2016), 940--951.
[3]
Jorge Albericio, Pablo Ibáñez, Víctor Viñals, and José M. Llabería. 2013. The reuse cache: Downsizing the shared last-level cache. In Proceedings of the International Symposium on Microarchitecture.
[4]
Nathan Beckmann and Daniel Sanchez. 2013. Jigsaw: Scalable software-defined caches. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques.
[5]
Nathan Beckmann, Po-An Tsai, and Daniel Sanchez. 2015. Scaling distributed cache hierarchies through computation and data co-scheduling. In Proceedings of International Symposium in High Performance Computer Architecture.
[6]
Shane Bell, Bruce Edwards, John Amann, Rich Conlin, Kevin Joyce, Vince Leung, John MacKay, Mike Reif, Liewei Bao, John Brown, Matthew Mattina, Chyi-Chang Miao, Carl Ramey, David Wentzlaff, Walker Anderson, Ethan Berger, Nat Fairbanks, Durlov Khan, Froilan Montenegro, Jay Stickney, and John Zook. 2008. TILE64-processor: A 64-core SoC with mesh interconnect. In International Solid-State Circuits Conference Digest of Technical Papers.
[7]
Xiuyuan Bi, Zhenyu Sun, Hai Li, and Wenqing Wu. 2012. Probabilistic design methodology to improve run-time stability and performance of STT-RAM caches. In Proceedings of the International Conference on Computer-Aided Design.
[8]
Yu-Ting Chen, Jason Cong, Hui Huang, Chunyue Liu, Raghu Prabhakar, and Glenn Reinman. 2012. Static and dynamic co-optimizations for blocks mapping in hybrid caches. In Proceedings of the International Symposium on Low Power Electronics and Design.
[9]
Hsiang-Yun Cheng, Jishen Zhao, Jack Sampson, Mary Jane Irwin, Aamer Jaleel, Yu Lu, and Yuan Xie. 2016. LAP: Loop-block aware inclusion properties for energy-efficient asymmetric last level caches. In Proceedings of the International Symposium on Computer Architecture.
[10]
Derek Chiou, Prabhat Jain, Srinivas Devadas, and Larry Rudolph. 2000. Dynamic cache partitioning via columnization. In Proceedings of Design Automation Conference.
[11]
Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. 2003. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In Proceedings of the International Symposium on Microarchitecture.
[12]
Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. 2005. Optimizing replication, communication, and capacity allocation in CMPs. In Proceedings of the International Symposium on Computer Architecture.
[13]
George Chrysos. 2012. Intel® Xeon Phi coprocessor (codename Knights Corner). In IEEE Hot Chips Symposium.
[14]
Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, Hai Li, and Yiran Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the Design Automation Conference.
[15]
Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P. Jouppi. 2012. NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 31, 7 (2012), 994--1007.
[16]
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Arch. News 34, 4 (2006), 1--17.
[17]
Adwait Jog, Asit K. Mishra, Cong Xu, Yuan Xie, Vijaykrishnan Narayanan, Ravishankar Iyer, and Chita R. Das. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the Design Automation Conference.
[18]
Georgios Keramidas, Pavlos Petoumenos, and Stefanos Kaxiras. 2007. Cache replacement based on reuse-distance prediction. In Proceedings of the International Conference on Computer Design.
[19]
Samira M. Khan, Yingying Tian, and Daniel A. Jimenez. 2010. Sampling dead block prediction for last-level caches. In Proceedings of the International Symposium on Microarchitecture.
[20]
Hyunjin Lee, Sangyeun Cho, and Bruce R. Childers. 2011. CloudCache: Expanding and shrinking private caches. In Proceedings of the International Symposium on High Performance Computer Architecture.
[21]
Jianhua Li, Liang Shi, Chun Jason Xue, Chengmo Yang, and Yinlong Xu. 2011. Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache. In Proceedings of the Symposium on Embedded Systems for Real-Time Multimedia.
[22]
Qingan Li, Jianhua Li, Liang Shi, Chun Jason Xue, and Yanxiang He. 2012. MAC: Migration-aware compilation for STT-RAM based hybrid cache in embedded systems. In Proceedings of the International Symposium on Low Power Electronics and Design.
[23]
Qingan Li, Mengying Zhao, Chun Jason Xue, and Yanxiang He. 2012. Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache. In Proceedings of the International Conference on Languages, Compilers, Tools and Theory for Embedded Systems.
[24]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2013. The McPAT framework formulticore and manycore architectures: Simultaneously modeling power, area, and timing. ACM Trans. Arch. Code Optim. 10, 1 (2013), 5:1--5:29.
[25]
Asit K. Mishra, Xiangyu Dong, Guangyu Sun, Yuan Xie, Vijaykrishnan Narayanan, and Chita R. Das. 2011. Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs. In Proceedings of International Symposium in Computer Architecture.
[26]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. Technical Report HPL-2009-85. HP Laboratories.
[27]
Rasmus Pagh and Flemming Friche Rodler. 2001. Cuckoo hashing. In Proceedings of the European Symposium on Algorithms.
[28]
Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. 2006. A case for MLP-aware cache replacement. In Proceedings of International Symposium in Computer Architecture.
[29]
Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture.
[30]
Moinuddin K. Qureshi, David Thompson, and Yale N. Patt. 2005. The V-Way cache: Demand-based associativity via global replacement. In Proceedings of the International Symposium on Computer Architecture.
[31]
Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of the International Symposium on Microarchitecture.
[32]
Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of International Symposium in Computer Architecture.
[33]
Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of International Symposium in Computer Architecture.
[34]
André Seznec. 1993. A case for two-way skewed-associative caches. In Proceedings of International Symposium in Computer Architecture.
[35]
Clinton W. Smullen IV, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the International Symposium on High Performance Computer Architecture.
[36]
Chen Sun, Chia-Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li-Shiuan Peh, and Vladimir Stojanovic. 2012. DSENT-A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Proceedings of the International Symposium on Networks on Chip.
[37]
Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the International Symposium on High Performance Computer Architecture.
[38]
Zhenyu Sun, Xiuyuan Bi, Hai Li, Weng-Fai Wong, Zhong-Liang Ong, Xiaochun Zhu, and Wenqing Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the International Symposium on Microarchitecture.
[39]
Jue Wang, Xiangyu Dong, and Yuan Xie. 2013. OAP: An obstruction-aware cache management policy for STT-RAM last-level caches. In Proceedings of the Design, Automation and Test in Europe.
[40]
Zhe Wang, Daniel A. Jimenez, Cong Xu, Guangyu Sun, and Yuan Xie. 2013. Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In Proceedings of the International Symposium on High Performance Computer Architecture.
[41]
Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the International Symposium on Computer Architecture.
[42]
Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, and Yuan Xie. 2011. Power and performance of read-write aware hybrid caches with non-volatile memories. In Proceedings of the Design, Automation and Test in Europe.
[43]
Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the International Symposium on Computer Architecture.
[44]
Tianhao Zheng, Jaeyoung Park, Michael Orshansky, and Mattan Erez. 2013. Variable-energy write STT-RAM architecture with bit-wise write-completion monitoring. In Proceedings of the International Symposium on Low Power Electronics and Design.
[45]
Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the International Conference on Computer-Aided Design.

Cited By

View all
  • (2023)HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache HierarchyACM Transactions on Architecture and Code Optimization10.1145/357283920:2(1-20)Online publication date: 1-Mar-2023
  • (2023)Efficient placement and migration policies for an STT-RAM based hybrid L1 cache for intermittently powered systemsDesign Automation for Embedded Systems10.1007/s10617-023-09272-w27:4(303-331)Online publication date: 1-Dec-2023
  • (2021)Exploiting Bit-Level Write Patterns to Reduce Energy Consumption in Hybrid Cache ArchitectureIEICE Electronics Express10.1587/elex.18.20210327Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 15, Issue 1
March 2018
401 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3199680
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2018
Accepted: 01 December 2017
Revised: 01 November 2017
Received: 01 May 2017
Published in TACO Volume 15, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Manycore systems
  2. STT-RAM
  3. distributed
  4. energy-efficient
  5. hybrid cache

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Samsung Electronics Co., Ltd

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)10
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache HierarchyACM Transactions on Architecture and Code Optimization10.1145/357283920:2(1-20)Online publication date: 1-Mar-2023
  • (2023)Efficient placement and migration policies for an STT-RAM based hybrid L1 cache for intermittently powered systemsDesign Automation for Embedded Systems10.1007/s10617-023-09272-w27:4(303-331)Online publication date: 1-Dec-2023
  • (2021)Exploiting Bit-Level Write Patterns to Reduce Energy Consumption in Hybrid Cache ArchitectureIEICE Electronics Express10.1587/elex.18.20210327Online publication date: 2021
  • (2021)DAM: Deadblock Aware Migration Techniques for STT-RAM-Based Hybrid CachesIEEE Computer Architecture Letters10.1109/LCA.2021.307171720:1(62-4)Online publication date: 1-Jan-2021
  • (2020)Energy-Efficient Runtime Adaptable L1 STT-RAM Cache DesignIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.291292039:6(1328-1339)Online publication date: Jun-2020
  • (2020)Statistical Behavior Guided Block Allocation in Hybrid Cache-Based Edge Computing for Cyber-Physical-Social SystemsIEEE Access10.1109/ACCESS.2020.29723058(29055-29063)Online publication date: 2020
  • (2019)Router-integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP SystemsElectronics10.3390/electronics81113638:11(1363)Online publication date: 17-Nov-2019
  • (2019)Filter router: An enhanced router design for efficient stacked shared cache networkIEICE Electronics Express10.1587/elex.16.2019035816:14(20190358-20190358)Online publication date: 2019
  • (2018)Exploring Hybrid Memory Caches in Chip Multiprocessors2018 13th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)10.1109/ReCoSoC.2018.8449386(1-8)Online publication date: Jul-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media