research-article

Open access

Benzene: An Energy-Efficient Distributed Hybrid Cache Architecture for Manycore Systems

Authors:

Daniel Sanchez,

Soojung RyuAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 15, Issue 1

Article No.: 10, Pages 1 - 23

https://doi.org/10.1145/3177963

Published: 22 March 2018 Publication History

Abstract

This article proposes Benzene, an energy-efficient distributed SRAM/STT-RAM hybrid cache for manycore systems running multiple applications. It is based on the observation that a naïve application of hybrid cache techniques to distributed caches in a manycore architecture suffers from limited energy reduction due to uneven utilization of scarce SRAM. We propose two-level optimization techniques: intra-bank and inter-bank. Intra-bank optimization leverages highly associative cache design, achieving more uniform distribution of writes within a bank. Inter-bank optimization evenly balances the amount of write-intensive data across the banks. Our evaluation results show that Benzene significantly reduces energy consumption of distributed hybrid caches.

References

[1]

Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2014. DASCA: Dead write prediction assisted STT-RAM cache architecture. In Proceedings of the International Symposium on High Performance Computer Architecture.

[2]

Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2016. Prediction hybrid cache: An energy-efficient STT-RAM cache architecture. IEEE Trans. Comput. 65, 3 (2016), 940--951.

Digital Library

[3]

Jorge Albericio, Pablo Ibáñez, Víctor Viñals, and José M. Llabería. 2013. The reuse cache: Downsizing the shared last-level cache. In Proceedings of the International Symposium on Microarchitecture.

Digital Library

[4]

Nathan Beckmann and Daniel Sanchez. 2013. Jigsaw: Scalable software-defined caches. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques.

Digital Library

[5]

Nathan Beckmann, Po-An Tsai, and Daniel Sanchez. 2015. Scaling distributed cache hierarchies through computation and data co-scheduling. In Proceedings of International Symposium in High Performance Computer Architecture.

[6]

Shane Bell, Bruce Edwards, John Amann, Rich Conlin, Kevin Joyce, Vince Leung, John MacKay, Mike Reif, Liewei Bao, John Brown, Matthew Mattina, Chyi-Chang Miao, Carl Ramey, David Wentzlaff, Walker Anderson, Ethan Berger, Nat Fairbanks, Durlov Khan, Froilan Montenegro, Jay Stickney, and John Zook. 2008. TILE64-processor: A 64-core SoC with mesh interconnect. In International Solid-State Circuits Conference Digest of Technical Papers.

[7]

Xiuyuan Bi, Zhenyu Sun, Hai Li, and Wenqing Wu. 2012. Probabilistic design methodology to improve run-time stability and performance of STT-RAM caches. In Proceedings of the International Conference on Computer-Aided Design.

Digital Library

[8]

Yu-Ting Chen, Jason Cong, Hui Huang, Chunyue Liu, Raghu Prabhakar, and Glenn Reinman. 2012. Static and dynamic co-optimizations for blocks mapping in hybrid caches. In Proceedings of the International Symposium on Low Power Electronics and Design.

Digital Library

[9]

Hsiang-Yun Cheng, Jishen Zhao, Jack Sampson, Mary Jane Irwin, Aamer Jaleel, Yu Lu, and Yuan Xie. 2016. LAP: Loop-block aware inclusion properties for energy-efficient asymmetric last level caches. In Proceedings of the International Symposium on Computer Architecture.

Digital Library

[10]

Derek Chiou, Prabhat Jain, Srinivas Devadas, and Larry Rudolph. 2000. Dynamic cache partitioning via columnization. In Proceedings of Design Automation Conference.

[11]

Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. 2003. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In Proceedings of the International Symposium on Microarchitecture.

Digital Library

[12]

Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. 2005. Optimizing replication, communication, and capacity allocation in CMPs. In Proceedings of the International Symposium on Computer Architecture.

Digital Library

[13]

George Chrysos. 2012. Intel® Xeon Phi coprocessor (codename Knights Corner). In IEEE Hot Chips Symposium.

[14]

Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, Hai Li, and Yiran Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the Design Automation Conference.

Digital Library

[15]

Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P. Jouppi. 2012. NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 31, 7 (2012), 994--1007.

Digital Library

[16]

John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Arch. News 34, 4 (2006), 1--17.

Digital Library

[17]

Adwait Jog, Asit K. Mishra, Cong Xu, Yuan Xie, Vijaykrishnan Narayanan, Ravishankar Iyer, and Chita R. Das. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the Design Automation Conference.

Digital Library

[18]

Georgios Keramidas, Pavlos Petoumenos, and Stefanos Kaxiras. 2007. Cache replacement based on reuse-distance prediction. In Proceedings of the International Conference on Computer Design.

[19]

Samira M. Khan, Yingying Tian, and Daniel A. Jimenez. 2010. Sampling dead block prediction for last-level caches. In Proceedings of the International Symposium on Microarchitecture.

Digital Library

[20]

Hyunjin Lee, Sangyeun Cho, and Bruce R. Childers. 2011. CloudCache: Expanding and shrinking private caches. In Proceedings of the International Symposium on High Performance Computer Architecture.

Digital Library

[21]

Jianhua Li, Liang Shi, Chun Jason Xue, Chengmo Yang, and Yinlong Xu. 2011. Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache. In Proceedings of the Symposium on Embedded Systems for Real-Time Multimedia.

[22]

Qingan Li, Jianhua Li, Liang Shi, Chun Jason Xue, and Yanxiang He. 2012. MAC: Migration-aware compilation for STT-RAM based hybrid cache in embedded systems. In Proceedings of the International Symposium on Low Power Electronics and Design.

Digital Library

[23]

Qingan Li, Mengying Zhao, Chun Jason Xue, and Yanxiang He. 2012. Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache. In Proceedings of the International Conference on Languages, Compilers, Tools and Theory for Embedded Systems.

Digital Library

[24]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2013. The McPAT framework formulticore and manycore architectures: Simultaneously modeling power, area, and timing. ACM Trans. Arch. Code Optim. 10, 1 (2013), 5:1--5:29.

Digital Library

[25]

Asit K. Mishra, Xiangyu Dong, Guangyu Sun, Yuan Xie, Vijaykrishnan Narayanan, and Chita R. Das. 2011. Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs. In Proceedings of International Symposium in Computer Architecture.

Digital Library

[26]

Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. Technical Report HPL-2009-85. HP Laboratories.

[27]

Rasmus Pagh and Flemming Friche Rodler. 2001. Cuckoo hashing. In Proceedings of the European Symposium on Algorithms.

Digital Library

[28]

Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. 2006. A case for MLP-aware cache replacement. In Proceedings of International Symposium in Computer Architecture.

Digital Library

[29]

Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture.

Digital Library

[30]

Moinuddin K. Qureshi, David Thompson, and Yale N. Patt. 2005. The V-Way cache: Demand-based associativity via global replacement. In Proceedings of the International Symposium on Computer Architecture.

Digital Library

[31]

Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of the International Symposium on Microarchitecture.

Digital Library

[32]

Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of International Symposium in Computer Architecture.

Digital Library

[33]

Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of International Symposium in Computer Architecture.

Digital Library

[34]

André Seznec. 1993. A case for two-way skewed-associative caches. In Proceedings of International Symposium in Computer Architecture.

Digital Library

[35]

Clinton W. Smullen IV, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the International Symposium on High Performance Computer Architecture.

Digital Library

[36]

Chen Sun, Chia-Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li-Shiuan Peh, and Vladimir Stojanovic. 2012. DSENT-A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Proceedings of the International Symposium on Networks on Chip.

Digital Library

[37]

Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the International Symposium on High Performance Computer Architecture.

[38]

Zhenyu Sun, Xiuyuan Bi, Hai Li, Weng-Fai Wong, Zhong-Liang Ong, Xiaochun Zhu, and Wenqing Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the International Symposium on Microarchitecture.

Digital Library

[39]

Jue Wang, Xiangyu Dong, and Yuan Xie. 2013. OAP: An obstruction-aware cache management policy for STT-RAM last-level caches. In Proceedings of the Design, Automation and Test in Europe.

Digital Library

[40]

Zhe Wang, Daniel A. Jimenez, Cong Xu, Guangyu Sun, and Yuan Xie. 2013. Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In Proceedings of the International Symposium on High Performance Computer Architecture.

[41]

Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the International Symposium on Computer Architecture.

Digital Library

[42]

Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, and Yuan Xie. 2011. Power and performance of read-write aware hybrid caches with non-volatile memories. In Proceedings of the Design, Automation and Test in Europe.

Digital Library

[43]

Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the International Symposium on Computer Architecture.

Digital Library

[44]

Tianhao Zheng, Jaeyoung Park, Michael Orshansky, and Mattan Erez. 2013. Variable-energy write STT-RAM architecture with bit-wise write-completion monitoring. In Proceedings of the International Symposium on Low Power Electronics and Design.

Digital Library

[45]

Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the International Conference on Computer-Aided Design.

Digital Library

Cited By

Singh SSurana NPrasad KJain PMekie JAwasthi M(2023)HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache HierarchyACM Transactions on Architecture and Code Optimization10.1145/357283920:2(1-20)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1145/3572839
Badri SSaini MGoel N(2023)Efficient placement and migration policies for an STT-RAM based hybrid L1 cache for intermittently powered systemsDesign Automation for Embedded Systems10.1007/s10617-023-09272-w27:4(303-331)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s10617-023-09272-w
Choi JPark H(2021)Exploiting Bit-Level Write Patterns to Reduce Energy Consumption in Hybrid Cache ArchitectureIEICE Electronics Express10.1587/elex.18.20210327Online publication date: 2021
https://doi.org/10.1587/elex.18.20210327
Show More Cited By

Index Terms

Benzene: An Energy-Efficient Distributed Hybrid Cache Architecture for Manycore Systems
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Hardware

Recommendations

Improving the Performance of Hybrid Caches Using Partitioned Victim Caching

Non-Volatile Memory technologies are coming as a viable option on account of the high density and low-leakage power over the conventional SRAM counterpart. However, the increased write latency reduces their chances as a substitute for SRAM. To attenuate ...
High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policy
GLSVLSI '13: Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI

In recent years, NVM (non-volatile memory) technologies, such as STT-RAM (spin transfer torque RAM) and PRAM (phase change RAM), have drawn a lot of attention due to their low leakage and high density. However, both NVMs suffer from high write latency ...
SRAM- and STT-RAM-based hybrid, shared last-level cache for on-chip CPU---GPU heterogeneous architectures

Shared last-level cache (LLC) in on-chip CPU---GPU heterogeneous architectures is critical to the overall system performance, since CPU and GPU applications usually show completely different characteristics on cache accesses. Therefore, when co-running ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 15, Issue 1

March 2018

401 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3199680

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2018

Accepted: 01 December 2017

Revised: 01 November 2017

Received: 01 May 2017

Published in TACO Volume 15, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Samsung Electronics Co., Ltd

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
686
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)10

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Singh SSurana NPrasad KJain PMekie JAwasthi M(2023)HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache HierarchyACM Transactions on Architecture and Code Optimization10.1145/357283920:2(1-20)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1145/3572839
Badri SSaini MGoel N(2023)Efficient placement and migration policies for an STT-RAM based hybrid L1 cache for intermittently powered systemsDesign Automation for Embedded Systems10.1007/s10617-023-09272-w27:4(303-331)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s10617-023-09272-w
Choi JPark H(2021)Exploiting Bit-Level Write Patterns to Reduce Energy Consumption in Hybrid Cache ArchitectureIEICE Electronics Express10.1587/elex.18.20210327Online publication date: 2021
https://doi.org/10.1587/elex.18.20210327
Sarkar ASingh NVenkitaraman VSingh V(2021)DAM: Deadblock Aware Migration Techniques for STT-RAM-Based Hybrid CachesIEEE Computer Architecture Letters10.1109/LCA.2021.307171720:1(62-4)Online publication date: 1-Jan-2021
https://doi.org/10.1109/LCA.2021.3071717
Kuan KAdegbija T(2020)Energy-Efficient Runtime Adaptable L1 STT-RAM Cache DesignIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.291292039:6(1328-1339)Online publication date: Jun-2020
https://doi.org/10.1109/TCAD.2019.2912920
Shen FXu CZhang J(2020)Statistical Behavior Guided Block Allocation in Hybrid Cache-Based Edge Computing for Cyber-Physical-Social SystemsIEEE Access10.1109/ACCESS.2020.29723058(29055-29063)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2972305
Zhao Jia Watanabe (2019)Router-integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP SystemsElectronics10.3390/electronics81113638:11(1363)Online publication date: 17-Nov-2019
https://doi.org/10.3390/electronics8111363
Zhao HJia XWatanabe T(2019)Filter router: An enhanced router design for efficient stacked shared cache networkIEICE Electronics Express10.1587/elex.16.2019035816:14(20190358-20190358)Online publication date: 2019
https://doi.org/10.1587/elex.16.20190358
Donvanavard BMonazzah ADutt NMuck T(2018)Exploring Hybrid Memory Caches in Chip Multiprocessors2018 13th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)10.1109/ReCoSoC.2018.8449386(1-8)Online publication date: Jul-2018
https://doi.org/10.1109/ReCoSoC.2018.8449386

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents