Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Exploring the Role of Large Centralised Caches in Thermal Efficient Chip Design

Published: 28 June 2019 Publication History

Abstract

In the era of short channel length, Dynamic Thermal Management (DTM) has become a challenging task for the architects and designers engineering modern Chip Multi-Processors (CMPs). Ever-increasing demand of processing power along with the developed integration technology produces CMPs with high power density, which in turn increases effective chip temperature. This increased temperature leads to increase in the reliability issues for the chip-circuitry with significant increment in leakage power consumption. Recent DTM techniques apply DVFS or Task Migration to reduce temperature at the cores, the hottest on-chip components, but often ignore the on-chip hot caches. To commensurate the high data demand of these cores, most of the modern CMPs are equipped with large multi-level on-chip caches, out of which on-chip Last Level Caches (LLCs) occupy the largest on-chip area. These LLCs are accounted for their significantly high leakage power consumption that can also potentially generate on-chip hotspots at the LLCs similar to the cores. As power consumption constructs the backbone of heat dissipation, hence, this work dynamically shrinks cache size while maintaining performance constraint to reduce LLC leakage, primarily. These turned-off cache portions further work as on-chip thermal buffers for reducing average and peak temperature of the CMP without affecting the computation. Simulation results claim that, at a minimal penalty on the performance, proposed cache-based thermal management having 8MB centralised multi-banked shared LLC gives around 5°C reduction in peak and average chip temperature, which are comparable with a Greedy DVFS policy.

References

[1]
Oracle. 2011. Oracle’s SPARC T3-1, SPARC T3-2, SPARC T3-4, and SPARC T3-1B server architecture. Retrieved from: http://www.oracle.com/.
[2]
N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 33--42.
[3]
R. Ayoub and A. Orailoglu. 2010. Performance and energy efficient cache migration approach for thermal management in embedded systems. In Proceedings of the 20th Great Lakes Symposium on VLSI. 365--368.
[4]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 72--81.
[5]
T. Chantem, R. P. Dick, and X. S. Hu. 2008. Temperature-aware scheduling and assignment for hard real-time applications on MPSoCs. In Proceedings of the Conference on Design, Automation and Test in Europe. 288--293.
[6]
A. K. Coskun, T. Simunic Rosing, and K. Whisnant. 2007. Temperature aware task scheduling in MPSoCs. In Proceedings of the Conference on Design, Automation and Test in Europe. 1659--1664.
[7]
S. Das and H. K. Kapoor. 2016. A framework for block placement, migration, and fast searching in tiled-DNUCA architecture. ACM Trans. Des. Autom. Electron. Syst. 22, 1 (May 2016), 4:1--4:26.
[8]
H. Everett. 1963. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources. Oper. Res. 11, 3 (June 1963), 399--417.
[9]
S. Eyerman and L. Eeckhout. 2011. Fine-grained DVFS using on-chip regulators. ACM Trans. Archit. Code Optim. 8, 1 (Feb. 2011), 1:1--1:24.
[10]
M. Farahani and A. Baniasadi. 2009. Temperature reduction analysis in sentry tag cache systems. In Proceedings of the 10th Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture. 22--27.
[11]
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of the 29th International Symposium on Computer Architecture. 148--157.
[12]
Y. Ge, P. Malani, and Q. Qiu. 2010. Distributed task migration for thermal management in many-core systems. In Proceedings of the Design Automation Conference. 579--584.
[13]
Y. Ge, Q. Qiu, and Q. Wu. 2012. A multi-agent framework for thermal aware task migration in many-core systems. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 20, 10 (2012), 1758--1771.
[14]
V. Hanumaiah, S. Vrudhula, and K. S. Chatha. 2009. Maximizing performance of thermally constrained multi-core processors by dynamic voltage and frequency control. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design—Digest of Technical Papers. 310--313.
[15]
V. Hanumaiah, S. Vrudhula, and K. S. Chatha. 2011. Performance optimal online DVFS and task migration techniques for thermally constrained multi-core processors. IEEE Trans. Comput.-Aided Des. Integ. Circ. Syst. 30, 11 (Nov. 2011), 1677--1690.
[16]
H. Homayoun, M. Rahmatian, V. Kontorinis, S. Golshan, and D. M. Tullsen. 2012. Hot peripheral thermal management to mitigate cache temperature variation. In Proceedings of the 13th International Symposium on Quality Electronic Design. 755--763.
[17]
S. Kaxiras, Z. Hu, and M. Martonosi. 2001. Cache decay: Exploiting generational behavior to reduce cache leakage power. In Proceedings of the 28th International Symposium on Computer Architecture. 240--251.
[18]
W. Kim, M. S. Gupta, G. Y. Wei, and D. Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceedings of the 14th IEEE International Symposium on High Performance Computer Architecture. 123--134.
[19]
J. Kong, S. W. Chung, and K. Skadron. 2012. Recent thermal management techniques for microprocessors. ACM Comput. Surv. 44, 3 (June 2012), 13:1--13:42.
[20]
J. C. Ku, S. Ozdemir, G. Memik, and Y. Ismail. 2005. Thermal management of on-chip caches through power density minimization. In Proceedings of the 38th IEEE/ACM International Symposium on Microarchitecture.
[21]
J. Lee and N. S. Kim. 2012. Analyzing potential throughput improvement of power- and thermal-constrained multicore processors by exploiting DVFS and PCPG. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 20, 2 (Feb. 2012), 225--235.
[22]
S. Lee, K. Kang, and C. M. Kyung. 2015. Runtime thermal management for 3-D chip-multiprocessors with hybrid SRAM/MRAM L2 cache. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 23, 3 (Mar. 2015), 520--533.
[23]
L. Li, I. Kadayif, Y. F. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and A. Sivasubramaniam. 2002. Leakage energy management in cache hierarchies. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 131--140.
[24]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd IEEE/ACM International Symposium on Microarchitecture (MICRO’09). 469--480.
[25]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. 2002. Simics: A full system simulation platform. Computer 35, 2 (Feb. 2002), 50--58.
[26]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. 2005. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 4 (2005), 92--99.
[27]
M. Martonosi, S. Malik, and F. Xie. 2005. Efficient behavior-driven runtime dynamic voltage scaling policies. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 105--110.
[28]
A. Mirtar, S. Dey, and A. Raghunathan. 2015. Joint work and voltage/frequency scaling for quality-optimized dynamic thermal management. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 23, 6 (June 2015), 1017--1030.
[29]
S. Mittal. 2014. A survey of architectural techniques for improving cache power efficiency. Sustain. Comput.: Inform. Syst. 4, 1 (2014), 33--43.
[30]
H. Mizunuma, Y. C. Lu, and C. L. Yang. 2013. Thermal coupling aware task migration using neighboring core search for many-core systems. In Proceedings of the International Symposium on VLSI Design, Automation, and Test (VLSI-DAT’13). 1--4.
[31]
R. Balasubramonian N. Muralimanohar and N. P. Jouppi. 2007. CACTI 6.0: A tool to model large caches. Technical Report HPL-2009-85, HP Laboratories.
[32]
H. Noori, M. Goudarzi, K. Inoue, and K. Murakami. 2007. The effect of temperature on cache size tuning for low energy embedded systems. In Proceedings of the 17th ACM Great Lakes Symposium on VLSI.
[33]
H. Noori, M. Goudarzi, K. Inoue, and K. Murakami. 2008. Improving energy efficiency of configurable caches via temperature-aware configuration selection. In Proceedings of the IEEE Computer Society Symposium on VLSI. 363--368.
[34]
M. Powell, S. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the International Symposium on Low Power Electronics and Design.
[35]
R. Rao, S. Vrudhula, C. Chakrabarti, and N. Chang. 2006. An optimal analytical solution for processor speed control with thermal constraints. In Proceedings of the International Symposium on Low Power Electronics and Design. 292--297.
[36]
B. Salami, M. Baharani, and H. Noori. 2014. Proactive task migration with a self-adjusting migration threshold for dynamic thermal management of multi-core processors. J. Supercomput. 68, 3 (June 2014), 1068--1087.
[37]
A. Sembrant, E. Hagersten, and D. Black-Shaffer. 2013. TLC: A tag-less cache for reducing dynamic first level cache energy. In Proceedings of the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO’13). 49--61.
[38]
K. Stavrou and P. Trancoso. 2005. TSIC: Thermal scheduling simulator for chip multiprocessors. In Proceedings of the 10th Panhellenic Conference on Advances in Informatics. Springer-Verlag, 589--599.
[39]
G. Sun, X. Wu, and Y. Xie. 2009. Exploration of 3D stacked L2 cache design for high performance and efficient thermal control. In Proceedings of the International Symposium on Low Power Electronics and Design.
[40]
G. Sun, H. Yang, and Y. Xie. 2012. Performance/Thermal-Aware design of 3D-stacked L2 caches for CMPs. ACM Trans. Des. Autom. Electron. Syst. 17, 2 (Apr. 2012).
[41]
H. Wang, J. Ma, S. X.-D. Tan, C. Zhang, H. Tang, K. Huang, and Z. Zhang. 2016. Hierarchical dynamic thermal management method for high-performance many-core microprocessors. ACM Trans. Des. Autom. Electron. Syst. 22, 1 (Aug. 2016), 1:1--1:21.
[42]
H. Wang, X. Zhu, L. Peh, and S. Malik. 2002. Orion: A power-performance simulator for interconnection networks. In Proceedings of the 35th IEEE/ACM International Symposium on Microarchitecture (MICRO’02). 294--305.
[43]
Fen Xie, M. Martonosi, and S. Malik. 2005. Bounds on power savings using runtime dynamic voltage scaling: An exact algorithm and a linear-time heuristic approximation. In Proceedings of the International Symposium on Low Power Electronics and Design. 287--292.
[44]
W. Zang and A. Gordon-Ross. 2013. A survey on cache tuning from a power/energy perspective. ACM Comput. Surv. 45, 3 (July 2013), 32:1--32:49.
[45]
F. Zanini, D. Atienza, C. N. Jones, L. Benini, and G. De Micheli. 2013. Online thermal control methods for multiprocessor systems. ACM Trans. Des. Autom. Electron. Syst. 18, 1, Article 6 (Jan. 2013), 6:1--6:26.
[46]
R. Zhang, M. R. Stan, and K. Skadron. 2015. HotSpot 6.0: Validation, Acceleration and Extension. Technical Report CS-2015-04. University of Virginia.
[47]
Y. Zhang, L. Li, Z. Lu, A. Jantsch, M. Gao, H. Pan, and F. Han. 2014. A survey of memory architecture for 3D chip multi-processors. Microprocess. Microsyst. 38, 5 (July 2014), 415--430.

Cited By

View all
  • (2024)TREAFET: Temperature-Aware Real-Time Task Scheduling for FinFET based MulticoresACM Transactions on Embedded Computing Systems10.1145/3665276Online publication date: 16-May-2024
  • (2024)ARCTIC: Approximate Real-Time Computing in a Cache-Conscious Multicore EnvironmentIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338444243:10(2944-2957)Online publication date: 1-Oct-2024
  • (2023)DELICIOUS: Deadline-Aware Approximate Computing in Cache-Conscious MulticoreIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322875134:2(718-733)Online publication date: 1-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 24, Issue 5
September 2019
282 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3339837
  • Editor:
  • Naehyuck Chang
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 28 June 2019
Accepted: 01 May 2019
Revised: 01 May 2019
Received: 01 July 2017
Published in TODAES Volume 24, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cache memory
  2. IPC
  3. Last Level Cache(LLC)
  4. chip multi-processors(CMPs)
  5. dynamic power
  6. hotspot
  7. leakage power
  8. reconfiguration time
  9. temperature
  10. thermal buffer

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TREAFET: Temperature-Aware Real-Time Task Scheduling for FinFET based MulticoresACM Transactions on Embedded Computing Systems10.1145/3665276Online publication date: 16-May-2024
  • (2024)ARCTIC: Approximate Real-Time Computing in a Cache-Conscious Multicore EnvironmentIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338444243:10(2944-2957)Online publication date: 1-Oct-2024
  • (2023)DELICIOUS: Deadline-Aware Approximate Computing in Cache-Conscious MulticoreIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322875134:2(718-733)Online publication date: 1-Feb-2023
  • (2022)ACCURATE: Accuracy Maximization for Real-Time Multicore Systems With Energy-Efficient Way-Sharing CachesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.316140741:12(5246-5260)Online publication date: Dec-2022
  • (2021)Prepare: Power-Aware Approximate Real-time Task Scheduling for Energy-Adaptive QoS MaximizationACM Transactions on Embedded Computing Systems10.1145/347699320:5s(1-25)Online publication date: 23-Sep-2021
  • (2021)WaFFLeACM Transactions on Architecture and Code Optimization10.1145/347190818:4(1-25)Online publication date: 3-Sep-2021
  • (2021)Developing a Multicore Platform Utilizing Open RISC-V CoresIEEE Access10.1109/ACCESS.2021.31084759(120010-120023)Online publication date: 2021
  • (2020)RePAiR: A Strategy for Reducing Peak Temperature while Maximising Accuracy of Approximate Real-Time Computing: Work-in-Progress2020 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)10.1109/CODESISSS51650.2020.9244040(8-10)Online publication date: 20-Sep-2020
  • (2020)Efficient Cache Resizing policy for DRAM-based LLCs in ChipMultiprocessorsJournal of Systems Architecture10.1016/j.sysarc.2020.101886(101886)Online publication date: Sep-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media