Article

A flexible data to L2 cache mapping approach for future multicore processors

Authors:

Sangyeun ChoAuthors Info & Claims

MSPC '06: Proceedings of the 2006 workshop on Memory system performance and correctness

Pages 92 - 101

https://doi.org/10.1145/1178597.1178613

Published: 22 October 2006 Publication History

Abstract

This paper proposes and studies a distributed L2 cache management approach through page-level data to cache slice mapping in a future processor chip comprising many cores. L2 cache management is a crucial multicore processor design aspect to overcome non-uniform cache access latency for high program performance and to reduce on-chip network traffic and related power consumption. Unlike previously studied "pure" hardware-based private and shared cache designs, the proposed OS-microarchitecture approach allows mimicking a wide spectrum of L2 caching policies without complex hardware support. Moreover, processors and cache slices can be isolated from each other without hardware modifications, resulting in improved chip reliability characteristics. We discuss the key design issues and implementation strategies of the proposed approach, and present an experimental result showing the promise of it.

References

[1]

T. Austin, E. Larson, and D. Ernst. "SimpleScalar: An Infrastructure for Computer System Modeling," IEEE Computer, 35(2):59--67, Feb. 2002.

Digital Library

[2]

M. J. Bach. Design of the UNIX Operating System, Prentice Hall, Feb. 1987.

Digital Library

[3]

S. Borkar et al. "Platform 2015: Intel Processor and Platform Evolution for the Next Decade," Technology@Intel Magazine, March 2005.

[4]

D. Burger and J. R. Goodman. "Billion-Transistor Architectures: There and Back Again." IEEE Computer, 37(3):22--28, March 2004.

Digital Library

[5]

J. Chang and G. S. Sohi. "Cooperative Caching for Chip Multiprocessors," Proc. Int'l Symp. Computer Architecture, pp. 264--276, June 2006.

Digital Library

[6]

Z. Chishti, M. D. Powell, and T. N. Vijaykumar. "Optimizing Replication, Communication, and Capacity Allocation in CMPs," Proc. Int'l Symp. Computer Architecture, pp.357--368, June 2005.

Digital Library

[7]

J. Huh, D. Burger, and S. W. Keckler. "Exploring the Design Space of Future CMPs," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 199--210, Sept. 2001.

Digital Library

[8]

J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. "A NUCA Substrate for Flexible CMP Cache Sharing," Proc. Int'l Conf. Supercomputing, pp. 31--40, June 2005.

Digital Library

[9]

Intel Corp. "A New Era of Architectural Innovation Arrives with Intel Dual-Core Processors," Technology@Intel Magazine, May 2005.

[10]

ITRS (Int'l Technology Roadmap for Semiconductors). 2005 Edition, http://public.itrs.net.

[11]

R. Iyer. "CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms," Proc. Int'l Conf. Supercomputing, pp. 257--266, June 2004.

Digital Library

[12]

A. Jaleel, M. Mattina, and B. Jacob. "Last Level Cache (LLC) Performance of Data Mining Workloads on a CMP - A Case Study of Parallel Bioinformatics Workloads," Proc. Int'l Symp. High-Perf. Computer Arch., pp. 88--98, Feb. 2006.

[13]

R. E. Kessler and M. D. Hill. "Page Placement Algorithms for Large Real-Indexed Caches," ACM Trans. Computer Systems, 10(4):338--359, Nov. 1992.

Digital Library

[14]

C. Kim, D. Burger, and S. W. Keckler. "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," Proc. Int'l Conf. Architectural Support for Prog. Languages and Operating Systems, pp. 211--222, Oct. 2002.

Digital Library

[15]

J. Kong and G. Lee. "Relaxing the Inclusion Property in Cache Only Memory Architecture," Proc. Euro-Par, pp. 435--444, August 1996.

Digital Library

[16]

P. Kongetira, K. Aingaran, and K. Olukotun. "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro, 25(2):21--29, March-April 2005.

Digital Library

[17]

C. Liu, A. Sivasubramaniam, and M. Kandemir. "Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs," Proc. Int'l Symp. High-Performance Computer Architecture, pp. 176--185, Feb. 2004.

Digital Library

[18]

B. A. Nayfeh and K. Olukotun. "Exploring the Design Space for a Shared-Cache Multiprocessor," Proc. Int'l Symp. Computer Architecture, pp. 166--175, April 1994.

Digital Library

[19]

B. A. Nayfeh, K. Olukotun, and J. P. Singh. "The Impact of Shared-Cache Clustering in Small-Scale Shared-Memory Multiprocessors," Proc. Int'l Symp. High-Performance Computer Architecture, pp. 74--84, Feb. 1996.

Digital Library

[20]

T. Sherwood, B. Calder, and J. Emer. "Reducing Cache Misses Using Hardware and Software Page Placement," Proc. Int'l Conf. Supercomputing, pp. 155--164, June 1999.

Digital Library

[21]

B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, J. B. Joyner. "POWER5 System Microarchitecture," IBM J. Res. & Dev., 49(4): 505--521, July. 2005.

Digital Library

[22]

E. Speight, H. Shafi, L. Zhang, and R. Rajamony. "Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors," Proc. Int'l Symp. Computer Architecture, pp. 346--356, June 2005.

Digital Library

[23]

J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers. "The Impact of Technology Scaling on Lifetime Reliability," Proc. Int'l Conf. Dependable Systems and Networks, pp. 177--186, June 2004.

Digital Library

[24]

Standard Performance Evaluation Corporation. http://www.specbench.org.

[25]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. Int'l Symp. Computer Architecture, pp. 24--36, July 1995.

Digital Library

[26]

M. Zhang and K. Asanović. "Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors," Proc. Int'l Symp. Computer Architecture, pp. 336--345, June 2005.

Digital Library

Cited By

Zhao XAdileh AYu ZWang ZJaleel AEeckhout LManne SHunter HAltman E(2019)Adaptive memory-side last-level GPU cachingProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322235(411-423)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322235
Ding WTang XKandemir MZhang YKultursay E(2015)Optimizing off-chip accesses in multicoresACM SIGPLAN Notices10.1145/2813885.273798950:6(131-142)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2813885.2737989
Ding WTang XKandemir MZhang YKultursay EGrove DBlackburn S(2015)Optimizing off-chip accesses in multicoresProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737989(131-142)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2737924.2737989
Show More Cited By

Index Terms

A flexible data to L2 cache mapping approach for future multicore processors

Recommendations

An adaptive migration---replication scheme (AMR) for shared cache in chip multiprocessors

Most of today's chip multiprocessors implement last-level shared caches as non-uniform cache architectures. A major problem faced by such multicore architectures is cache line placement, especially in scenarios where multiple cores compete for line ...
The auction: optimizing banks usage in Non-Uniform Cache Architectures
ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing

The growing influence of wire delay in cache design has meant that access latencies to last-level cache banks are no longer constant. Non-Uniform Cache Architectures (NU-CAs) have been proposed to address this problem. Furthermore, an efficient last-...
Dynamic cache clustering for chip multiprocessors
ICS '09: Proceedings of the 23rd international conference on Supercomputing

This paper proposes DCC (Dynamic Cache Clustering), a novel distributed cache management scheme for large-scale chip multiprocessors. Using DCC, a per-core cache cluster is comprised of a number of L2 cache banks and cache clusters are constructed, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSPC '06: Proceedings of the 2006 workshop on Memory system performance and correctness

October 2006

114 pages

ISBN:1595935789

DOI:10.1145/1178597

General Chair:
Antony Hosking
Purdue U
,
Program Chair:
Ali-Reza Adl-Tabatabai
Intel

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MSPC '06

Sponsor:

SIGPLAN

MSPC '06: ACM SIGPLAN Workshop on Memory Systems Performance and Correctness 2006

October 22, 2006

California, San Jose

Acceptance Rates

Overall Acceptance Rate 6 of 20 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
939
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao XAdileh AYu ZWang ZJaleel AEeckhout LManne SHunter HAltman E(2019)Adaptive memory-side last-level GPU cachingProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322235(411-423)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322235
Ding WTang XKandemir MZhang YKultursay E(2015)Optimizing off-chip accesses in multicoresACM SIGPLAN Notices10.1145/2813885.273798950:6(131-142)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2813885.2737989
Ding WTang XKandemir MZhang YKultursay EGrove DBlackburn S(2015)Optimizing off-chip accesses in multicoresProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737989(131-142)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2737924.2737989
Ding WLiu JKandemir MIrwin MFensch CO'Boyle MSeznec ABodin F(2013)Reshaping cache misses to improve row-buffer locality in multicore systemsProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523754(235-244)Online publication date: 7-Oct-2013
https://dl.acm.org/doi/10.5555/2523721.2523754
Wei Ding Jun Liu Kandemir MIrwin M(2013)Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnectProceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2013.6618820(309-318)Online publication date: Oct-2013
https://doi.org/10.1109/PACT.2013.6618820
Jinglei Wang Dongsheng Wang Haixia Wang Yibo Xue (2012)Dynamic reusability-based replication with network address mapping in CMPs17th Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2012.6165002(487-492)Online publication date: Jan-2012
https://doi.org/10.1109/ASPDAC.2012.6165002
Kandemir MZhang YLiu JYemliha T(2011)Neighborhood-aware data locality optimization for NoC-based multicoresProceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2190025.2190066(191-200)Online publication date: 2-Apr-2011
https://dl.acm.org/doi/10.5555/2190025.2190066
Zhang YKandemir MYemliha T(2011)Studying inter-core data reuse in multicoresACM SIGMETRICS Performance Evaluation Review10.1145/2007116.200712039:1(25-36)Online publication date: 7-Jun-2011
https://dl.acm.org/doi/10.1145/2007116.2007120
Zhang YKandemir MYemliha TMerchant AKeeton KRubenstein D(2011)Studying inter-core data reuse in multicoresProceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems10.1145/1993744.1993748(25-36)Online publication date: 7-Jun-2011
https://dl.acm.org/doi/10.1145/1993744.1993748
Choi IZhao MYang XYeung D(2011)Experience with Improving Distributed Shared Cache Performance on Tilera's Tile ProcessorIEEE Computer Architecture Letters10.1109/L-CA.2011.1810:2(45-48)Online publication date: 1-Jul-2011
https://dl.acm.org/doi/10.1109/L-CA.2011.18
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents