Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1874620.1874807acmconferencesArticle/Chapter ViewAbstractPublication PagesdateConference Proceedingsconference-collections
research-article

Group-caching for NoC based multicore cache coherent systems

Published: 20 April 2009 Publication History

Abstract

Most CMPs use on-chip networks to connect cores and tend to integrate more simple cores on a single die. Low-radix networks, such as 2D-MESH, are widely used in tiled CMPs since they can be mapped to on-chip networks efficiently. However, low-radix networks introduce high network latency caused by long diameter. In this paper, we propose the use of group-caching design in NoC based multicore cache coherent systems. In our design, on-chip L2 banks are organized to form multiple groups. Each cache group behaves like a shared L2 cache for the cores inside cache group while the cache coherence between cache groups is maintained by coherence messages. Besides, group-caching also adopts the new cache replacement policy to improve the inefficient use of the aggregate L2 cache capacity. Compared to banked and shared L2 design, as most L2 accesses are served by local cache group, the hop count is significantly reduced. Experiment results based on full-system simulation show that for 2D-MESH, group-caching can increase the performance by 2%~8% compared to banked and shared L2 design, with network energy consumption reduced by 11%~13%. Experiment results also show that the communication overhead inside cache group plays an important role in the performance of group-caching.

References

[1]
M. B. Taylor, J. Psota, A. Saraf, N. Shnidman, V. Strumpen, M. Frank, "Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams" 31st Annual International Symposium on Computer Architecture, 2004. Proceedings. pp. 2--13.
[2]
P. Gratz, Kim Changkyu, K. Sankaralingam, H. Hanson, P. Shivakumar, S. W. Keckler, D. Burger, "On-Chip Interconnection Networks of the TRIPS Chip" Micro, IEEE Volume 27, Issue 5, pp. 41--50, 2007.
[3]
Y. Hoskote, S. Vangal, A. Singh, N. Borkar, S. Borkar, "A 5-GHz Mesh Interconnect for a Teraflops Processor" Micro, IEEE Volume 27, Issue 5, pp. 51--61, 2007.
[4]
D. Wentzlaff, P. Griffin, H. Hoffmann, Bao Liewei, B. Edwards, C. Ramey, M. Mattina, Miao Chyi-Chang J. F. Brown, A. Agarwal, "On-Chip Interconnection Architecture of the Tile Processor" Micro, IEEE Volume 27, Issue 5, pp. 15--31, 2007.
[5]
J. Kim, J. Balfour, W. J. Dally, "Flattened Butterfly Topology for On-Chip Networks" 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007. Preceedings, pp. 172--182.
[6]
J. Balfour, W. J. Dally, "Design tradeoffs for tiled CMP on-chip networks" 20th Annual International Conference on Supercomputing, 2006. Preceedings, pp. 187--198.
[7]
V. Soteriou, Wang Hangsheng, L. Peh, "A Statistical Traffic Model for On-Chip Interconnection Networks" 14th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2006. Preceedings, pp. 104--116.
[8]
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, B. Verghese, "Piranha: a scalable architecture based on single-chip multiprocessing" 27th International Symposium on Computer Architecture, 2000. Proceedings, pp. 282--293.
[9]
P. Kongetira, K. Aingaran, "Niagara: a 32-way multithreaded Sparc processor" Micro, IEEE Volume 25, Issue 2, 2005. Preceedings, pp. 21--29.
[10]
Chang Jichuan, G. S. Sohi, "Cooperative Caching for Chip Multiprocessors" 33rd International Symposium on Computer Architecture, 2006. Preceedings, pp. 264--276.
[11]
http://www.princeton.edu/~niketa/publications/garnet-tech-report.pdf.
[12]
P. Guerrier, A. Greiner, "A generic architecture for on-chip packet-switched interconnections" Design, Automation and Test in Europe Conference and Exhibition 2000. Proceedings, pp. 250--256.
[13]
Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood, "Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset" Computer Architecture News (CAN), September 2005.
[14]
M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, B. Werner, "Simics: A full system simulation platform Magnusson" Computer Volume 35, Issue 2, 2002. pp. 50--58.
[15]
A. R. Alameldeen, M. M. K. Martin, C. J. Mauer, K. E. Moore, M. Xu, D. J. Sorin, M. D. Hill, and D. A. Wood. "Simulating a $2M commercial server on a $2K PC. IEEE Computer" 2003. pp. 50--57.
[16]
Wang Hang-Sheng, Zhu Xinping, Peh Li-Shiuan, S. Malik, "Orion: a power-performance simulator for interconnection networks" 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002.Proceedings, pp. 294--305.
[17]
P. Barford, Mark Crovella, "Generating representative Web workloads for network and server performance evaluation" 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems table of contents. Proceedings, pp. 151--160.
[18]
Minglong Shao, Anastassia Ailamaki, Babak Falsafi, "DBmbench: Fast and Accurate Database Workload Representation on Modern Microarchitecture" Conference of the Centre for Advanced Studies on Collaborative Research 2005.
[19]
Cheng Liqun, N. Muralimanohar, K. Ramani, R. Balasubramonian, J. B. Carter, "Interconnect-Aware Coherence Protocols for Chip Multiprocessors" 33rd International Symposium on Computer Architecture, 2006. Preceedings, pp. 339--351.
[20]
Z. Chishti, M. D. Powell, T. N. Vijaykumar, "Optimizing replication, communication, and capacity allocation in CMPs" 32nd International Symposium on Computer Architecture, 2005. Proceedings, pp. 357--368.
[21]
M. Zhang and K. Asanovic "Victim replication: Maximizing capacity while hiding wire delay in tiled CMPs" 32nd International Symposium on Computer Architecture, 2005. Preceedings, pp. 336--345.
[22]
B. M. Beckmann and D. A. Wood. "Managing wire delay in large chip-multiprocessor caches", International Symposium on Microarchitecture, 2004. Preceedings, pp. 319--330.

Cited By

View all
  • (2017)Exploring grouped coherence for clustered hierarchical cacheThe Journal of Supercomputing10.1007/s11227-017-2024-873:9(4137-4157)Online publication date: 1-Sep-2017
  • (2014)Dual partitioning multicasting for high-performance on-chip networksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.07.00274:1(1858-1871)Online publication date: 1-Jan-2014
  • (2012)An optimized multicore cache coherence design for exploiting communication localityProceedings of the great lakes symposium on VLSI10.1145/2206781.2206797(59-62)Online publication date: 3-May-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DATE '09: Proceedings of the Conference on Design, Automation and Test in Europe
April 2009
1776 pages
ISBN:9783981080155

Sponsors

  • EDAA: European Design Automation Association
  • ECSI
  • EDAC: Electronic Design Automation Consortium
  • SIGDA: ACM Special Interest Group on Design Automation
  • The IEEE Computer Society TTTC
  • The IEEE Computer Society DATC
  • The Russian Academy of Sciences: The Russian Academy of Sciences

Publisher

European Design and Automation Association

Leuven, Belgium

Publication History

Published: 20 April 2009

Check for updates

Author Tags

  1. CMP
  2. L2 banks
  3. NOC
  4. cache coherence
  5. group-caching
  6. network latency
  7. performance
  8. power

Qualifiers

  • Research-article

Conference

DATE '09
Sponsor:
  • EDAA
  • EDAC
  • SIGDA
  • The Russian Academy of Sciences

Acceptance Rates

Overall Acceptance Rate 518 of 1,794 submissions, 29%

Upcoming Conference

DATE '25
Design, Automation and Test in Europe
March 31 - April 2, 2025
Lyon , France

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Exploring grouped coherence for clustered hierarchical cacheThe Journal of Supercomputing10.1007/s11227-017-2024-873:9(4137-4157)Online publication date: 1-Sep-2017
  • (2014)Dual partitioning multicasting for high-performance on-chip networksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.07.00274:1(1858-1871)Online publication date: 1-Jan-2014
  • (2012)An optimized multicore cache coherence design for exploiting communication localityProceedings of the great lakes symposium on VLSI10.1145/2206781.2206797(59-62)Online publication date: 3-May-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media