Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2523721.2523760acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems

Published: 07 October 2013 Publication History

Abstract

This paper introduces a new coherence protocol that addresses the challenges of complex multilevel cache hierarchies in future many-core systems. In order to keep coherence protocol complexity bounded, inclusiveness is required to track coherence information across levels in this type of systems, but this might introduce unsustainable costs for directory structures. Cost reduction decisions taken to reduce this complexity may introduce artificial inefficiencies in the on-chip cache hierarchy, especially when the number of cores and private caches size is large. The coherence protocol presented in this work, denoted MOSAIC, introduces a new approach to tackle this problem. In energy terms, the protocol scales like a conventional directory coherence protocol, but relaxes the shared information inclusiveness. This allows the performance implications of directory size and associativity reduction to be overcome. Contrary to the common belief that inclusiveness is inescapable when attempting to maintain complexity constrained, MOSAIC is even simpler than a conventional directory. The results of our evaluation show that the approach is quite insensitive, in terms of performance and energy expenditure, to the size and associativity of the directory.

References

[1]
M. M. K. Martin, M. D. Hill, and D. J. Sorin, "Why on-chip cache coherence is here to stay," Communications of the ACM, vol. 55, no. 7, p. 78, Jul. 2012.
[2]
R. Kalla, B. Sinharoy, W. J. Starke, and M. Floyd, "Power7: IBM's Next-Generation Server Processor," IEEE Micro, vol. 30, no. 2, pp. 7--15, 2010.
[3]
"Tilera. TILE-Gx 3000 Series Overview.," 2011.
[4]
M. Butler, "AMD 'Bulldozer' Core - a new approach to multithreaded compute performance for maximum efficiency and throughput," in IEEE HotChips Symposium on High-Performance Chips (HotChips 2010), 2010.
[5]
N. Kurd, J. Douglas, P. Mosalikanti, and R. Kumar, "Next generation Intel® micro-architecture (Nehalem) clocking architecture," in IEEE Symposium on VLSI Circuits, 2008, pp. 62--63.
[6]
J. L. Shin, H. Park, H. Li, A. Smith, Y. Choi, H. Sathianathan, S. Dash, S. Turullols, S. Kim, R. Masleid, G. Konstadinidis, R. Golla, M. J. Doherty, G. Grohoski, and C. McAllister, "The next-generation 64b SPARC core in a T4 SoC processor," IEEE Journal of Solid-State Circuits, vol. 48, no. 1, pp. 82--90, Feb. 2013.
[7]
B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin, "Scaling the bandwidth wall: challenges in and avenues for CMP scaling," in 36th International Symposium on Computer Architecture (ISCA'09), 2009, vol. 37, no. 3, pp. 371--382.
[8]
F. Busaba, M. A. Blake, B. Curran, M. Fee, C. Jacobi, P.-K. Mak, B. R. Prasky, and C. R. Walters, "IBM zEnterprise 196 microprocessor and cache subsystem," IBM Journal of Research and Development, vol. 56, no. 1, pp. 1:1--1:12, Jan. 2012.
[9]
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes, "Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor," IEEE Micro, vol. 30, no. 2, pp. 16--29, 2010.
[10]
A. Raghavan, C. Blundell, and M. M. K. Martin, "Token tenure: PATCHing token counting using directory-based cache coherence," in 41st IEEE/ACM International Symposium on Microarchitecture, 2008, pp. 47--58.
[11]
S. Przybylski, M. Horowitz, and J. Hennessy, "Characteristics Of Performance-Optimal Multi-level Cache Hierarchies," in 16th International Symposium on Computer Architecture (ISCA'89), 1989, pp. 114--121.
[12]
A. Jaleel, E. Borch, M. Bhandaru, S. C. Steely Jr., and J. Emer, "Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies," in 43rd IEEE/ACM International Symposium on Microarchitecture, 2010, pp. 151--162.
[13]
A. Gupta, W. Weber, and T. Mowry, "Reducing memory and traffic requirements for scalable directory-based cache coherence schemes," Springer US, pp. 167--192, 1992.
[14]
M. M. K. M. K. Martin, M. D. D. Hill, and D. a. A. Wood, "Token Coherence: Decoupling Performance and Correctness," in 30th International Symposium on Computer Architecture (ISCA'03), 2003, pp. 182--193.
[15]
J.-L. Baer and W.-H. Wang, "On the inclusion properties for multi-level cache hierarchies," ACM SIGARCH Computer Architecture News, vol. 16, no. 2, pp. 73--80, May 1988.
[16]
J. Zebchuk, V. Srinivasan, M. K. Qureshi, and A. Moshovos, "A tagless coherence directory," in 42nd IEEE/ACM International Symposium on Microarchitecture, 2009, pp. 423--434.
[17]
"OpenSPARC TM T2 system-on-chip (SoC) microarchitecture specification," 2008.
[18]
D. Sanchez and C. Kozyrakis, "SCD: A scalable coherence directory with flexible sharer set encoding," in 18th IEEE International Symposium on High Performance Computer Architecture, 2012, pp. 1--12.
[19]
B. A. Cuesta, A. Ros, M. E. Gómez, A. Robles, and J. F. Duato, "Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks," in 38th International Symposium on Computer Architecture (ISCA'11), 2011, pp. 93--104.
[20]
D. Sanchez and C. Kozyrakis, "The ZCache: Decoupling Ways and Associativity," in 43rd IEEE/ACM International Symposium on Microarchitecture, 2010, pp. 187--198.
[21]
M. Ferdman, P. Lotfi-Kamran, K. Balet, and B. Falsafi, "Cuckoo directory: A scalable directory for many-core systems," in 2011 IEEE 17th International Symposium on High Performance Computer Architecture, 2011, pp. 169--180.
[22]
H. Zhao, A. Shriraman, S. Dwarkadas, and V. Srinivasan, "SPATL: Honey, I Shrunk the Coherence Directory," in 20th International Conference on Parallel Architectures and Compilation Techniques (PACT'11), 2011, pp. 33--44.
[23]
J. H. Kelm, M. R. Johnson, S. S. Lumetta, and S. J. Patel, "WayPoint: Scaling Coherence to 1000-core Architectures," in 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10), 2010, pp. 99--110.
[24]
M. M. K. Martin, M. D. Hill, and D. A. Wood, "Token Coherence: a new framework for shared-memory multiprocessors," IEEE Micro, vol. 23, no. 6, pp. 108--116, 2003.
[25]
L. G. Menezo, V. Puente, P. Abad, and J. A. Gregorio, "Improving coherence protocol reactiveness by trading bandwidth for latency," in 9th ACM International Conference on Computing Frontiers (CF'12), 2012, pp. 143--152.
[26]
D. J. Sorin, M. Plakal, A. E. Condon, M. D. Hill, M. M. K. Martin, and D. A. Wood, "Specifying and verifying a broadcast and a multicast snooping cache coherence protocol," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 6, pp. 556--578, Jun. 2002.
[27]
"Mosaic Protocol Specification." {Online}. Available: http://www.atc.unican.es/galerna/mosaic.
[28]
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler, "A NUCA substrate for flexible CMP cache sharing," IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 8, pp. 1028--1040, 2007.
[29]
N. E. Jerger, L. S. Peh, and M. Lipasti, "Virtual circuit tree multicasting: A case for on-chip hardware multicast support," in 35th International Symposium on Computer Architecture (ISCA'08), 2008, pp. 229--240.
[30]
A. R. Alameldeen, M. M. K. Martin, C. J. Mauer, K. E. Moore, M. D. Hill, D. A. Wood, and D. J. Sorin, "Simulating a $2M Commercial Server on a $2K PC," Computer, vol. 36, no. 2, pp. 50--57, Feb. 2003.
[31]
H. Jin, M. Frumkin, and J. Yan, "The OpenMP Implementation of NAS Parallel Benchmarks and its Performance," NAS Technical Report NAS-99-011, NASA Ames Research Center, Moffett Field, CA, 1999.
[32]
SPEC Standard Performance Evaluation Corporation, "SPEC 2006." {Online}. Available: http://www.spec.org.
[33]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood, "Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset," Computer Architecture News, 2005.
[34]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0," in 40th IEEE/ACM International Symposium on Microarchitecture, 2007, pp. 3--14.
[35]
C. Sun, C.-H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L.-S. Peh, and V. Stojanovic, "DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling," 6th IEEE/ACM International Symposium on Networks-on-Chip, pp. 201--210, 2012.
[36]
G. H. Loh and M. D. Hill, "Efficiently enabling conventional block sizes for very large die-stacked DRAM caches," in 44th IEEE/ACM International Symposium on Microarchitecture, 2011, pp. 454--464.

Cited By

View all
  • (2018)MosaicInternational Journal of Parallel Programming10.1007/s10766-018-0557-y46:6(1110-1138)Online publication date: 1-Dec-2018
  • (2017)An adaptive cache coherence protocolJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.020102:C(163-174)Online publication date: 1-Apr-2017
  • (2016)DiSquawkProceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools10.1145/2972206.2972212(1-12)Online publication date: 29-Aug-2016

Index Terms

  1. The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
    October 2013
    422 pages
    ISBN:9781479910212

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 07 October 2013

    Check for updates

    Author Tags

    1. CMPs
    2. coherence protocol
    3. multi-core

    Qualifiers

    • Research-article

    Acceptance Rates

    PACT '13 Paper Acceptance Rate 36 of 208 submissions, 17%;
    Overall Acceptance Rate 121 of 471 submissions, 26%

    Upcoming Conference

    PACT '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)MosaicInternational Journal of Parallel Programming10.1007/s10766-018-0557-y46:6(1110-1138)Online publication date: 1-Dec-2018
    • (2017)An adaptive cache coherence protocolJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.020102:C(163-174)Online publication date: 1-Apr-2017
    • (2016)DiSquawkProceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools10.1145/2972206.2972212(1-12)Online publication date: 29-Aug-2016

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media