Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ISCA.2005.42acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Published: 01 May 2005 Publication History

Abstract

It has been shown that many requests miss in all remote nodes in shared memory multiprocessors. We are motivated by the observation that this behavior extends to much coarser grain areas of memory. We define a region to be a continuous, aligned memory area whose size is a power of two and observe that many requests find that no other node caches a block in the same region even for regions as large as 16K bytes. We propose RegionScout, a family of simple filter mechanisms that dynamically detect most non-shared regions. A node with a RegionScout filter can determine in advance that a request will miss in all remote nodes. RegionScout filters are implemented as a layered extension over existing snoop-based coherence systems. They require no changes to existing coherence protocols or caches and impose no constraints on what can be cached simultaneously. Their operation is completely transparent to software and the operating system. RegionScout filters require little additional storage and a single additional global signal. These characteristics are made possible by utilizing imprecise information about the regions cached in each node. Since they rely on dynamically collected information RegionScout filters can adapt to changing sharing patterns. We present two applications of RegionScout: In the first RegionScout is used to avoid broadcasts for non-shared regions thus reducing bandwidth. In the second RegionScout is used to avoid snoop induced tag lookups thus reducing energy.

References

[1]
{1} --, MIPS R10000 Microprocessor User's Manual v2.0, MIPS Technologies, Inc., January 1997.
[2]
{2} M. Abramovici, M. A. Breuer, A. D. Friedman, Digital Systems Testing & Testable Design, Wiley-IEEE Computer Society Press, January 1993.
[3]
{3} P. Bannon, B. Lilly, D. Asher, M. Steinman, D. Webb, R. Tan, and T. Litt. Alpha 21364: A Single-Chip Shared Memory Multiprocessor, Government Microcircuits Applications Conference 2001, Digest of Papers, Defense Technical Information Center, Belvoir, Va., March 2001.
[4]
{4} L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proc. of the 27th Annual International Symposium on Computer Architecture, June 2000.
[5]
{5} B. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of ACM, pages 13(7):422-426, July 1970.
[6]
{6} D. Burger and T. Austin. The Simplescalar Tool Set v2.0, Technical Report UW-CS-97-1342. Computer Sciences Department, University of Wisconsin-Madison , June 1997.
[7]
{7} D. Brooks, V. Tiwari M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimization. In Proc. of the 27th Annual International Symposium on Computer Architecture, June 2000.
[8]
{8} J. F. Cantin, M. H. Lipasti and J. E. Smith, Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking, In Proc. of the 32nd Annual International Symposium on Computer Architecture, June 2005.
[9]
{9} A. Charlesworth. Starfire: Extending the SMP Envelope. IEEE Micro, vol. 18, No. 1, Jan./Feb. 1998.
[10]
{10} Z. Cvetanovic: Performance Analysis of the Alpha 21364-Based HP GS1280 Multiprocessor. In Proc. of the 30th Annual International Symposium on Computer Architecture, June 2003.
[11]
{11} W. J. Dally and J. W. Poulton. Digital Systems Engineering. Cambridge University Press, 1998.
[12]
{12} M. Ekman, F. Dahlgren, and P. Stenström, TLB and Snoop Energy-Reduction using Virtual Caches for Low-Power Chip-Multiprocessors. In Proc. of ACM International Symposium on Low Power Electronics and Design, August 2002.
[13]
{13} M. Ekman, F. Dahlgren, and P. Stenström: Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors. In Proc. of the First Workshop on Duplicating, Deconstructing, and Debunking, May 2002.
[14]
{14} L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP, IEEE MICRO Magazine, March-April 2000.
[15]
{15} N. Hardavellas, S. Somogyi, T. F. Wenisch, R. E. Wunderlich, S. Chen, J. Kim, B. Falsafi, J. Hoe and A. G. Nowatzyk, SimFlex: A Fast, Accurate, Flexible Full-System Simulation Framework for Performance Evaluation of Server Architecture, SIGMETRICS Performance Evaluation Review, Vol. 31, No. 4, pp. 31-35, March 2004.
[16]
{16} J. Huh, D. Burger, and S. W. Keckler. Exploring the design space of future CMPs. In Proc. 10th International Conference on Parallel Architectures and Compilation Techniques, September 2001.
[17]
{17} S. Kaxiras and C. Young. Coherence Communication Prediction in Shared-Memory Multiprocessors. In Proc. of the Sixth IEEE Symposium on High-Performance Computer Architecture, Jan. 2000.
[18]
{18} R. E. Kessler and R. Jooss and A. Lebeck and M. D. Hill, Inexpensive implementations of set-associativity. In the Proceedings of the 16th Annual International Symposium on Computer Architecture, 1989.
[19]
{19} A.-C. Lai and B. Falsafi. Memory Sharing Predictor: The Key to a Speculative Coherent DSM. In Proc. of the 26th Annual International Symposium on Computer Architecture, May 1999.
[20]
{20} J. Li, J. F. Màrtinèz and M. C. Huang, The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors, In Proc. of the 10th Annual International Symposium on High Performance Computer Architecture, Feb. 2004.
[21]
{21} N. Manjikian, Multiprocessor Enhancements of the SimpleScalar Tool Set, ACM Computer Architecture News, Vol. 29, No. 1, March 2001.
[22]
{22} M. M. K. Martin, M. D. Hill, and D. A. Wood. Token Coherence: Decoupling Performance and Correctness. In Proc. of the 30th Annual International Symposium on Computer Architecture, June 2003.
[23]
{23} M. M. K. Martin, P. J. Harper, D. J. Sorin, M. D. Hill, and D. A. Wood. Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors. In Proc. of the 30th Annual International Symposium on Computer Architecture, June 2003.
[24]
{24} M. M. K. Martin, D. J. Sorin, M. D. Hill, D. A. Wood, Bandwidth Adaptive Snooping, In Proc. of the 8th International Symposium on High- Performance Computer Architecture, January 2002.
[25]
{25} J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors, ACM Transactions on Computer Systems, February 1991.
[26]
{26} A. Moshovos, Exploiting Coarse Grain Non-Shared Regions in Snoopy Coherent Multiprocessors, Technical Report, Computer Engineering Group, University of Toronto, Dec. 2003.
[27]
{27} A. Moshovos, G. Memik, B. Falsafi, and A. Choudhary. Jetty: Filtering snoops for reduced energy consumption in SMP servers. In Proc. of the 7th International Symposium on High- Performance Computer Architecture, January 2001.
[28]
{28} S. S. Mukherjee and M. D. Hill. Using Prediction to Accelerate Coherence Protocols. In Proc. of the 25th Annual International Symposium on Computer Architecture, June 1998.
[29]
{29} J. Nilsson, A. Landin, P. Stenström. Coherence Predictor Cache: A Resource Efficient Coherence Message Prediction Infrastructure. In Proc. of the 6th IEEE International Symposium on Parallel and Distributed Processing Symposium, April 2003.
[30]
{30} C. Saldanha and M. H. Lipasti, Power Efficient Cache Coherence, High Performance Memory Systems, edited by H. Hadimiouglu, D. Kaeli, J. Kuskin, A. Nanda, and J. Torrellas, Springer-Verlag, 2003.
[31]
{31} S. Somogyi, T. F. Wenisch, N. Hardavellas, J. Kim, A. Ailamaki and B. Falsafi, Memory Coherence Activity Prediction in Commercial Workloads, IEEE Workshop on Memory Performance Issues, June 2004.
[32]
{32} J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Journal of Research and Development, vol. 46, No. 1, January 2002.
[33]
{33} S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. of the 22nd Annual International Symposium on Computer Architecture, June 1995.
[34]
{34} D. A. Wood and M. D. Hill. Cost-effective parallel computing. IEEE Computer Magazine, 28(2), Feb. 1995.

Cited By

View all
  • (2022)Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their TracksACM Transactions on Architecture and Code Optimization10.1145/356369520:1(1-24)Online publication date: 17-Nov-2022
  • (2018)Runtime-assisted cache coherence deactivation in task parallel programsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291703(1-12)Online publication date: 11-Nov-2018
  • (2018)An Adaptive Mechanism for Designing Efficient Snoop FiltersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.281024126:7(1233-1240)Online publication date: 1-Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture
June 2005
541 pages
ISBN:076952270X
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 33, Issue 2
    ISCA 2005
    May 2005
    531 pages
    ISSN:0163-5964
    DOI:10.1145/1080695
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2005

Check for updates

Qualifiers

  • Article

Conference

ISCA05
Sponsor:

Acceptance Rates

ISCA '05 Paper Acceptance Rate 45 of 194 submissions, 23%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their TracksACM Transactions on Architecture and Code Optimization10.1145/356369520:1(1-24)Online publication date: 17-Nov-2022
  • (2018)Runtime-assisted cache coherence deactivation in task parallel programsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291703(1-12)Online publication date: 11-Nov-2018
  • (2018)An Adaptive Mechanism for Designing Efficient Snoop FiltersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.281024126:7(1233-1240)Online publication date: 1-Jul-2018
  • (2018)Runtime-assisted cache coherence deactivation in task parallel programsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00038(1-12)Online publication date: 11-Nov-2018
  • (2017)An adaptive cache coherence protocolJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.020102:C(163-174)Online publication date: 1-Apr-2017
  • (2016)C3DThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195681(1-12)Online publication date: 15-Oct-2016
  • (2016)CANDYThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195680(1-13)Online publication date: 15-Oct-2016
  • (2016)Software Assisted Hardware Cache Coherence for Heterogeneous ProcessorsProceedings of the Second International Symposium on Memory Systems10.1145/2989081.2989092(279-288)Online publication date: 3-Oct-2016
  • (2015)Coherence protocol for transparent management of scratchpad memories in shared memory manycore architecturesACM SIGARCH Computer Architecture News10.1145/2872887.275041143:3S(720-732)Online publication date: 13-Jun-2015
  • (2015)Coherence protocol for transparent management of scratchpad memories in shared memory manycore architecturesProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750411(720-732)Online publication date: 13-Jun-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media