article

Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Authors:

Jason F. Cantin,

Mikko H. Lipasti,

James E. SmithAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 33, Issue 2

Pages 246 - 257

https://doi.org/10.1145/1080695.1069991

Published: 01 May 2005 Publication History

Abstract

To main coherence in conventional shared-memory multiprocessor systems, processors first check other proessors' caches before obtaining data from memory. This coherence checking adds latency to memory requests and leads to large amounts of interconnect traffic in broadcast-based systems. Our results for a set of commercial, scientific and multiprogrammed workloads show that on average 67% (and up to 94%) of broadcasts are unnecessary. Coarse-Grain Coherence Tracking is a new technique that supplements a conventional coherence mechanism and optimizes the performance of coherence enforcement. The Coarse-Grain Coherence mechanism monitors the coherence status of large regions of memory, and uses that information to avoid unnecessary broadcasts. Coarse-Grain Coherence Tracking is shown to eliminate 55-97% of the unnecessary broadcasts, and improve performance by 8.8% on average (and up to 21.7%).

References

[1]

{1} Charlesworth, A. The Sun Fireplane System Interconnect. Proceedings of SC2001.

Digital Library

[2]

{2} Tendler, J., Dodson, S., and Fields, S. IBM eServer Power4 System Microarchitecture, Technical White Paper, IBM Server Group, 2001.

[3]

{3} Kalla, R., Sinharoy, B., and Tendler, J. IBM Power5 Chip: A Dual-Core Multithreaded Processor IEEE Micro, 2004.

Digital Library

[4]

{4} Weber, F., Opteron and AMD64, A Commodity 64 bit x86 SOC. Presentation. Advanced Micro Devices, 2003.

[5]

{5} Sweazy, P., and Smith A., A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Futurebus . Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA), 1986.

Digital Library

[6]

{6} Liptay, S., Structural Aspects of the System/360 Model 85, Part II: The Cache. IBM Systems Journal, Vol. 7, pp. 15- 21, 1968.

Digital Library

[7]

{7} Hill, M., Smith, A., Experimental Evaluation of On-Chip Microprocessor Cache Memories. Proceedings of the 15th International Symposium on Computer Architecture, 1984.

Digital Library

[8]

{8} Rothman, J., and Smith, A., The Pool of Subsectors Cache Design. Proceedings of the 13th International Conference on Supercomputing (ICS), 1999.

Digital Library

[9]

{9} Seznec, A., Decoupled Sectored Caches: conciliating low tag implementation cost and low miss ratio. Proceedings of the 21st Annual International Symposium on Computer Architecture(ISCA), 1994.

Digital Library

[10]

{10} Kadiyala, M., and Bhuyan, L. A Dynamic Cache Sub-block Design to Reduce False Sharing. International Conference on Computer Design, VLSI in Computers and Processors, 1995.

Digital Library

[11]

{11} Anderson, C., and Baer, J-L. Design and Evaluation of a Subblock Cache Coherence Protocol for Bus-Based Multiprocessors . Technical Report UW CSE TR 94-05-02, University of Washington, 1994.

[12]

{12} Dubnicki, C., and LeBlanc, T. Adjustable Block Size Coherent Caches. Proceedings of the 19th Annual International Symposium on Computer Architecture (ISCA), 1992.

Digital Library

[13]

{13} May, C., Silha, E., Simpson, R., and Warren, H. (Eds). The PowerPC Architecture: A Specification for a New Family of RISC Processors (2nd Edition). Morgan Kaufmann Publishers, Inc., 1994.

Digital Library

[14]

{14} Steven R. Kunkel, Personal Communication, March 2004.

[15]

{15} Moshovos, A., Memik, G., Falsafi, B., and Choudhary, A. JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers. Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA), 2001.

Digital Library

[16]

{16} Moshovos, A., RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence. Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). 2005.

Digital Library

[17]

{17} Saldanha, C., and Lipasti, M., Power Efficient Cache Coherence. Workshop on Memory Performance Issues, in conjunction with the International Symposium on Computer Architecture, (ISCA), 2001.

[18]

{18} Ekman, M., Dahlgren, F., and Stenström, P. TLB and Snoop Energy-Reduction using Virtual Caches in Low-Power Chip-Multiprocessors. Proceedings of ISLPED, 2002.

Digital Library

[19]

{19} Reynolds, P., Williams, C., and Wagner, R., Isotach Networks . IEEE Transactions on Parallel and Distributed Systems. Vol. 8, No. 4, 1997.

Digital Library

[20]

{20} Martin, M., Sorin, D., Ailamaki, A., Alameldeen A., Dickson, R., Mauer C., Moore K., Plakal M., Hill, M., and Wood, D. Timestamp Snooping: An Approach for Extending SMPs. Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.

Digital Library

[21]

{21} Martin, M, Hill, M, Wood, D. Token Coherence: Decoupling Performance and Correctness. Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA), 2003.

Digital Library

[22]

{22} Martin, M., Harper, P., Sorin, D., Hill, M., and Wood, D., Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors. Proceedings of the 30th International Symposium on Computer Architecture, 2003.

Digital Library

[23]

{23} Lebeck, A., and Wood, D. Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors . Proceedings of the 22nd International Symposium on Computer Architecture (ISCA), 1995.

Digital Library

[24]

{24} UltraSPARC IV Processor, User's Manual Supplement, Sun Microsystems Inc, 2004.

[25]

{25} Cain, H., Lepak, K., Schwartz, B., and Lipasti, M., Precise and Accurate Processor Simulation. Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads, pp. 13-22, 2002.

[26]

{26} Keller, T., Maynard, A., Simpson, R., and Bohrer, P. Simos-ppc Full System Simulator. http://www.cs.utexas.edu/users/cart/simOS.

[27]

{27} Alameldeen, A., Martin, M., Mauer, C., Moore, K., Xu, M., Hill, M., and Wood, D. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer, 2003.

Digital Library

[28]

{28} Gharachorloo, K., Gupta, A., and Hennessy, J. Two Techniques to Enhance the Performance of Memory Consistency Models. Proceedings of the International Conference on Parallel Processing (ICPP), 1991.

Cited By

Upadhyay BRos AM. S(2023)Fine-grain data classification to filter token coherence trafficJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.09.004171(40-53)Online publication date: Jan-2023
https://doi.org/10.1016/j.jpdc.2022.09.004
Patil ANagarajan VBalasubramonian ROswald NMartínez JDuato JJohn L(2021)DvéProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00048(526-539)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00048
Upadhyay BRos AShah J(2021)Efficient classification of private memory blocksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.07.005Online publication date: Jul-2021
https://doi.org/10.1016/j.jpdc.2021.07.005
Show More Cited By

Index Terms

Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Recommendations

Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture

To main coherence in conventional shared-memory multiprocessor systems, processors first check other proessors' caches before obtaining data from memory. This coherence checking adds latency to memory requests and leads to large amounts of interconnect ...
Coarse-grain coherence tracking
Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays

Coarse-grain coherence tracking is a new technique that extends a conventional coherence mechanism and optimizes coherence enforcement. It monitors the coherence status of large regions of memory and uses that information to avoid unnecessary broadcasts ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 33, Issue 2

ISCA 2005

May 2005

531 pages

ISSN:0163-5964

DOI:10.1145/1080695

Issue’s Table of Contents

ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture
June 2005
541 pages
ISBN:076952270X

Copyright © 2005 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2005

Published in SIGARCH Volume 33, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
29
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Upadhyay BRos AM. S(2023)Fine-grain data classification to filter token coherence trafficJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.09.004171(40-53)Online publication date: Jan-2023
https://doi.org/10.1016/j.jpdc.2022.09.004
Patil ANagarajan VBalasubramonian ROswald NMartínez JDuato JJohn L(2021)DvéProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00048(526-539)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00048
Upadhyay BRos AShah J(2021)Efficient classification of private memory blocksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.07.005Online publication date: Jul-2021
https://doi.org/10.1016/j.jpdc.2021.07.005
Upadhyay BRos ANS M(2020)TLB-based Block-Grain Classification of Private Data2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP50117.2020.00025(122-130)Online publication date: Mar-2020
https://doi.org/10.1109/PDP50117.2020.00025
Kao CLin Y(2017)Designs of Low Power Snoop for Multiprocessor System on ChipJournal of Signal Processing Systems10.1007/s11265-016-1135-488:1(83-89)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1007/s11265-016-1135-4
Lenjani MHashemi M(2014)Tree‐based scheme for reducing shared cache miss rate leveraging regional, statistical and temporal similaritiesIET Computers & Digital Techniques10.1049/iet-cdt.2011.00668:1(30-48)Online publication date: Jan-2014
https://doi.org/10.1049/iet-cdt.2011.0066
Lei Fang Peng Liu Qi Hu Huang MGuofan Jiang (2013)Generating efficient data movement code for heterogeneous architectures with distributed-memoryProceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2013.6618826(375-386)Online publication date: Oct-2013
https://doi.org/10.1109/PACT.2013.6618826
Lodde MRoca TFlich J(2013)Built‐in fast gather control network for efficient support of coherence protocolsIET Computers & Digital Techniques10.1049/iet-cdt.2012.00567:2(69-80)Online publication date: Mar-2013
https://doi.org/10.1049/iet-cdt.2012.0056
Alisafaee M(2012)Spatiotemporal Coherence TrackingProceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2012.39(341-350)Online publication date: 1-Dec-2012
https://dl.acm.org/doi/10.1109/MICRO.2012.39
Cuesta BRobles ADuato J(2012)Switch-based packing technique to reduce traffic and latency in token coherenceJournal of Parallel and Distributed Computing10.1016/j.jpdc.2011.11.01072:3(409-423)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1016/j.jpdc.2011.11.010
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents