research-article

Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks

Authors:

Jaydeep Marathe,

Frank MuellerAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 18, Issue 6

Pages 818 - 834

https://doi.org/10.1109/TPDS.2007.1058

Published: 01 June 2007 Publication History

Publisher Site

Abstract

Cache coherence in shared-memory multiprocessor systems has been studied mostly from an architecture viewpoint, often by means of aggregating metrics. In many cases, aggregate events provide insufficient information for programmers to understand and optimize the coherence behavior of their applications. A better understanding would be given by source code correlations of not only aggregate events, but also finer granularity metrics directly linked to high-level source code constructs, such as source lines and data structures. In this paper, we explore a novel application-centric approach to studying coherence traffic. We develop a coherence analysis framework based on incremental coherence simulation of actual reference traces. We provide tool support to extract these reference traces and synchronization information from OpenMP threads at runtime using dynamic binary rewriting of the application executable. These traces are fed to ccSIM, our cache-coherence simulator. The novelty of ccSIM lies in its ability to relate low-level cache coherence metrics (such as coherence misses and their causative invalidations) to high-level source code constructs including source code locations and data structures. We explore the degree of freedom in interleaving data traces from different processors and assess simulation accuracy in comparison to metrics obtained from hardware performance counters. Our quantitative results show that: 1) Cache coherence traffic can be simulated with a considerable degree of accuracy for SPMD programs, as the invalidation traffic closely matches the corresponding hardware performance counters. 2) Detailed, high-level coherence statistics are very useful in detecting, isolating, and understanding coherence bottlenecks. We use ccSIM with several well-known benchmarks and find coherence optimization opportunities leading to significant reductions in coherence traffic and savings in wall-clock execution time.

References

[1]

E.A. Brewer, C.N. Dellarocas, A. Colbrook, and W.E. Weihl, “Proteus: A High-Performance Parallel-Architecture Simulator,” Proc. ACM Joint Int'l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS and PERFORMANCE '92), pp. 247-248, June 1992.

Abstract

References

Cited By

Index Terms

Recommendations

Detailed cache coherence characterization for OpenMP benchmarks

A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks

Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations