Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems
D Hackenberg, D Molka, WE Nagel - … of the 42Nd Annual IEEE/ACM …, 2009 - dl.acm.org
D Hackenberg, D Molka, WE Nagel
Proceedings of the 42Nd Annual IEEE/ACM International Symposium on …, 2009•dl.acm.orgAcross a broad range of applications, multicore technology is the most important factor that
drives today's microprocessor performance improvements. Closely coupled is a growing
complexity of the memory subsystems with several cache levels that need to be exploited
efficiently to gain optimal application performance. Many important implementation details of
these memory subsystems are undocumented. We therefore present a set of sophisticated
benchmarks for latency and bandwidth measurements to arbitrary locations in the memory …
drives today's microprocessor performance improvements. Closely coupled is a growing
complexity of the memory subsystems with several cache levels that need to be exploited
efficiently to gain optimal application performance. Many important implementation details of
these memory subsystems are undocumented. We therefore present a set of sophisticated
benchmarks for latency and bandwidth measurements to arbitrary locations in the memory …
Across a broad range of applications, multicore technology is the most important factor that drives today's microprocessor performance improvements. Closely coupled is a growing complexity of the memory subsystems with several cache levels that need to be exploited efficiently to gain optimal application performance. Many important implementation details of these memory subsystems are undocumented. We therefore present a set of sophisticated benchmarks for latency and bandwidth measurements to arbitrary locations in the memory subsystem. We consider the coherency state of cache lines to analyze the cache coherency protocols and their performance impact. The potential of our approach is demonstrated with an in-depth comparison of ccNUMA multiprocessor systems with AMD (Shanghai) and Intel (Nehalem-EP) quad-core x86-64 processors that both feature integrated memory controllers and coherent point-to-point interconnects. Using our benchmarks we present fundamental memory performance data and architectural properties of both processors. Our comparison reveals in detail how the microarchitectural differences tremendously affect the performance of the memory subsystem.
ACM Digital Library