TABARNAC: Visualizing and resolving memory access issues on NUMA architectures

D Beniamine, M Diener, G Huard… - Proceedings of the 2nd …, 2015 - dl.acm.org
D Beniamine, M Diener, G Huard, POA Navaux
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015dl.acm.org
In modern parallel architectures, memory accesses represent a common bottleneck. Thus,
optimizing the way applications access the memory is an important way to improve
performance and energy consumption. Memory accesses are even more important with
NUMA machines, as the access time to data depends on its location in the memory. Many
efforts were made to develop adaptive tools to improve memory accesses at the runtime by
optimizing the mapping of data and threads to NUMA nodes. However, theses tools are not …
In modern parallel architectures, memory accesses represent a common bottleneck. Thus, optimizing the way applications access the memory is an important way to improve performance and energy consumption. Memory accesses are even more important with NUMA machines, as the access time to data depends on its location in the memory. Many efforts were made to develop adaptive tools to improve memory accesses at the runtime by optimizing the mapping of data and threads to NUMA nodes. However, theses tools are not able to change the memory access pattern of the original application, therefore a code written without considering memory performance might not benefit from them. Moreover, automatic mapping tools take time to converge towards the best mapping, losing optimization opportunities. A deeper understanding of the memory behavior can help optimizing it, removing the need for runtime analysis.
In this paper, we present TABARNAC, a tool for analyzing the memory behavior of parallel applications with a focus on NUMA architectures. TABARNAC provides a new visualization of the memory access behavior, focusing on the distribution of accesses by thread and by structure. Such visualization allows the developer to easily understand why performance issues occur and how to fix them. Using TABARNAC, we explain why some applications do not benefit from data and thread mapping. Moreover, we propose several code modifications to improve the memory access behavior of several parallel applications.
ACM Digital Library