Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/IPDPS.2011.100guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Profiling Directed NUMA Optimization on Linux Systems: A Case Study of the Gaussian Computational Chemistry Code

Published: 16 May 2011 Publication History
  • Get Citation Alerts
  • Abstract

    The parallel performance of applications running on Non-Uniform Memory Access (NUMA) platforms is strongly influenced by the relative placement of memory pages to the threads that access them. As a consequence there are Linux application programmer interfaces (APIs) to control this. For large parallel codes it can, however, be difficult to determine how and when to use these APIs. In this paper we introduce the \texttt{NUMAgrind} profiling tool which can be used to simplify this process. It extends the \texttt{Val grind} binary translation framework to include a model which incorporates cache coherency, memory locality domains and interconnect traffic for arbitrary NUMA topologies. \ Using \texttt{NUMAgrind}, cache misses can be mapped to memory locality domains, page access modes determined, and pages that are referenced by multiple threads quickly determined. We show how the \texttt{NUMAgrind} tool can be used to guide the use of Linux memory and thread placement APIs in the Gaussian computational chemistry code. The performance of the code before and after use of these APIs is also presented for three different commodity NUMA platforms.

    Cited By

    View all
    • (2021)NumaPerfProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460361(52-62)Online publication date: 3-Jun-2021
    • (2019)A zero-positive learning approach for diagnosing software performance regressionsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455330(11627-11639)Online publication date: 8-Dec-2019
    • (2015)Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/276645012:2(1-24)Online publication date: 8-Jul-2015
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    IPDPS '11: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
    May 2011
    1285 pages
    ISBN:9780769543857

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 16 May 2011

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)NumaPerfProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460361(52-62)Online publication date: 3-Jun-2021
    • (2019)A zero-positive learning approach for diagnosing software performance regressionsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455330(11627-11639)Online publication date: 8-Dec-2019
    • (2015)Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/276645012:2(1-24)Online publication date: 8-Jul-2015
    • (2014)A tool to analyze the performance of multithreaded programs on NUMA architecturesACM SIGPLAN Notices10.1145/2692916.255527149:8(259-272)Online publication date: 6-Feb-2014
    • (2014)A tool to analyze the performance of multithreaded programs on NUMA architecturesProceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/2555243.2555271(259-272)Online publication date: 6-Feb-2014
    • (2012)Nonuniform memory affinity strategy in multithreaded sparse matrix computationsProceedings of the 2012 Symposium on High Performance Computing10.5555/2338816.2338825(1-8)Online publication date: 26-Mar-2012

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media