Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Published: 01 January 2020 Publication History

Abstract

High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep learning, big data and planet-scale simulations. However, the lack of deep understanding on how modern GPUs can be connected and the real impact of state-of-the-art interconnect technology on multi-GPU application performance become a hurdle. In this paper, we fill the gap by conducting a thorough evaluation on five latest types of modern GPU interconnects: PCIe, NVLink-V1, NVLink-V2, NVLink-SLI and NVSwitch, from six high-end servers and HPC platforms: NVIDIA P100-DGX-1, V100-DGX-1, DGX-2, OLCF's SummitDev and Summit supercomputers, as well as an SLI-linked system with two NVIDIA Turing RTX-2080 GPUs. Based on the empirical evaluation, we have observed four new types of GPU communication network NUMA effects: three are triggered by NVLink's topology, connectivity and routing, while one is caused by PCIe chipset design issue. These observations indicate that, for an application running in a multi-GPU node, choosing the right GPU combination can impose considerable impact on GPU communication efficiency, as well as the application's overall performance. Our evaluation can be leveraged in building practical multi-GPU performance models, which are vital for GPU task allocation, scheduling and migration in a shared environment (e.g., AI cloud and HPC centers), as well as communication-oriented performance tuning.

Cited By

View all
  • (2024)XGNN: Boosting Multi-GPU GNN Training via Global GNN Memory StoreProceedings of the VLDB Endowment10.14778/3641204.364121917:5(1105-1118)Online publication date: 1-Jan-2024
  • (2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
  • (2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
  • Show More Cited By

Index Terms

  1. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        Publisher

        IEEE Press

        Publication History

        Published: 01 January 2020

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 17 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)XGNN: Boosting Multi-GPU GNN Training via Global GNN Memory StoreProceedings of the VLDB Endowment10.14778/3641204.364121917:5(1105-1118)Online publication date: 1-Jan-2024
        • (2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
        • (2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
        • (2024)Enhancing Intra-Node GPU-to-GPU Performance in MPI+UCX through Multi-Path CommunicationProceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions10.1145/3642961.3643800(9-14)Online publication date: 2-Mar-2024
        • (2024)GPU-Initiated Resource Allocation for Irregular WorkloadsProceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions10.1145/3642961.3643799(1-8)Online publication date: 2-Mar-2024
        • (2024)Embedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBarkProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688111(622-632)Online publication date: 8-Oct-2024
        • (2024)Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU ParallelismProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635036(1-11)Online publication date: 18-Jan-2024
        • (2024)Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning InferenceACM Transactions on Design Automation of Electronic Systems10.1145/362859929:2(1-22)Online publication date: 14-Feb-2024
        • (2024)TCCL: Discovering Better Communication Paths for PCIe GPU ClustersProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651362(999-1015)Online publication date: 27-Apr-2024
        • (2024)Heet: Accelerating Elastic Training in Heterogeneous Deep Learning ClustersProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640375(499-513)Online publication date: 27-Apr-2024
        • Show More Cited By

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media