research-article

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Authors:

Ang Li,

Xu Liu,

Kevin J. BarkerAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 31, Issue 1

Pages 94 - 110

https://doi.org/10.1109/TPDS.2019.2928289

Published: 01 January 2020 Publication History

Abstract

High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep learning, big data and planet-scale simulations. However, the lack of deep understanding on how modern GPUs can be connected and the real impact of state-of-the-art interconnect technology on multi-GPU application performance become a hurdle. In this paper, we fill the gap by conducting a thorough evaluation on five latest types of modern GPU interconnects: PCIe, NVLink-V1, NVLink-V2, NVLink-SLI and NVSwitch, from six high-end servers and HPC platforms: NVIDIA P100-DGX-1, V100-DGX-1, DGX-2, OLCF's SummitDev and Summit supercomputers, as well as an SLI-linked system with two NVIDIA Turing RTX-2080 GPUs. Based on the empirical evaluation, we have observed four new types of GPU communication network NUMA effects: three are triggered by NVLink's topology, connectivity and routing, while one is caused by PCIe chipset design issue. These observations indicate that, for an application running in a multi-GPU node, choosing the right GPU combination can impose considerable impact on GPU communication efficiency, as well as the application's overall performance. Our evaluation can be leveraged in building practical multi-GPU performance models, which are vital for GPU task allocation, scheduling and migration in a shared environment (e.g., AI cloud and HPC centers), as well as communication-oriented performance tuning.

Cited By

View all

Tang DWang JChen RWang LYu WZhou JLi K(2024)XGNN: Boosting Multi-GPU GNN Training via Global GNN Memory StoreProceedings of the VLDB Endowment10.14778/3641204.364121917:5(1105-1118)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641219
Yang FPeng SSun NWang FWang YWu FQiu JPan A(2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673095
Kroviakov AKurapov PAnneser CGiceva J(2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663441
Show More Cited By

Index Terms

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Index terms have been assigned to the content through auto-classification.

Recommendations

Ultra-Performance Pascal GPU and NVLink Interconnect

This article introduces Nvidia's high-performance Pascal GPU. GP100 features in-package high-bandwidth memory, support for efficient FP16 operations, unified memory, and instruction preemption, and incorporates Nvidia's NVLink I/O for high-bandwidth ...
Evaluating Multi-GPU Sorting with Modern Interconnects
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

GPUs have become a mainstream accelerator for database operations such as sorting. Most GPU sorting algorithms are single-GPU approaches. They neither harness the full computational power nor exploit the high-bandwidth P2P interconnects of modern multi-...
The development of Mellanox/NVIDIA GPUDirect over InfiniBand--a new model for GPU to GPU communications

The usage and adoption of General Purpose GPUs (GPGPU) in HPC systems is increasing due to the unparalleled performance advantage of the GPUs and the ability to fulfill the ever-increasing demands for floating points operations. While the GPU can ...

Comments

Information & Contributors

Information

Published In

1045-9219 © 2019 IEEE Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2020

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

53
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tang DWang JChen RWang LYu WZhou JLi K(2024)XGNN: Boosting Multi-GPU GNN Training via Global GNN Memory StoreProceedings of the VLDB Endowment10.14778/3641204.364121917:5(1105-1118)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641219
Yang FPeng SSun NWang FWang YWu FQiu JPan A(2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673095
Kroviakov AKurapov PAnneser CGiceva J(2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663441
Sojoodi ATemucin YAfsahi A(2024)Enhancing Intra-Node GPU-to-GPU Performance in MPI+UCX through Multi-Path CommunicationProceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions10.1145/3642961.3643800(9-14)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3642961.3643800
Turimbetov ISasongko MUnat D(2024)GPU-Initiated Resource Allocation for Irregular WorkloadsProceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions10.1145/3642961.3643799(1-8)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3642961.3643799
Liu SZheng NKang HSimmons XZhang JLanger MZhu WLee MWang Z(2024)Embedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBarkProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688111(622-632)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688111
Elis BPearce OBoehme DBurmark JSchulz M(2024)Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU ParallelismProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635036(1-11)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3635035.3635036
Kim SLee JPaik YKim CLee WKim S(2024)Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning InferenceACM Transactions on Design Automation of Electronic Systems10.1145/362859929:2(1-22)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3628599
Kim HRyu JLee JTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)TCCL: Discovering Better Communication Paths for PCIe GPU ClustersProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651362(999-1015)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651362
Mo ZXu HXu CTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Heet: Accelerating Elastic Training in Heterogeneous Deep Learning ClustersProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640375(499-513)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640375
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Index Terms

Recommendations

Ultra-Performance Pascal GPU and NVLink Interconnect

Evaluating Multi-GPU Sorting with Modern Interconnects

The development of Mellanox/NVIDIA GPUDirect over InfiniBand--a new model for GPU to GPU communications

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations