Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3569951.3593595acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Public Access

DPU-Bench: A Micro-Benchmark Suite to Measure Offload Efficiency Of SmartNICs

Published: 10 September 2023 Publication History

Abstract

Smart Network Interface Cards (SmartNIC) have experienced massive growth in popularity over the last few years such as the NVIDIA BlueField-2 Data Processing Unit (DPU). Being equipped with their own set of cores and memory allows them to perform actions beyond a regular NIC, and HPC researchers are designing new ways to use them. For example, offloading communication to one enables the CPU "host" to perform more computationally heavy tasks. However, one question remains: How much of that work can be distributed among processes placed on the SmartNIC before facing performance degradation? We present DPU-Bench: A low-level micro-benchmark suite using IB-Verbs primitives to enable HPC users to examine the number of processes to be placed on one or more SmartNICs in order to efficiently offload a given communication pattern. We examine direct algorithms in this paper at a medium scale with different work assignment mechanisms and give insights into the trends found with varying numbers of worker processes and message sizes.

References

[1]
Argonne National Laboratory 2022. MPICH. https://www.mpich.org/.
[2]
Dotan Barak. 2014. Verbs Programming Tutorial. https://www.cs.mtsu.edu/ waderholdt/6430/papers/ibverbs.pdf
[3]
Mohammadreza Bayatpour, Nick Sarkauskas, Hari Subramoni, Jahanzeb Maqbool Hashmi, and Dhabaleswar K. Panda. 2021. BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. In High Performance Computing, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, and Piotr Luszczek (Eds.). Springer International Publishing, Cham, 18–37.
[4]
Idan Burstein. 2021. Nvidia Data Center Processing Unit (DPU) Architecture. In 2021 IEEE Hot Chips 33 Symposium (HCS). 1–20. https://doi.org/10.1109/HCS52781.2021.9567066
[5]
High Performance Compute Availability Group 2022. OpenHPCA Benchmark Suite. https://github.com/openucx/openhpca.
[6]
Intel 2022. Intel MPI Benchmarks. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-mpi-benchmarks.html.
[7]
Arpan Jain, Nawras Alnaasan, Aamir Shafi, Hari Subramoni, and Dhabaleswar K Panda. 2021. Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs. In 2021 IEEE Symposium on High-Performance Interconnects (HOTI). 17–24. https://doi.org/10.1109/HOTI52880.2021.00017
[8]
Pouya Kousha, Kamal Raj Sankarapandian Dayala Ganesh Ram, Mansa Kedia, Hari Subramoni, Arpan Jain, Aamir Shafi, Dhabaleswar Panda, Trey Dockendorf, Heechang Na, and Karen Tomko. 2021. INAM: Cross-Stack Profiling and Analysis of Communication in MPI-Based Applications. In Practice and Experience in Advanced Research Computing (Boston, MA, USA) (PEARC ’21). Association for Computing Machinery, New York, NY, USA, Article 14, 11 pages. https://doi.org/10.1145/3437359.3465582
[9]
Patrick MacArthur, Qian Liu, Robert D. Russell, Fabrice Mizero, Malathi Veeraraghavan, and John M. Dennis. 2017. An Integrated Tutorial on InfiniBand, Verbs, and MPI. IEEE Communications Surveys & Tutorials 19, 4 (2017), 2894–2926. https://doi.org/10.1109/COMST.2017.2746083
[10]
Message Passing Interface Forum. 2021. MPI: A Message-Passing Interface Standard Version 4.0. https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf
[11]
Network-Based Computing Laboratory 2022. MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE. http://mvapich.cse.ohio-state.edu/.
[12]
Network-Based Computing Laboratory 2022. OSU Microbenchmarks. http://mvapich.cse.ohio-state.edu/benchmarks/.
[13]
Nick Sarkauskas, Mohammadreza Bayatpour, Tu Tran, Bharath Ramesh, Hari Subramoni, and Dhabaleswar K. Panda. 2021. Large-Message Nonblocking MPI_Iallgather and MPI Ibcast Offload via BlueField-2 DPU. In 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC). 388–393. https://doi.org/10.1109/HiPC53243.2021.00054
[14]
Sameer S. Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. Int. J. High Perform. Comput. Appl. 20, 2 (may 2006), 287–311. https://doi.org/10.1177/1094342006064482
[15]
Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 157–173. https://doi.org/10.1007/978-3-642-11261-4_11
[16]
Top500 2022. Top500. https://www.top500.org/lists/top500/2022/11/.

Cited By

View all
  • (2024)Offloading NVMe over Fabrics (NVMe-oF) to SmartNICs on an at-scale Distributed Testbed2024 IEEE 10th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft60951.2024.10588915(316-318)Online publication date: 24-Jun-2024
  • (2023)Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs2023 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI59126.2023.00020(41-48)Online publication date: Aug-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '23: Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good
July 2023
519 pages
ISBN:9781450399852
DOI:10.1145/3569951
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Benchmarks
  2. High-Performance Computing Interconnects
  3. InfiniBand
  4. Micro-Benchmarks
  5. Network-Based Computing
  6. SmartNICS

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PEARC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)293
  • Downloads (Last 6 weeks)37
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Offloading NVMe over Fabrics (NVMe-oF) to SmartNICs on an at-scale Distributed Testbed2024 IEEE 10th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft60951.2024.10588915(316-318)Online publication date: 24-Jun-2024
  • (2023)Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs2023 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI59126.2023.00020(41-48)Online publication date: Aug-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media