Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3569951.3593595acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

DPU-Bench: A Micro-Benchmark Suite to Measure Offload Efficiency Of SmartNICs

Published: 10 September 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Smart Network Interface Cards (SmartNIC) have experienced massive growth in popularity over the last few years such as the NVIDIA BlueField-2 Data Processing Unit (DPU). Being equipped with their own set of cores and memory allows them to perform actions beyond a regular NIC, and HPC researchers are designing new ways to use them. For example, offloading communication to one enables the CPU "host" to perform more computationally heavy tasks. However, one question remains: How much of that work can be distributed among processes placed on the SmartNIC before facing performance degradation? We present DPU-Bench: A low-level micro-benchmark suite using IB-Verbs primitives to enable HPC users to examine the number of processes to be placed on one or more SmartNICs in order to efficiently offload a given communication pattern. We examine direct algorithms in this paper at a medium scale with different work assignment mechanisms and give insights into the trends found with varying numbers of worker processes and message sizes.

    References

    [1]
    Argonne National Laboratory 2022. MPICH. https://www.mpich.org/.
    [2]
    Dotan Barak. 2014. Verbs Programming Tutorial. https://www.cs.mtsu.edu/ waderholdt/6430/papers/ibverbs.pdf
    [3]
    Mohammadreza Bayatpour, Nick Sarkauskas, Hari Subramoni, Jahanzeb Maqbool Hashmi, and Dhabaleswar K. Panda. 2021. BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. In High Performance Computing, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, and Piotr Luszczek (Eds.). Springer International Publishing, Cham, 18–37.
    [4]
    Idan Burstein. 2021. Nvidia Data Center Processing Unit (DPU) Architecture. In 2021 IEEE Hot Chips 33 Symposium (HCS). 1–20. https://doi.org/10.1109/HCS52781.2021.9567066
    [5]
    High Performance Compute Availability Group 2022. OpenHPCA Benchmark Suite. https://github.com/openucx/openhpca.
    [6]
    Intel 2022. Intel MPI Benchmarks. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-mpi-benchmarks.html.
    [7]
    Arpan Jain, Nawras Alnaasan, Aamir Shafi, Hari Subramoni, and Dhabaleswar K Panda. 2021. Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs. In 2021 IEEE Symposium on High-Performance Interconnects (HOTI). 17–24. https://doi.org/10.1109/HOTI52880.2021.00017
    [8]
    Pouya Kousha, Kamal Raj Sankarapandian Dayala Ganesh Ram, Mansa Kedia, Hari Subramoni, Arpan Jain, Aamir Shafi, Dhabaleswar Panda, Trey Dockendorf, Heechang Na, and Karen Tomko. 2021. INAM: Cross-Stack Profiling and Analysis of Communication in MPI-Based Applications. In Practice and Experience in Advanced Research Computing (Boston, MA, USA) (PEARC ’21). Association for Computing Machinery, New York, NY, USA, Article 14, 11 pages. https://doi.org/10.1145/3437359.3465582
    [9]
    Patrick MacArthur, Qian Liu, Robert D. Russell, Fabrice Mizero, Malathi Veeraraghavan, and John M. Dennis. 2017. An Integrated Tutorial on InfiniBand, Verbs, and MPI. IEEE Communications Surveys & Tutorials 19, 4 (2017), 2894–2926. https://doi.org/10.1109/COMST.2017.2746083
    [10]
    Message Passing Interface Forum. 2021. MPI: A Message-Passing Interface Standard Version 4.0. https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf
    [11]
    Network-Based Computing Laboratory 2022. MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE. http://mvapich.cse.ohio-state.edu/.
    [12]
    Network-Based Computing Laboratory 2022. OSU Microbenchmarks. http://mvapich.cse.ohio-state.edu/benchmarks/.
    [13]
    Nick Sarkauskas, Mohammadreza Bayatpour, Tu Tran, Bharath Ramesh, Hari Subramoni, and Dhabaleswar K. Panda. 2021. Large-Message Nonblocking MPI_Iallgather and MPI Ibcast Offload via BlueField-2 DPU. In 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC). 388–393. https://doi.org/10.1109/HiPC53243.2021.00054
    [14]
    Sameer S. Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. Int. J. High Perform. Comput. Appl. 20, 2 (may 2006), 287–311. https://doi.org/10.1177/1094342006064482
    [15]
    Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 157–173. https://doi.org/10.1007/978-3-642-11261-4_11
    [16]
    Top500 2022. Top500. https://www.top500.org/lists/top500/2022/11/.

    Cited By

    View all
    • (2023)Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs2023 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI59126.2023.00020(41-48)Online publication date: Aug-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PEARC '23: Practice and Experience in Advanced Research Computing
    July 2023
    519 pages
    ISBN:9781450399852
    DOI:10.1145/3569951
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 September 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Benchmarks
    2. High-Performance Computing Interconnects
    3. InfiniBand
    4. Micro-Benchmarks
    5. Network-Based Computing
    6. SmartNICS

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    PEARC '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 133 of 202 submissions, 66%

    Upcoming Conference

    PEARC '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)244
    • Downloads (Last 6 weeks)18

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs2023 IEEE Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI59126.2023.00020(41-48)Online publication date: Aug-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media