Keyword: HPC : Search

Applied Filters

People

Publications

Reproducibility Badges

Publication Date

17 Results for: Keyword: HPCEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,725,817 records)|Limit your search to The ACM Full-Text Collection (748,583 records)

Showing 1 - 17of17 Results

Filters

Select All

Export Citations Save to Binder

per page:

Relevance

Article
February 2024
Benchmarking Modern Databases for Storing and Profiling Very Large Scale HPC Communication Data
Benchmarking, Measuring, and OptimizingDec 2023, Pages 104–119https://doi.org/10.1007/978-981-97-0316-6_7
Abstract
Capturing cross-stack profiling of communication on HPC systems at fine granularity is critical for gaining insights into the detailed performance trade-offs and interplay among various components of HPC ecosystem. To enable this, one needs to be ...
0
Metrics
Total Citations0
research-article
November 2023
Democratizing HPC Access and Use with Knowledge Graphs
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023, Pages 243–251https://doi.org/10.1145/3624062.3624094

The field of High-Performance Computing (HPC) is undergoing rapid evolution, with an expanding and diverse user base harnessing its unparalleled computational capabilities. As the range of HPC applications grows, newcomers to the field are faced with the ...
0
63
Metrics
Total Citations0
Total Downloads63
Last 12 Months63
Last 6 weeks4
Get Access
Article
May 2023
SAI: AI-Enabled Speech Assistant Interface for Science Gateways in HPC
High Performance ComputingMay 2023, Pages 402–424https://doi.org/10.1007/978-3-031-32041-5_21
Abstract
High-Performance Computing (HPC) is increasingly being used in traditional scientific domains as well as emerging areas like Deep Learning (DL). This has led to a diverse set of professionals who interact with state-of-the-art HPC systems. The ...
0
Metrics
Total Citations0
Article
May 2022
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters
High Performance ComputingMay 2022, Pages 3–25https://doi.org/10.1007/978-3-031-07312-0_1
Abstract
As more High-Performance Computing (HPC) and Deep Learning (DL) applications are adapting to scale using GPUs, the communication of GPU-resident data is becoming vital to end-to-end application performance. Among the available MPI operations in ...
1
Metrics
Total Citations1
research-article
July 2020
Frontera: The Evolution of Leadership Computing at the National Science Foundation
PEARC '20: Practice and Experience in Advanced Research Computing 2020: Catch the WaveJuly 2020, Pages 106–111https://doi.org/10.1145/3311790.3396656

As part of the NSF’s cyberinfrastructure vision for a robust mix of high capability and capacity HPC systems, Frontera represents the most recent evolution of trans-petascale resources available to all open science research projects in the U.S. Debuting ...
72
418
Metrics
Total Citations72
Total Downloads418
Last 12 Months104
Last 6 weeks13
1
Supplementary Material
3311790.3396656.mp4
Get Access
Upcoming Conferences

PEARC '24

July 21 - 25, 2024

RICC, Providence, RI, USA

PEARC '24 Website
research-article
June 2020
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems
ICS '20: Proceedings of the 34th ACM International Conference on SupercomputingJune 2020, Article No.: 6, Pages 1–12https://doi.org/10.1145/3392717.3392771

The advanced fabrics like NVIDIA NVLink are enabling the deployment of dense Graphics Processing Unit (GPU) systems such as DGX-2 and Summit. With the wide adoption of large-scale GPU-enabled systems for distributed deep learning (DL) training, it is ...
29
613
Metrics
Total Citations29
Total Downloads613
Last 12 Months148
Last 6 weeks7
Get Access
research-article
July 2019
Artifacts Available
Cooperative rendezvous protocols for improved performance and overlap
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisNovember 2018, Article No.: 28, Pages 1–13https://doi.org/10.1109/SC.2018.00031

With the emergence of larger multi-/many-core clusters and new areas of HPC applications, performance of large message communication is becoming more important. MPI libraries use different rendezvous protocols to perform large message communication. ...
1
43
Metrics
Total Citations1
Total Downloads43
Last 12 Months6
Last 6 weeks2
Get Access
research-article
July 2019
Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2?
Parallel Computing (PACO), Volume 85, Issue CJul 2019, Pages 141–152https://doi.org/10.1016/j.parco.2019.03.005
Highlights

Propose and design new MPI_Bcast algorithms and mechanisms that provide efficient GPU-based communication across all message sizes for emerging Deep Learning ...
Abstract
Traditionally, MPI runtimes have been designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and GPU clusters with a relatively smaller number of nodes, efficient communication schemes ...
6
Metrics
Total Citations6
research-article
April 2019
Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures
GPGPU '19: Proceedings of the 12th Workshop on General Purpose Processing Using GPUsApril 2019, Pages 43–52https://doi.org/10.1145/3300053.3319419

The CUDA Unified Memory (UM) interface enables a significantly simpler programming paradigm and has the potential to fundamentally change the way programmers write CUDA applications in the future. Although UM leads to high productivity in programming ...
8
633
Metrics
Total Citations8
Total Downloads633
Last 12 Months59
Last 6 weeks5
Get Access
tutorial
February 2019
High performance distributed deep learning: a beginner's guide
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel ProgrammingFebruary 2019, Pages 452–454https://doi.org/10.1145/3293883.3302260

The current wave of advances in Deep Learning (DL) has led to many exciting challenges and opportunities for Computer Science and Artificial Intelligence researchers alike. Modern DL frameworks like Caffe2, TensorFlow, Cognitive Toolkit (CNTK), PyTorch, ...
4
494
Metrics
Total Citations4
Total Downloads494
Last 12 Months37
Last 6 weeks5
Get Access
research-article
November 2018
Artifacts Available
Cooperative rendezvous protocols for improved performance and overlap
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisNovember 2018, Article No.: 28, Pages 1–13

With the emergence of larger multi-/many-core clusters and new areas of HPC applications, performance of large message communication is becoming more important. MPI libraries use different rendezvous protocols to perform large message communication. ...
0
172
Metrics
Total Citations0
Total Downloads172
Last 12 Months3
Last 6 weeks1
Get Access
research-article
October 2018
A 1 PB/s file system to checkpoint three million MPI tasks
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computingJune 2013, Pages 143–154https://doi.org/10.1145/2462902.2462908

With the massive scale of high-performance computing systems, long-running scientific parallel applications periodically save the state of their execution to files called checkpoints to recover from system failures. Checkpoints are stored on external ...
29
82
Metrics
Total Citations29
Total Downloads82
Last 12 Months48
Last 6 weeks2
Get Access
research-article
September 2018
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
EuroMPI '18: Proceedings of the 25th European MPI Users' Group MeetingSeptember 2018, Article No.: 2, Pages 1–9https://doi.org/10.1145/3236367.3236381

Traditionally, MPI runtimes have been designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and dense multi-GPU systems, it has become important to design efficient communication schemes. This coupled with ...
21
525
Metrics
Total Citations21
Total Downloads525
Last 12 Months84
Last 6 weeks12
Get Access
research-article
September 2018
Efficient Asynchronous Communication Progress for MPI without Dedicated Resources
EuroMPI '18: Proceedings of the 25th European MPI Users' Group MeetingSeptember 2018, Article No.: 14, Pages 1–11https://doi.org/10.1145/3236367.3236376

The overlap of computation and communication is critical for good performance of many HPC applications. State-of-the-art designs for the asynchronous progress require specially designed hardware resources (advanced switches or network interface cards), ...
10
345
Metrics
Total Citations10
Total Downloads345
Last 12 Months30
Last 6 weeks1
Get Access
research-article
September 2018
Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures
EuroMPI '18: Proceedings of the 25th European MPI Users' Group MeetingSeptember 2018, Article No.: 4, Pages 1–10https://doi.org/10.1145/3236367.3236371

Intel Knights Landing (KNL) and IBM POWER architectures are becoming widely deployed on modern supercomputing systems due to its powerful components. MPI Remote Memory Access (RMA) model that provides one-sided communication semantics has been seen as ...
0
99
Metrics
Total Citations0
Total Downloads99
Last 12 Months8
Last 6 weeks0
Get Access
research-article
June 2013
A 1 PB/s file system to checkpoint three million MPI tasks
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computingJune 2013, Pages 143–154https://doi.org/10.1145/2493123.2462908

With the massive scale of high-performance computing systems, long-running scientific parallel applications periodically save the state of their execution to files called checkpoints to recover from system failures. Checkpoints are stored on external ...
48
484
Metrics
Total Citations48
Total Downloads484
Last 12 Months33
Last 6 weeks3
Get Access
research-article
May 2013
SR-IOV support for virtualization on infiniband clusters: early experience
CCGRID '13: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid ComputingMay 2013, Pages 385–392https://doi.org/10.1109/CCGrid.2013.76

High Performance Computing (HPC) systems are becoming increasingly complex and are also associated with very high operational costs. The cloud computing paradigm, coupled with modern Virtual Machine (VM) technology offers attractive techniques to easily ...
10
25
Metrics
Total Citations10
Total Downloads25
Last 12 Months1
Last 6 weeks0
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Caption

Benchmarking Modern Databases for Storing and Profiling Very Large Scale HPC Communication Data

Democratizing HPC Access and Use with Knowledge Graphs

SAI: AI-Enabled Speech Assistant Interface for Science Gateways in HPC

Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters

Frontera: The Evolution of Leadership Computing at the National Science Foundation

Upcoming Conferences

NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems

Cooperative rendezvous protocols for improved performance and overlap

Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2?

Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures

High performance distributed deep learning: a beginner's guide

Cooperative rendezvous protocols for improved performance and overlap

A 1 PB/s file system to checkpoint three million MPI tasks

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

Efficient Asynchronous Communication Progress for MPI without Dedicated Resources

Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures

A 1 PB/s file system to checkpoint three million MPI tasks

SR-IOV support for virtualization on infiniband clusters: early experience