Concurrent computing methodologies

Applied Filters

Publications

Conferences

Reproducibility Badges

Publication Date

17 Results for: Book/Issue: SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,846,495 records)|Limit your search to The ACM Full-Text Collection (775,930 records)

Showing 1 - 17of17 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
Free
February 2025
Integrating ORNL's HPC and Neutron Facilities with a Performance-Portable CPU/GPU Ecosystem
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 2107–2117https://doi.org/10.1109/SCW63240.2024.00264

We explore the development of a performance-portable CPU/GPU ecosystem to integrate two of the US Department of Energy's (DOE's) largest scientific instruments, the Oak Ridge Leadership Computing facility and the Spallation Neutron Source (SNS), both of ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
View online with eReader
PDF
research-article
Free
February 2025
Productive, Vendor-Neutral GPU Programming Using Chapel
- Engin Kayraklioglu,
- Andy Stone
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1914–1922https://doi.org/10.1109/SCW63240.2024.00241

HPC programming ecosystem is mostly based on sequential C/C++/Fortran languages. These fundamental languages are then augmented with other frameworks such as MPI and OpenMP to enable different types of parallelism. Increased prevalence of GPUs in HPC and ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
View online with eReader
PDF
research-article
Free
February 2025
Artifacts Available / v1.1
Portability of Fortran's 'do concurrent' on GPUs
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1904–1913https://doi.org/10.1109/SCW63240.2024.00240

There is a continuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the do concurrent (DC) loop has been successfully demonstrated on the NVIDIA ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
View online with eReader
PDF
research-article
Free
February 2025
Copper: Cooperative Caching Layer for Scalable Data Loading in Exascale Supercomputers
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1320–1329https://doi.org/10.1109/SCW63240.2024.00173

Job initialization time of dynamic executables increases as HPC jobs launch on a larger number of nodes and processes. This is due to the processes flooding the storage system with a tremendous number of I/O requests for the same files leading to ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
February 2025
Artifacts Available / v1.1
Mitigating synchronization bottlenecks in high-performance actor-model-based software
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1274–1287https://doi.org/10.1109/SCW63240.2024.00168

Bulk synchronous programming (in distributedmemory systems) and the fork-join pattern (in shared-memory systems) are often used for problems where independent processes must periodically synchronize. Frequent synchronization can greatly undermine the ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
February 2025
Accelerating Multi-GPU Embedding Retrieval with PGAS-Style Communication for Deep Learning Recommendation Systems
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1262–1273https://doi.org/10.1109/SCW63240.2024.00167

In this paper, we propose using Partitioned Global Address Space (PGAS) GPU one-sided asynchronous small messages to replace the widely used collective communication calls for sparse input multi-GPU embedding retrieval in deep learning recommendation ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
View online with eReader
PDF
research-article
Free
February 2025
Results Reproduced / v1.1
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Speeding-Up LULESH on HPX: Useful Tricks and Lessons Learned using a Many-Task-Based Approach
- Torben Kalkhof,
- Andreas Koch
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1223–1235https://doi.org/10.1109/SCW63240.2024.00164

Current programming models face challenges in dealing with modern supercomputers' growing parallelism and heterogeneity. Emerging programming models, like the task-based programming model found in the asynchronous many-task HPX programming framework, ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
February 2025
Performance Portable Optimizations of an Ice-sheet Modeling Code on GPU-supercomputers
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1141–1151https://doi.org/10.1109/SCW63240.2024.00156

In this paper, we present GPU-optimizations for an ice-sheet modeling code known as MPAS-Albany Land Ice (MALI). MALI is a C++ template code that leverages the Kokkos programming model for portability and the Trilinos library for data structures, ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
View online with eReader
PDF
research-article
Free
February 2025
Optimizing MILC-Dslash Performance on NVIDIA A100 GPU: Parallel Strategies using SYCL
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1106–1116https://doi.org/10.1109/SCW63240.2024.00151

MILC-Dslash is a benchmark that is derived from the MILC code which simulates lattice-gauge theory on a four-dimensional hypercube. This paper outlines a gradual progression in increasing the granularity of parallelism in the MILC-Dslash kernel using the ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
February 2025
Sum Reduction with OpenMP Offload on NVIDIA Grace-Hopper System
- Zheming Jin
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1006–1013https://doi.org/10.1109/SCW63240.2024.00140

Sum reduction is a primitive operation in parallel computing. With OpenMP directives that enable data and computation offload to a graphics processing unit (GPU), we annotate the serial sum reduction with the OpenMP directives and evaluate the ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
View online with eReader
PDF
research-article
Free
February 2025
ACID Support for Compute eXpress Link Memory Transactions
- Ellis Giles,
- Peter Varman
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 982–995https://doi.org/10.1109/SCW63240.2024.00138

With the recent explosive growth in worldwide data and data processing demands, the need to support a large volume of transactions on shared data is increasing in both high performance computing and datacenter processing. A recent innovation in server ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
February 2025
Parallel Runtime Interface for Fortran (PRIF): A Multi-Image Solution for LLVM Flang
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 950–960https://doi.org/10.1109/SCW63240.2024.00134

Fortran compilers that provide support for Fortran's native parallel features often do so with a runtime library that depends on details of both the compiler implementation and the communication library, while others provide limited or no support at all. ...
0
1
Metrics
Total Citations0
Total Downloads1
Last 12 Months1
Last 6 weeks1
View online with eReader
PDF
research-article
Free
February 2025
Pragma driven shared memory parallelism in Zig by supporting OpenMP loop directives
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 930–938https://doi.org/10.1109/SCW63240.2024.00132

The Zig programming language, which is designed to provide performance and safety as first class concerns, has become popular in recent years. Given that Zig is built upon LLVM, and-so enjoys many of the benefits provided by the ecosystem, including ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
February 2025
Shared Memory-Aware Latency-Sensitive Message Aggregation for Fine-Grained Communication
- Kavitha Chandrasekar,
- Laxmikant Kale
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 682–687https://doi.org/10.1109/SCW63240.2024.00095

Message aggregation is widely used with a goal to reduce communication cost in HPC applications. The difference in the order of overhead of sending a message and cost of per byte transferred motivates the need for message aggregation, for several ...
1
Metrics
Total Citations1
View online with eReader
PDF
research-article
Free
February 2025
Offloaded MPI message matching: an optimistic approach
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 457–469https://doi.org/10.1109/SCW63240.2024.00067

Message matching is a critical process ensuring the correct delivery of messages in distributed and HPC environments. The advent of SmartNICs presents an opportunity to develop offloaded message-matching approaches that leverage this on-NIC programmable ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
February 2025
Modes, Persistence and Orthogonality: Blowing MPI Up
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 404–413https://doi.org/10.1109/SCW63240.2024.00061

The Message-Passing Interface (MPI) specification provides a restricted form of persistence in point-to-point and collective communication operations that purportedly enables libraries to amortize precomputation and setup costs over longer sequences of ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
February 2025
Introduction to Parallel and Distributed Programming using N-Body Simulations
SC-W '24: Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and AnalysisPages 347–354https://doi.org/10.1109/SCW63240.2024.00052

This paper describes how we use n-body simulations as an interesting and visually compelling way to teach efficient, parallel, and distributed programming. Our first course focuses on bachelor students introducing them to algorithmic complexities and ...
0
Metrics
Total Citations0
View online with eReader
PDF

Applied Filters

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Results

Integrating ORNL's HPC and Neutron Facilities with a Performance-Portable CPU/GPU Ecosystem

Productive, Vendor-Neutral GPU Programming Using Chapel

Portability of Fortran's 'do concurrent' on GPUs

Copper: Cooperative Caching Layer for Scalable Data Loading in Exascale Supercomputers

Mitigating synchronization bottlenecks in high-performance actor-model-based software

Accelerating Multi-GPU Embedding Retrieval with PGAS-Style Communication for Deep Learning Recommendation Systems

Speeding-Up LULESH on HPX: Useful Tricks and Lessons Learned using a Many-Task-Based Approach

Performance Portable Optimizations of an Ice-sheet Modeling Code on GPU-supercomputers

Optimizing MILC-Dslash Performance on NVIDIA A100 GPU: Parallel Strategies using SYCL

Sum Reduction with OpenMP Offload on NVIDIA Grace-Hopper System

ACID Support for Compute eXpress Link Memory Transactions

Parallel Runtime Interface for Fortran (PRIF): A Multi-Image Solution for LLVM Flang

Pragma driven shared memory parallelism in Zig by supporting OpenMP loop directives

Shared Memory-Aware Latency-Sensitive Message Aggregation for Fine-Grained Communication

Offloaded MPI message matching: an optimistic approach

Modes, Persistence and Orthogonality: Blowing MPI Up

Introduction to Parallel and Distributed Programming using N-Body Simulations