Keyword: MPI : Search

research-article

Enabling high-level parallel programming on multi-FPGA clusters

HEART '24: Proceedings of the 14th International Symposium on Highly Efficient Accelerators and Reconfigurable TechnologiesJune 2024, Pages 1–9https://doi.org/10.1145/3665283.3665292

Field Programmable Gate Arrays (FPGA) are still relatively new in the High Performance Computing (HPC) field. Hence, they still lack a mature ecosystem that allows non-FPGA experts to scale an application with many devices operating in parallel. In this ...

research-article

An Illustration of Extending Hedgehog to Multi-Node GPU Architectures Using GEMM

SN Computer Science (SNCS), Volume 5, Issue 5Jun 2024https://doi.org/10.1007/s42979-024-02917-y

Abstract

Asynchronous task-based systems offer the possibility of making it easier to take advantage of scalable heterogeneous architectures. This paper extends the previous work, demonstrating how Hedgehog, a dataflow graph-based model developed at the ...

research-article

A Portable and Efficient Lagrangian Particle Capability for Idealized Atmospheric Phenomena

PASC '24: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2024, Article No.: 26, Pages 1–11https://doi.org/10.1145/3659914.3659940

The Cloud Model version 1 is an atmospheric model that allows for idealized studies of atmospheric phenomena. A new Lagrangian microphysics capability has been added, enabling a significantly more accurate representation than the traditional bulk or ...

research-article

Performance Analysis and Optimizations of ERO2.0 Fusion Code

PASC '24: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2024, Article No.: 18, Pages 1–11https://doi.org/10.1145/3659914.3659932

In this paper, we present a thorough performance analysis of a highly parallel Monte Carlo code for modeling global erosion and redeposition in fusion devices, ERO2.0. The study shows that the main bottleneck preventing the code from efficiently using ...

research-article

Efficient Parallel Strategies For Conjugate Heat Transfer Problems

PASC '24: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2024, Article No.: 16, Pages 1–11https://doi.org/10.1145/3659914.3659930

Historically, temperature boundary conditions in thermal fluids have conventionally been approached as Robin-type boundary conditions. However, with the emergence of supercomputing capabilities, there is now the opportunity to explore the solution of ...

research-article

Applying dynamic balancing to improve the performance of MPI parallel genomics applications

SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied ComputingApril 2024, Pages 506–514https://doi.org/10.1145/3605098.3635986

Genomics applications are becoming more and more important in the field of bioinformatics, as they allow researchers to extract meaningful information from the huge amount of data generated by the new sequencing technologies. The analysis of these data ...

research-article

Open Access

A shared compilation stack for distributed-memory parallelism in stencil DSLs

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3April 2024, Pages 38–56https://doi.org/10.1145/3620666.3651344

Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target ...

Article

Algorithm Selection of MPI Collectives Considering System Utilization

Euro-Par 2023: Parallel Processing WorkshopsAug 2023, Pages 302–307https://doi.org/10.1007/978-3-031-48803-0_37

Abstract

MPI collective communications play an important role in coordinating and exchanging data among parallel processes in high performance computing. Various algorithms exist for implementing MPI collectives, each of which exhibits different ...

research-article

Enhancing Intra-Node GPU-to-GPU Performance in MPI+UCX through Multi-Path Communication

ExHET '24: Proceedings of the 3rd International Workshop on Extreme Heterogeneity SolutionsMarch 2024, Pages 9–14https://doi.org/10.1145/3642961.3643800

Efficient communication among GPUs is crucial for achieving high performance in modern GPU-accelerated applications. This paper introduces a multi-path communication framework within the MPI+UCX library to enhance P2P communication performance between ...

research-article

Analysis and prediction of performance variability in large-scale computing systems

The Journal of Supercomputing (JSCO), Volume 80, Issue 10Jul 2024, Pages 14978–15005https://doi.org/10.1007/s11227-024-06040-w

Abstract

The development of new exascale supercomputers has dramatically increased the need for fast, high-performance networking technology. Efficient network topologies, such as Dragonfly+, have been introduced to meet the demands of data-intensive ...

research-article

Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs

International Journal of Parallel Programming (IJPP), Volume 52, Issue 3Jun 2024, Pages 171–186https://doi.org/10.1007/s10766-024-00767-y

Abstract

Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a ...

research-article

PARamrfinder: detecting allele-specific DNA methylation on multicore clusters

The Journal of Supercomputing (JSCO), Volume 80, Issue 10Jul 2024, Pages 14573–14599https://doi.org/10.1007/s11227-024-05939-8

Abstract

The discovery of Allele-Specific Methylation (ASM) is an important research field in biology as it regulates genomic imprinting, which has been identified as the cause of some genetic diseases. Nevertheless, the high computational cost of the ...

Article

Performance Evaluation of Spark, Ray and MPI: A Case Study on Long Read Alignment Algorithm

Algorithms and Architectures for Parallel ProcessingOct 2023, Pages 57–76https://doi.org/10.1007/978-981-97-0798-0_4

Abstract

The utilization of large-scale datasets in various fields is increasing due to the advancement of big data technology. Due to limited computing resources, traditional serial frameworks are no longer efficient in processing such massive data. ...

research-article

Open Access

Parallelized Remapping Algorithms for km-scale Global Weather and Climate Simulations with Icosahedral Grid System

HPCAsia '24: Proceedings of the International Conference on High Performance Computing in Asia-Pacific RegionJanuary 2024, Pages 35–46https://doi.org/10.1145/3635035.3635040

In weather and climate research, latitude–longitude grid data are typically used for analysis and visualization, and remapping from model native grids to latitude–longitude grids typically requires a significant amount of time. Here, we developed a ...

research-article

Open Access

Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU Parallelism

HPCAsia '24: Proceedings of the International Conference on High Performance Computing in Asia-Pacific RegionJanuary 2024, Pages 1–11https://doi.org/10.1145/3635035.3635036

GPUs are increasingly popular in HPC systems, and more applications are adopting GPUs each day. However, the control synchronization of GPUs with CPUs is suboptimal and only possible after GPU kernel termination points, resulting in serialized host and ...

research-article

An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX

HPCAsia '24 Workshops: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region WorkshopsJanuary 2024, Pages 7–16https://doi.org/10.1145/3636480.3637094

The adoption of ARM processor architectures is on the rise in the HPC ecosystem. Fugaku supercomputer is a homogeneous ARM-based machine, and is one among the most powerful machine in the world. In the programming world, dependent task-based programming ...

Article

Parallel Algorithm for Source Type Recovering by the Time Reversal Mirror

SupercomputingSep 2023, Pages 267–281https://doi.org/10.1007/978-3-031-49435-2_19

Abstract

The problem of reconstructing the location and type of a seismic source in a three-dimensional medium based on seismic observations is considered. The Time Reverse Mirror (TRM) method is used to solve this problem. Recently, the TRM method has ...

research-article

SUARA: A scalable universal allreduce communication algorithm for acceleration of parallel deep learning applications

Journal of Parallel and Distributed Computing (JPDC), Volume 183, Issue CJan 2024https://doi.org/10.1016/j.jpdc.2023.104767

Abstract

Parallel and distributed deep learning (PDNN) has become an effective strategy to reduce the long training times of large-scale deep neural networks. Mainstream PDNN software packages based on the message-passing interface (MPI) and employing ...

Highlights

A novel scalable universal allreduce collective algorithm called SUARA.
An optimized Open MPI SUARA implementation, SUARA2, with speedup O ( P ).
2x practical speedup of SUARA2 over native Open MPI allreduce for P = 1024 processes.

short-paper

Open Access

Verifying Performance Guidelines for MPI Collectives at Scale

Sascha Hunold

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023, Pages 1264–1268https://doi.org/10.1145/3624062.3625532

MPI collective communication operations are crucial for high-performance computing, making the efficient implementation of collective algorithms essential for optimal application performance. While most MPI libraries provide several algorithms for a ...

research-article

Open Access

shmem4py: High-Performance One-Sided Communication for Python Applications

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023, Pages 1185–1193https://doi.org/10.1145/3624062.3624602

This paper describes shmem4py, a Python wrapper for the OpenSHMEM application programming interface (API) which follows a design similar to that of the well-known mpi4py package. OpenSHMEM is a descendant of the one-sided communication library for the ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences