Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2024
Enabling high-level parallel programming on multi-FPGA clusters
HEART '24: Proceedings of the 14th International Symposium on Highly Efficient Accelerators and Reconfigurable TechnologiesJune 2024, Pages 1–9https://doi.org/10.1145/3665283.3665292Field Programmable Gate Arrays (FPGA) are still relatively new in the High Performance Computing (HPC) field. Hence, they still lack a mature ecosystem that allows non-FPGA experts to scale an application with many devices operating in parallel. In this ...
- research-articleJune 2024
An Illustration of Extending Hedgehog to Multi-Node GPU Architectures Using GEMM
AbstractAsynchronous task-based systems offer the possibility of making it easier to take advantage of scalable heterogeneous architectures. This paper extends the previous work, demonstrating how Hedgehog, a dataflow graph-based model developed at the ...
- research-articleJune 2024
A Portable and Efficient Lagrangian Particle Capability for Idealized Atmospheric Phenomena
PASC '24: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2024, Article No.: 26, Pages 1–11https://doi.org/10.1145/3659914.3659940The Cloud Model version 1 is an atmospheric model that allows for idealized studies of atmospheric phenomena. A new Lagrangian microphysics capability has been added, enabling a significantly more accurate representation than the traditional bulk or ...
- research-articleJune 2024
Performance Analysis and Optimizations of ERO2.0 Fusion Code
PASC '24: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2024, Article No.: 18, Pages 1–11https://doi.org/10.1145/3659914.3659932In this paper, we present a thorough performance analysis of a highly parallel Monte Carlo code for modeling global erosion and redeposition in fusion devices, ERO2.0. The study shows that the main bottleneck preventing the code from efficiently using ...
- research-articleJune 2024
Efficient Parallel Strategies For Conjugate Heat Transfer Problems
PASC '24: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2024, Article No.: 16, Pages 1–11https://doi.org/10.1145/3659914.3659930Historically, temperature boundary conditions in thermal fluids have conventionally been approached as Robin-type boundary conditions. However, with the emergence of supercomputing capabilities, there is now the opportunity to explore the solution of ...
-
- research-articleMay 2024
Applying dynamic balancing to improve the performance of MPI parallel genomics applications
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied ComputingApril 2024, Pages 506–514https://doi.org/10.1145/3605098.3635986Genomics applications are becoming more and more important in the field of bioinformatics, as they allow researchers to extract meaningful information from the huge amount of data generated by the new sequencing technologies. The analysis of these data ...
- research-articleApril 2024
A shared compilation stack for distributed-memory parallelism in stencil DSLs
- George Bisbas,
- Anton Lydike,
- Emilien Bauer,
- Nick Brown,
- Mathieu Fehr,
- Lawrence Mitchell,
- Gabriel Rodriguez-Canal,
- Maurice Jamieson,
- Paul H. J. Kelly,
- Michel Steuwer,
- Tobias Grosser
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3April 2024, Pages 38–56https://doi.org/10.1145/3620666.3651344Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target ...
- ArticleApril 2024
Algorithm Selection of MPI Collectives Considering System Utilization
Euro-Par 2023: Parallel Processing WorkshopsAug 2023, Pages 302–307https://doi.org/10.1007/978-3-031-48803-0_37AbstractMPI collective communications play an important role in coordinating and exchanging data among parallel processes in high performance computing. Various algorithms exist for implementing MPI collectives, each of which exhibits different ...
- research-articleApril 2024
Enhancing Intra-Node GPU-to-GPU Performance in MPI+UCX through Multi-Path Communication
ExHET '24: Proceedings of the 3rd International Workshop on Extreme Heterogeneity SolutionsMarch 2024, Pages 9–14https://doi.org/10.1145/3642961.3643800Efficient communication among GPUs is crucial for achieving high performance in modern GPU-accelerated applications. This paper introduces a multi-path communication framework within the MPI+UCX library to enhance P2P communication performance between ...
- research-articleMarch 2024
Analysis and prediction of performance variability in large-scale computing systems
The Journal of Supercomputing (JSCO), Volume 80, Issue 10Jul 2024, Pages 14978–15005https://doi.org/10.1007/s11227-024-06040-wAbstractThe development of new exascale supercomputers has dramatically increased the need for fast, high-performance networking technology. Efficient network topologies, such as Dragonfly+, have been introduced to meet the demands of data-intensive ...
- research-articleMarch 2024
Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs
International Journal of Parallel Programming (IJPP), Volume 52, Issue 3Jun 2024, Pages 171–186https://doi.org/10.1007/s10766-024-00767-yAbstractCollective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a ...
- research-articleMarch 2024
PARamrfinder: detecting allele-specific DNA methylation on multicore clusters
The Journal of Supercomputing (JSCO), Volume 80, Issue 10Jul 2024, Pages 14573–14599https://doi.org/10.1007/s11227-024-05939-8AbstractThe discovery of Allele-Specific Methylation (ASM) is an important research field in biology as it regulates genomic imprinting, which has been identified as the cause of some genetic diseases. Nevertheless, the high computational cost of the ...
- ArticleMarch 2024
Performance Evaluation of Spark, Ray and MPI: A Case Study on Long Read Alignment Algorithm
Algorithms and Architectures for Parallel ProcessingOct 2023, Pages 57–76https://doi.org/10.1007/978-981-97-0798-0_4AbstractThe utilization of large-scale datasets in various fields is increasing due to the advancement of big data technology. Due to limited computing resources, traditional serial frameworks are no longer efficient in processing such massive data. ...
- research-articleJanuary 2024
Parallelized Remapping Algorithms for km-scale Global Weather and Climate Simulations with Icosahedral Grid System
HPCAsia '24: Proceedings of the International Conference on High Performance Computing in Asia-Pacific RegionJanuary 2024, Pages 35–46https://doi.org/10.1145/3635035.3635040In weather and climate research, latitude–longitude grid data are typically used for analysis and visualization, and remapping from model native grids to latitude–longitude grids typically requires a significant amount of time. Here, we developed a ...
- research-articleJanuary 2024
Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU Parallelism
HPCAsia '24: Proceedings of the International Conference on High Performance Computing in Asia-Pacific RegionJanuary 2024, Pages 1–11https://doi.org/10.1145/3635035.3635036GPUs are increasingly popular in HPC systems, and more applications are adopting GPUs each day. However, the control synchronization of GPUs with CPUs is suboptimal and only possible after GPU kernel termination points, resulting in serialized host and ...
- research-articleJanuary 2024
An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX
- Romain Pereira,
- Adrien Roussel,
- Miwako Tsuji,
- Patrick Carribault,
- Mitsuhisa Sato,
- Hitoshi Murai,
- Thierry Gautier
HPCAsia '24 Workshops: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region WorkshopsJanuary 2024, Pages 7–16https://doi.org/10.1145/3636480.3637094The adoption of ARM processor architectures is on the rise in the HPC ecosystem. Fugaku supercomputer is a homogeneous ARM-based machine, and is one among the most powerful machine in the world. In the programming world, dependent task-based programming ...
- ArticleJanuary 2024
Parallel Algorithm for Source Type Recovering by the Time Reversal Mirror
AbstractThe problem of reconstructing the location and type of a seismic source in a three-dimensional medium based on seismic observations is considered. The Time Reverse Mirror (TRM) method is used to solve this problem. Recently, the TRM method has ...
- research-articleJanuary 2024
SUARA: A scalable universal allreduce communication algorithm for acceleration of parallel deep learning applications
Journal of Parallel and Distributed Computing (JPDC), Volume 183, Issue CJan 2024https://doi.org/10.1016/j.jpdc.2023.104767AbstractParallel and distributed deep learning (PDNN) has become an effective strategy to reduce the long training times of large-scale deep neural networks. Mainstream PDNN software packages based on the message-passing interface (MPI) and employing ...
Highlights
- A novel scalable universal allreduce collective algorithm called SUARA.
- An optimized Open MPI SUARA implementation, SUARA2, with speedup O ( P ).
- 2x practical speedup of SUARA2 over native Open MPI allreduce for P = 1024 processes.
- short-paperNovember 2023
Verifying Performance Guidelines for MPI Collectives at Scale
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023, Pages 1264–1268https://doi.org/10.1145/3624062.3625532MPI collective communication operations are crucial for high-performance computing, making the efficient implementation of collective algorithms essential for optimal application performance. While most MPI libraries provide several algorithms for a ...
- research-articleNovember 2023
shmem4py: High-Performance One-Sided Communication for Python Applications
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023, Pages 1185–1193https://doi.org/10.1145/3624062.3624602This paper describes shmem4py, a Python wrapper for the OpenSHMEM application programming interface (API) which follows a design similar to that of the well-known mpi4py package. OpenSHMEM is a descendant of the one-sided communication library for the ...