Massively parallel algorithms

Applied Filters

People

Publications

Conferences

Publication Date

Searched The ACM Guide to Computing Literature (3,774,379 records)|Limit your search to The ACM Full-Text Collection (761,091 records)

Showing 1 - 20of33 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
Open Access
June 2024
An Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 286–297https://doi.org/10.1145/3650200.3656629

Embedding is a crucial step for deep neural networks. Datasets, from different applications, with different structures, can all be processed through an embedding layer and transformed into a dense matrix. The transformation must minimize both the loss ...
0
227
Metrics
Total Citations0
Total Downloads227
Last 12 Months227
Last 6 weeks89
View online with eReader
View this article in HTML format
PDF
research-article
June 2024
FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 38–49https://doi.org/10.1145/3650200.3656621

Probabilistic breadth-first traversals (BPTs) are used in many network science and graph machine learning applications. In this paper, we are motivated by the application of BPTs in stochastic diffusion-based graph problems such as influence ...
0
106
Metrics
Total Citations0
Total Downloads106
Last 12 Months106
Last 6 weeks16
Get Access
research-article
June 2024
DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 1–13https://doi.org/10.1145/3650200.3656600

The shortest paths problem is a fundamental challenge in graph theory, with a broad range of potential applications. The algorithms based on matrix multiplication exhibits excellent parallelism and scalability, but is constrained by high memory ...
0
111
Metrics
Total Citations0
Total Downloads111
Last 12 Months111
Last 6 weeks22
Get Access
research-article
June 2024
RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 537–548https://doi.org/10.1145/3650200.3656596

Top-k selection, which identifies the largest or smallest k elements from a data set, is a fundamental operation in data-intensive domains such as databases and deep learning, so its scalability and efficiency are critical for these high-performance ...
0
116
Metrics
Total Citations0
Total Downloads116
Last 12 Months116
Last 6 weeks20
Get Access
research-article
Open Access
June 2024
FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 511–524https://doi.org/10.1145/3650200.3656593

This paper introduces FASTEN, a cutting-edge library developed to address the computational challenges inherent in Heterogeneous Graph Neural Networks (HGNNs). The key focus of FASTEN is the optimization of segmented matrix multiplication, a critical ...
0
693
Metrics
Total Citations0
Total Downloads693
Last 12 Months693
Last 6 weeks224
View online with eReader
View this article in HTML format
PDF
research-article
Open Access
June 2023
Fast All-Pairs Shortest Paths Algorithm in Large Sparse Graph
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 277–288https://doi.org/10.1145/3577193.3593728

Finding the All-Pairs Shortest Paths (APSP) in a graph is the key for various domains. Motivated by the graphs are sparse in most real-world applications, we store the whole graph as a compressed storage format in each process of the distributed ...
2
813
Metrics
Total Citations2
Total Downloads813
Last 12 Months670
Last 6 weeks64
View online with eReader
PDF
research-article
June 2023
Optimizing Multi-grid Computation and Parallelization on Multi-cores
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 227–239https://doi.org/10.1145/3577193.3593726

Multigrid algorithms are widely used to solve large-scale sparse linear systems, which is essential for many high-performance workloads. The symmetric Gauss-Seidel (SYMGS) method is often responsible for the performance bottleneck of MG. This paper ...
3
164
Metrics
Total Citations3
Total Downloads164
Last 12 Months106
Last 6 weeks9
Get Access
research-article
June 2023
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 264–276https://doi.org/10.1145/3577193.3593725

Recent studies have shown that Binary Graph Neural Networks (GNNs) are promising for saving computations of GNNs through binarized tensors. Prior work, however, mainly focused on algorithm designs or training techniques, leaving it open to how to ...
2
165
Metrics
Total Citations2
Total Downloads165
Last 12 Months101
Last 6 weeks5
Get Access
research-article
Open Access
June 2023
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 360–372https://doi.org/10.1145/3577193.3593715

General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing since an efficient GEMM implementation is essential for the performance of these calculations. While researchers often ...
2
613
Metrics
Total Citations2
Total Downloads613
Last 12 Months502
Last 6 weeks62
View online with eReader
PDF
research-article
Open Access
June 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 203–214https://doi.org/10.1145/3577193.3593704

Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert blocks to a base model, increasing the number of parameters without impacting computational costs. However, current distributed deep learning frameworks are ...
2
1,051
Metrics
Total Citations2
Total Downloads1,051
Last 12 Months832
Last 6 weeks76
View online with eReader
PDF
research-article
June 2023
Accelerating BWA-MEM Read Mapping on GPUs
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 155–166https://doi.org/10.1145/3577193.3593703

Advancements in Next-Generation Sequencing (NGS) have significantly reduced the cost of generating DNA sequence data and increased the speed of data production. However, such high-throughput data production has increased the need for efficient data ...
1
372
Metrics
Total Citations1
Total Downloads372
Last 12 Months256
Last 6 weeks28
Get Access
research-article
June 2022
Seamless optimization of the GEMM kernel for task-based programming models
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 31, Pages 1–11https://doi.org/10.1145/3524059.3532385

The general matrix-matrix multiplication (GEMM) kernel is a fundamental building block of many scientific applications. Many libraries such as Intel MKL and BLIS provide highly optimized sequential and parallel versions of this kernel. The parallel ...
0
187
Metrics
Total Citations0
Total Downloads187
Last 12 Months47
Last 6 weeks1
Get Access
research-article
June 2022
Parallel K-clique counting on GPUs
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 21, Pages 1–14https://doi.org/10.1145/3524059.3532382

Counting k-cliques in a graph is an important problem in graph analysis with many applications such as community detection and graph partitioning. Counting k-cliques is typically done by traversing search trees starting at each vertex in the graph. ...
8
386
Metrics
Total Citations8
Total Downloads386
Last 12 Months137
Last 6 weeks14
Get Access
research-article
June 2022
AnySeq/GPU: a novel approach for faster sequence alignment on GPUs
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 20, Pages 1–11https://doi.org/10.1145/3524059.3532376

In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches ...
3
192
Metrics
Total Citations3
Total Downloads192
Last 12 Months71
Last 6 weeks10
Get Access
research-article
Open Access
June 2022
SnuQS: scaling quantum circuit simulation using storage devices
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 6, Pages 1–13https://doi.org/10.1145/3524059.3532375

Since the state-of-the-art quantum computers are still noisy and error-prone, classical simulation of quantum circuits is essential in verifying/calibrating quantum computers and prototyping/debugging complex quantum algorithms. Classical simulation of ...
1
940
Metrics
Total Citations1
Total Downloads940
Last 12 Months375
Last 6 weeks53
View online with eReader
PDF
research-article
June 2022
Handling heavy-tailed input of transformer inference on GPUs
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 38, Pages 1–11https://doi.org/10.1145/3524059.3532372

Transformer-based models achieve superior accuracy in the field of natural language processing (NLP) and start to be widely deployed in production. As a popular deployment device, graphic processing units (GPUs) basically adopt the batch processing ...
3
352
Metrics
Total Citations3
Total Downloads352
Last 12 Months106
Last 6 weeks5
Get Access
research-article
Open Access
June 2022
SnuHPL: high performance LINPACK for heterogeneous GPUs
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 18, Pages 1–12https://doi.org/10.1145/3524059.3532370

These days, it is typical for a large-scale cluster system to have different kinds of GPUs. However, HPL (High-Performance LINPACK), the de-facto standard LINPACK implementation for evaluating the performance of a cluster system, is originally designed ...
1
1,231
Metrics
Total Citations1
Total Downloads1,231
Last 12 Months555
Last 6 weeks41
View online with eReader
PDF
research-article
June 2022
Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores
- Zhuoran Ji,
- Cho-Li Wang
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 10, Pages 1–12https://doi.org/10.1145/3524059.3532368

Approximate nearest neighbor search plays a fundamental role in many areas, and the k-nearest neighbor graph (KNNG) becomes a promising solution, especially in high-dimensional space. The advantages of KNNG come at the expense of high construction time, ...
4
440
Metrics
Total Citations4
Total Downloads440
Last 12 Months196
Last 6 weeks7
Get Access
research-article
June 2022
MASTIFF: structure-aware minimum spanning tree/forest
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 9, Pages 1–13https://doi.org/10.1145/3524059.3532365

The Minimum Spanning Forest (MSF) problem finds usage in many different applications. While theoretical analysis shows that linear-time solutions exist, in practice, parallel MSF algorithms remain computationally demanding due to the continuously ...
2
165
Metrics
Total Citations2
Total Downloads165
Last 12 Months18
Last 6 weeks2
Get Access
research-article
Open Access
June 2021
Best Student Paper
HyQuas: hybrid partitioner based quantum circuit simulation system on GPU
ICS '21: Proceedings of the 35th ACM International Conference on SupercomputingPages 443–454https://doi.org/10.1145/3447818.3460357

Quantum computing has shown its strong potential in solving certain important problems. Due to the intrinsic limitations of current real quantum computers, quantum circuit simulation still plays an important role in both research and development of ...
12
1,711
Metrics
Total Citations12
Total Downloads1,711
Last 12 Months430
Last 6 weeks50
View online with eReader
PDF

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Publication Date

An Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer

FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems

DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs

RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection

FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks

Fast All-Pairs Shortest Paths Algorithm in Large Sparse Graph

Optimizing Multi-grid Computation and Parallelization on Multi-cores

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

Accelerating BWA-MEM Read Mapping on GPUs

Seamless optimization of the GEMM kernel for task-based programming models

Parallel K-clique counting on GPUs

AnySeq/GPU: a novel approach for faster sequence alignment on GPUs

SnuQS: scaling quantum circuit simulation using storage devices

Handling heavy-tailed input of transformer inference on GPUs

SnuHPL: high performance LINPACK for heterogeneous GPUs

Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores

MASTIFF: structure-aware minimum spanning tree/forest

HyQuas: hybrid partitioner based quantum circuit simulation system on GPU