Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2024
An Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 286–297https://doi.org/10.1145/3650200.3656629Embedding is a crucial step for deep neural networks. Datasets, from different applications, with different structures, can all be processed through an embedding layer and transformed into a dense matrix. The transformation must minimize both the loss ...
- research-articleJune 2024
FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems
- Reece Neff,
- Mostafa Eghbali Zarch,
- Marco Minutoli,
- Mahantesh Halappanavar,
- Antonino Tumeo,
- Ananth Kalyanaraman,
- Michela Becchi
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 38–49https://doi.org/10.1145/3650200.3656621Probabilistic breadth-first traversals (BPTs) are used in many network science and graph machine learning applications. In this paper, we are motivated by the application of BPTs in stochastic diffusion-based graph problems such as influence ...
- research-articleJune 2024
DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 1–13https://doi.org/10.1145/3650200.3656600The shortest paths problem is a fundamental challenge in graph theory, with a broad range of potential applications. The algorithms based on matrix multiplication exhibits excellent parallelism and scalability, but is constrained by high memory ...
- research-articleJune 2024
RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 537–548https://doi.org/10.1145/3650200.3656596Top-k selection, which identifies the largest or smallest k elements from a data set, is a fundamental operation in data-intensive domains such as databases and deep learning, so its scalability and efficiency are critical for these high-performance ...
- research-articleJune 2024
FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 511–524https://doi.org/10.1145/3650200.3656593This paper introduces FASTEN, a cutting-edge library developed to address the computational challenges inherent in Heterogeneous Graph Neural Networks (HGNNs). The key focus of FASTEN is the optimization of segmented matrix multiplication, a critical ...
-
- research-articleJune 2023
Fast All-Pairs Shortest Paths Algorithm in Large Sparse Graph
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 277–288https://doi.org/10.1145/3577193.3593728Finding the All-Pairs Shortest Paths (APSP) in a graph is the key for various domains. Motivated by the graphs are sparse in most real-world applications, we store the whole graph as a compressed storage format in each process of the distributed ...
- research-articleJune 2023
Optimizing Multi-grid Computation and Parallelization on Multi-cores
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 227–239https://doi.org/10.1145/3577193.3593726Multigrid algorithms are widely used to solve large-scale sparse linear systems, which is essential for many high-performance workloads. The symmetric Gauss-Seidel (SYMGS) method is often responsible for the performance bottleneck of MG. This paper ...
- research-articleJune 2023
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 264–276https://doi.org/10.1145/3577193.3593725Recent studies have shown that Binary Graph Neural Networks (GNNs) are promising for saving computations of GNNs through binarized tensors. Prior work, however, mainly focused on algorithm designs or training techniques, leaving it open to how to ...
- research-articleJune 2023
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 360–372https://doi.org/10.1145/3577193.3593715General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing since an efficient GEMM implementation is essential for the performance of these calculations. While researchers often ...
- research-articleJune 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 203–214https://doi.org/10.1145/3577193.3593704Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert blocks to a base model, increasing the number of parameters without impacting computational costs. However, current distributed deep learning frameworks are ...
- research-articleJune 2023
Accelerating BWA-MEM Read Mapping on GPUs
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 155–166https://doi.org/10.1145/3577193.3593703Advancements in Next-Generation Sequencing (NGS) have significantly reduced the cost of generating DNA sequence data and increased the speed of data production. However, such high-throughput data production has increased the need for efficient data ...
- research-articleJune 2022
Seamless optimization of the GEMM kernel for task-based programming models
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 31, Pages 1–11https://doi.org/10.1145/3524059.3532385The general matrix-matrix multiplication (GEMM) kernel is a fundamental building block of many scientific applications. Many libraries such as Intel MKL and BLIS provide highly optimized sequential and parallel versions of this kernel. The parallel ...
- research-articleJune 2022
Parallel K-clique counting on GPUs
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 21, Pages 1–14https://doi.org/10.1145/3524059.3532382Counting k-cliques in a graph is an important problem in graph analysis with many applications such as community detection and graph partitioning. Counting k-cliques is typically done by traversing search trees starting at each vertex in the graph. ...
- research-articleJune 2022
AnySeq/GPU: a novel approach for faster sequence alignment on GPUs
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 20, Pages 1–11https://doi.org/10.1145/3524059.3532376In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches ...
- research-articleJune 2022
SnuQS: scaling quantum circuit simulation using storage devices
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 6, Pages 1–13https://doi.org/10.1145/3524059.3532375Since the state-of-the-art quantum computers are still noisy and error-prone, classical simulation of quantum circuits is essential in verifying/calibrating quantum computers and prototyping/debugging complex quantum algorithms. Classical simulation of ...
- research-articleJune 2022
Handling heavy-tailed input of transformer inference on GPUs
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 38, Pages 1–11https://doi.org/10.1145/3524059.3532372Transformer-based models achieve superior accuracy in the field of natural language processing (NLP) and start to be widely deployed in production. As a popular deployment device, graphic processing units (GPUs) basically adopt the batch processing ...
- research-articleJune 2022
SnuHPL: high performance LINPACK for heterogeneous GPUs
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 18, Pages 1–12https://doi.org/10.1145/3524059.3532370These days, it is typical for a large-scale cluster system to have different kinds of GPUs. However, HPL (High-Performance LINPACK), the de-facto standard LINPACK implementation for evaluating the performance of a cluster system, is originally designed ...
- research-articleJune 2022
Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 10, Pages 1–12https://doi.org/10.1145/3524059.3532368Approximate nearest neighbor search plays a fundamental role in many areas, and the k-nearest neighbor graph (KNNG) becomes a promising solution, especially in high-dimensional space. The advantages of KNNG come at the expense of high construction time, ...
- research-articleJune 2022
MASTIFF: structure-aware minimum spanning tree/forest
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 9, Pages 1–13https://doi.org/10.1145/3524059.3532365The Minimum Spanning Forest (MSF) problem finds usage in many different applications. While theoretical analysis shows that linear-time solutions exist, in practice, parallel MSF algorithms remain computationally demanding due to the continuously ...
- research-articleJune 2021Best Student Paper
HyQuas: hybrid partitioner based quantum circuit simulation system on GPU
ICS '21: Proceedings of the 35th ACM International Conference on SupercomputingPages 443–454https://doi.org/10.1145/3447818.3460357Quantum computing has shown its strong potential in solving certain important problems. Due to the intrinsic limitations of current real quantum computers, quantum circuit simulation still plays an important role in both research and development of ...