Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- posterMarch 2022
Optimizing sparse computations jointly
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 459–460https://doi.org/10.1145/3503221.3508439This work proposes a framework called FuSy that analyzes the data dependence graphs (DAGs) of two sparse kernels and creates an efficient schedule to execute the kernels in combination. Sparse kernels are frequently used in scientific codes and in ...
- posterMarch 2022
Rethinking graph data placement for graph neural network training on multiple GPUs
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 455–456https://doi.org/10.1145/3503221.3508435The existing Graph Neural Network (GNN) systems adopt graph partitioning to divide the graph data for multi-GPU training. Although they support large graphs, we find that the existing techniques lead to large data loading overhead. In this work, we for ...
- research-articleMarch 2022
Parallel block-delayed sequences
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 61–75https://doi.org/10.1145/3503221.3508434Programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of parallelism because such functions are ...
- posterMarch 2022
Parallel algorithms for masked sparse matrix-matrix products
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 453–454https://doi.org/10.1145/3503221.3508430Computing the product of two sparse matrices (SpGEMM) is a fundamental operation in various combinatorial and graph algorithms as well as various bioinformatics and data analytics applications for computing inner-product similarities. For an important ...
- posterMarch 2022
An LLVM-based open-source compiler for NVIDIA GPUs
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 448–449https://doi.org/10.1145/3503221.3508428We present GASS, an LLVM-based open-source compiler for NVIDIA GPU's SASS machine assembly. GASS is the first open-source compiler targeting SASS, and it provides a unified toolchain for currently fragmented low-level performance research on NVIDIA GPUs. ...
- research-articleMarch 2022
Understanding and detecting deep memory persistency bugs in NVM programs with DeepMC
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 322–336https://doi.org/10.1145/3503221.3508427To facilitate programming with non-volatile memory (NVM), a set of memory persistency models, such as strict and epoch persistency, have been proposed. Although these models provide high-level guidance for reasoning about the data persistence, ...
Interference relation-guided SMT solving for multi-threaded program verification
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 163–176https://doi.org/10.1145/3503221.3508424Concurrent program verification is challenging due to a large number of thread interferences. A popular approach is to encode concurrent programs as SMT formulas and then rely on off-the-shelf SMT solvers to accomplish the verification. In most existing ...
- research-articleMarch 2022
CASE: a compiler-assisted SchEduling framework for multi-GPU systems
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 17–31https://doi.org/10.1145/3503221.3508423Modern computing platforms tend to deploy multiple GPUs on a single node to boost performance. GPUs have large computing capacities and are an expensive resource. Increasing their utilization without causing performance degradation of individual ...
- research-articleMarch 2022
Dopia: online parallelism management for integrated CPU/GPU architectures
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 32–45https://doi.org/10.1145/3503221.3508421Recent desktop and mobile processors often integrate CPU and GPU onto the same die. The limited memory bandwidth of these integrated architectures can negatively affect the performance of data-parallel workloads when all computational resources are ...
Asymmetry-aware scalable locking
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 294–308https://doi.org/10.1145/3503221.3508420The pursuit of power-efficiency is popularizing asymmetric multicore processors (AMP) such as ARM big.LITTLE, Apple M1 and recent Intel Alder Lake with big and little cores. However, we find that existing scalable locks fail to scale on AMP and cause ...
- posterMarch 2022
Hardening selective protection across multiple program inputs for HPC applications
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 437–438https://doi.org/10.1145/3503221.3508414With the ever-shrinking size of transistors and increasing scale of applications, silent data corruptions (SDCs) have become a common yet serious issue in HPC applications. Selective instruction duplication (SID) is a popular fault-tolerance technique ...
Stream processing with dependency-guided synchronization
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 1–16https://doi.org/10.1145/3503221.3508413Real-time data processing applications with low latency requirements have led to the increasing popularity of stream processing systems. While such systems offer convenient APIs that can be used to achieve data parallelism automatically, they offer ...
- research-articleMarch 2022
Vapro: performance variance detection and diagnosis for production-run parallel applications
- Liyan Zheng,
- Jidong Zhai,
- Xiongchao Tang,
- Haojie Wang,
- Teng Yu,
- Yuyang Jin,
- Shuaiwen Leon Song,
- Wenguang Chen
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 150–162https://doi.org/10.1145/3503221.3508411Performance variance is a serious problem for parallel applications, which can cause performance degradation and make applications' behavior hard to understand. Therefore, detecting and diagnosing performance variance are of crucial importance for users ...
PerFlow: a domain specific framework for automatic performance analysis of parallel applications
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 177–191https://doi.org/10.1145/3503221.3508405Performance analysis is widely used to identify performance issues of parallel applications. However, complex communications and data dependence, as well as the interactions between different kinds of performance issues make high-efficiency performance ...
- research-articleMarch 2022
Deadlock-free asynchronous message reordering in rust with multiparty session types
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 246–261https://doi.org/10.1145/3503221.3508404Rust is a modern systems language focused on performance and reliability. Complementing Rust's promise to provide "fearless concurrency", developers frequently exploit asynchronous message passing. Unfortunately, sending and receiving messages in an ...
- posterMarch 2022
Automatic synthesis of parallel unix commands and pipelines with KumQuat
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 431–432https://doi.org/10.1145/3503221.3508400We present KumQuat, a system for automatically generating data-parallel implementations of Unix shell commands and pipelines. The generated parallel versions split input streams, execute multiple instantiations of the original pipeline commands to ...