It is a great pleasure to welcome you to the 27th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2018), and with it to Tempe, Arizona, United States of America. This year's HPDC continues its nearly three-decade tradition as the premier annual conference for presenting the latest research on the design, implementation, evaluation, and the use of parallel and distributed systems for high-end computing.
Proceeding Downloads
Reproducibility in computational and data-enabled science
As computation becomes central to scientific research and discovery new questions arise regarding the implementation, dissemination, and evaluation of methods that underlie scientific claims. I present a framework for conceptualizing the affordances ...
PicoDriver: fast-path device drivers for multi-kernel operating systems
Lightweight kernel (LWK) operating systems (OS) in high-end supercomputing have a proven track record of excellent scalability. However, the lack of full Linux compatibility and limited availability of device drivers in LWKs have prohibited their wide-...
Hard real-time scheduling for parallel run-time systems
High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown ...
ABFR: convenient management of latent error resilience using application knowledge
Exascale systems face high error-rates due to increasing scale (109 cores), software complexity and rising memory error rates. Increasingly, errors escape immediate hardware-level detection, silently corrupting application states. Such latent errors can ...
Desh: deep learning for system health prediction of lead times to failure in HPC
Today's large-scale supercomputers encounter faults on a daily basis. Exascale systems are likely to experience even higher fault rates due to increased component count and density. Triggering resilience-mitigating techniques remains a challenge due to ...
Improving performance of iterative methods by lossy checkponting
Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks in parallel, ...
Efficient sparse-matrix multi-vector product on GPUs
- Changwan Hong,
- Aravind Sukumaran-Rajam,
- Bortik Bandyopadhyay,
- Jinsung Kim,
- Süreyya Emre Kurt,
- Israt Nisa,
- Shivani Sabhlok,
- Ümit V. Çatalyürek,
- Srinivasan Parthasarathy,
- P. Sadayappan
Sparse Matrix-Vector (SpMV) and Sparse Matrix-Multivector (SpMM) products are key kernels for computational science and data science. While GPUs offer significantly higher peak performance and memory bandwidth than multicore CPUs, achieving high ...
CommAnalyzer: automated estimation of communication cost and scalability on HPC clusters from sequential code
To deliver scalable performance to large-scale scientific and data analytic applications, HPC cluster architectures adopt the distributed-memory model. The performance and scalability of parallel applications on such systems are limited by the ...
A high-performance connected components implementation for GPUs
Computing connected components is an important graph algorithm that is used, for example, in medicine, image processing, and biochemistry. This paper presents a fast connected-components implementation for GPUs called ECL-CC. It builds upon the best ...
Cambrian explosion of computing and big data in the post-moore era
The so-called "Moore's Law", by which the performance of the processors will increase exponentially by factor of 4 every 3 years or so, is slated to be ending in 10--15 year timeframe due to the lithography of VLSIs reaching its limits around that time, ...
PShifter: feedback-based dynamic power shifting within HPC jobs for performance
The US Department of Energy (DOE) has set a power target of 20-30MW on the first exascale machines. To achieve one exaFLOPS under this power constraint, it is necessary to manage power intelligently while maximizing performance. Most production-level ...
ADAPT: an event-based adaptive collective communication framework
The increase in scale and heterogeneity of high-performance computing (HPC) systems predispose the performance of Message Passing Interface (MPI) collective communications to be susceptible to noise, and to adapt to a complex mix of hardware ...
Process-in-process: techniques for practical address-space sharing
The two most common parallel execution models for many-core CPUs today are multiprocess (e.g., MPI) and multithread (e.g., OpenMP). The multiprocess model allows each process to own a private address space, although processes can explicitly allocate ...
Thread-local concurrency: a technique to handle data race detection at programming model abstraction
With greater adoption of various high-level parallel programming models to harness on-node parallelism, accurate data race detection has become more crucial than ever. However, existing tools have great difficulty spotting data races through these high-...
LADR: low-cost application-level detector for reducing silent output corruptions
Applications running on future high performance computing (HPC) systems are more likely to experience transient faults due to technology scaling trends with respect to higher circuit density, smaller transistor size and near-threshold voltage (NTV) ...
Profiling distributed systems in lightweight virtualized environments with logs and resource metrics
Understanding and troubleshooting distributed systems in the cloud is considered a very difficult problem because the execution of a single user request is distributed to multiple machines. Further, the multi-tenancy nature of cloud environments further ...
Tuyere: enabling scalable memory workloads for system exploration
Memory technologies are under active development. Meanwhile, workloads on contemporary computing systems are increasing rapidly in size and diversity. Such dynamics in hardware and software further widen the gap between memory system design and ...
Performance analysis and optimization of in-situ integration of simulation with data analysis: zipping applications up
This paper targets an important class of applications that requires combining HPC simulations with data analysis for online or real-time scientific discovery. We use the state-of-the-art parallel-IO and data-staging libraries to build simulation-time ...
ForkTail: a black-box fork-join tail latency prediction model for user-facing datacenter workloads
The workflows of the predominant user-facing datacenter services, including web searching and social networking, are underlaid by various Fork-Join structures. Due to the lack of understanding the performance of Fork-Join structures in general, today's ...
The biology of software
Biological design principles can potentially change the way we study engineer, maintain, and develop large dynamic software systems. For example, computer programmers like to think of software as the product of intelligent design, carefully crafted to ...
Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system
Modern High-Performance Computing (HPC) systems are adding extra layers to the memory and storage hierarchy named deep memory and storage hierarchy (DMSH), to increase I/O performance. New hardware technologies, such as NVMe and SSD, have been ...
NVStream: accelerating HPC workflows with NVRAM-based transport for streaming objects
Nonvolatile memory technologies (NVRAM) with larger capacity relative to DRAM and faster persistence relative to block-based storage technologies are expected to play a crucial role in accelerating I/O performance for HPC scientific workflows. Typically,...
Parallelizing garbage collection with I/O to improve flash resource utilization
Garbage Collection (GC) has been a critical optimization target for improving the performance of flash-based Solid State Drives (SSDs); the long-lasting GC process occupies the flash resources, thereby blocking normal I/O requests and increasing ...
Transparent speculation in geo-replicated transactional data stores
This work presents Speculative Transaction Replication (STR), a protocol that exploits transparent speculation techniques to enhance performance of geo-distributed, partially replicated transactional data stores. In addition, we define a new consistency ...
Cross-geography scientific data transferring trends and behavior
Wide area data transfers play an important role in many science applications but rely on expensive infrastructure that often delivers disappointing performance in practice. In response, we present a systematic examination of a large set of data transfer ...
Index Terms
- Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing