The 9th ROSS workshop, a full-day meeting at the HPDC 2019 conference in Phoenix, Arizona, USA, focuses on issues related to runtime and operating systems in large-scale supercomputers.
This year, the workshop accepted 4 high-quality submissions by 17 authors.
This year's keynote will be presented by Dr. Jidong Zhai from Tsinghua University, China. In his talk "HPC System Software Enhanced by Source Code Analysis," he will discuss the use of source code analysis to address the challenge of ever-increasing problem size and system scale in largescale performance analysis. He will outline the multiple performance tools designed in his group and share the experience of building them through combining static analysis and runtime analysis.
Proceeding Downloads
HPC System Software Enhanced by Source Code Analysis
Building efficient and scalable system software, especially performance analysis and monitoring, for large-scale systems, is increasingly important both for the developers of parallel applications and the designers of next-generation HPC systems. ...
Towards a Practical Ecosystem of Specialized OS Kernels
Specialized operating systems have enjoyed a recent revival driven both by a pressing need to rethink the system software stack in several domains and by the convenience and flexibility that on-demand infrastructure and virtual execution environments ...
The Effect of System Utilization on Application Performance Variability
Application performance variability caused by network contention is a major issue on dragonfly based systems. This work-in-progress study makes two contributions. First, we analyze real workload logs and conduct application experiments on the production ...
Asynchronous Abstract Machines: Anti-noise System Software for Many-core Processors
Today's systems offer an increasing number of processor cores, however, the chance to operate them efficiently by dedicating cores to specific tasks is often missed. Instead, mixed workloads are processed by each core which leads to system noise (i.e., ...
MCEM: Multi-Level Cooperative Exception Model for HPC Workflows
As fault recovery mechanisms become increasingly important in HPC systems, the need for a new recovery model for workflows on these systems grows as well. While the traditional approach in which each system component attempts its own independent ...
Index Terms
- Proceedings of the 9th International Workshop on Runtime and Operating Systems for Supercomputers