Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleMarch 2020
SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 1341–1355https://doi.org/10.1145/3373376.3378530It is known that deeper and wider neural networks can achieve better accuracy. But it is difficult to continue the trend to increase model size due to limited GPU memory. One promising solution is to support swapping between GPU and CPU memory. However, ...
- research-articleMarch 2020
Batch-Aware Unified Memory Management in GPUs for Irregular Workloads
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 1357–1370https://doi.org/10.1145/3373376.3378529While unified virtual memory and demand paging in modern GPUs provide convenient abstractions to programmers for working with large-scale applications, they come at a significant performance cost. We provide the first comprehensive analysis of major ...
- research-articleMarch 2020
Lynx: A SmartNIC-driven Accelerator-centric Architecture for Network Servers
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 117–131https://doi.org/10.1145/3373376.3378528This paper explores new opportunities afforded by the growing deployment of compute and I/O accelerators to improve the performance and efficiency of hardware-accelerated computing services in data centers.
We propose Lynx, an accelerator-centric ...
- research-articleMarch 2020
Vortex: Extreme-Performance Memory Abstractions for Data-Intensive Streaming Applications
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 623–638https://doi.org/10.1145/3373376.3378527Many applications in data analytics, information retrieval, and cluster computing process huge amounts of information. The complexity of involved algorithms and massive scale of data require a programming model that can not only offer a simple ...
- research-articleMarch 2020
Learning-based Memory Allocation for C++ Server Workloads
- Martin Maas,
- David G. Andersen,
- Michael Isard,
- Mohammad Mahdi Javanmard,
- Kathryn S. McKinley,
- Colin Raffel
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 541–556https://doi.org/10.1145/3373376.3378525Modern C++ servers have memory footprints that vary widely over time, causing persistent heap fragmentation of up to 2x from long-lived objects allocated during peak memory usage. This fragmentation is exacerbated by the use of huge (2MB) pages, a ...
-
Reproducible Containers
- Omar S. Navarro Leija,
- Kelly Shiptoski,
- Ryan G. Scott,
- Baojun Wang,
- Nicholas Renner,
- Ryan R. Newton,
- Joseph Devietti
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 167–182https://doi.org/10.1145/3373376.3378519We describe the design and implementation of DetTrace, a reproducible container abstraction for Linux implemented in user space. All computation that occurs inside a DetTrace container is a pure function of the initial filesystem state of the container. ...
- research-articleMarch 2020
Catalyzer: Sub-millisecond Startup for Serverless Computing with Initialization-less Booting
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 467–481https://doi.org/10.1145/3373376.3378512Serverless computing promises cost-efficiency and elasticity for high-productive software development. To achieve this, the serverless sandbox system must address two challenges: strong isolation between function instances, and low startup latency to ...
- research-articleMarch 2020Best Paper
IOctopus: Outsmarting Nonuniform DMA
- Igor Smolyar,
- Alex Markuze,
- Boris Pismenny,
- Haggai Eran,
- Gerd Zellweger,
- Austin Bolen,
- Liran Liss,
- Adam Morrison,
- Dan Tsafrir
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 101–115https://doi.org/10.1145/3373376.3378509In a multi-CPU server, memory modules are local to the CPU to which they are connected, forming a nonuniform memory access (NUMA) architecture. Because non-local accesses are slower than local accesses, the NUMA architecture might degrade application ...
- research-articleMarch 2020
Capuchin: Tensor-based GPU Memory Management for Deep Learning
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 891–905https://doi.org/10.1145/3373376.3378505In recent years, deep learning has gained unprecedented success in various domains, the key of the success is the larger and deeper deep neural networks (DNNs) that achieved very high accuracy. On the other side, since GPU global memory is a scarce ...
- research-articleMarch 2020
Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 401–416https://doi.org/10.1145/3373376.3378499Distributed deep learning training usually adopts All-Reduce as the synchronization mechanism for data parallel algorithms due to its high performance in homogeneous environment. However, its performance is bounded by the slowest worker among all ...
- research-articleMarch 2020Best Paper
Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 1093–1108https://doi.org/10.1145/3373376.3378493The unprecedented growth in the memory needs of emerging memory-intensive workloads has made virtual memory translation a major performance bottleneck. To address this problem, this paper introduces Elastic Cuckoo Page Tables, a novel page table design ...
- research-articleMarch 2020
HaRMony: Heterogeneous-Reliability Memory and QoS-Aware Energy Management on Virtualized Servers
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 575–590https://doi.org/10.1145/3373376.3378489The explosive growth of data increases the storage needs, especially within servers, making DRAM responsible for more than 40% of the total system power. Such a reality has made researchers focus on energy saving schemes that relax the pessimistic DRAM ...
- research-articleMarch 2020
Durable Transactional Memory Can Scale with Timestone
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 335–349https://doi.org/10.1145/3373376.3378483Non-volatile main memory (NVMM) technologies promise byte addressability and near-DRAM access that allows developers to build persistent applications with common load and store instructions. However, it is difficult to realize these promises because ...
A Hypervisor for Shared-Memory FPGA Platforms
- Jiacheng Ma,
- Gefei Zuo,
- Kevin Loughlin,
- Xiaohe Cheng,
- Yanqiang Liu,
- Abel Mulugeta Eneyew,
- Zhengwei Qi,
- Baris Kasikci
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 827–844https://doi.org/10.1145/3373376.3378482Cloud providers widely deploy FPGAs as application-specific accelerators for customer use. These providers seek to multiplex their FPGAs among customers via virtualization, thereby reducing running costs. Unfortunately, most virtualization support is ...
Occlum: Secure and Efficient Multitasking Inside a Single Enclave of Intel SGX
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 955–970https://doi.org/10.1145/3373376.3378469Intel Software Guard Extensions (SGX) enables user-level code to create private memory regions called enclaves, whose code and data are protected by the CPU from software and hardware attacks outside the enclaves. Recent work introduces library ...
Mitosis: Transparently Self-Replicating Page-Tables for Large-Memory Machines
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 283–300https://doi.org/10.1145/3373376.3378468Multi-socket machines with 1-100 TBs of physical memory are becoming prevalent. Applications running on such multi-socket machines suffer non-uniform bandwidth and latency when accessing physical memory. Decades of research have focused on data ...
Optimizing Nested Virtualization Performance Using Direct Virtual Hardware
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 557–574https://doi.org/10.1145/3373376.3378467Nested virtualization, running virtual machines and hypervisors on top of other virtual machines and hypervisors, is increasingly important because of the need to deploy virtual machines running software stacks on top of virtualized cloud ...
AvA: Accelerated Virtualization of Accelerators
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 807–825https://doi.org/10.1145/3373376.3378466Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore's Law. These trends are in conflict: cloud applications run on virtual platforms, but existing virtualization techniques ...
Game of Threads: Enabling Asynchronous Poisoning Attacks
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 35–52https://doi.org/10.1145/3373376.3378462As data sizes continue to grow at an unprecedented rate, machine learning training is being forced to adopt asynchronous algorithms to maintain performance and scalability. In asynchronous training, many threads share and update the model in a racy ...
Perspective: A Sensible Approach to Speculative Automatic Parallelization
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020, Pages 351–367https://doi.org/10.1145/3373376.3378458The promise of automatic parallelization, freeing programmers from the error-prone and time-consuming process of making efficient use of parallel processing resources, remains unrealized. For decades, the imprecision of memory analysis limited the ...