Welcome to the first volume of ASPLOS'24: the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. For the second year, ASPLOS employs a model of three submission deadlines - spring, summer and fall - along with a major revision mechanism, which, as an alternative to rejection, gives the authors of some submissions the opportunity to fix a list of problems and then resubmit their work to the subsequent review cycle.
We introduced several notable changes to ASPLOS this year. Briefly, these include significantly increasing the program committee size to over 220 members (more than twice the size of last year), foregoing synchronous PC meetings and instead making all decisions online, and overhauling the review assignment process. The overhaul includes comparing the textual contents of submissions to the contents of papers authored by the reviewers and using a metric that quantifies the goodness of the match to guide the assignment of reviewers to submissions. The overhaul additionally involves asking reviewers to predict the expertise of their future reviews for a subset of the submissions and using this input as well, among others, for the assignment process.
Key statistics of the ASPLOS'24 spring cycle include: 173 submissions were finalized (nearly double last year's spring count), with 47 (27%) related to machine learning, 41 to storage/memory, 39 to accelerators/FPGAs/GPUs, and 27 to security; 87 (51%) submissions were promoted to the second review round; 28 (16.2%) papers were accepted, with 16, 13, and 9 awarded artifact evaluation badges of "available," "functional," and "reproduced," respectively; 27 (15.6%) submissions were allowed to submit major revisions, of which 22 were subsequently accepted during the summer cycle; 762 reviews were uploaded; and 2,868 comments were generated during online discussions.
Another change we introduced this year is asking authors to specify their per-submission most-related broader areas of research, which revealed that 54%, 42%, and 25% of the submissions are associated with architecture, operating systems, and programming languages, respectively, with only 21% being interdisciplinary. The full details are available in the PDF of the front matter.
Proceeding Downloads
Amanda: Unified Instrumentation Framework for Deep Neural Networks
- Yue Guan,
- Yuxian Qiu,
- Jingwen Leng,
- Fan Yang,
- Shuo Yu,
- Yunxin Liu,
- Yu Feng,
- Yuhao Zhu,
- Lidong Zhou,
- Yun Liang,
- Chen Zhang,
- Chao Li,
- Minyi Guo
The success of deep neural networks (DNNs) has sparked efforts to analyze (e.g., tracing) and optimize (e.g., pruning) them. These tasks have specific requirements and ad-hoc implementations in current execution backends like TensorFlow/PyTorch, which ...
Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors
Embedded applications extract the best power-performance trade-off from digital signal processors (DSPs) by making extensive use of vectorized execution. Rather than handwriting the many customized kernels these applications use, DSP engineers rely on ...
BypassD: Enabling fast userspace access to shared SSDs
Modern storage devices, such as Optane NVMe SSDs, offer ultra-low latency of a few microseconds and high bandwidth of multiple gigabytes per second. At these speeds, the kernel software I/O stack is a substantial source of overhead. Userspace approaches ...
CC-NIC: a Cache-Coherent Interface to the NIC
- Henry N. Schuh,
- Arvind Krishnamurthy,
- David Culler,
- Henry M. Levy,
- Luigi Rizzo,
- Samira Khan,
- Brent E. Stephens
Emerging interconnects make peripherals, such as the network interface controller (NIC), accessible through the processor's cache hierarchy, allowing these devices to participate in the CPU cache coherence protocol. This is a fundamental change from the ...
Cocco: Hardware-Mapping Co-Exploration towards Memory Capacity-Communication Optimization
Memory is a critical design consideration in current data-intensive DNN accelerators, as it profoundly determines energy consumption, bandwidth requirements, and area costs. As DNN structures become more complex, a larger on-chip memory capacity is ...
CodeCrunch: Improving Serverless Performance via Function Compression and Cost-Aware Warmup Location Optimization
Serverless computing has a critical problem of function cold starts. To minimize cold starts, state-of-the-art techniques predict function invocation times to warm them up. Warmed-up functions occupy space in memory and incur a keep-alive cost, which can ...
CrossPrefetch: Accelerating I/O Prefetching for Modern Storage
We introduce CrossPrefetch, a novel cross-layered I/O prefetching mechanism that operates across the OS and a user-level runtime to achieve optimal performance. Existing OS prefetching mechanisms suffer from rigid interfaces that do not provide ...
EagleEye: Nanosatellite constellation design for high-coverage, high-resolution sensing
Advances in nanosatellite technology and low launch costs have led to more Earth-observation satellites in low-Earth orbit. Prior work shows that satellite images are useful for geospatial analysis applications (e.g., ship detection, lake monitoring, and ...
Everywhere All at Once: Co-Location Attacks on Public Cloud FaaS
Microarchitectural side-channel attacks exploit shared hardware resources, posing significant threats to modern systems. A pivotal step in these attacks is achieving physical host co-location between attacker and victim. This step is especially ...
Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable deployment experience
- Leonardo Piga,
- Iyswarya Narayanan,
- Aditya Sundarrajan,
- Matt Skach,
- Qingyuan Deng,
- Biswadip Maity,
- Manoj Chakkaravarthy,
- Alison Huang,
- Abhishek Dhanotia,
- Parth Malani
COVID-19 pandemic created unexpected demand for our physical infrastructure. We increased our computing supply by growing our infrastructure footprint as well as expanded existing capacity by using various techniques among those DVFS boosting. This paper ...
Exploiting Human Color Discrimination for Memory- and Energy-Efficient Image Encoding in Virtual Reality
Virtual Reality (VR) has the potential of becoming the next ubiquitous computing platform. Continued progress in the burgeoning field of VR depends critically on an efficient computing substrate. In particular, DRAM access energy is known to contribute ...
Formal Mechanised Semantics of CHERI C: Capabilities, Undefined Behaviour, and Provenance
- Vadim Zaliva,
- Kayvan Memarian,
- Ricardo Almeida,
- Jessica Clarke,
- Brooks Davis,
- Alexander Richardson,
- David Chisnall,
- Brian Campbell,
- Ian Stark,
- Robert N. M. Watson,
- Peter Sewell
Memory safety issues are a persistent source of security vulnerabilities, with conventional architectures and the C codebase chronically prone to exploitable errors. The CHERI research project has shown how one can provide radically improved security for ...
GPU-based Private Information Retrieval for On-Device Machine Learning Inference
- Maximilian Lam,
- Jeff Johnson,
- Wenjie Xiong,
- Kiwan Maeng,
- Udit Gupta,
- Yang Li,
- Liangzhen Lai,
- Ilias Leontiadis,
- Minsoo Rhu,
- Hsien-Hsin S. Lee,
- Vijay Janapa Reddi,
- Gu-Yeon Wei,
- David Brooks,
- Edward Suh
On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on ...
HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis
Dataflow architectures are growing in popularity due to their potential to mitigate the challenges posed by the memory wall inherent to the Von Neumann architecture. At the same time, high-level synthesis (HLS) has demonstrated its efficacy as a design ...
Lightweight, Modular Verification for WebAssembly-to-Native Instruction Selection
Language-level guarantees---like module runtime isolation for WebAssembly (Wasm)---are only as strong as the compiler that produces a final, native-machine-specific executable. The process of lowering language-level constructions to ISA-specific ...
Loupe: Driving the Development of OS Compatibility Layers
- Hugo Lefeuvre,
- Gaulthier Gain,
- Vlad-Andrei Bădoiu,
- Daniel Dinca,
- Vlad-Radu Schiller,
- Costin Raiciu,
- Felipe Huici,
- Pierre Olivier
Supporting mainstream applications is fundamental for a new OS to have impact. It is generally achieved by developing a layer of compatibility allowing applications developed for a mainstream OS like Linux to run unmodified on the new OS. Building such a ...
ngAP: Non-blocking Large-scale Automata Processing on GPUs
Finite automata serve as compute kernels for various applications that require high throughput. However, despite the increasing compute power of GPUs, their potential in processing automata remains underutilized. In this work, we identify three major ...
Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions
Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. ...
Performance-aware Scale Analysis with Reserve for Homomorphic Encryption
Thanks to the computation ability on encrypted data and the efficient fixed-point execution, the RNS-CKKS fully homo-morphic encryption (FHE) scheme is a promising solution for privacy-preserving machine learning services. However, writing an efficient ...
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
Existing machine learning inference-serving systems largely rely on hardware scaling by adding more devices or using more powerful accelerators to handle increasing query demands. However, hardware scaling might not be feasible for fixed-size edge ...
RainbowCake: Mitigating Cold-starts in Serverless with Layer-wise Container Caching and Sharing
- Hanfei Yu,
- Rohan Basu Roy,
- Christian Fontenot,
- Devesh Tiwari,
- Jian Li,
- Hong Zhang,
- Hao Wang,
- Seung-Jong Park
Serverless computing has grown rapidly as a new cloud computing paradigm that promises ease-of-management, cost-efficiency, and auto-scaling by shipping functions via self-contained virtualized containers. Unfortunately, serverless computing suffers from ...
Scaling Up Memory Disaggregated Applications with SMART
Recent developments in RDMA networks are leading to the trend of memory disaggregation. However, the performance of each compute node is still limited by the network, especially when it needs to perform a large number of concurrent fine-grained remote ...
SoCFlow: Efficient and Scalable DNN Training on SoC-Clustered Edge Servers
SoC-Cluster, a novel server architecture composed of massive mobile system-on-chips (SoCs), is gaining popularity in industrial edge computing due to its energy efficiency and compatibility with existing mobile applications. However, we observe that the ...
SoD2: Statically Optimizing Dynamic Deep Neural Network Execution
Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or ...
TrackFM: Far-out Compiler Support for a Far Memory World
Large memory workloads with favorable locality of reference can benefit by extending the memory hierarchy across machines. Systems that enable such far memory configurations can improve application performance and overall memory utilization in a cluster. ...
Training Job Placement in Clusters with Statistical In-Network Aggregation
In-Network Aggregation (INA) offloads the gradient aggregation in distributed training (DT) onto programmable switches, where the switch memory could be allocated to jobs in either synchronous or statistical multiplexing mode. Statistical INA has ...
UBFuzz: Finding Bugs in Sanitizer Implementations
In this paper, we propose a testing framework for validating sanitizer implementations in compilers. Our core components are (1) a program generator specifically designed for producing programs containing undefined behavior (UB), and (2) a novel test ...
ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference
Zero knowledge Neural Networks draw increasing attention for guaranteeing computation integrity and privacy of neural networks (NNs) based on zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK) security scheme. However, the ...
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
ASPLOS '19 | 351 | 74 | 21% |
ASPLOS '18 | 319 | 56 | 18% |
ASPLOS '17 | 320 | 53 | 17% |
ASPLOS '16 | 232 | 53 | 23% |
ASPLOS '15 | 287 | 48 | 17% |
ASPLOS '14 | 217 | 49 | 23% |
ASPLOS XV | 181 | 32 | 18% |
ASPLOS XIII | 127 | 31 | 24% |
ASPLOS XII | 158 | 38 | 24% |
ASPLOS X | 175 | 24 | 14% |
ASPLOS IX | 114 | 24 | 21% |
ASPLOS VIII | 123 | 28 | 23% |
ASPLOS VII | 109 | 25 | 23% |
Overall | 2,713 | 535 | 20% |