Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3617232acmconferencesBook PagePublication PagesasplosConference Proceedingsconference-collections
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1
ACM2024 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
ASPLOS '24: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 La Jolla CA USA 27 April 2024- 1 May 2024
ISBN:
979-8-4007-0372-0
Published:
17 April 2024
In-Cooperation:
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN
Next Conference
Reflects downloads up to 13 Jan 2025Bibliometrics
Skip Abstract Section
Abstract

Welcome to the first volume of ASPLOS'24: the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. For the second year, ASPLOS employs a model of three submission deadlines - spring, summer and fall - along with a major revision mechanism, which, as an alternative to rejection, gives the authors of some submissions the opportunity to fix a list of problems and then resubmit their work to the subsequent review cycle.

We introduced several notable changes to ASPLOS this year. Briefly, these include significantly increasing the program committee size to over 220 members (more than twice the size of last year), foregoing synchronous PC meetings and instead making all decisions online, and overhauling the review assignment process. The overhaul includes comparing the textual contents of submissions to the contents of papers authored by the reviewers and using a metric that quantifies the goodness of the match to guide the assignment of reviewers to submissions. The overhaul additionally involves asking reviewers to predict the expertise of their future reviews for a subset of the submissions and using this input as well, among others, for the assignment process.

Key statistics of the ASPLOS'24 spring cycle include: 173 submissions were finalized (nearly double last year's spring count), with 47 (27%) related to machine learning, 41 to storage/memory, 39 to accelerators/FPGAs/GPUs, and 27 to security; 87 (51%) submissions were promoted to the second review round; 28 (16.2%) papers were accepted, with 16, 13, and 9 awarded artifact evaluation badges of "available," "functional," and "reproduced," respectively; 27 (15.6%) submissions were allowed to submit major revisions, of which 22 were subsequently accepted during the summer cycle; 762 reviews were uploaded; and 2,868 comments were generated during online discussions.

Another change we introduced this year is asking authors to specify their per-submission most-related broader areas of research, which revealed that 54%, 42%, and 25% of the submissions are associated with architecture, operating systems, and programming languages, respectively, with only 21% being interdisciplinary. The full details are available in the PDF of the front matter.

Skip Table Of Content Section
research-article
Open Access
Amanda: Unified Instrumentation Framework for Deep Neural Networks

The success of deep neural networks (DNNs) has sparked efforts to analyze (e.g., tracing) and optimize (e.g., pruning) them. These tasks have specific requirements and ad-hoc implementations in current execution backends like TensorFlow/PyTorch, which ...

research-article
Open Access
Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors

Embedded applications extract the best power-performance trade-off from digital signal processors (DSPs) by making extensive use of vectorized execution. Rather than handwriting the many customized kernels these applications use, DSP engineers rely on ...

BypassD: Enabling fast userspace access to shared SSDs

Modern storage devices, such as Optane NVMe SSDs, offer ultra-low latency of a few microseconds and high bandwidth of multiple gigabytes per second. At these speeds, the kernel software I/O stack is a substantial source of overhead. Userspace approaches ...

research-article
Open Access
CC-NIC: a Cache-Coherent Interface to the NIC

Emerging interconnects make peripherals, such as the network interface controller (NIC), accessible through the processor's cache hierarchy, allowing these devices to participate in the CPU cache coherence protocol. This is a fundamental change from the ...

research-article
Open Access
Cocco: Hardware-Mapping Co-Exploration towards Memory Capacity-Communication Optimization

Memory is a critical design consideration in current data-intensive DNN accelerators, as it profoundly determines energy consumption, bandwidth requirements, and area costs. As DNN structures become more complex, a larger on-chip memory capacity is ...

CodeCrunch: Improving Serverless Performance via Function Compression and Cost-Aware Warmup Location Optimization

Serverless computing has a critical problem of function cold starts. To minimize cold starts, state-of-the-art techniques predict function invocation times to warm them up. Warmed-up functions occupy space in memory and incur a keep-alive cost, which can ...

CrossPrefetch: Accelerating I/O Prefetching for Modern Storage

We introduce CrossPrefetch, a novel cross-layered I/O prefetching mechanism that operates across the OS and a user-level runtime to achieve optimal performance. Existing OS prefetching mechanisms suffer from rigid interfaces that do not provide ...

research-article
Open Access
EagleEye: Nanosatellite constellation design for high-coverage, high-resolution sensing

Advances in nanosatellite technology and low launch costs have led to more Earth-observation satellites in low-Earth orbit. Prior work shows that satellite images are useful for geospatial analysis applications (e.g., ship detection, lake monitoring, and ...

research-article
Everywhere All at Once: Co-Location Attacks on Public Cloud FaaS

Microarchitectural side-channel attacks exploit shared hardware resources, posing significant threats to modern systems. A pivotal step in these attacks is achieving physical host co-location between attacker and victim. This step is especially ...

research-article
Open Access
Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable deployment experience

COVID-19 pandemic created unexpected demand for our physical infrastructure. We increased our computing supply by growing our infrastructure footprint as well as expanded existing capacity by using various techniques among those DVFS boosting. This paper ...

research-article
Open Access
Exploiting Human Color Discrimination for Memory- and Energy-Efficient Image Encoding in Virtual Reality

Virtual Reality (VR) has the potential of becoming the next ubiquitous computing platform. Continued progress in the burgeoning field of VR depends critically on an efficient computing substrate. In particular, DRAM access energy is known to contribute ...

research-article
Open Access
Formal Mechanised Semantics of CHERI C: Capabilities, Undefined Behaviour, and Provenance

Memory safety issues are a persistent source of security vulnerabilities, with conventional architectures and the C codebase chronically prone to exploitable errors. The CHERI research project has shown how one can provide radically improved security for ...

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on ...

research-article
HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis

Dataflow architectures are growing in popularity due to their potential to mitigate the challenges posed by the memory wall inherent to the Von Neumann architecture. At the same time, high-level synthesis (HLS) has demonstrated its efficacy as a design ...

Lightweight, Modular Verification for WebAssembly-to-Native Instruction Selection

Language-level guarantees---like module runtime isolation for WebAssembly (Wasm)---are only as strong as the compiler that produces a final, native-machine-specific executable. The process of lowering language-level constructions to ISA-specific ...

Loupe: Driving the Development of OS Compatibility Layers

Supporting mainstream applications is fundamental for a new OS to have impact. It is generally achieved by developing a layer of compatibility allowing applications developed for a mainstream OS like Linux to run unmodified on the new OS. Building such a ...

ngAP: Non-blocking Large-scale Automata Processing on GPUs

Finite automata serve as compute kernels for various applications that require high throughput. However, despite the increasing compute power of GPUs, their potential in processing automata remains underutilized. In this work, we identify three major ...

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. ...

research-article
Performance-aware Scale Analysis with Reserve for Homomorphic Encryption

Thanks to the computation ability on encrypted data and the efficient fixed-point execution, the RNS-CKKS fully homo-morphic encryption (FHE) scheme is a promising solution for privacy-preserving machine learning services. However, writing an efficient ...

Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling

Existing machine learning inference-serving systems largely rely on hardware scaling by adding more devices or using more powerful accelerators to handle increasing query demands. However, hardware scaling might not be feasible for fixed-size edge ...

RainbowCake: Mitigating Cold-starts in Serverless with Layer-wise Container Caching and Sharing

Serverless computing has grown rapidly as a new cloud computing paradigm that promises ease-of-management, cost-efficiency, and auto-scaling by shipping functions via self-contained virtualized containers. Unfortunately, serverless computing suffers from ...

Scaling Up Memory Disaggregated Applications with SMART

Recent developments in RDMA networks are leading to the trend of memory disaggregation. However, the performance of each compute node is still limited by the network, especially when it needs to perform a large number of concurrent fine-grained remote ...

SoCFlow: Efficient and Scalable DNN Training on SoC-Clustered Edge Servers

SoC-Cluster, a novel server architecture composed of massive mobile system-on-chips (SoCs), is gaining popularity in industrial edge computing due to its energy efficiency and compatibility with existing mobile applications. However, we observe that the ...

research-article
Open Access
SoD2: Statically Optimizing Dynamic Deep Neural Network Execution

Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or ...

TrackFM: Far-out Compiler Support for a Far Memory World

Large memory workloads with favorable locality of reference can benefit by extending the memory hierarchy across machines. Systems that enable such far memory configurations can improve application performance and overall memory utilization in a cluster. ...

research-article
Open Access
Training Job Placement in Clusters with Statistical In-Network Aggregation

In-Network Aggregation (INA) offloads the gradient aggregation in distributed training (DT) onto programmable switches, where the switch memory could be allocated to jobs in either synchronous or statistical multiplexing mode. Statistical INA has ...

UBFuzz: Finding Bugs in Sanitizer Implementations

In this paper, we propose a testing framework for validating sanitizer implementations in compilers. Our core components are (1) a program generator specifically designed for producing programs containing undefined behavior (UB), and (2) a novel test ...

ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference

Zero knowledge Neural Networks draw increasing attention for guaranteeing computation integrity and privacy of neural networks (NNs) based on zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK) security scheme. However, the ...

Contributors
  • Technion - Israel Institute of Technology
  • Microsoft Research
  • University of California, Riverside
  • University of California, Riverside

Recommendations

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%
YearSubmittedAcceptedRate
ASPLOS '193517421%
ASPLOS '183195618%
ASPLOS '173205317%
ASPLOS '162325323%
ASPLOS '152874817%
ASPLOS '142174923%
ASPLOS XV1813218%
ASPLOS XIII1273124%
ASPLOS XII1583824%
ASPLOS X1752414%
ASPLOS IX1142421%
ASPLOS VIII1232823%
ASPLOS VII1092523%
Overall2,71353520%