Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleFebruary 2024
Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck Analysis
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 87–107https://doi.org/10.1145/3623278.3624772Effective design space exploration (DSE) is paramount for hardware/software codesigns of deep learning accelerators that must meet strict execution constraints. For their vast search space, existing DSE techniques can require excessive trials to obtain a ...
Flame: A Centralized Cache Controller for Serverless Computing
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 153–168https://doi.org/10.1145/3623278.3624769Caching function is a promising way to mitigate coldstart overhead in serverless computing. However, as caching also increases the resource cost significantly, how to make caching decisions is still challenging. We find that the prior "local cache ...
- research-articleFebruary 2024
HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 189–201https://doi.org/10.1145/3623278.3624767The emergence of machine learning, image and audio processing on edge devices has motivated research towards power-efficient custom hardware accelerators. Though FPGAs are an ideal target for custom accelerators, the difficulty of hardware design and the ...
λFS: A Scalable and Elastic Distributed File System Metadata Service using Serverless Functions
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 394–411https://doi.org/10.1145/3623278.3624765The metadata service (MDS) sits on the critical path for distributed file system (DFS) operations, and therefore it is key to the overall performance of a large-scale DFS. Common "serverful" MDS architectures, such as a single server or cluster of ...
- research-articleFebruary 2024
VarSaw: Application-tailored Measurement Error Mitigation for Variational Quantum Algorithms
- Siddharth Dangwal,
- Gokul Subramanian Ravi,
- Poulami Das,
- Kaitlin N. Smith,
- Jonathan Mark Baker,
- Frederic T. Chong
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 362–377https://doi.org/10.1145/3623278.3624764For potential quantum advantage, Variational Quantum Algorithms (VQAs) need high accuracy beyond the capability of today's NISQ devices, and thus will benefit from error mitigation. In this work we are interested in mitigating measurement errors which ...
- research-articleFebruary 2024
CPS: A Cooperative Para-virtualized Scheduling Framework for Manycore Machines
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 43–56https://doi.org/10.1145/3623278.3624762Today's cloud platforms offer large virtual machine (VM) instances with multiple virtual CPUs (vCPU) on manycore machines. These machines typically have a deep memory hierarchy to enhance communication between cores. Although previous researches have ...
- research-articleFebruary 2024
RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns
- Zaifeng Pan,
- Zhen Zheng,
- Feng Zhang,
- Ruofan Wu,
- Hao Liang,
- Dalin Wang,
- Xiafei Qiu,
- Junjie Bai,
- Wei Lin,
- Xiaoyong Du
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 268–286https://doi.org/10.1145/3623278.3624761Embedding columns are important for deep recommendation models to achieve high accuracy, but they can be very time-consuming during inference. Machine learning (ML) compilers are used broadly in real businesses to optimize ML models automatically. ...
- research-articleFebruary 2024
Sleuth: A Trace-Based Root Cause Analysis System for Large-Scale Microservices with Graph Neural Networks
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 324–337https://doi.org/10.1145/3623278.3624758Cloud microservices are being scaled up due to the rising demand for new features and the convenience of cloud-native technologies. However, the growing scale of microservices complicates the remote procedure call (RPC) dependency graph, exacerbates the ...
- research-articleFebruary 2024
LightRidge: An End-to-end Agile Design Framework for Diffractive Optical Neural Networks
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 202–218https://doi.org/10.1145/3623278.3624757To lower the barrier to diffractive optical neural networks (DONNs) design, exploration, and deployment, we propose LightRidge, the first end-to-end optical ML compilation framework, which consists of (1) precise and differentiable optical physics ...
- research-articleFebruary 2024
Predict; Don't React for Enabling Efficient Fine-Grain DVFS in GPUs
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 253–267https://doi.org/10.1145/3623278.3624756With the continuous improvement of on-chip integrated voltage regulators (IVRs) and fast, adaptive frequency control, dynamic voltage-frequency scaling (DVFS) transition times have shrunk from the microsecond to the nanosecond regime, providing immense ...
DataFlower: Exploiting the Data-flow Paradigm for Serverless Workflow Orchestration
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 57–72https://doi.org/10.1145/3623278.3624755Serverless computing that runs functions with auto-scaling is a popular task execution pattern in the cloud-native era. By connecting serverless functions into workflows, tenants can achieve complex functionality. Prior research adopts the control-flow ...
- research-articleFebruary 2024
Supporting Descendants in SIMD-Accelerated JSONPath
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 338–361https://doi.org/10.1145/3623278.3624754Harnessing the power of SIMD can bring tremendous performance gains in data processing. In querying streamed JSON data, the state of the art leverages SIMD to fast forward significant portions of the document. However, it does not provide support for ...
- research-articleFebruary 2024
DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 73–86https://doi.org/10.1145/3623278.3624753Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control involve dynamic behaviors in various granularity; task, model, and layers within a model. Such dynamic behaviors introduce new challenges to the system software in an ML ...
- research-articleFebruary 2024
Exploiting the Regular Structure of Modern Quantum Architectures for Compiling and Optimizing Programs with Permutable Operators
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 108–124https://doi.org/10.1145/3623278.3624751A critical feature in today's quantum circuit is that they have permutable two-qubit operators. The flexibility in ordering the permutable two-qubit gates leads to more compiler optimization opportunities. However, it also imposes significant challenges ...
- research-articleFebruary 2024
Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism
ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4Pages 219–237https://doi.org/10.1145/3623278.3624750The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. ...