Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-Based Accelerators
The need to efficiently execute different Deep Neural Networks (DNNs) on the same computing platform, coupled with the requirement for easy scalability, makes Multi-Chip Module (MCM)-based accelerators a preferred design choice. Such an accelerator brings ...
TensorMap: A Deep RL-Based Tensor Mapping Framework for Spatial Accelerators
The mapping of tensor computation is a complex and important process for spatial accelerators. Today's mapping works depend on hand-tuned kernel libraries or search-based heuristics from human experts. The former is time-intensive while the latter ...
Uniformity and Independence of H3 Hash Functions for Bloom Filters
In this paper, we investigate the effects of violating the conditions of hash function uniformity and/or independence on the false positive probability of Bloom Filters (BF). To this end, we focus on hash functions of the H3 family with a partitioned ...
CoDA: A Co-Design Framework for Versatile and Efficient Attention Accelerators
As a primary component of Transformers, attention mechanism suffers from quadratic computational complexity. To achieve efficient implementations, its hardware accelerator designs have aroused great research interest. However, most existing accelerators ...
Relieving Write Disturbance for Phase Change Memory With RESET-Aware Data Encoding
The write disturbance (WD) problem is becoming increasingly severe in PCM due to the continuous scaling down of memory technology. Previous studies have attempted to transform WD-vulnerable data patterns of the new data to alleviate the WD problem. ...
Enhancing Neural Network Reliability: Insights From Hardware/Software Collaboration With Neuron Vulnerability Quantization
Ensuring the reliability of deep neural networks (DNNs) is paramount in safety-critical applications. Although introducing supplementary fault-tolerant mechanisms can augment the reliability of DNNs, an efficiency tradeoff may be introduced. This study ...
Ada-WL: An Adaptive Wear-Leveling Aware Data Migration Approach for Flexible SSD Array Scaling in Clusters
Recently, the flash-based Solid State Drive (SSD) array has been widely implemented in real-world large-scale clusters. With the increasing number of users in upper-tier applications and the burst of Input/Output requests in this data explosive era, data ...
Effective Huge Page Strategies for TLB Miss Reduction in Nested Virtualization
Huge page strategies, such as Linux Transparent Huge Page (THP), have become a prevalent solution to mitigate the performance bottleneck caused by increasingly high memory address translation overhead. However, in cloud environments, virtualization ...
ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic Array
The systolic accelerator is one of the premier architectural choices for DNN acceleration. However, the conventional systolic architecture suffers from low PE utilization due to the mismatch between the fixed array and diverse DNN workloads. Recent ...
Revocable and Efficient Blockchain-Based Fine-Grained Access Control Against EDoS Attacks in Cloud Storage
Users have become accustomed to storing data on the cloud using ciphertext policy attribute-based encryption (CP-ABE) for fine-grained access control. However, this encryption method does not consider the ability of malicious users to launch thousands of ...
AQA: An Adaptive Post-Training Quantization Method for Activations of CNNs
The post-training quantization (PTQ) is a common technology to improve the efficiency of embedded neural network accelerators. Existing PTQ schemes for CNN activations usually rely on calibration dataset with good data representation to reduce ...
Toward Finding S-Box Circuits With Optimal Multiplicative Complexity
In this paper, we present a new method to find S-box circuits with optimal multiplicative complexity (MC), i.e., MC-optimal S-box circuits. We provide new observations for efficiently constructing circuits and computing MC, combined with a popular ...
FutureDID: A Fully Decentralized Identity System With Multi-Party Verification
Decentralized identity (DID) systems conforming to the World Wide Web Consortium (W3C) Decentralized Identifiers (DIDs) and Verifiable Credentials Data Model recommendations have recently attracted attention due to their better autonomy, interoperability, ...
A Machine Learning-Empowered Cache Management Scheme for High-Performance SSDs
NAND Flash-based solid-state drives (SSDs) have gained widespread usage in data storage thanks to their exceptional performance and low power consumption. The computational capability of SSDs has been elevated to tackle complex algorithms. Inside an SSD, ...
DPU-Direct: Unleashing Remote Accelerators via Enhanced RDMA for Disaggregated Datacenters
This paper presents DPU-Direct, an accelerator disaggregation system that connects accelerator nodes (ANs) and CPU nodes (CNs) over a standard Remote Direct Memory Access (RDMA) network. DPU-Direct eliminates the latency introduced by the CPU-based ...
BSR-FL: An Efficient Byzantine-Robust Privacy-Preserving Federated Learning Framework
Federated learning (FL) is a technique that enables clients to collaboratively train a model by sharing local models instead of raw private data. However, existing reconstruction attacks can recover the sensitive training samples from the shared models. ...
BlockCompass: A Benchmarking Platform for Blockchain Performance
Blockchain technology has gained momentum due to its immutability and transparency. Several blockchain platforms, each with different consensus protocols, have been proposed. However, choosing and configuring such a platform is a non-trivial task. ...