- Sponsor:
- sigda
No abstract available.
Proceeding Downloads
QuantumNAT: quantum noise-aware training with noise injection, quantization and normalization
Parameterized Quantum Circuits (PQC) are promising towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of PQC models has a severe degradation on real quantum devices. Take Quantum ...
Optimizing quantum circuit synthesis for permutations using recursion
We describe a family of recursive methods for the synthesis of qubit permutations on quantum computers with limited qubit connectivity. Two objectives are of importance: circuit size and depth. In each case we combine a scalable heuristic with a non-...
A fast and scalable qubit-mapping method for noisy intermediate-scale quantum computers
This paper presents an efficient qubit-mapping method that redesigns a quantum circuit to overcome the limitations of qubit connectivity. We propose a recursive graph-isomorphism search to generate the scalable initial mapping. In the main mapping, we ...
Optimizing quantum circuit placement via machine learning
Quantum circuit placement (QCP) is the process of mapping the synthesized logical quantum programs on physical quantum machines, which introduces additional SWAP gates and affects the performance of quantum circuits. Nevertheless, determining the minimal ...
HERO: hessian-enhanced robust optimization for unifying and improving generalization and quantization performance
With the recent demand of deploying neural network models on mobile and edge devices, it is desired to improve the model's generalizability on unseen testing data, as well as enhance the model's robustness under fixed-point quantization for efficient ...
Neural computation for robust and holographic face detection
- Mohsen Imani,
- Ali Zakeri,
- Hanning Chen,
- TaeHyun Kim,
- Prathyush Poduval,
- Hyunsei Lee,
- Yeseong Kim,
- Elaheh Sadredini,
- Farhad Imani
Face detection is an essential component of many tasks in computer vision with several applications. However, existing deep learning solutions are significantly slow and inefficient to enable face detection on embedded platforms. In this paper, we ...
FHDnn: communication efficient and robust federated learning for AIoT networks
The advent of IoT and advances in edge computing inspired federated learning, a distributed algorithm to enable on device learning. Transmission costs, unreliable networks and limited compute power all of which are typical characteristics of IoT networks ...
ODHD: one-class brain-inspired hyperdimensional computing for outlier detection
Outlier detection is a classical and important technique that has been used in different application domains such as medical diagnosis and Internet-of-Things. Recently, machine learning-based outlier detection algorithms, such as one-class support vector ...
High-level synthesis performance prediction using GNNs: benchmarking, modeling, and advancing
Agile hardware development requires fast and accurate circuit quality evaluation from early design stages. Existing work of high-level synthesis (HLS) performance prediction usually requires extensive feature engineering after the synthesis process. To ...
Automated accelerator optimization aided by graph neural networks
Using High-Level Synthesis (HLS), the hardware designers must describe only a high-level behavioral flow of the design. However, it still can take weeks to develop a high-performance architecture mainly because there are many design choices at a higher ...
Functionality matters in netlist representation learning
Learning feasible representation from raw gate-level netlists is essential for incorporating machine learning techniques in logic synthesis, physical design, or verification. Existing message-passing-based graph learning methodologies focus merely on ...
EMS: efficient memory subsystem synthesis for spatial accelerators
Spatial accelerators provide massive parallelism with an array of homogeneous PEs, and enable efficient data reuse with PE array dataflow and on-chip memory. Many previous works have studied the dataflow architecture of spatial accelerators, including ...
DA PUF: dual-state analog PUF
Physical unclonable function (PUF) is a promising lightweight hardware security primitive that exploits process variations during chip fabrication for applications such as key generation and device authentication. Reliability of the PUF information plays ...
PathFinder: side channel protection through automatic leaky paths identification and obfuscation
Side-channel analysis (SCA) attacks show an enormous threat to cryptographic integrated circuits (ICs). To address this threat, designers try to adopt various countermeasures during the IC development process. However, many existing solutions are costly ...
LOCK&ROLL: deep-learning power side-channel attack mitigation using emerging reconfigurable devices and logic locking
- Gaurav Kolhe,
- Tyler Sheaves,
- Kevin Immanuel Gubbi,
- Soheil Salehi,
- Setareh Rafatirad,
- Sai Manoj PD,
- Avesta Sasan,
- Houman Homayoun
The security and trustworthiness of ICs are exacerbated by the modern globalized semiconductor business model. This model involves many steps performed at multiple locations by different providers and integrates various Intellectual Properties (IPs) from ...
Efficient access scheme for multi-bank based NTT architecture through conflict graph
Number Theoretical Transform (NTT) hardware accelerator becomes crucial building block in many cryptosystems like post-quantum cryptography. In this paper, we provide new insights into the construction of conflict-free memory mapping scheme (CFMMS) for ...
InfoX: an energy-efficient ReRAM accelerator design with information-lossless low-bit ADCs
ReRAM-based accelerators have shown great potential in neural network acceleration via in-memory analog computing. However, high-precision analog-to-digital converters (ADCs), which are required by the ReRAM crossbars to achieve high-accuracy network ...
PHANES: ReRAM-based photonic accelerator for deep neural networks
Resistive random access memory (ReRAM) has demonstrated great promises of in-situ matrix-vector multiplications to accelerate deep neural networks. However, subject to the intrinsic properties of analog processing, most of the proposed ReRAM-based ...
CP-SRAM: charge-pulsation SRAM marco for ultra-high energy-efficiency computing-in-memory
SRAM-based computing-in-memory (SRAM-CIM) provides fast speed and good scalability with advanced process technology. However, the energy efficiency of the state-of-the-art current-domain SRAM-CIM bit-cell structure is limited and the peripheral circuitry ...
CREAM: computing in ReRAM-assisted energy and area-efficient SRAM for neural network acceleration
Computing-in-memory has been widely explored to accelerate DNN. However, most existing CIM cannot store all NN weights due to limited SRAM capacity for edge AI devices, inducing a large amount off-chip DRAM access. In this paper, a new computing in ReRAM-...
Chiplet actuary: a quantitative cost model and multi-chiplet architecture exploration
Multi-chip integration is widely recognized as the extension of Moore's Law. Cost-saving is a frequently mentioned advantage, but previous works rarely present quantitative demonstrations on the cost superiority of multi-chip integration over monolithic ...
PANORAMA: divide-and-conquer approach for mapping complex loop kernels on CGRA
CGRAs are well-suited as hardware accelerators due to power efficiency and reconfigurability. However, their potential is limited by the inability of the compiler to map complex loop kernels onto the architectures effectively. We propose PANORAMA, a fast ...
A fast parameter tuning framework via transfer learning and multi-objective bayesian optimization
Design space exploration (DSE) can automatically and effectively determine design parameters to achieve the optimal performance, power and area (PPA) in very large-scale integration (VLSI) design. The lack of prior knowledge causes low efficient ...
PriMax: maximizing DSL application performance with selective primitive acceleration
Domain-specific languages (DSLs) improve developer productivity by abstracting away low-level details of an algorithm's implementation within a specialized domain. These languages often provide powerful primitives to describe complex operations, ...
Accelerating and pruning CNNs for semantic segmentation on FPGA
- Pierpaolo Morì,
- Manoj-Rohit Vemparala,
- Nael Fasfous,
- Saptarshi Mitra,
- Sreetama Sarkar,
- Alexander Frickenstein,
- Lukas Frickenstein,
- Domenik Helms,
- Naveen Shankar Nagaraja,
- Walter Stechele,
- Claudio Passerone
Semantic segmentation is one of the popular tasks in computer vision, providing pixel-wise annotations for scene understanding. However, segmentation-based convolutional neural networks require tremendous computational power. In this work, a fully-...
SoftSNN: low-cost fault tolerance for spiking neural network accelerators under soft errors
Specialized hardware accelerators have been designed and employed to maximize the performance efficiency of Spiking Neural Networks (SNNs). However, such accelerators are vulnerable to transient faults (i.e., soft errors), which occur due to high-energy ...
A joint management middleware to improve training performance of deep recommendation systems with SSDs
As the sizes and variety of training data scale over time, data preprocessing is becoming an important performance bottleneck for training deep recommendation systems. This challenge becomes more serious when training data is stored in Solid-State Drives ...
The larger the fairer?: small neural networks can achieve fairness for edge devices
Along with the progress of AI democratization, neural networks are being deployed more frequently in edge devices for a wide range of applications. Fairness concerns gradually emerge in many applications, such as face recognition and mobile medical. One ...
SCAIE-V: an open-source SCAlable interface for ISA extensions for RISC-V processors
Custom instructions extending a base ISA are often used to increase performance. However, only few cores provide open interfaces for integrating such ISA Extensions (ISAX). In addition, the degree to which a core's capabilities are exposed for extension ...
A scalable symbolic simulation tool for low power embedded systems
Recent work has demonstrated the effectiveness of using symbolic simulation to perform hardware software co-analysis on an application-processor pair and developed a variety of hardware and software design techniques and optimizations, ranging from ...