Search | arXiv e-print repository

Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

Authors: Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

Abstract: Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, thi… ▽ More Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, this is the first FPGA-based ViT acceleration framework exploring model quantization. Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0.47% to 1.36% higher Top-1 accuracy under the same bit-width. Compared with the 32-bit floating-point baseline FPGA accelerator, our accelerator achieves around 5.6x improvement on the frame rate (i.e., 56.8 FPS vs. 10.0 FPS) with 0.71% accuracy drop on ImageNet dataset for DeiT-base. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: Published in FPL2022

arXiv:2206.01244 [pdf, other]

Real-Time Portrait Stylization on the Edge

Authors: Yanyu Li, Xuan Shen, Geng Yuan, Jiexiong Guan, Wei Niu, Hao Tang, Bin Ren, Yanzhi Wang

Abstract: In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices. We propose a latency-driven differentiable architecture search method, maintaining realistic generative quality. With our framework, we obtain $10\times$ computation reduction on the generative model and achieve real-time video stylization on off-the-sh… ▽ More In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices. We propose a latency-driven differentiable architecture search method, maintaining realistic generative quality. With our framework, we obtain $10\times$ computation reduction on the generative model and achieve real-time video stylization on off-the-shelf smartphone using mobile GPUs. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2108.08910 [pdf, other]

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

Authors: Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang

Abstract: Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices. To overcome the challenge and facilitate the real-time deploymen… ▽ More Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices. To overcome the challenge and facilitate the real-time deployment of SISR tasks on mobile, we combine neural architecture search with pruning search and propose an automatic search framework that derives sparse super-resolution (SR) models with high image quality while satisfying the real-time inference requirement. To decrease the search cost, we leverage the weight sharing strategy by introducing a supernet and decouple the search problem into three stages, including supernet construction, compiler-aware architecture and pruning search, and compiler-aware pruning ratio search. With the proposed framework, we are the first to achieve real-time SR inference (with only tens of milliseconds per frame) for implementing 720p resolution with competitive image quality (in terms of PSNR and SSIM) on mobile platforms (Samsung Galaxy S20). △ Less

Submitted 14 February, 2023; v1 submitted 18 August, 2021; originally announced August 2021.

arXiv:1912.05416 [pdf, other]

A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation

Authors: Geng Yuan, Xiaolong Ma, Sheng Lin, Zhengang Li, Caiwen Ding

Abstract: The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption are required. To address these challenges, spin orbit torque magnetic random-access memory (SOT-MRAM)… ▽ More The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption are required. To address these challenges, spin orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, none-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra energy efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system throughput at the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficiency and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area \& power) and throughput of SOT-MRAM PIM engine is significantly improved. △ Less

Submitted 24 November, 2019; originally announced December 2019.

arXiv:1908.10017 [pdf, other]

Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation

Authors: Xiaolong Ma, Geng Yuan, Sheng Lin, Caiwen Ding, Fuxun Yu, Tao Liu, Wujie Wen, Xiang Chen, Yanzhi Wang

Abstract: The state-of-art DNN structures involve intensive computation and high memory storage. To mitigate the challenges, the memristor crossbar array has emerged as an intrinsically suitable matrix computation and low-power acceleration framework for DNN applications. However, the high accuracy solution for extreme model compression on memristor crossbar array architecture is still waiting for unravelin… ▽ More The state-of-art DNN structures involve intensive computation and high memory storage. To mitigate the challenges, the memristor crossbar array has emerged as an intrinsically suitable matrix computation and low-power acceleration framework for DNN applications. However, the high accuracy solution for extreme model compression on memristor crossbar array architecture is still waiting for unraveling. In this paper, we propose a memristor-based DNN framework which combines both structured weight pruning and quantization by incorporating alternating direction method of multipliers (ADMM) algorithm for better pruning and quantization performance. We also discover the non-optimality of the ADMM solution in weight pruning and the unused data path in a structured pruned model. Motivated by these discoveries, we design a software-hardware co-optimization framework which contains the first proposed Network Purification and Unused Path Removal algorithms targeting on post-processing a structured pruned model after ADMM steps. By taking memristor hardware constraints into our whole framework, we achieve extreme high compression ratio on the state-of-art neural network structures with minimum accuracy loss. For quantizing structured pruned model, our framework achieves nearly no accuracy loss after quantizing weights to 8-bit memristor weight representation. We share our models at anonymous link https://bit.ly/2VnMUy0. △ Less

Submitted 27 August, 2019; originally announced August 2019.

arXiv:1806.04716 [pdf]

Application of FPGA Acceleration in ADC Performance Calibration

Authors: Guangyuan Yuan, Zhe cao, Shuwen Wang, Shubin Liu, Qi An

Abstract: In recent years, high speed and high resolution analog-to-digital converter (ADC) is widely employed in many physical experiments, especially in high precision time and charge measurement. The rapid increasing amount of digitized data demands faster computing. FPGA acceleration has an attracting prospect in data process for its stream process and parallel process feature. In this paper, an ADC per… ▽ More In recent years, high speed and high resolution analog-to-digital converter (ADC) is widely employed in many physical experiments, especially in high precision time and charge measurement. The rapid increasing amount of digitized data demands faster computing. FPGA acceleration has an attracting prospect in data process for its stream process and parallel process feature. In this paper, an ADC performance calibration application based on FPGA acceleration is described. FPGA reads the ADC digitized data stream from PC memory, processes and then writes processed result back to the PC memory. PCIE bus is applied to increase the data transfer speed, and floating point algorithm is applied to improve the accuracy. The test result shows that FPGA acceleration can reduce the processing time of the ADC performance calibration compared with traditional method of C-based CPU processing. This frame of PCIE-based FPGA acceleration method can be applied in analysis and simulation in the future physical experiment for large ADC array, such as CCD camera and waveform digitization readout electronics calibration. △ Less

Submitted 9 June, 2018; originally announced June 2018.

Comments: 2 pages, 4 figures, 21st IEEE Real Time Conference

arXiv:1802.09879 [pdf, other]

L0TV: A Sparse Optimization Method for Impulse Noise Image Restoration

Authors: Ganzhao Yuan, Bernard Ghanem

Abstract: Total Variation (TV) is an effective and popular prior model in the field of regularization-based image processing. This paper focuses on total variation for removing impulse noise in image restoration. This type of noise frequently arises in data acquisition and transmission due to many reasons, e.g. a faulty sensor or analog-to-digital converter errors. Removing this noise is an important task i… ▽ More Total Variation (TV) is an effective and popular prior model in the field of regularization-based image processing. This paper focuses on total variation for removing impulse noise in image restoration. This type of noise frequently arises in data acquisition and transmission due to many reasons, e.g. a faulty sensor or analog-to-digital converter errors. Removing this noise is an important task in image restoration. State-of-the-art methods such as Adaptive Outlier Pursuit(AOP) \cite{yan2013restoration}, which is based on TV with $\ell_{02}$-norm data fidelity, only give sub-optimal performance. In this paper, we propose a new sparse optimization method, called $\ell_0TV$-PADMM, which solves the TV-based restoration problem with $\ell_0$-norm data fidelity. To effectively deal with the resulting non-convex non-smooth optimization problem, we first reformulate it as an equivalent biconvex Mathematical Program with Equilibrium Constraints (MPEC), and then solve it using a proximal Alternating Direction Method of Multipliers (PADMM). Our $\ell_0TV$-PADMM method finds a desirable solution to the original $\ell_0$-norm optimization problem and is proven to be convergent under mild conditions. We apply $\ell_0TV$-PADMM to the problems of image denoising and deblurring in the presence of impulse noise. Our extensive experiments demonstrate that $\ell_0TV$-PADMM outperforms state-of-the-art image restoration methods. △ Less

Submitted 28 December, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

Comments: to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Showing 1–7 of 7 results for author: Yuan, G