Accelerating Deformable Convolution Networks with Dynamic and Irregular Memory Accesses
Abstract
1 Introduction
2 Background AND Related Work
2.1 Deformable Convolutional Networks
2.2 Unified Deformable Convolution Model
2.3 Neural Network Accelerator Redesigning
Network | Dataset | [17] | [18] | Native DCNs [1] | Quantized DCNs |
---|---|---|---|---|---|
ShuffleNet V2 | COCO | 36.8 | \(\setminus\) | 38.4 | 37.9 |
ShuffleNet V2 | VOC | 63.1 | \(\setminus\) | 64.4 | 64.2 |
Faster R-CNN | COCO | \(\setminus\) | 60.8 | 61.8 | 61.1 |
VGG16 | IMAGENET | 82.6 | 83.2 | 84.3 | 84.1 |
VGG16 | CIFAR10 | 88.7 | 89.4 | 90.2 | 90.2 |
3 Observation
4 DCN Accelerator Architecture
4.1 Overall Accelerator Architecture
4.2 BLI Implementation
4.3 Runtime Tile Scheduling
4.4 BLI and Convolution Fusion
5 Experiment
5.1 Experiment Setup
# of PEs | In Buf | Out Buf | Weight Buf | Index Buf | Inst Buf |
---|---|---|---|---|---|
16 \(\times\) 32 | 128 KB | 256 KB | 256 KB | 32 KB | 64 KB |
ACT | RD | WR | READ I/O | Write ODT | BG |
---|---|---|---|---|---|
63.7mW | 52.1 mW | 52.1 mW | 32.7 mW | 136.1mW | 67.7 mW |
Network | # of deformable Conv | # of Conv | Kernel types |
---|---|---|---|
VGG19-3 | 3 | 13 | 3 |
VGG19-8 | 8 | 8 | 3 |
VGG19-F | 19 | 0 | 3 |
SegNet-3 | 3 | 13 | 3 |
SegNet-8 | 8 | 8 | 3 |
SegNet-F | 16 | 0 | 3 |
VGG19-3 | VGG19-8 | VGG19-F | SegNet-3 | SegNet-8 | SegNet-F | |
---|---|---|---|---|---|---|
DCN-I | 10.2 | 11.6 | 70.6 | 18.7 | 25.7 | 123.2 |
DCN-II | 11.6 | 15.1 | 84.2 | 23.2 | 31.2 | 155.2 |
5.2 Performance Evaluation
VGG19-3 | VGG19-8 | VGG19-F | SegNet-3 | SegNet-8 | SegNet-F | |
---|---|---|---|---|---|---|
DCN-I | 15.9 | 17.1 | 55.8 | 29.1 | 34.3 | 99 |
DCN-II | 17.3 | 20.1 | 65.6 | 34.5 | 40.8 | 122.8 |
5.3 Energy Consumption Evaluation
5.4 Chip Area Evaluation
5.5 Optimization Evaluation
6 Conclusion
Footnote
References
Index Terms
- Accelerating Deformable Convolution Networks with Dynamic and Irregular Memory Accesses
Recommendations
Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs
DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on ...
Accelerating PQMRCGSTAB algorithm on GPU
UCHPC-MAW '09: Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshopThe general computations on GPU are becoming more and more popular because of GPU's powerful computing ability. In this paper, how to use GPU to accelerate sparse linear system solver, preconditioned QMRCGSTAB (PQMRCGSTAB for short), is our concern. We ...
N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysAccelerating the neural network inference by FPGA has emerged as a popular option, since the reconfigurability and high performance computing capability of FPGA intrinsically satisfies the computation demand of the fast-evolving neural algorithms. ...
Comments
Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Journal Family
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Funding Sources
- National Key R&D Program of China
- National Natural Science Foundation of China
- Singapore Government’s Research, Innovation and Enterprise 2020 Plan (Advanced Manufacturing and Engineering domain)
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 1,310Total Downloads
- Downloads (Last 12 months)1,193
- Downloads (Last 6 weeks)98
Other Metrics
Citations
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in