research-article

Open access

PointAcc: Efficient Point Cloud Accelerator

Authors:

Hanrui Wang, and

Song HanAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

Pages 449 - 461

https://doi.org/10.1145/3466752.3480084

Published: 17 October 2021 Publication History

All formats PDF

Abstract

Deep learning on point clouds plays a vital role in a wide range of applications such as autonomous driving and AR/VR. These applications interact with people in real time on edge devices and thus require low latency and low energy. Compared to projecting the point cloud to 2D space, directly processing 3D point cloud yields higher accuracy and lower #MACs. However, the extremely sparse nature of point cloud poses challenges to hardware acceleration. For example, we need to explicitly determine the nonzero outputs and search for the nonzero neighbors (mapping operation), which is unsupported in existing accelerators. Furthermore, explicit gather and scatter of sparse features are required, resulting in large data movement overhead.

In this paper, we comprehensively analyze the performance bottleneck of modern point cloud networks on CPU/GPU/TPU. To address the challenges, we then present PointAcc, a novel point cloud deep learning accelerator. PointAcc maps diverse mapping operations onto one versatile ranking-based kernel, streams the sparse computation with configurable caching, and temporally fuses consecutive dense layers to reduce the memory footprint. Evaluated on 8 point cloud models across 4 applications, PointAcc achieves 3.7 × speedup and 22 × energy savings over RTX 2080Ti GPU. Co-designed with light-weight neural networks, PointAcc rivals the prior accelerator Mesorasi by 100 × speedup with 9.1% higher accuracy running segmentation on the S3DIS dataset. PointAcc paves the way for efficient point cloud recognition.

References

[1]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. ACM SIGARCH Computer Architecture News 44, 3 (2016), 1–13.

Digital Library

[2]

M. Alwani, H. Chen, M. Ferdman, and P. Milder. 2016. Fused-layer CNN accelerators. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12.

[3]

Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D Semantic Parsing of Large-Scale Indoor Spaces. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]

Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Juergen Gall. 2019. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In IEEE/CVF International Conference on Computer Vision (ICCV).

[5]

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015. ShapeNet: An Information-Rich 3D Model Repository. arXiv (2015).

[6]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, 2014. DaDianNao: A machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 609–622.

Digital Library

[7]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits 52, 1 (2016), 127–138.

Digital Library

[8]

Ran Cheng, Ryan Razani, Ehsan Taghavi, Enxu Li, and Bingbing Liu. 2021. (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 2019. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]

Yu Feng, Boyuan Tian, Tiancheng Xu, Paul Whatmough, and Yuhao Zhu. 2020. Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1037–1050.

[11]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]

Fabian Gieseke, Justin Heinermann, Cosmin Oancea, and Christian Igel. 2014. Buffer kd trees: processing massive nearest neighbor queries on GPUs. In International Conference on Machine Learning. 172–180.

[13]

Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D Semantic Segmentation With Submanifold Sparse Convolutional Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]

Lei Han, Tian Zheng, Lan Xu, and Lu Fang. 2020. OccuSeg: Occupancy-aware 3D Instance Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News 44, 3 (2016), 243–254.

Digital Library

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]

Qingdong He, Zhengning Wang, Hao Zeng, Yi Zeng, Shuaicheng Liu, and Bing Zeng. 2020. SVGA-Net: Sparse Voxel-Graph Attention Network for 3D Object Detection from Point Clouds. arXiv preprint arXiv:2006.04043(2020).

[18]

Simon Heinzle, Gaël Guennebaud, Mario Botsch, and Markus Gross. 2008. A hardware processing unit for point sets. In Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware. 21–31.

Digital Library

[19]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In International Symposium on Computer Architecture (ISCA).

Digital Library

[20]

Yoongu Kim, Weikun Yang, and Onur Mutlu. 2015. Ramulator: A fast and extensible DRAM simulator. IEEE Computer architecture letters 15, 1 (2015), 45–49.

[21]

Roman Klokov and Victor S Lempitsky. 2017. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In IEEE/CVF International Conference on Computer Vision (ICCV).

[22]

Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, and Bernard Ghanem. 2021. DeepGCNs: Making GCNs Go as Deep as CNNs. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2021).

[23]

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: Convolution on Math 4-Transformed Points. In Advances in Neural Information Processing Systems (NeurIPS).

[24]

Zhijian Liu, Haotian Tang, Yujun Lin, and Song Han. 2019. Point-Voxel CNN for Efficient 3D Deep Learning. In Advances in Neural Information Processing Systems (NeurIPS).

[25]

Naveen Muralimanohar, Rajeev Balasubramonian, and Norman Jouppi. 2015. CACTI 6.0: A tool to model large caches. IEEE (2015).

[26]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Computer Architecture News 45, 2 (2017), 27–40.

Digital Library

[27]

Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W Fletcher, and Joel Emer. 2019. Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 137–151.

Digital Library

[28]

Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J Guibas. 2018. Frustum PointNets for 3D Object Detection from RGB-D Data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]

Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]

Charles Ruizhongtai Qi, Hao Su, Matthias Niessner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and Multi-View CNNs for Object Classification on 3D Data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Advances in Neural Information Processing Systems (NeurIPS).

[32]

Deyuan Qiu, Stefan May, and Andreas Nüchter. 2009. GPU-accelerated nearest neighbor search for 3D registration. In International Conference on Computer Vision Systems. Springer, 194–203.

Digital Library

[33]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14–26.

Digital Library

[34]

Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jinaping Shi, Xiaogang Wang, and Hongsheng Li. 2021. PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection. arXiv preprint arXiv:2102.00463(2021).

[35]

Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, and Song Han. 2020. Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution. In European Conference on Computer Vision (ECCV).

Digital Library

[36]

Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas J Guibas. 2019. KPConv: Flexible and Deformable Convolution for Point Clouds. In IEEE/CVF International Conference on Computer Vision (ICCV).

[37]

Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 65–74.

Digital Library

[38]

Hanrui Wang, Zhekai Zhang, and Song Han. 2021. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. In 2021 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE.

[39]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic Graph CNN for Learning on Point Clouds. In ACM SIGGRAPH.

[40]

Felix Winterstein, Samuel Bayliss, and George A Constantinides. 2013. FPGA-based K-means clustering using tree-based data structures. In 2013 23rd International Conference on Field programmable Logic and Applications. IEEE, 1–6.

[41]

Wenxuan Wu, Zhongang Qi, and Li Fuxin. 2019. PointConv: Deep Convolutional Networks on 3D Point Clouds. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A Deep Representation for Volumetric Shapes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]

Tiancheng Xu, Boyuan Tian, and Yuhao Zhu. 2019. Tigris: Architecture and algorithms for 3D perception in point clouds. In MICRO. 629–642.

Digital Library

[44]

Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Yu Qiao. 2018. SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters. In European Conference on Computer Vision (ECCV).

[45]

Tianwei Yin, Xingyi Zhou, and Philipp Krähenbühl. 2021. Center-based 3D Object Detection and Tracking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–12.

[47]

Zhekai Zhang*, Hanrui Wang*, Song Han, and William J Dally. 2020. Sparch: Efficient architecture for sparse matrix multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 261–274.

Cited By

Lyu DLi ZChen YWang GHe WXu NHe G(2024)FLNA: Flexibly Accelerating Feature Learning Networks for Large-Scale Point Clouds With Efficient Dataflow DecouplingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.335512632:4(739-751)Online publication date: Apr-2024
https://doi.org/10.1109/TVLSI.2024.3355126
Mao WWang MXie XWu XWang Z(2024)Hardware Accelerator Design for Sparse DNN Inference and Training: A TutorialIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.334468171:3(1708-1714)Online publication date: Mar-2024
https://doi.org/10.1109/TCSII.2023.3344681
Zhao PChang LZeng JWu LZhou LZhou J(2024)HISPOC: A High-Performance Irregular Activation Sparsity-Aware Point Cloud Network AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.333037771:4(2294-2298)Online publication date: Apr-2024
https://doi.org/10.1109/TCSII.2023.3330377
Show More Cited By

Recommendations

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-...
Read More
Efficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver

The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block ...
Read More
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 2006 ASPLOS Conference

GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
2,631
Total Downloads

Downloads (Last 12 months)850
Downloads (Last 6 weeks)85

Other Metrics

View Author Metrics

Citations

Cited By

Lyu DLi ZChen YWang GHe WXu NHe G(2024)FLNA: Flexibly Accelerating Feature Learning Networks for Large-Scale Point Clouds With Efficient Dataflow DecouplingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.335512632:4(739-751)Online publication date: Apr-2024
https://doi.org/10.1109/TVLSI.2024.3355126
Mao WWang MXie XWu XWang Z(2024)Hardware Accelerator Design for Sparse DNN Inference and Training: A TutorialIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.334468171:3(1708-1714)Online publication date: Mar-2024
https://doi.org/10.1109/TCSII.2023.3344681
Zhao PChang LZeng JWu LZhou LZhou J(2024)HISPOC: A High-Performance Irregular Activation Sparsity-Aware Point Cloud Network AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.333037771:4(2294-2298)Online publication date: Apr-2024
https://doi.org/10.1109/TCSII.2023.3330377
Han SKim JKam DKong BKim MKim YLee Y(2024)Constrained Sorter Design using Zero-One Principle2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557942(1-5)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10557942
Wang SXu HMamandipoor AMahapatra RAhn BGhodrati SKailas KAlian MEsmaeilzadeh H(2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00083
Lee MPark SKim HYoon MLee JChoi JKim NKang MChoi J(2024)SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00041(454-467)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00041
Takada INitto DMidoh YMiura NShiomi JShinkuma R(2024)Edge-Oriented Point Cloud Compression by Moving Object Detection for Realtime Smart Monitoring2024 IEEE 21st Consumer Communications & Networking Conference (CCNC)10.1109/CCNC51664.2024.10454895(400-405)Online publication date: 6-Jan-2024
https://doi.org/10.1109/CCNC51664.2024.10454895
Morabito RChiang M(2024)Exploring Edge AI Inference in Heterogeneous Environments: Requirements, Challenges, and SolutionsIoT Edge Intelligence10.1007/978-3-031-58388-9_2(37-66)Online publication date: 4-Jun-2024
https://doi.org/10.1007/978-3-031-58388-9_2
Song ZLu HLi GJiang LJing NLiang X(2023)PRADA: Point Cloud Recognition Acceleration via Dynamic Approximation2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137301(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10137301
E. Becker PArnau JGonzález ASolihin YHeinrich M(2023)K-D Bonsai: ISA-Extensions to Compress K-D Trees for Autonomous Driving TasksProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589055(1-13)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589055
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents