Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3466752.3480084acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Open access

PointAcc: Efficient Point Cloud Accelerator

Published: 17 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Deep learning on point clouds plays a vital role in a wide range of applications such as autonomous driving and AR/VR. These applications interact with people in real time on edge devices and thus require low latency and low energy. Compared to projecting the point cloud to 2D space, directly processing 3D point cloud yields higher accuracy and lower #MACs. However, the extremely sparse nature of point cloud poses challenges to hardware acceleration. For example, we need to explicitly determine the nonzero outputs and search for the nonzero neighbors (mapping operation), which is unsupported in existing accelerators. Furthermore, explicit gather and scatter of sparse features are required, resulting in large data movement overhead.
    In this paper, we comprehensively analyze the performance bottleneck of modern point cloud networks on CPU/GPU/TPU. To address the challenges, we then present PointAcc, a novel point cloud deep learning accelerator. PointAcc maps diverse mapping operations onto one versatile ranking-based kernel, streams the sparse computation with configurable caching, and temporally fuses consecutive dense layers to reduce the memory footprint. Evaluated on 8 point cloud models across 4 applications, PointAcc achieves 3.7 × speedup and 22 × energy savings over RTX 2080Ti GPU. Co-designed with light-weight neural networks, PointAcc rivals the prior accelerator Mesorasi by 100 × speedup with 9.1% higher accuracy running segmentation on the S3DIS dataset. PointAcc paves the way for efficient point cloud recognition.

    References

    [1]
    Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. ACM SIGARCH Computer Architecture News 44, 3 (2016), 1–13.
    [2]
    M. Alwani, H. Chen, M. Ferdman, and P. Milder. 2016. Fused-layer CNN accelerators. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12.
    [3]
    Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D Semantic Parsing of Large-Scale Indoor Spaces. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [4]
    Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Juergen Gall. 2019. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In IEEE/CVF International Conference on Computer Vision (ICCV).
    [5]
    Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015. ShapeNet: An Information-Rich 3D Model Repository. arXiv (2015).
    [6]
    Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, 2014. DaDianNao: A machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 609–622.
    [7]
    Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits 52, 1 (2016), 127–138.
    [8]
    Ran Cheng, Ryan Razani, Ehsan Taghavi, Enxu Li, and Bingbing Liu. 2021. (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [9]
    Christopher Choy, JunYoung Gwak, and Silvio Savarese. 2019. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [10]
    Yu Feng, Boyuan Tian, Tiancheng Xu, Paul Whatmough, and Yuhao Zhu. 2020. Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1037–1050.
    [11]
    Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [12]
    Fabian Gieseke, Justin Heinermann, Cosmin Oancea, and Christian Igel. 2014. Buffer kd trees: processing massive nearest neighbor queries on GPUs. In International Conference on Machine Learning. 172–180.
    [13]
    Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D Semantic Segmentation With Submanifold Sparse Convolutional Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [14]
    Lei Han, Tian Zheng, Lan Xu, and Lu Fang. 2020. OccuSeg: Occupancy-aware 3D Instance Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [15]
    Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News 44, 3 (2016), 243–254.
    [16]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [17]
    Qingdong He, Zhengning Wang, Hao Zeng, Yi Zeng, Shuaicheng Liu, and Bing Zeng. 2020. SVGA-Net: Sparse Voxel-Graph Attention Network for 3D Object Detection from Point Clouds. arXiv preprint arXiv:2006.04043(2020).
    [18]
    Simon Heinzle, Gaël Guennebaud, Mario Botsch, and Markus Gross. 2008. A hardware processing unit for point sets. In Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware. 21–31.
    [19]
    N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In International Symposium on Computer Architecture (ISCA).
    [20]
    Yoongu Kim, Weikun Yang, and Onur Mutlu. 2015. Ramulator: A fast and extensible DRAM simulator. IEEE Computer architecture letters 15, 1 (2015), 45–49.
    [21]
    Roman Klokov and Victor S Lempitsky. 2017. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In IEEE/CVF International Conference on Computer Vision (ICCV).
    [22]
    Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, and Bernard Ghanem. 2021. DeepGCNs: Making GCNs Go as Deep as CNNs. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2021).
    [23]
    Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: Convolution on Math 4-Transformed Points. In Advances in Neural Information Processing Systems (NeurIPS).
    [24]
    Zhijian Liu, Haotian Tang, Yujun Lin, and Song Han. 2019. Point-Voxel CNN for Efficient 3D Deep Learning. In Advances in Neural Information Processing Systems (NeurIPS).
    [25]
    Naveen Muralimanohar, Rajeev Balasubramonian, and Norman Jouppi. 2015. CACTI 6.0: A tool to model large caches. IEEE (2015).
    [26]
    Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Computer Architecture News 45, 2 (2017), 27–40.
    [27]
    Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W Fletcher, and Joel Emer. 2019. Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 137–151.
    [28]
    Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J Guibas. 2018. Frustum PointNets for 3D Object Detection from RGB-D Data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [29]
    Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [30]
    Charles Ruizhongtai Qi, Hao Su, Matthias Niessner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and Multi-View CNNs for Object Classification on 3D Data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [31]
    Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Advances in Neural Information Processing Systems (NeurIPS).
    [32]
    Deyuan Qiu, Stefan May, and Andreas Nüchter. 2009. GPU-accelerated nearest neighbor search for 3D registration. In International Conference on Computer Vision Systems. Springer, 194–203.
    [33]
    Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14–26.
    [34]
    Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jinaping Shi, Xiaogang Wang, and Hongsheng Li. 2021. PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection. arXiv preprint arXiv:2102.00463(2021).
    [35]
    Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, and Song Han. 2020. Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution. In European Conference on Computer Vision (ECCV).
    [36]
    Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas J Guibas. 2019. KPConv: Flexible and Deformable Convolution for Point Clouds. In IEEE/CVF International Conference on Computer Vision (ICCV).
    [37]
    Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 65–74.
    [38]
    Hanrui Wang, Zhekai Zhang, and Song Han. 2021. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. In 2021 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE.
    [39]
    Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic Graph CNN for Learning on Point Clouds. In ACM SIGGRAPH.
    [40]
    Felix Winterstein, Samuel Bayliss, and George A Constantinides. 2013. FPGA-based K-means clustering using tree-based data structures. In 2013 23rd International Conference on Field programmable Logic and Applications. IEEE, 1–6.
    [41]
    Wenxuan Wu, Zhongang Qi, and Li Fuxin. 2019. PointConv: Deep Convolutional Networks on 3D Point Clouds. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [42]
    Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A Deep Representation for Volumetric Shapes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [43]
    Tiancheng Xu, Boyuan Tian, and Yuhao Zhu. 2019. Tigris: Architecture and algorithms for 3D perception in point clouds. In MICRO. 629–642.
    [44]
    Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Yu Qiao. 2018. SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters. In European Conference on Computer Vision (ECCV).
    [45]
    Tianwei Yin, Xingyi Zhou, and Philipp Krähenbühl. 2021. Center-based 3D Object Detection and Tracking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [46]
    Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–12.
    [47]
    Zhekai Zhang*, Hanrui Wang*, Song Han, and William J Dally. 2020. Sparch: Efficient architecture for sparse matrix multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 261–274.

    Cited By

    View all
    • (2024)FLNA: Flexibly Accelerating Feature Learning Networks for Large-Scale Point Clouds With Efficient Dataflow DecouplingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.335512632:4(739-751)Online publication date: Apr-2024
    • (2024)Hardware Accelerator Design for Sparse DNN Inference and Training: A TutorialIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.334468171:3(1708-1714)Online publication date: Mar-2024
    • (2024)HISPOC: A High-Performance Irregular Activation Sparsity-Aware Point Cloud Network AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.333037771:4(2294-2298)Online publication date: Apr-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
    October 2021
    1322 pages
    ISBN:9781450385572
    DOI:10.1145/3466752
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Check for updates

    Author Tags

    1. neural network accelerator
    2. point cloud
    3. sparse convolution

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MICRO '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Upcoming Conference

    MICRO '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)850
    • Downloads (Last 6 weeks)85

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)FLNA: Flexibly Accelerating Feature Learning Networks for Large-Scale Point Clouds With Efficient Dataflow DecouplingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.335512632:4(739-751)Online publication date: Apr-2024
    • (2024)Hardware Accelerator Design for Sparse DNN Inference and Training: A TutorialIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.334468171:3(1708-1714)Online publication date: Mar-2024
    • (2024)HISPOC: A High-Performance Irregular Activation Sparsity-Aware Point Cloud Network AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.333037771:4(2294-2298)Online publication date: Apr-2024
    • (2024)Constrained Sorter Design using Zero-One Principle2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557942(1-5)Online publication date: 19-May-2024
    • (2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
    • (2024)SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00041(454-467)Online publication date: 2-Mar-2024
    • (2024)Edge-Oriented Point Cloud Compression by Moving Object Detection for Realtime Smart Monitoring2024 IEEE 21st Consumer Communications & Networking Conference (CCNC)10.1109/CCNC51664.2024.10454895(400-405)Online publication date: 6-Jan-2024
    • (2024)Exploring Edge AI Inference in Heterogeneous Environments: Requirements, Challenges, and SolutionsIoT Edge Intelligence10.1007/978-3-031-58388-9_2(37-66)Online publication date: 4-Jun-2024
    • (2023)PRADA: Point Cloud Recognition Acceleration via Dynamic Approximation2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137301(1-6)Online publication date: Apr-2023
    • (2023)K-D Bonsai: ISA-Extensions to Compress K-D Trees for Autonomous Driving TasksProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589055(1-13)Online publication date: 17-Jun-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media