Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3589610.3596284acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Synchronization-Aware NAS for an Efficient Collaborative Inference on Mobile Platforms

Published: 13 June 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Previous neural architecture search (NAS) approaches for mobile platforms have achieved great success in designing a slim-but-accurate neural network that is generally well-matched to a single computing unit such as a CPU or GPU. However, as recent mobile devices consist of multiple heterogeneous computing units, the next main challenge is to maximize both accuracy and efficiency by fully utilizing multiple available resources. We propose an ensemble-like approach with intermediate feature aggregations, namely synchronizations, for active collaboration between individual models on a mobile device. A main challenge is to determine the optimal synchronization strategies for achieving both performance and efficiency. To this end, we propose SyncNAS to automate the exploration of synchronization strategies for collaborative neural architectures that maximize utilization of heterogeneous computing units on a target device. We introduce a novel search space for synchronization strategy and apply Monte Carlo tree search (MCTS) algorithm to improve the sampling efficiency and reduce the search cost. On ImageNet, our collaborative model based on MobileNetV2 achieves 2.7% top-1 accuracy improvement within the baseline latency budget. Under the reduced target latency down to half, our model maintains higher accuracy than its baseline model, owing to the enhanced utilization and collaboration. As an impact of MCTS, SyncNAS reduces its search cost by up to 21x in searching for the optimal strategy.

    References

    [1]
    Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265–283.
    [2]
    Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12.
    [3]
    Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 4, 1 (2012), 1–43.
    [4]
    Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791.
    [5]
    Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In International Conference on Learning Representations (ICLR).
    [6]
    Guillaume Chaslot, Sander Bakkes, Istvan Szita, and Pieter Spronck. 2008. Monte-Carlo Tree Search: A New Framework for Game AI. AIIDE, 8 (2008), 216–217.
    [7]
    Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274.
    [8]
    Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Q. Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578–594.
    [9]
    Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085.
    [10]
    Kyunghwan Choi, Seongju Lee, Beom Woo Kang, and Yongjun Park. 2021. Legion: Tailoring Grouped Neural Execution Considering Heterogeneity on Multiple Edge Devices. In 2021 IEEE 39th International Conference on Computer Design (ICCD). 383–390.
    [11]
    Xiangxiang Chu, Bo Zhang, and Ruijun Xu. 2020. Moga: Searching beyond mobilenetv3. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4042–4046.
    [12]
    Xiangxiang Chu, Bo Zhang, Ruijun Xu, and Jixiang Li. 2019. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv preprint arXiv:1907.01845.
    [13]
    NVIDIA Corporation. 2017. NVIDIA DGX-1 with Tesla V100 System Architecture. https://images.nvidia.com/content/pdf/dgx1-v100-system-architecture-whitepaper.pdf
    [14]
    NVIDIA Corporation. 2021. NVIDIA Ampere GA 102 GPU Architecture. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf
    [15]
    Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, and Niraj K. Jha. 2019. Chamnet: Towards efficient network design through platform-aware model adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11398–11407.
    [16]
    Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc' aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc Le, and Andrew Ng. 2012. Large scale distributed deep networks. Advances in neural information processing systems, 25 (2012), 1223–1231.
    [17]
    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. 248–255.
    [18]
    Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, and Min Sun. 2018. Dpp-net: Device-aware progressive search for pareto-optimal neural architectures. In Proceedings of the European Conference on Computer Vision (ECCV). 517–531.
    [19]
    Lukasz Dudziak, Thomas Chau, Mohamed Abdelfattah, Royson Lee, Hyeji Kim, and Nicholas Lane. 2020. Brp-nas: Prediction-based nas using gcns. Advances in Neural Information Processing Systems, 33 (2020), 10480–10490.
    [20]
    Cheng Fu, Huili Chen, Zhenheng Yang, Farinaz Koushanfar, Yuandong Tian, and Jishen Zhao. 2020. Enhancing Model Parallelism in Neural Architecture Search for Multidevice System. IEEE Micro, 40, 5 (2020), 46–55.
    [21]
    Mudasir A Ganaie, Minghui Hu, AK Malik, M Tanveer, and PN Suganthan. 2021. Ensemble deep learning: A review. arXiv preprint arXiv:2104.02395.
    [22]
    Sylvain Gelly and David Silver. 2007. Combining online and offline knowledge in UCT. In Proceedings of the 24th international conference on Machine learning. 273–280.
    [23]
    Abhinav Goel, Caleb Tung, Xiao Hu, George K Thiruvathukal, James C Davis, and Yung-Hsiang Lu. 2022. Efficient computer vision on edge devices with pipeline-parallel hierarchical neural networks. In 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC). 532–537.
    [24]
    Dongyoon Han, Sangdoo Yun, Byeongho Heo, and YoungJoon Yoo. 2021. Rethinking channel dimensions for efficient model design. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 732–741.
    [25]
    Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.
    [26]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
    [27]
    Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European conference on computer vision (ECCV). 784–800.
    [28]
    Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324.
    [29]
    Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
    [30]
    Zhihao Jia, Sina Lin, Charles R Qi, and Alex Aiken. 2018. Exploring hidden dimensions in accelerating convolutional neural networks. In International Conference on Machine Learning. 2274–2283.
    [31]
    Beom Woo Kang. 2023. SyncNAS - Synchronization-aware NAS for an Efficient Collaborative Inference on Mobile Platforms. https://github.com/beomwookang/SyncNAS
    [32]
    Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems.
    [33]
    Juyong Kim, Yookoon Park, Gunhee Kim, and Sung Ju Hwang. 2017. Splitnet: Learning to semantically split deep networks for parameter reduction and model parallelization. In International Conference on Machine Learning. 1866–1874.
    [34]
    Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. μ layer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In Proceedings of the Fourteenth EuroSys Conference 2019. 1–15.
    [35]
    Yunyong Ko, Jae-Seo Yu, Hong-Kyun Bae, Yongjun Park, Dongwon Lee, and Sang-Wook Kim. 2021. MASCOT: A Quantization Framework for Efficient Matrix Factorization in Recommender Systems.
    [36]
    Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. In European conference on machine learning. 282–293.
    [37]
    Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997.
    [38]
    Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055.
    [39]
    Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the Value of Network Pruning. In International Conference on Learning Representations.
    [40]
    Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV). 116–131.
    [41]
    Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and Yiran Chen. 2017. Modnn: Local distributed mobile computing system for deep neural network. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. 1396–1401.
    [42]
    Joe Mellor, Jack Turner, Amos Storkey, and Elliot J Crowley. 2021. Neural architecture search without training. In International Conference on Machine Learning. 7588–7598.
    [43]
    Elizbar A Nadaraya. 1964. On estimating regression. Theory of Probability & Its Applications, 9, 1 (1964), 141–142.
    [44]
    Renato Negrinho and Geoff Gordon. 2017. Deeparchitect: Automatically designing and training deep architectures. arXiv preprint arXiv:1704.08792.
    [45]
    Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 907–922.
    [46]
    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32 (2019), 8026–8037.
    [47]
    Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In International Conference on Machine Learning. 4095–4104.
    [48]
    Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence. 33, 4780–4789.
    [49]
    Omer Sagi and Lior Rokach. 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8, 4 (2018), e1249.
    [50]
    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
    [51]
    Nir Shlezinger, Erez Farhan, Hai Morgenstern, and Yonina C Eldar. 2021. Collaborative inference via ensembles on the edge. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 8478–8482.
    [52]
    Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
    [53]
    Vladislav Sovrasov. 2018. Flops counting tool for neural networks in pytorch framework. https://github.com/sovrasov/flops-counter.pytorch
    [54]
    Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. 2019. Single-path nas: Designing hardware-efficient convnets in less than 4 hours. Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD.
    [55]
    Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, and Hervé Jégou. 2019. And the bit goes down: Revisiting the quantization of neural networks. arXiv preprint arXiv:1907.05686.
    [56]
    Xiu Su, Tao Huang, Yanxi Li, Shan You, Fei Wang, Chen Qian, Changshui Zhang, and Chang Xu. 2021. Prioritized Architecture Sampling with Monto-Carlo Tree Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10968–10977.
    [57]
    Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2820–2828.
    [58]
    Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. 6105–6114.
    [59]
    Frederick Tung and Greg Mori. 2018. Clip-q: Deep network compression learning by in-parallel pruning-quantization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7873–7882.
    [60]
    Olivier Valery, Pangfeng Liu, and Jan-Jan Wu. 2018. A collaborative CPU–GPU approach for principal component analysis on mobile heterogeneous platforms. J. Parallel and Distrib. Comput., 120 (2018), 44–61.
    [61]
    Linnan Wang, Yiyang Zhao, Yuu Jinnai, Yuandong Tian, and Rodrigo Fonseca. 2019. Alphax: exploring neural architectures with deep neural networks and monte carlo tree search. arXiv preprint arXiv:1903.11059.
    [62]
    Geoffrey S Watson. 1964. Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A, 359–372.
    [63]
    Martin Wistuba. 2017. Finding competitive network architectures within a day using uct. arXiv preprint arXiv:1712.07420.
    [64]
    Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10734–10742.
    [65]
    Carole Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, and Peizhao Zhang. 2019. Machine learning at facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 331–344.
    [66]
    Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. 2018. SNAS: stochastic neural architecture search. arXiv preprint arXiv:1812.09926.
    [67]
    Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, and Xian-sheng Hua. 2019. Quantization networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7308–7316.
    [68]
    Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, and Yiyu Shi. 2020. Co-exploration of neural architectures and heterogeneous asic accelerator designs targeting multiple tasks. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1–6.
    [69]
    Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. 2018. NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications. In The European Conference on Computer Vision (ECCV).
    [70]
    Jiahui Yu and Thomas Huang. 2019. Autoslim: Towards one-shot architecture search for channel numbers. arXiv preprint arXiv:1903.11728.
    [71]
    Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, and Quoc Le. 2020. Bignas: Scaling up neural architecture search with big single-stage models. In European Conference on Computer Vision. 702–717.
    [72]
    Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6848–6856.
    [73]
    Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37, 11 (2018), 2348–2359.
    [74]
    Li Zhou, Mohammad Hossein Samavatian, Anys Bacha, Saikat Majumdar, and Radu Teodorescu. 2019. Adaptive parallel execution of deep neural networks on heterogeneous edge devices. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. 195–208.
    [75]
    Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2020. A comprehensive survey on transfer learning. Proc. IEEE, 109, 1 (2020), 43–76.
    [76]
    Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8697–8710.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LCTES 2023: Proceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems
    June 2023
    147 pages
    ISBN:9798400701740
    DOI:10.1145/3589610
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Model Parallelism
    2. Neural Architecture Search
    3. On-Device ML

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    LCTES '23

    Acceptance Rates

    Overall Acceptance Rate 116 of 438 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 184
      Total Downloads
    • Downloads (Last 12 months)141
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media