Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3489517.3530589acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Open access

QuiltNet: efficient deep learning inference on multi-chip accelerators using model partitioning

Published: 23 August 2022 Publication History
  • Get Citation Alerts
  • Abstract

    We have seen many successful deployments of deep learning accelerator designs on different platforms and technologies, e.g., FPGA, ASIC, and Processing In-Memory platforms. However, the size of the deep learning models keeps increasing, making computations a burden on the accelerators. A naive approach to resolve this issue is to design larger accelerators; however, it is not scalable due to high resource requirements, e.g., power consumption and off-chip memory sizes. A promising solution is to utilize multiple accelerators and use them as needed, similar to conventional multiprocessing. For example, for smaller networks, we may use a single accelerator, while we may use multiple accelerators with proper network partitioning for larger networks. However, partitioning DNN models into multiple parts leads to large communication overheads due to inter-layer communications. In this paper, we propose a scalable solution to accelerate DNN models on multiple devices by devising a new model partitioning technique. Our technique transforms a DNN model into layer-wise partitioned models using an autoencoder. Since the autoencoder encodes a tensor output into a smaller dimension, we can split the neural network model into multiple pieces while significantly reducing the communication overhead to pipeline them. Our evaluation results conducted on state-of-the-art deep learning models show that the proposed technique significantly improves performance and energy efficiency. Our solution increases performance and energy efficiency by up to 30.5% and 28.4% with minimal accuracy loss as compared to running the same model on pipelined multi-block accelerators without the autoencoder.

    References

    [1]
    Szymon Migacz. Nvidia 8-bit inference width tensorrt. In GPU Technology Conference, volume 10, 2017.
    [2]
    Yunji Chen, et al. Diannao family: energy-efficient hardware accelerators for machine learning. Communications of the ACM, 59(11):105--112, 2016.
    [3]
    Norman P Jouppi, et al. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture, pages 1--12, 2017.
    [4]
    Ahmad Shawahna, et al. Fpga-based accelerators of deep learning networks for learning and classification: A review. IEEE Access, 7:7823--7859, 2018.
    [5]
    Denis Foley et al. Ultra-performance pascal gpu and nvlink interconnect. IEEE Micro, 37(2):7--17, 2017.
    [6]
    Anastasia Koloskova, et al. Decentralized deep learning with arbitrary communication compression. arXiv preprint arXiv:1907.09356, 2019.
    [7]
    Ganggang Dong, et al. A review of the autoencoder and its variants: A comparative perspective from target recognition in synthetic-aperture radar images. IEEE Geoscience and Remote Sensing Magazine, 6(3):44--68, 2018.
    [8]
    Alex Krizhevsky. One weird trick for parallelizing convolutional neural networks. CoRR, abs/1404.5997, 2014.
    [9]
    Linghao Song, et al. Hypar: Towards hybrid parallelism for deep learning accelerator array. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 56--68, 2019.
    [10]
    Minjie Wang, et al. Supporting very large models using automatic dataflow graph partitioning. In Proceedings of the Fourteenth EuroSys Conference 2019, EuroSys '19, New York, NY, USA, 2019. Association for Computing Machinery.
    [11]
    Linghao Song, et al. Accpar: Tensor partitioning for heterogeneous deep learning accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 342--355, 2020.
    [12]
    Yakun Sophia Shao, et al. Simba: Scaling deep-learning inference with multi-chip-module-based architecture. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '52, page 14--27, New York, NY, USA, 2019. Association for Computing Machinery.
    [13]
    Pierre Baldi. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML workshop on unsupervised and transfer learning, pages 37--49. JMLR Workshop and Conference Proceedings, 2012.
    [14]
    Chuanqi Tan, et al. A survey on deep transfer learning. In International conference on artificial neural networks, pages 270--279. Springer, 2018.
    [15]
    Luis Perez et al. The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621, 2017.
    [16]
    Uci machine learning repository. https://archive.ics.uci.edu/.
    [17]
    Anthony Thomas, et al. Hierarchical and distributed machine learning inference beyond the edge. In 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC), pages 18--23. IEEE, 2019.
    [18]
    Kaiming He, et al. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.
    [19]
    Olaf Ronneberger, et al. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234--241. Springer, 2015.
    [20]
    Hang Qi, et al. Paleo: A performance model for deep neural networks. In Proceedings of the International Conference on Learning Representations, 2017.
    [21]
    Alex Krizhevsky, et al. Learning multiple layers of features from tiny images. 2009.

    Cited By

    View all
    • (2024)Compression and Encryption of Heterogeneous Signals for Internet of Medical ThingsIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.326499728:5(2524-2535)Online publication date: May-2024
    • (2024)Enhancing Collective Communication in MCM Accelerators for Deep Learning Training2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00069(1-16)Online publication date: 2-Mar-2024
    • (2023)Automatic Pipeline Parallelism: A Parallel Inference Framework for Deep Learning Applications in 6G Mobile Communication SystemsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.328097041:7(2041-2056)Online publication date: 1-Jul-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
    July 2022
    1462 pages
    ISBN:9781450391429
    DOI:10.1145/3489517
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DNN partitioning
    2. deep learning acceleration
    3. scalable DNN acceleration

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    DAC '22
    Sponsor:
    DAC '22: 59th ACM/IEEE Design Automation Conference
    July 10 - 14, 2022
    California, San Francisco

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)365
    • Downloads (Last 6 weeks)44
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Compression and Encryption of Heterogeneous Signals for Internet of Medical ThingsIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.326499728:5(2524-2535)Online publication date: May-2024
    • (2024)Enhancing Collective Communication in MCM Accelerators for Deep Learning Training2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00069(1-16)Online publication date: 2-Mar-2024
    • (2023)Automatic Pipeline Parallelism: A Parallel Inference Framework for Deep Learning Applications in 6G Mobile Communication SystemsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.328097041:7(2041-2056)Online publication date: 1-Jul-2023
    • (2023)Communication-aware Quantization for Deep Learning Inference Parallelization on Chiplet-based Accelerators2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00165(1123-1130)Online publication date: 17-Dec-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media