research-article

Open access

QuiltNet: efficient deep learning inference on multi-chip accelerators using model partitioning

Authors:

Yeseong KimAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 1159 - 1164

https://doi.org/10.1145/3489517.3530589

Published: 23 August 2022 Publication History

Abstract

We have seen many successful deployments of deep learning accelerator designs on different platforms and technologies, e.g., FPGA, ASIC, and Processing In-Memory platforms. However, the size of the deep learning models keeps increasing, making computations a burden on the accelerators. A naive approach to resolve this issue is to design larger accelerators; however, it is not scalable due to high resource requirements, e.g., power consumption and off-chip memory sizes. A promising solution is to utilize multiple accelerators and use them as needed, similar to conventional multiprocessing. For example, for smaller networks, we may use a single accelerator, while we may use multiple accelerators with proper network partitioning for larger networks. However, partitioning DNN models into multiple parts leads to large communication overheads due to inter-layer communications. In this paper, we propose a scalable solution to accelerate DNN models on multiple devices by devising a new model partitioning technique. Our technique transforms a DNN model into layer-wise partitioned models using an autoencoder. Since the autoencoder encodes a tensor output into a smaller dimension, we can split the neural network model into multiple pieces while significantly reducing the communication overhead to pipeline them. Our evaluation results conducted on state-of-the-art deep learning models show that the proposed technique significantly improves performance and energy efficiency. Our solution increases performance and energy efficiency by up to 30.5% and 28.4% with minimal accuracy loss as compared to running the same model on pipelined multi-block accelerators without the autoencoder.

References

[1]

Szymon Migacz. Nvidia 8-bit inference width tensorrt. In GPU Technology Conference, volume 10, 2017.

[2]

Yunji Chen, et al. Diannao family: energy-efficient hardware accelerators for machine learning. Communications of the ACM, 59(11):105--112, 2016.

Digital Library

[3]

Norman P Jouppi, et al. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture, pages 1--12, 2017.

Digital Library

[4]

Ahmad Shawahna, et al. Fpga-based accelerators of deep learning networks for learning and classification: A review. IEEE Access, 7:7823--7859, 2018.

[5]

Denis Foley et al. Ultra-performance pascal gpu and nvlink interconnect. IEEE Micro, 37(2):7--17, 2017.

Digital Library

[6]

Anastasia Koloskova, et al. Decentralized deep learning with arbitrary communication compression. arXiv preprint arXiv:1907.09356, 2019.

[7]

Ganggang Dong, et al. A review of the autoencoder and its variants: A comparative perspective from target recognition in synthetic-aperture radar images. IEEE Geoscience and Remote Sensing Magazine, 6(3):44--68, 2018.

[8]

Alex Krizhevsky. One weird trick for parallelizing convolutional neural networks. CoRR, abs/1404.5997, 2014.

[9]

Linghao Song, et al. Hypar: Towards hybrid parallelism for deep learning accelerator array. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 56--68, 2019.

[10]

Minjie Wang, et al. Supporting very large models using automatic dataflow graph partitioning. In Proceedings of the Fourteenth EuroSys Conference 2019, EuroSys '19, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

[11]

Linghao Song, et al. Accpar: Tensor partitioning for heterogeneous deep learning accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 342--355, 2020.

[12]

Yakun Sophia Shao, et al. Simba: Scaling deep-learning inference with multi-chip-module-based architecture. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '52, page 14--27, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

[13]

Pierre Baldi. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML workshop on unsupervised and transfer learning, pages 37--49. JMLR Workshop and Conference Proceedings, 2012.

[14]

Chuanqi Tan, et al. A survey on deep transfer learning. In International conference on artificial neural networks, pages 270--279. Springer, 2018.

[15]

Luis Perez et al. The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621, 2017.

[16]

Uci machine learning repository. https://archive.ics.uci.edu/.

[17]

Anthony Thomas, et al. Hierarchical and distributed machine learning inference beyond the edge. In 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC), pages 18--23. IEEE, 2019.

[18]

Kaiming He, et al. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.

[19]

Olaf Ronneberger, et al. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234--241. Springer, 2015.

[20]

Hang Qi, et al. Paleo: A performance model for deep neural networks. In Proceedings of the International Conference on Learning Representations, 2017.

[21]

Alex Krizhevsky, et al. Learning multiple layers of features from tiny images. 2009.

Cited By

He PMeng SCui YWu DWang R(2024)Compression and Encryption of Heterogeneous Signals for Internet of Medical ThingsIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.326499728:5(2524-2535)Online publication date: May-2024
https://doi.org/10.1109/JBHI.2023.3264997
Laskar SMajhi PKim SMahmud FMuzahid AKim E(2024)Enhancing Collective Communication in MCM Accelerators for Deep Learning Training2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00069(1-16)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00069
Shi HZheng WLiu ZMa RGuan H(2023)Automatic Pipeline Parallelism: A Parallel Inference Framework for Deep Learning Applications in 6G Mobile Communication SystemsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.328097041:7(2041-2056)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/JSAC.2023.3280970
Show More Cited By

Recommendations

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing
Wireless Algorithms, Systems, and Applications
Abstract
Recently, deep neural networks (DNNs) have been applied to most intelligent applications and deployed on different kinds of devices. However, DNN inference is resource-intensive. Especially, in edge computing, DNN inference demands to face the ...
Joint DNN partitioning and task offloading in mobile edge computing via deep reinforcement learning
Abstract
As Artificial Intelligence (AI) becomes increasingly prevalent, Deep Neural Networks (DNNs) have become a crucial tool for developing and advancing AI applications. Considering limited computing and energy resources on mobile devices (MDs), it is ...
MJOA-MU: End-to-edge collaborative computation for DNN inference based on model uploading
Abstract
As an emerging computing paradigm, edge computing can assist user equipments (UEs) in executing computation-intensive deep neural network (DNN) inference tasks, thereby satisfying the stringent QoS requirement and relieving the burden ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
688
Total Downloads

Downloads (Last 12 months)365
Downloads (Last 6 weeks)44

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

He PMeng SCui YWu DWang R(2024)Compression and Encryption of Heterogeneous Signals for Internet of Medical ThingsIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.326499728:5(2524-2535)Online publication date: May-2024
https://doi.org/10.1109/JBHI.2023.3264997
Laskar SMajhi PKim SMahmud FMuzahid AKim E(2024)Enhancing Collective Communication in MCM Accelerators for Deep Learning Training2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00069(1-16)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00069
Shi HZheng WLiu ZMa RGuan H(2023)Automatic Pipeline Parallelism: A Parallel Inference Framework for Deep Learning Applications in 6G Mobile Communication SystemsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.328097041:7(2041-2056)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/JSAC.2023.3280970
Zou KQu SLi WWang YLi HLiu Y(2023)Communication-aware Quantization for Deep Learning Inference Parallelization on Chiplet-based Accelerators2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00165(1123-1130)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00165

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents