research-article

Public Access

H2H: heterogeneous model to heterogeneous system mapping with computation and communication awareness

Authors:

Jingtong HuAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 601 - 606

https://doi.org/10.1145/3489517.3530509

Published: 23 August 2022 Publication History

Abstract

The complex nature of real-world problems calls for heterogeneity in both machine learning (ML) models and hardware systems. The heterogeneity in ML models comes from multi-sensor perceiving and multi-task learning, i.e., multi-modality multi-task (MMMT), resulting in diverse deep neural network (DNN) layers and computation patterns. The heterogeneity in systems comes from diverse processing components, as it becomes the prevailing method to integrate multiple dedicated accelerators into one system. Therefore, a new problem emerges: heterogeneous model to heterogeneous system mapping (H2H). While previous mapping algorithms mostly focus on efficient computations, in this work, we argue that it is indispensable to consider computation and communication simultaneously for better system efficiency. We propose a novel H2H mapping algorithm with both computation and communication awareness; by slightly trading computation for communication, the system overall latency and energy consumption can be largely reduced. The superior performance of our work is evaluated based on MAESTRO modeling, demonstrating 15%-74% latency reduction and 23%-64% energy reduction compared with existing computation-prioritized mapping algorithms. Code is publicly available at https://github.com/xyzxinyizhang/H2H.

References

[1]

Cong Hao et al. Software/hardware co-design for multi-modal multi-task learning in autonomous systems. In Proceeding of AICAS, pages 1--5. IEEE, 2021.

[2]

Jeremy Fowers et al. A configurable cloud-scale dnn processor for real-time ai. In Proceeding of ISCA, pages 1--14. IEEE, 2018.

[3]

Murium Iqbal et al. A multimodal recommender system for large-scale assortment generation in e-commerce. arXiv preprint arXiv:1806.11226, 2018.

[4]

Alexander Mehler et al. Vannotator: A framework for generating multimodal hypertexts. In Proceedings of the Hypertext and Soc. Media, pages 150--154. 2018.

[5]

Abhinav Valada, Noha Radwan, and Wolfram Burgard. Deep auxiliary learning for visual localization and odometry. In 2018 ICRA, pages 6939--6946. IEEE, 2018.

Digital Library

[6]

Brian Gaide et al. Xilinx adaptive compute acceleration platform: Versaltm architecture. In Proceedings of the 2019 FPGA, pages 84--93, 2019.

[7]

Michael Ditty et al. Nvidia's xavier soc. In Hot chips: a symposium on high performance chips, 2018.

[8]

Emil Talpes et al. Compute solution for tesla's full self-driving computer. IEEE Micro, 40(2):25--35, 2020.

[9]

Aws network. https://aws.amazon.com/blogs/aws/new-gigabit-connectivity-options-for-amazon-direct-connect/.

[10]

Hyoukjun Kwon et al. Heterogeneous dataflow accelerators for multi-dnn workloads. In Proceedings of HPCA. IEEE, 2021.

[11]

Yao Chen et al. Cloud-dnn: An open framework for mapping dnn models to cloud fpgas. In Proceedings of the 2019 FPGA, pages 73--82, 2019.

[12]

Yu-Hsin Chen et al. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Computer Architecture News.

[13]

Nvidia. Website. http://nvdla.org/.

[14]

Zidong Du et al. Shidiannao: Shifting vision processing closer to the sensor. In Proceedings of ISCA, pages 92--104, 2015.

[15]

Hyoukjun Kwon et al. MAESTRO: A data-centric approach to understand reuse, performance, and hardware cost of DNN mappings. IEEE Micro, 40(3), 2020.

[16]

Kaiyuan Guo et al. A survey of fpga-based neural network inference accelerators. ACM Transactions on TRETS, 12(1):1--26, 2019.

[17]

Kenjiro Taura et al. A heuristic algorithm for mapping communicating tasks on heterogeneous resources. In Proceedings of HCW, pages 102--115. IEEE, 2000.

[18]

Chris Riley. Basic tutorial for maximizing memory bandwidth with vitis and xilinx ultrascale+ hbm devices, 2019.

[19]

Chen Zhang et al. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the FPGA, pages 161--170, 2015.

[20]

Da-Ren Chen et al. A power-aware 2-covered path routing for wireless body area networks with variable transmission ranges. Journal of JPDC.

[21]

Shifeng Zhang et al. A dataset and benchmark for large-scale multi-modal face anti-spoofing. In Proceedings of the CVF, pages 919--928, 2019.

[22]

Selvarajah Thuseethan et al. Multimodal deep learning framework for sentiment analysis from text-image web data. In 2020 WI-IAT. IEEE, 2020.

[23]

Tao Shen et al. Facebagnet: Bag-of-local-features model for multi-modal face anti-spoofing. In Proceedings of the CVF, 2019.

[24]

Xinyu Li et al. Concurrent activity recognition with multimodal cnn-lstm structure. arXiv preprint arXiv:1702.01638, 2017.

[25]

Samarth Tripathi et al. Multi-modal emotion recognition on iemocap with neural networks. arXiv preprint arXiv:1804.05788, 2018.

[26]

Jialiang Zhang et al. Improving the performance of opencl-based fpga accelerator for convolutional neural network. In Proceedings of the 2017 FPGA, 2017.

[27]

Weiwen Jiang et al. Achieving super-linear speedup across multi-fpga for realtime dnn inference. ACM Transactions on TECS, 18(5s):1--23, 2019.

[28]

Jiantao Qiu et al. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 FPGA, pages 26--35, 2016.

[29]

Andre Xian Ming Chang et al. Compiling deep learning models for custom hardware accelerators. arXiv preprint arXiv:1708.00117, 2017.

[30]

Yijin Guan et al. Fp-dnn: An automated framework for mapping deep neural networks onto fpgas with rtl-hls hybrid templates. In 2017 IEEE FCCM.

[31]

Yufei Ma et al. Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. In Proceedings of the 2017 FPGA, 2017.

[32]

Abhinav Podili et al. Fast and efficient implementation of convolutional neural networks on fpga. In 2017 IEEE ASAP, pages 11--18. IEEE, 2017.

[33]

Xuechao Wei et al. Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. In Proceedings of the 54th DAC, 2017.

[34]

Song Han et al. Ese: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 FPGA, pages 75--84, 2017.

[35]

Xinyi Zhang et al. Achieving full parallelism in lstm via a unified accelerator design. In 2020 IEEE ICCD, pages 469--477. IEEE, 2020.

[36]

Bingbing Li et al. Ftrans: energy-efficient acceleration of transformers using fpga. In Proceedings of the ISLPED, pages 175--180, 2020.

Cited By

Dagli IBelviranli MLee IChabbi MSteuwer M(2024)Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-ChipsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638502(243-256)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638502
Yang ZJi SChen XZhuang JZhang WJani DZhou PKim T(2024)Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous ChipletsProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473961(765-770)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1109/ASP-DAC58780.2024.10473961
Kamath AAbi-Karam SBhat AHao C(2023)M5: Multi-modal Multi-task Model Mapping on Multi-FPGA with Accelerator Configuration Search2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136962(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10136962
Show More Cited By

Recommendations

Battery Lifetime-Aware Base Station Sleeping Control with M2M/H2H Coexistence
2016 IEEE Global Communications Conference (GLOBECOM)
Fundamental tradeoffs in green cellular networks with coexistence of machine-oriented and human-oriented traffic are investigated. First, we present a queuing system to model the uplink transmission of a green base station which serves two types of ...
Modeling and performance analysis of unlicensed bands MAC strategy in multi-channel LTE-A networks with M2M/H2H coexistence

With the growing use of the machine-to-machine (M2M) communication and the unlicensed band by advanced long term evolution (LTE-A) networks, known as LTE unlicensed (LTE-U), demand for resource access strategy is rapidly increasing and has recently been ...
Downlink coverage and average cell load of M2M and H2H in ultra-dense networks
2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC)
In this paper, we study the impact of the coexistence of Machine-to-Machine (M2M) communication and Human-to-Human (H2H) communication on the network performance in Ultra-Dense Networks (UDNs). The performance evaluation of the network considers the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Laboratory of Physical Sciences
NSF (National Science Foundation)

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
494
Total Downloads

Downloads (Last 12 months)349
Downloads (Last 6 weeks)29

Reflects downloads up to 26 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dagli IBelviranli MLee IChabbi MSteuwer M(2024)Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-ChipsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638502(243-256)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638502
Yang ZJi SChen XZhuang JZhang WJani DZhou PKim T(2024)Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous ChipletsProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473961(765-770)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1109/ASP-DAC58780.2024.10473961
Kamath AAbi-Karam SBhat AHao C(2023)M5: Multi-modal Multi-task Model Mapping on Multi-FPGA with Accelerator Configuration Search2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136962(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10136962
Zhou PZhuang JCahoon STang YYang ZChen XShi YHu JJones A(2023)REFRESH FPGAs: Sustainable FPGA Chiplet ArchitecturesProceedings of the 14th International Green and Sustainable Computing Conference10.1145/3634769.3634798(1-3)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3634769.3634798
Zheng SChen SGao SJia LSun GWang RLiang Y(2023)TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based AnalysisProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623792(1271-1288)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623792
Hao XDing ZYin JWang YLiang Y(2023)Monad: Towards Cost-Effective Specialization for Chiplet-Based Spatial Accelerators2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323880(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323880
Shen GZhao JWang ZLin ZDing WWu CChen QGuo M(2023)MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247992(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247992
Zheng SChen SLiang Y(2023)Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247951(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247951
Velasco-Montero DGoossens BFernández-Berni JRodríguez-Vázquez ÁPhilips W(2023)A Pipelining-Based Heterogeneous Scheduling and Energy-Throughput Optimization Scheme for CNNs Leveraging Apache TVMIEEE Access10.1109/ACCESS.2023.326482811(35007-35021)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3264828

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents