Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3489517.3530509acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Public Access

H2H: heterogeneous model to heterogeneous system mapping with computation and communication awareness

Published: 23 August 2022 Publication History

Abstract

The complex nature of real-world problems calls for heterogeneity in both machine learning (ML) models and hardware systems. The heterogeneity in ML models comes from multi-sensor perceiving and multi-task learning, i.e., multi-modality multi-task (MMMT), resulting in diverse deep neural network (DNN) layers and computation patterns. The heterogeneity in systems comes from diverse processing components, as it becomes the prevailing method to integrate multiple dedicated accelerators into one system. Therefore, a new problem emerges: heterogeneous model to heterogeneous system mapping (H2H). While previous mapping algorithms mostly focus on efficient computations, in this work, we argue that it is indispensable to consider computation and communication simultaneously for better system efficiency. We propose a novel H2H mapping algorithm with both computation and communication awareness; by slightly trading computation for communication, the system overall latency and energy consumption can be largely reduced. The superior performance of our work is evaluated based on MAESTRO modeling, demonstrating 15%-74% latency reduction and 23%-64% energy reduction compared with existing computation-prioritized mapping algorithms. Code is publicly available at https://github.com/xyzxinyizhang/H2H.

References

[1]
Cong Hao et al. Software/hardware co-design for multi-modal multi-task learning in autonomous systems. In Proceeding of AICAS, pages 1--5. IEEE, 2021.
[2]
Jeremy Fowers et al. A configurable cloud-scale dnn processor for real-time ai. In Proceeding of ISCA, pages 1--14. IEEE, 2018.
[3]
Murium Iqbal et al. A multimodal recommender system for large-scale assortment generation in e-commerce. arXiv preprint arXiv:1806.11226, 2018.
[4]
Alexander Mehler et al. Vannotator: A framework for generating multimodal hypertexts. In Proceedings of the Hypertext and Soc. Media, pages 150--154. 2018.
[5]
Abhinav Valada, Noha Radwan, and Wolfram Burgard. Deep auxiliary learning for visual localization and odometry. In 2018 ICRA, pages 6939--6946. IEEE, 2018.
[6]
Brian Gaide et al. Xilinx adaptive compute acceleration platform: Versaltm architecture. In Proceedings of the 2019 FPGA, pages 84--93, 2019.
[7]
Michael Ditty et al. Nvidia's xavier soc. In Hot chips: a symposium on high performance chips, 2018.
[8]
Emil Talpes et al. Compute solution for tesla's full self-driving computer. IEEE Micro, 40(2):25--35, 2020.
[9]
Aws network. https://aws.amazon.com/blogs/aws/new-gigabit-connectivity-options-for-amazon-direct-connect/.
[10]
Hyoukjun Kwon et al. Heterogeneous dataflow accelerators for multi-dnn workloads. In Proceedings of HPCA. IEEE, 2021.
[11]
Yao Chen et al. Cloud-dnn: An open framework for mapping dnn models to cloud fpgas. In Proceedings of the 2019 FPGA, pages 73--82, 2019.
[12]
Yu-Hsin Chen et al. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Computer Architecture News.
[13]
Nvidia. Website. http://nvdla.org/.
[14]
Zidong Du et al. Shidiannao: Shifting vision processing closer to the sensor. In Proceedings of ISCA, pages 92--104, 2015.
[15]
Hyoukjun Kwon et al. MAESTRO: A data-centric approach to understand reuse, performance, and hardware cost of DNN mappings. IEEE Micro, 40(3), 2020.
[16]
Kaiyuan Guo et al. A survey of fpga-based neural network inference accelerators. ACM Transactions on TRETS, 12(1):1--26, 2019.
[17]
Kenjiro Taura et al. A heuristic algorithm for mapping communicating tasks on heterogeneous resources. In Proceedings of HCW, pages 102--115. IEEE, 2000.
[18]
Chris Riley. Basic tutorial for maximizing memory bandwidth with vitis and xilinx ultrascale+ hbm devices, 2019.
[19]
Chen Zhang et al. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the FPGA, pages 161--170, 2015.
[20]
Da-Ren Chen et al. A power-aware 2-covered path routing for wireless body area networks with variable transmission ranges. Journal of JPDC.
[21]
Shifeng Zhang et al. A dataset and benchmark for large-scale multi-modal face anti-spoofing. In Proceedings of the CVF, pages 919--928, 2019.
[22]
Selvarajah Thuseethan et al. Multimodal deep learning framework for sentiment analysis from text-image web data. In 2020 WI-IAT. IEEE, 2020.
[23]
Tao Shen et al. Facebagnet: Bag-of-local-features model for multi-modal face anti-spoofing. In Proceedings of the CVF, 2019.
[24]
Xinyu Li et al. Concurrent activity recognition with multimodal cnn-lstm structure. arXiv preprint arXiv:1702.01638, 2017.
[25]
Samarth Tripathi et al. Multi-modal emotion recognition on iemocap with neural networks. arXiv preprint arXiv:1804.05788, 2018.
[26]
Jialiang Zhang et al. Improving the performance of opencl-based fpga accelerator for convolutional neural network. In Proceedings of the 2017 FPGA, 2017.
[27]
Weiwen Jiang et al. Achieving super-linear speedup across multi-fpga for realtime dnn inference. ACM Transactions on TECS, 18(5s):1--23, 2019.
[28]
Jiantao Qiu et al. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 FPGA, pages 26--35, 2016.
[29]
Andre Xian Ming Chang et al. Compiling deep learning models for custom hardware accelerators. arXiv preprint arXiv:1708.00117, 2017.
[30]
Yijin Guan et al. Fp-dnn: An automated framework for mapping deep neural networks onto fpgas with rtl-hls hybrid templates. In 2017 IEEE FCCM.
[31]
Yufei Ma et al. Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. In Proceedings of the 2017 FPGA, 2017.
[32]
Abhinav Podili et al. Fast and efficient implementation of convolutional neural networks on fpga. In 2017 IEEE ASAP, pages 11--18. IEEE, 2017.
[33]
Xuechao Wei et al. Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. In Proceedings of the 54th DAC, 2017.
[34]
Song Han et al. Ese: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 FPGA, pages 75--84, 2017.
[35]
Xinyi Zhang et al. Achieving full parallelism in lstm via a unified accelerator design. In 2020 IEEE ICCD, pages 469--477. IEEE, 2020.
[36]
Bingbing Li et al. Ftrans: energy-efficient acceleration of transformers using fpga. In Proceedings of the ISLPED, pages 175--180, 2020.

Cited By

View all
  • (2024)Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-ChipsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638502(243-256)Online publication date: 2-Mar-2024
  • (2024)Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous ChipletsProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473961(765-770)Online publication date: 22-Jan-2024
  • (2023)M5: Multi-modal Multi-task Model Mapping on Multi-FPGA with Accelerator Configuration Search2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136962(1-6)Online publication date: Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
ISBN:9781450391429
DOI:10.1145/3489517
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

DAC '22
Sponsor:
DAC '22: 59th ACM/IEEE Design Automation Conference
July 10 - 14, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)349
  • Downloads (Last 6 weeks)29
Reflects downloads up to 26 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-ChipsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638502(243-256)Online publication date: 2-Mar-2024
  • (2024)Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous ChipletsProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473961(765-770)Online publication date: 22-Jan-2024
  • (2023)M5: Multi-modal Multi-task Model Mapping on Multi-FPGA with Accelerator Configuration Search2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136962(1-6)Online publication date: Apr-2023
  • (2023)REFRESH FPGAs: Sustainable FPGA Chiplet ArchitecturesProceedings of the 14th International Green and Sustainable Computing Conference10.1145/3634769.3634798(1-3)Online publication date: 28-Oct-2023
  • (2023)TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based AnalysisProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623792(1271-1288)Online publication date: 28-Oct-2023
  • (2023)Monad: Towards Cost-Effective Specialization for Chiplet-Based Spatial Accelerators2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323880(1-9)Online publication date: 28-Oct-2023
  • (2023)MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247992(1-6)Online publication date: 9-Jul-2023
  • (2023)Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247951(1-6)Online publication date: 9-Jul-2023
  • (2023)A Pipelining-Based Heterogeneous Scheduling and Energy-Throughput Optimization Scheme for CNNs Leveraging Apache TVMIEEE Access10.1109/ACCESS.2023.326482811(35007-35021)Online publication date: 2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media