Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks

Published: 17 September 2021 Publication History

Abstract

In-memory computing (IMC) on a monolithic chip for deep learning faces dramatic challenges on area, yield, and on-chip interconnection cost due to the ever-increasing model sizes. 2.5D integration or chiplet-based architectures interconnect multiple small chips (i.e., chiplets) to form a large computing system, presenting a feasible solution beyond a monolithic IMC architecture to accelerate large deep learning models. This paper presents a new benchmarking simulator, SIAM, to evaluate the performance of chiplet-based IMC architectures and explore the potential of such a paradigm shift in IMC architecture design. SIAM integrates device, circuit, architecture, network-on-chip (NoC), network-on-package (NoP), and DRAM access models to realize an end-to-end system. SIAM is scalable in its support of a wide range of deep neural networks (DNNs), customizable to various network structures and configurations, and capable of efficient design space exploration. We demonstrate the flexibility, scalability, and simulation speed of SIAM by benchmarking different state-of-the-art DNNs with CIFAR-10, CIFAR-100, and ImageNet datasets. We further calibrate the simulation results with a published silicon result, SIMBA. The chiplet-based IMC architecture obtained through SIAM shows 130 and 72 improvement in energy-efficiency for ResNet-50 on the ImageNet dataset compared to Nvidia V100 and T4 GPUs.

References

[1]
AnySilicon. 2011. Fabrication Cost. https://anysilicon.com/die-per-wafer-formula-free-calculators/. Accessed 29 Mar. 2021.
[2]
Ali BanaGozar et al. 2019. CIM-SIM: Computation in memory SIMulator. In Proceedings of the 22nd International Workshop on Software and Compilers for Embedded Systems. 1–4.
[3]
Noah Beck, Sean White, Milam Paraschou, and Samuel Naffziger. 2018. ‘Zeppelin’: An SoC for multichip architectures. In 2018 IEEE ISSCC. IEEE, 40–42.
[4]
Nathan Binkert et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1–7.
[5]
Mariusz Bojarski et al. 2017. Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911 (2017).
[6]
Indranil Chakraborty, Mustafa Fayez Ali, Dong Eun Kim, Aayush Ankit, and Kaushik Roy. 2020. Geniex: A generalized approach to emulating non-ideality in memristive xbars using neural networks. In 2020 57th ACM/IEEE DAC. IEEE.
[7]
Marc Erett et al. 2018. A 126mW 56Gb/s NRZ wireline transceiver for synchronous short-reach applications in 16nm FinFET. In 2018 IEEE ISSCC. IEEE, 274–276.
[8]
Saugata Ghose et al. 2018. What your DRAM power models are not telling you: Lessons from a detailed experimental study. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 3 (2018), 1–41.
[9]
David Greenhill et al. 2017. 3.3 A 14nm 1GHz FPGA with 2.5 D transceiver integration. In 2017 IEEE ISSCC. IEEE.
[10]
Kaiming He et al. 2016. Deep residual learning for image recognition. In IEEE CVPR. 770–778.
[11]
Andrew Howard et al. 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF ICCV. 1314–1324.
[12]
Gao Huang et al. 2017. Densely connected convolutional networks. In IEEE CVPR. 4700–4708.
[13]
Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A chiplet-based, hybrid sparse-dense accelerator for personalized recommendations. In 2020 ACM/IEEE 47th Annual ISCA. IEEE, 968–981.
[14]
Mohsen Imani, Saransh Gupta, Yeseong Kim, and Tajana Rosing. 2019. Floatpim: In-memory acceleration of deep neural network training with high precision. In Proceedings of the 46th Annual ISCA. 802–815.
[15]
Shubham Jain, Abhronil Sengupta, Kaushik Roy, and Anand Raghunathan. 2020. RxNN: A framework for evaluating deep neural networks on resistive crossbars. IEEE TCAD (2020).
[16]
James Jeffers et al. 2016. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition.
[17]
Nan Jiang et al. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In IEEE ISPASS. 86–96.
[18]
Ajaykumar Kannan, Natalie Enright Jerger, and Gabriel H Loh. 2015. Enabling interposer-based disintegration of multi-core processors. In 2015 48th Annual IEEE/ACM MICRO. IEEE, 546–558.
[19]
Yoongu Kim, Weikun Yang, and Onur Mutlu. 2015. RAMULATOR: A fast and extensible DRAM simulator. IEEE Computer Architecture Letters 15, 1 (2015), 45–49.
[20]
Gokul Krishnan et al. 2020. Interconnect-aware area and energy optimization for in-memory acceleration of DNNs. IEEE Design & Test 37, 6 (2020), 79–87.
[21]
Gokul Krishnan et al. 2021. Interconnect-centric benchmarking of in-memory acceleration for DNNS. In 2021 China Semiconductor Technology International Conference (CSTIC). IEEE, 1–4.
[22]
Mu-Shan Lin et al. 2020. A 7-nm 4-GHz Arm-core-based CoWoS chiplet design for high-performance computing. IEEE Journal of Solid-State Circuits 55, 4 (2020), 956–966.
[23]
Sumit K Mandal et al. 2020. A latency-optimized reconfigurable noc for in-memory acceleration of DNNs. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10, 3 (2020), 362–375.
[24]
Sumit K Mandal, Anish Krishnakumar, and Umit Y Ogras. 2021. Energy-efficient networks-on-chip architectures: Design and run-time optimization. Network-on-Chip Security and Privacy (2021), 55.
[25]
Radu Marculescu et al. 2008. Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 1 (2008), 3–21.
[26]
MICRON. 2011. Datasheet for DDR3 Model. https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr3/2gb_ddr3l-rs.pdf?rev=f43686e89394458caff410138d9d2152. Accessed 29 Mar. 2021.
[27]
MICRON. 2014. Datasheet for DDR4 Model. https://www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr4/4gb_ddr4_dram_2e0d.pdf. Accessed 29 Mar. 2021.
[28]
Seyed Morteza Nabavinejad et al. 2020. An overview of efficient interconnection networks for deep neural network accelerators. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10, 3 (2020), 268–282.
[29]
Xiaochen Peng, Shanshi Huang, Yandong Luo, Xiaoyu Sun, and Shimeng Yu. 2019. DNN+ neurosim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In 2019 IEEE IEDM. IEEE, 32–5.
[30]
John W Poulton et al. 2013. A 0.54 pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications. In 2013 IEEE ISSCC. IEEE, 404–405.
[31]
Yasir Mahmood Qureshi, William Andrew Simon, Marina Zapater, David Atienza, and Katzalin Olcoz. 2019. Gem5-X: A Gem5-based system level simulation framework to optimize many-core platforms. In 2019 SpringSim. IEEE, 1–12.
[32]
Enrico Russo, Maurizio Palesi, Salvatore Monteleone, Davide Patti, Giuseppe Ascia, and Vincenzo Catania. 2021. LAMBDA: An open framework for deep neural network accelerators simulation. In 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops).
[33]
Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic CNN accelerator simulator. arXiv preprint arXiv:1811.02883 (2018).
[34]
Ali Shafiee et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM/IEEE ISCA (2016).
[35]
Yakun Sophia Shao et al. 2019. SIMBA: Scaling deep-learning inference with multi-chip-module-based architecture. In Proceedings of the 52nd Annual IEEE/ACM MICRO. 14–27.
[36]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[37]
Saurabh Sinha, Greg Yeric, Vikas Chandra, Brian Cline, and Yu Cao. 2012. Exploring sub-20nm FinFET design with predictive technology models. In DAC 2012. IEEE, 283–288.
[38]
Linghao Song et al. 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In IEEE HPCA. 541–552.
[39]
Simon M Tam et al. 2018. SkyLake-SP: A 14nm 28-core xeon® Processor. In 2018 IEEE ISSCC. 34–36.
[40]
Walker J Turner et al. 2018. Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects. In 2018 IEEE CICC. IEEE, 1–8.
[41]
Saining Xie, Alexander Kirillov, Ross Girshick, and Kaiming He. 2019. Exploring randomly wired neural networks for image recognition. In Proceedings of the IEEE/CVF ICCV. 1284–1293.
[42]
Jieming Yin et al. 2018. Modular routing design for chiplet-based systems. In 2018 ACM/IEEE 45th Annual ISCA. IEEE.
[43]
Shihui Yin et al. 2019. Vesti: Energy-efficient in-memory computing accelerator for deep neural networks. IEEE TVLSI 28, 1 (2019), 48–61.
[44]
Zhenhua Zhu et al. 2020. MNSIM 2.0: A behavior-level modeling tool for memristor-based neuromorphic computing systems. In Proceedings of the 2020 on Great Lakes Symposium on VLSI. 83–88.
[45]
Brian Zimmer et al. 2019. A 0.11 PJ/Op, 0.32-128 TOPS, scalable multi-chip-module-based deep neural network accelerator with ground-reference signaling in 16nm. In 2019 Symposium on VLSI Circuits. IEEE, C300–C301.
[46]
Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).

Cited By

View all
  • (2024)Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546730(1-6)Online publication date: 25-Mar-2024
  • (2024)TEFLON: Thermally Efficient Dataflow-aware 3D NoC for Accelerating CNN Inferencing on Manycore PIM ArchitecturesACM Transactions on Embedded Computing Systems10.1145/366527923:5(1-23)Online publication date: 14-Aug-2024
  • (2024)Thermal Modeling and Management Challenges in Heterogenous Integration: 2.5D Chiplet Platforms and Beyond2024 IEEE 42nd VLSI Test Symposium (VTS)10.1109/VTS60656.2024.10538578(1-4)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 20, Issue 5s
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
October 2021
1367 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3481713
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 17 September 2021
Accepted: 01 July 2021
Revised: 01 June 2021
Received: 01 April 2021
Published in TECS Volume 20, Issue 5s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Chiplet architecture
  2. in-memory compute
  3. DNN acceleration
  4. IMC benchmarking
  5. network-on-chip
  6. network-on-package

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Semiconductor Research Corporation (SRC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)498
  • Downloads (Last 6 weeks)51
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546730(1-6)Online publication date: 25-Mar-2024
  • (2024)TEFLON: Thermally Efficient Dataflow-aware 3D NoC for Accelerating CNN Inferencing on Manycore PIM ArchitecturesACM Transactions on Embedded Computing Systems10.1145/366527923:5(1-23)Online publication date: 14-Aug-2024
  • (2024)Thermal Modeling and Management Challenges in Heterogenous Integration: 2.5D Chiplet Platforms and Beyond2024 IEEE 42nd VLSI Test Symposium (VTS)10.1109/VTS60656.2024.10538578(1-4)Online publication date: 22-Apr-2024
  • (2024)INDM: Chiplet-Based Interconnect Network and Dataflow Mapping for DNN AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333283243:4(1107-1120)Online publication date: 1-Apr-2024
  • (2024)SHIFFT: A Scalable Hybrid In-Memory Computing FFT Accelerator2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00034(130-135)Online publication date: 1-Jul-2024
  • (2024)Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous ChipletsProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473961(765-770)Online publication date: 22-Jan-2024
  • (2024)Neural architecture search for in-memory computing-based deep learning acceleratorsNature Reviews Electrical Engineering10.1038/s44287-024-00052-71:6(374-390)Online publication date: 20-May-2024
  • (2024)Review of chiplet-based design: system architecture and interconnectionScience China Information Sciences10.1007/s11432-023-3926-867:10Online publication date: 19-Jul-2024
  • (2023)End-to-End Benchmarking of Chiplet-Based In-Memory ComputingNeuromorphic Computing10.5772/intechopen.111926Online publication date: 15-Nov-2023
  • (2023)Investigation of the Temperature Dependence of Volt-Ampere Characteristics of a Thin-Film Si3N4 MemristorCrystals10.3390/cryst1302032313:2(323)Online publication date: 15-Feb-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media