Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3539845.3539940acmconferencesArticle/Chapter ViewAbstractPublication PagesdateConference Proceedingsconference-collections
research-article

MemPool-3D: boosting performance and efficiency of shared-l1 memory many-core clusters with 3D integration

Published: 31 May 2022 Publication History

Abstract

Three-dimensional integrated circuits promise power, performance, and footprint gains compared to their 2D counterparts, thanks to drastic reductions in the interconnects' length through their smaller form factor. We can leverage the potential of 3D integration by enhancing MemPool, an open-source many-core design with 256 cores and a shared pool of L1 scratchpad memory connected with a low-latency interconnect. MemPool's baseline 2D design is severely limited by routing congestion and wire propagation delay, making the design ideal for 3D integration. In architectural terms, we increase MemPool's scratchpad memory capacity beyond the sweet spot for 2D designs, improving performance in a common digital signal processing kernel. We propose a 3D MemPool design that leverages a smart partitioning of the memory resources across two layers to balance the size and utilization of the stacked dies. In this paper, we explore the architectural and the technology parameter spaces by analyzing the power, performance, area, and energy efficiency of MemPool instances in 2D and 3D with 1MiB, 2MiB, 4MiB, and 8MiB of scratchpad memory in a commercial 28 nm technology node. We observe a performance gain of 9.1% when running a matrix multiplication on MemPool-3D with 4 MiB of scratchpad memory compared to the MemPool 2D counterpart. In terms of energy efficiency, we can implement the MemPool-3D instance with 4 MiB of L1 memory on an energy budget 15 % smaller than its 2D counterpart, and 3.7% smaller than the MemPool-2D instance with a fourth of the L1 scratchpad memory capacity.

References

[1]
S. Panth, K. Samadi, Y. Du, and S. K. Lim, "High-density integration of functional modules using monolithic 3D-IC technology," in 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC), 2013.
[2]
V. F. Pavlidis and E. G. Friedman, "Interconnect-based design methodologies for three-dimensional integrated circuits," Proceedings of the IEEE, vol. 97, no. 1, pp. 123--140, 2009.
[3]
X. Dong, J. Zhao, and Y. Xie, "Fabrication cost analysis and cost-aware design space exploration for 3-D ICs," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 12, 2010.
[4]
W. S. Tsai, C. Y. Huang, C. K. Chung, K. H. Yu, and C. F. Lin, "Generational changes of flip chip interconnection technology," in 2017 12th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT), 2017, pp. 306--310.
[5]
E. Beyne, S.-W. Kim, L. Peng, N. Heylen, J. De Messemaeker, O. O. Okudur, A. Phommahaxay, T.-G. Kim, M. Stucchi, D. Velenis, A. Miller, and G. Beyer, "Scalable, sub 2μm pitch, Cu/SiCN to Cu/SiCN hybrid wafer-to-wafer bonding technology," in 2017 IEEE International Electron Devices Meeting (IEDM), 2017, pp. 32.4.1--32.4.4.
[6]
L. Bamberg, A. García-Ortiz, L. Zhu, S. Pentapati, D. E. Shim, and S. K. Lim, "Macro-3D: A physical design methodology for face-to-face-stacked heterogeneous 3D ICs," in 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020.
[7]
M. B. Healy, K. Athikulwongse, R. Goel, M. M. Hossain, D. H. Kim, Y.-J. Lee, D. L. Lewis, T.-W. Lin, C. Liu, M. Jung, B. Ouellette, M. Pathak, H. Sane, G. Shen, D. H. Woo, X. Zhao, G. H. Loh, H.-H. S. Lee, and S. K. Lim, "Design and analysis of 3D-MAPS: A many-core 3D processor with stacked memory," in IEEE Custom Integrated Circuits Conference 2010, 2010, pp. 1--4.
[8]
S. Pentapati, L. Zhu, L. Bamberg, D. E. Shim, A. García-Ortiz, and S. K. Lim, "A logic-on-memory processor-system design with monolithic 3-D technology," IEEE Micro, vol. 39, no. 6, pp. 38--45, 2019.
[9]
M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, "MemPool: A shared-L1 memory many-core cluster with a low-latency interconnect," in 2021 Design, Automation, & Test in Europe Conference & Exhibition (DATE), Grenoble, France, Mar. 2021, pp. 701--706.
[10]
PULP Platform, "MemPool," https://github.com/pulp-platform/mempool/.
[11]
F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, "Snitch: A tiny pseudo dual-issue processor for area and energy efficient execution of floating-point intensive workloads," IEEE Transactions on Computers, 2020.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DATE '22: Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe
March 2022
1637 pages
ISBN:9783981926361

Sponsors

In-Cooperation

  • EDAA: European Design Automation Association
  • IEEE SSCS Shanghai Chapter
  • ESDA: Electronic System Design Alliance
  • IEEE CEDA
  • IEEE CS
  • IEEE-RAS: Robotics and Automation

Publisher

European Design and Automation Association

Leuven, Belgium

Publication History

Published: 31 May 2022

Check for updates

Author Tags

  1. 3D integration
  2. 3D-ICs
  3. many-core

Qualifiers

  • Research-article

Conference

DATE '22
Sponsor:
DATE '22: Design, Automation and Test in Europe
March 14 - 23, 2022
Antwerp, Belgium

Acceptance Rates

Overall Acceptance Rate 518 of 1,794 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 38
    Total Downloads
  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media