Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2897937.2897978acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Public Access

Achieving lightweight multicast in asynchronous networks-on-chip using local speculation

Published: 05 June 2016 Publication History

Abstract

We propose a lightweight parallel multicast targeting an asynchronous NoC with a variant Mesh-of-Trees topology. A novel strategy, local speculation, is introduced, where a subset of switches are speculative and always broadcast. These switches are surrounded by non-speculative switches, which throttle any redundant packets, restricting these packets to small regions. Speculative switches have simplified designs, thereby improving network performance. A hybrid network architecture is proposed to mix the speculative and non-speculative switches. For multicast benchmarks, significant performance improvements with small power savings are obtained by the new approach over a tree-based non-speculative approach. Interestingly, similar improvements are also shown for unicast. Finally, another benefit is to reduce the address field size in multicast packets.

References

[1]
R. Marculescu, Y. Ogras, L.-S. Peh, N. D. E. Jerger, and Y. V. Hoskote, "Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives," TCAD, vol. 28, pp. 3--21, 2009.
[2]
W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004.
[3]
D. Bertozzi, G. Dimitrakopoulos, J. Flich, and S. Sonntag, "The fast evolving landscape of on-chip communication - selected future challenges and research avenues," Design Autom. for Emb. Sys., vol. 19, pp. 59--76, 2015.
[4]
N. D. E. Jerger, L. Peh, and M. H. Lipasti, "Virtual circuit tree multicasting: A case for on-chip hardware multicast support," in ISCA, 2008, pp. 229--240.
[5]
T. Krishna, L. Peh, B. M. Beckmann, and S. K. Reinhardt, "Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication," in MICRO, 2011, pp. 71--82.
[6]
S. Deb, A. Ganguly, K. Chang, P. P. Pande, B. Belzer, and D. H. Heo, "Enhancing performance of network-on-chip architectures with millimeter-wave wireless interconnects," in Conference on Application-Specific Systems Architectures and Processors, 2010, pp. 73--80.
[7]
C. Li, M. Browning, P. Gratz, and S. Palermo, "Energy-efficient optical broadcast for nanophotonic networks-on-chip," in Optical Interconnects Conference, 2012, pp. 64--65.
[8]
P. Merolla, J. V. Arthur, R. Alvarez-Icaza, J. Bussat, and K. Boahen, "A multicast tree router for multichip neuromorphic systems," IEEE Trans. on Circuits and Systems, vol. 61-I, pp. 820--833, 2014.
[9]
X. Wang, T. Ahonen, and J. Nurmi, "Applying CDMA technique to network-on-chip," IEEE Trans. VLSI Syst., vol. 15, pp. 1091--1100, 2007.
[10]
L. Benini, E. Flamand, D. Fuin, and D. Melpignano, "P2012: building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator," in DATE, 2012, pp. 983--987.
[11]
A. Ghiribaldi, D. Bertozzi, and S. M. Nowick, "A transition-signaling bundled data NoC switch architecture for cost-effective GALS multicore systems," in DATE, 2013, pp. 332--337.
[12]
Y. Thonnart, P. Vivet, and F. Clermidy, "A fully-asynchronous low-power framework for GALS NoC integration," in DATE, 2010, pp. 33--38.
[13]
D. Gebhardt, J. You, and K. S. Stevens, "Comparing energy and latency of asynchronous and synchronous NoCs for embedded SoCs," in NOCS, 2010, pp. 115--122.
[14]
P. Merolla et al., "A million spiking-neuron integrated circuit with a scalable communication network and interface," Science, vol. 345, no. 6197, pp. 668--673, 2014.
[15]
M. Daneshtalab, M. Ebrahimi, S. Mohammadi, and A. Afzali-Kusha, "Low-distance path-based multicast routing algorithm for network-on-chips," IET Computers & Digital Techniques, vol. 3, no. 5, pp. 430--442, 2009.
[16]
M. Ebrahimi, M. Daneshtalab, P. Liljeberg, J. Plosila, J. Flich, and H. Tenhunen, "Path-based partitioning methods for 3D networks-on-chip with minimal adaptive routing," IEEE Trans. Computers, vol. 63, pp. 718--733, 2014.
[17]
T. Krishna and L. Peh, "Single-cycle collective communication over a shared network fabric," in NOCS, 2014, pp. 1--8.
[18]
S. Kundu and S. Chattopadhyay, "Mesh-of-tree deterministic routing for network-on-chip architecture," in GLSVLSI, 2008, pp. 343--346.
[19]
A. O. Balkan, M. N. Horak, G. Qu, and U. Vishkin, "Layout-accurate design and implementation of a high-throughput interconnection network for single-chip parallel processing," in HOTI, 2007, pp. 21--28.
[20]
A. Rahimi, I. Loi, M. R. Kakoee, and L. Benini, "A fully-synthesizable single-cycle interconnection network for shared-L1 processor clusters," in DATE, 2011, pp. 491--496.
[21]
M. N. Horak, S. M. Nowick, M. Carlberg, and U. Vishkin, "A low-overhead asynchronous interconnection network for GALS chip multiprocessors," TCAD, vol. 30, pp. 494--507, 2011.

Cited By

View all
  • (2019)A Continuous-Time Replication Strategy for Efficient Multicast in Asynchronous NoCsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.287685627:2(350-363)Online publication date: Feb-2019
  • (2017)Achieving Lightweight Multicast in Asynchronous NoCs Using a Continuous-Time Multi-Way Read BufferProceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip10.1145/3130218.3130221(1-8)Online publication date: 19-Oct-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DAC '16: Proceedings of the 53rd Annual Design Automation Conference
June 2016
1048 pages
ISBN:9781450342360
DOI:10.1145/2897937
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation

Conference

DAC '16

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)10
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2019)A Continuous-Time Replication Strategy for Efficient Multicast in Asynchronous NoCsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.287685627:2(350-363)Online publication date: Feb-2019
  • (2017)Achieving Lightweight Multicast in Asynchronous NoCs Using a Continuous-Time Multi-Way Read BufferProceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip10.1145/3130218.3130221(1-8)Online publication date: 19-Oct-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media