research-article

Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

Authors:

Tushar Krishna,

Bradford M. Beckmann,

Steven K. ReinhardtAuthors Info & Claims

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 71 - 82

https://doi.org/10.1145/2155620.2155630

Published: 03 December 2011 Publication History

Abstract

The prevalence of multicore architectures has accentuated the need for scalable cache coherence solutions. Many of the proposed designs use a mix of 1-to-1, 1-to-many (1-to-M), and many-to-1 (M-to-1) communication to maintain data coherence and consistency. The on-chip network is the communication backbone that needs to handle all these flows efficiently to allow these protocols to scale. However, most research in on-chip networks has focused on optimizing only 1-to-1 traffic. There has been some recent work addressing 1-to-M traffic by proposing the forking of multicast packets within the network at routers, but these techniques incur high packet delays and power penalties. There has been little research in addressing M-to-1 traffic.

We propose two in-network techniques, Flow Across Network Over Uncongested Trees (FANOUT) and Flow AggregatioN In-Network (FANIN), which perform efficient 1-to-M forking and M-to-1 aggregation, respectively, such that packets incur only single-cycle delays at most routers along their path, thus approaching an ideal network (one that incurs only wire delay/energy). Full-system simulations on a 64-core CMP with SPLASH-2 and PARSEC benchmarks show that FANOUT and FANIN together reduce runtime by 14.9% and network energy by 40.2%, on average, compared to state-of-the-art networks, operating at just 1% and 9.6% above the runtime and energy of an ideal network.

References

[1]

Intel Nehalem. http://www.realworldtech.com/page.cfm?ArticleID=RWT040208182719.

[2]

Simics Full-system Simulator. http://www.windriver.com/products/simics.

[3]

SPLASH-2. http://www-flash.stanford.edu/apps/SPLASH/.

[4]

J. L. Abellán et al. Efficient and scalable barrier synchronization for many-core CMPs. In Proc. 7th ACM International Conference on Computing Frontiers, 2010.

Digital Library

[5]

N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha. GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS, Apr. 2009.

[6]

N. Agarwal, L.-S. Peh, and N. K. Jha. In-network snoop ordering (INSO): Snoopy coherence on unordered interconnects. In HPCA, Feb. 2009.

[7]

A. R. Alameldeen and D. A. Wood. IPC considered harmful for multiprocessor workloads. IEEE Micro, 26(4):8--17, 2006.

Digital Library

[8]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, Oct. 2008.

Digital Library

[9]

E. E. Bilir et al. Multicast snooping: A new coherence method using a multicast address network. In ISCA, 1999.

Digital Library

[10]

J. G. Castanos et al. Evaluation of a multithreaded architecture for cellular computing. In ISCA, 2002.

Digital Library

[11]

P. Conway and B. Hughes. The AMD Opteron Northbridge Architecture. IEEE Micro, 27:10--21, Mar. 2007.

Digital Library

[12]

P. Conway et al. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro, 30:16--29, 2010.

Digital Library

[13]

W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Pub., 2003.

Digital Library

[14]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI, Dec. 2008.

Digital Library

[15]

N. Enright Jerger, L.-S. Peh, and M. Lipasti. Virtual circuit tree multicasting: A case for on-chip hardware multicast support. In ISCA, Jun. 2008.

Digital Library

[16]

P. A. Fidalgo, V. Puente, and J.-Á. Gregorio. MRR: Enabling fully adaptive multicast routing for CMP interconnection networks. In HPCA, 2009.

[17]

M. Galles. Scalable pipelined interconnect for distributed endpoint routing: The SGI SPIDER chip. In Hot Interconnects 4, Aug. 1996.

[18]

A. Gara et al. Overview of the Blue Gene/L system architecture. IBM J. Res. Dev., 49:195--212, Mar. 2005.

Digital Library

[19]

A. Gottlieb et al. The NYU Ultracomputer -- designing an MIMD shared memory parallel computer. IEEE Trans. on Computers, 32:175--189, 1983.

Digital Library

[20]

Y. Hoskote et al. A 5-GHz mesh interconnect for a teraflops processor. IEEE Micro, 27(5):51--61, Sept. 2007.

Digital Library

[21]

A. B. Kahng et al. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. DATE, Feb. 2009.

Digital Library

[22]

V. Krishnan and J. Torrellas. The need for fast communication in hardware-based speculative chip multiprocessors. Int. J. Parallel Program., 29:3--33, Feb. 2001.

[23]

A. Kumar, P. Kundu, A. P. Singh, L.-S. Peh, and N. K. Jha. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In ICCD, Oct. 2007.

[24]

A. Kumar, L.-S. Peh, and N. K. Jha. Token flow control. In MICRO, Nov. 2008.

Digital Library

[25]

A. Kumar et al. Express virtual channels: Towards the ideal interconnection fabric. In ISCA, Jun. 2007.

Digital Library

[26]

G. Kurian et al. ATAC: a 1000-core cache-coherent processor with on-chip optical network. In PACT, 2010

Digital Library

[27]

J. Laudon and D. Lenoski. The SGI origin: a ccNUMA highly scalable server. In ISCA, Jun. 1997.

Digital Library

[28]

D. Lenoski et al. The directory-based cache coherence protocol for the DASH multiprocessor. In ISCA, Jun. 1990.

Digital Library

[29]

M. M. K. Martin, M. D. Hill, and D. A. Wood. Token coherence: Decoupling performance and correctness. In ISCA, Jun. 2003.

Digital Library

[30]

M. M. K. Martin et al. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. CAN, Sep. 2005.

Digital Library

[31]

H. Matsutani et al. Prediction router: Yet another low latency on-chip router architecture. In MICRO, Feb. 2009.

[32]

G. F. Pfister et al. The IBM Research Parallel Processor Prototype (RP3): Introduction and architecture. In ICPP, pages 764--771, 1985.

[33]

A. Raghavan et al. Token tenure: PATCHing token counting using directory-based cache coherence. In MICRO, Nov. 2008.

Digital Library

[34]

S. Rodrigo et al. Efficient unicast and multicast support for CMPs. In MICRO, Sep. 2008.

Digital Library

[35]

A. F. Samman et al. Multicast parallel pipeline router architecture for network-on-chip. In DATE, 2008.

Digital Library

[36]

K. Strauss et al. Uncorq: Unconstrained snoop request delivery in embedded-ring multiprocessors. In MICRO, 2007.

Digital Library

[37]

L. Wang, Y. Jin, H. Kim, and E. J. Kim. Recursive partitioning multicast: A bandwidth-efficient routing for networks-on-chip. In NOCS, 2009.

Digital Library

[38]

H.-S. Wang et al. Power-driven design of router microarchitectures in on-chip networks. In MICRO, 2003.

Digital Library

[39]

M. A. Watkins et al. ReMAP: A reconfigurable heterogeneous multicore architecture. In MICRO, 2010.

Digital Library

[40]

D. Wentzlaff et al. On-chip interconnection architecture of the tile processor. IEEE Micro, 27(5):15--31, Sept. 2007.

Digital Library

Cited By

Yang WChen YHuang ZZhang HGu H(2023)Routing and Wavelength Assignment for Multiple Multicasts in Optical Network-on-Chip (ONoC)IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.327495142:12(4934-4947)Online publication date: Dec-2023
https://doi.org/10.1109/TCAD.2023.3274951
Ahmed MMansoor NGanguly A(2022)An Asymmetric, One-To-Many Traffic-Aware mm-Wave Wireless Interconnection Architecture for Multichip SystemsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2020.302061510:1(324-338)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TETC.2020.3020615
Franques AAbadal SHassanieh HTorrellas J(2021)Fuzzy-Token: An Adaptive MAC Protocol for Wireless-Enabled Manycores2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9473960(1657-1662)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9473960
Show More Cited By

Index Terms

Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
    2. Parallel architectures
      1. Interconnection architectures

Recommendations

Architectural integration of rf-interconnect to enhance on-chip communication for many-core chip multiprocessors
A multilayer nanophotonic interconnection network for on-chip many-core communications
DAC '10: Proceedings of the 47th Design Automation Conference

Multi-core chips or chip multiprocessors (CMPs) are becoming the de facto architecture for scaling up performance and taking advantage of the increasing transistor count on the chip within reasonable power consumption levels. The projected increase in ...
On-chip network designs for many-core computational platforms

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

December 2011

519 pages

ISBN:9781450310536

DOI:10.1145/2155620

Conference Chair:
Carlo Galuzzi
Technische Universiteit Delft, The Netherlands
,
General Chair:
Luigi Carro
Universidade Federal do Rio Grande do Sul, Brasil
,
Program Chairs:
Andreas Moshovos
University of Toronto, Canada
,
Milos Prvulovic
Georgia Institute of Technology, United States

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE
ACM: Association for Computing Machinery
UFRGS: Universidade Federal do Rio Grande do Sul
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Division of Computing and Communication Foundations

Conference

MICRO-44

Sponsor:

ACM
UFRGS
SIGMICRO
IEEE-CS

MICRO-44: The 44th Annual IEEE/ACM International Symposium on Microarchitecture

December 3 - 7, 2011

Porto Alegre, Brazil

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

79
Total Citations
View Citations
616
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)3

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang WChen YHuang ZZhang HGu H(2023)Routing and Wavelength Assignment for Multiple Multicasts in Optical Network-on-Chip (ONoC)IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.327495142:12(4934-4947)Online publication date: Dec-2023
https://doi.org/10.1109/TCAD.2023.3274951
Ahmed MMansoor NGanguly A(2022)An Asymmetric, One-To-Many Traffic-Aware mm-Wave Wireless Interconnection Architecture for Multichip SystemsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2020.302061510:1(324-338)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TETC.2020.3020615
Franques AAbadal SHassanieh HTorrellas J(2021)Fuzzy-Token: An Adaptive MAC Protocol for Wireless-Enabled Manycores2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9473960(1657-1662)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9473960
Biswas AChatterjee NMondal HGogniat GDiguet J(2021)Attacks Toward Wireless Network-on-Chip and CountermeasuresIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2020.29734279:2(692-706)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TETC.2020.2973427
John AJosna VNair P(2021)Network-on-Chip Routers for Efficient Dataflow Computing: A Survey2021 International Conference on Advances in Computing and Communications (ICACC)10.1109/ICACC-202152719.2021.9708116(1-8)Online publication date: 21-Oct-2021
https://doi.org/10.1109/ICACC-202152719.2021.9708116
Li YWu MLi WXue RFan DLi DJi YYe X(2020)An Efficient Multicast Router using Shared-Buffer with Packet Merging for Dataflow Architecture2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)10.1109/NOCS50636.2020.9241709(1-8)Online publication date: 24-Sep-2020
https://doi.org/10.1109/NOCS50636.2020.9241709
Konstantinou DNicopoulos CLee JSirakoulis GDimitrakopoulos G(2020)SmartFork: Partitioned Multicast Allocation and Switching in Network-on-Chip Routers2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9180774(1-5)Online publication date: Oct-2020
https://doi.org/10.1109/ISCAS45731.2020.9180774
Yang WChen YHuang ZZhang HGu HYu C(2020)A Survey of Multicast Communication in Optical Network-on-Chip (ONoC)Parallel Architectures, Algorithms and Programming10.1007/978-981-15-2767-8_6(58-70)Online publication date: 26-Jan-2020
https://doi.org/10.1007/978-981-15-2767-8_6
Menezo LPuente VGregorio J(2020) Rainbow: A composable coherence protocol for multi‐chip servers Concurrency and Computation: Practice and Experience10.1002/cpe.594732:24Online publication date: 21-Jul-2020
https://doi.org/10.1002/cpe.5947
Ma SLiu ZChen SHuang LGuo YWang ZZhang M(2019)Coordinated DMA: Improving the DRAM Access Efficiency for Matrix MultiplicationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.290689130:10(2148-2164)Online publication date: 1-Oct-2019
https://doi.org/10.1109/TPDS.2019.2906891
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents