research-article

Express Link Placement for NoC-Based Many-Core Platforms

Authors:

Lizhong ChenAuthors Info & Claims

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

Article No.: 27, Pages 1 - 10

https://doi.org/10.1145/3337821.3337877

Published: 05 August 2019 Publication History

Abstract

With the integration of up to hundreds of cores in recent general-purpose processors that can be used in parallel processing systems, it is critical to design scalable and low-latency networks-on-chip (NoCs) to support various on-chip communications. An effective way to reduce on-chip latency and improve network scalability is to add express links between pairs of non-adjacent routers. However, increasing the number of express links may result in smaller bandwidth per link due to the limited total bisection bandwidth on chip, thus leading to higher serialization latency of packets in the network. Unlike previous works on application-specific designs or on fixed placement of express links, this paper aims at finding effective placement of express links for general-purpose processors considering all the possible placement options. We formulate the problem mathematically and propose an efficient algorithm that utilizes an initial solution generation heuristic and enhanced candidate generator in simulated annealing. Evaluation on 4x4, 8x8 and 16x16 networks using multi-threaded PARSEC benchmarks and various synthetic traffic patterns shows significant reduction of average packet latency over previous works.

References

[1]

Agarwal, N., Krishna, T., Peh, L. S., & Jha, N. K. (2009, April). "GARNET: A detailed on-chip network model inside a full-system simulator," International Symposium on Performance Analysis of Systems and Software (ISPASS), 33--42.

[2]

Bahirat, S., & Pasricha, S. (2009). "Exploring hybrid photonic networks-on-chip for emerging chip multiprocessors," Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, 129--136.

Digital Library

[3]

Bienia, C., Kumar, S., Singh, J. P., & Li, K. (2008). "The PARSEC benchmark suite: Characterization and architectural implications," In 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), 72--81.

Digital Library

[4]

Binkert, N., et al. (2011). "The gem5 simulator," ACM SIGARCH Computer Architecture News, 39(2), 1--7.

Digital Library

[5]

Chang, M. F., et al. (2008) "CMP network-on-chip overlaid with multi-band RF-interconnect," High Performance Computer Architecture (HPCA) IEEE 14th International Symposium on.

[6]

Chen, C. et al. (2010). "Physical vs. virtual express topologies with low-swing links for future many-core nocs," 4th ACM/IEEE International Symposium on Networks-on-Chip (NOCS), 173--180.

Digital Library

[7]

Chen, L. and Pinkston, T. M. (2012). "NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers," In 45th IEEE/ACM International Symposium on Microarchitecture (MICRO), 270--281.

Digital Library

[8]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, "Introduction to algorithms," MIT press, 2001.

Digital Library

[9]

Dally, W. J. (1991). "Express Cubes: Improving the Performance of k-ary n-cube Interconnection Networks," In IEEE Transactions on Computers, 40(9), 1016--1023.

Digital Library

[10]

Dally, W. J., & Towles, B. P. (2004). Principles and practices of interconnection networks. Elsevier.

Digital Library

[11]

Daya, B. K, et al. (2014). "SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering," In IEEE International Symposium on Computer Architecture (ISCA).

Digital Library

[12]

Dumitriu, V., & Khan, G. N. (2009). "Throughput-oriented NoC topology generation and analysis for high performance SoCs," In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 17(10), 1433--1446.

Digital Library

[13]

Gratz, P., et al. (2007). "On-chip interconnection networks of the TRIPS chip," IEEE Micro, vol. 27, 41--50.

Digital Library

[14]

Grot, B., Hestness, J., Keckler, S. W., & Mutlu, O. (2009). "Express cube topologies for on-chip interconnects," In 15th IEEE International Symposium on High Performance Computer Architecture (HPCA), 163--174.

[15]

Ho, W., & Pinkston, T. (2006). "A Design Methodology for Efficient Application-Specific On-Chip Interconnects," In IEEE Transactions on Parallel & Distributed Systems (TPDS), vol.17, no. 2, 174--190.

Digital Library

[16]

Howard, J., et al. (2010). "A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS," In Proceedings of the International Solid-State Circuits Conference (ISSCC).

[17]

Kim, B. and Stojanović, V. (2007). "Equalized interconnects for on-chip networks: modeling and optimization framework," In Int'l Conference Computer-Aided Design (ICCAD), 552--559.

Digital Library

[18]

Kim, J., Balfour, J., & Dally, W. (2007). "Flattened butterfly topology for on-chip networks," In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 172--182.

Digital Library

[19]

Kumar, A., Peh, L. S., Kundu, P., & Jha, N. K. (2007). "Express virtual channels: towards the ideal interconnection fabric," In ACM SIGARCH Computer Architecture News, 35(2), 150--161.

Digital Library

[20]

Ma, S., Jerger, N. E., & Wang, Z. (2012). "Whole packet forwarding: Efficient design of fully adaptive routing algorithms for networks-on-chip," In International Symposium on High Performance Computer Architecture (HPCA), 1--12.

Digital Library

[21]

Ogras, U. Y., & Marculescu, R. (2006). "It's a small world after all": NoC performance optimization via long-range link insertion. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(7), 693--706.

Digital Library

[22]

Park, S., et al. (2012). "Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI," In Proceedings of the 49th ACM Annual Design Automation Conference (DAC), 398--405.

Digital Library

[23]

Sun, C., et al. (2012). "DSENT-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling," In IEEE/ACM International Symposium on Networks-on-Chip (NOCS), 201--210.

Digital Library

[24]

S. Vangal, et al. (2007). "An 80-tile 1.28 TFLOPS network-on-chip in 65nm CMOS," In International Solid-State Circuits Conference (ISSCC).

Recommendations

3D NOC for many-core processors

With an increasing number of processors forming many-core chip multiprocessors (CMP), there exists a need for easily scalable, high-performance and low-power intra-chip communication infrastructure for emerging systems. In CMPs with hundreds of ...
On-chip network designs for many-core computational platforms
Regional cache organization for NoC based many-core processors

As the number of Processing Elements (PEs) on a single chip keeps growing, we are now facing with slower memory references due to longer wire delay, intenser on-chip resource contention and higher network traffic congestion. Network on Chip (NoC) is now ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

August 2019

1107 pages

ISBN:9781450362955

DOI:10.1145/3337821

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP 2019

ICPP 2019: 48th International Conference on Parallel Processing

August 5 - 8, 2019

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
95
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents