research-article

Argo: A Real-Time Network-on-Chip Architecture With an Efficient GALS Implementation

Authors:

Evangelia Kasapaki,

Martin Schoeberl,

Rasmus Bo Sorensen,

Christoph Muller,

Jens SparsoAuthors Info & Claims

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume 24, Issue 2

Pages 479 - 492

https://doi.org/10.1109/TVLSI.2015.2405614

Published: 01 February 2016 Publication History

Abstract

In this paper, we present an area-efficient, globally asynchronous, locally synchronous network-on-chip (NoC) architecture for a hard real-time multiprocessor platform. The NoC implements message-passing communication between processor cores. It uses statically scheduled time-division multiplexing (TDM) to control the communication over a structure of routers, links, and network interfaces (NIs) to offer real-time guarantees. The area-efficient design is a result of two contributions: 1) asynchronous routers combined with TDM scheduling and 2) a novel NI microarchitecture. Together they result in a design in which data are transferred in a pipelined fashion, from the local memory of the sending core to the local memory of the receiving core, without any dynamic arbitration, buffering, and clock synchronization. The routers use two-phase bundled-data handshake latches based on the Mousetrap latch controller and are extended with a clock gating mechanism to reduce the energy consumption. The NIs integrate the direct memory access functionality and the TDM schedule, and use dual-ported local memories to avoid buffering, flow-control, and synchronization. To verify the design, we have implemented a 4 <inline-formula> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 4 bitorus NoC in 65-nm CMOS technology and we present results on area, speed, and energy consumption for the router, NI, NoC, and postlayout.

References

[1]

W. J. Dally and B. Towles, “Route packets, not wires: On-chip interconnection networks,” in Proc. Design Autom. Conf., Jun. 2001, pp. 684–689.

[2]

L. Benini and G. De Micheli, “Networks on chips: A new SoC paradigm,” Computer, vol. 35, no. 1, pp. 70–78, Jan. 2002.

Digital Library

[3]

F. Clermidy et al., “A 477 mW NoC-based digital baseband for MIMO 4G SDR,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2010, pp. 278–279.

[4]

J. Howard et al., “A 48-core IA-32 processor in 45 nm CMOS using on-die message-passing and DVFS for performance and power scaling,” IEEE J. Solid-State Circuits, vol. 46, no. 1, pp. 173–183, Jan. 2011.

[5]

L. A. Plana et al., “SpiNNaker: Design and implementation of a GALS multicore system-on-chip,” ACM J. Emerg. Technol. Comput. Syst., vol. 7, no. 4, 2011, Art. ID.

[6]

L. Benini, E. Flamand, D. Fuin, and D. Melpignano, “P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator,” in Proc. Design, Autom. Test Eur. (DATE), Mar. 2012, pp. 983–987.

[7]

M. Schoeberl, D. V. Chong, W. Puffitsch, and J. Sparsø, “A time-predictable memory network-on-chip,” in Proc. 14th Int. Workshop Worst-Case Execution Time Anal. (WCET), 2014, p. 53.

[8]

Accellera Systems Initiative. (2013). Open Core Protocol Specification, Release 3.0. [Online]. Available: http://www.accellera.org/downloads/standards/ocp/ocp_3.0/

[9]

F. G. Moraes, A. Mello, L. Möller, L. Ost, and N. L. V. Calazans, “A low area overhead packet-switched network on chip: Architecture and prototyping,” in Proc. IFIP/IEEE Int. Conf. Very Large Scale Integr. (VLSI-SOC), Dec. 2003, pp. 318–323.

[10]

E. Beigne, F. Clermidy, P. Vivet, A. Clouard, and M. Renaudin, “An asynchronous NOC architecture providing low latency service and its multi-level design framework,” in Proc. 11th IEEE Int. Symp. Asynchron. Circuits Syst. (ASYNC), Mar. 2005, pp. 54–63.

[11]

M. Dall’Osso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini, “Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs,” in Proc. IEEE 30th Int. Conf. Comput. Design (ICCD), Sep. 2012, pp. 45–48.

[12]

R. Dobkin, V. Vishnyakov, E. Friedman, and R. Ginosar, “An asynchronous router for multiple service levels networks on chip,” in Proc. 11th IEEE Int. Symp. Asynchron. Circuits Syst. (ASYNC), Mar. 2005, pp. 44–53.

[13]

T. Felicijan and S. B. Furber, “An asynchronous on-chip network router with quality-of-service (QoS) support,” in Proc. IEEE Int. Syst.-Chip Conf. (SOCC), Sep. 2004, pp. 274–277.

[14]

K. Goossens and A. Hansson, “The Æthereal network on chip after ten years: Goals, evolution, lessons, and future,” in Proc. ACM/IEEE Design Autom. Conf. (DAC), Jun. 2010, pp. 306–311.

[15]

T. Bjerregaard and J. Sparsø, “Scheduling discipline for latency and bandwidth guarantees in asynchronous network-on-chip,” in Proc. 11th IEEE Int. Symp. Asynchron. Circuits Syst. (ASYNC), Mar. 2005, pp. 34–43.

[16]

I. E. Sutherland, “Micropipelines,” Commun. ACM, vol. 32, no. 6, pp. 720–738, Jun. 1989.

Digital Library

[17]

J. Sparsø, E. Kasapaki, and M. Schoeberl, “An area-efficient network interface for a TDM-based network-on-chip,” in Proc. Design, Autom. Test Eur. (DATE), Mar. 2013, pp. 1044–1047.

[18]

E. Kasapaki, J. Sparsø, R. B. Sørensen, and K. Goossens, “Router designs for an asynchronous time-division-multiplexed network-on-chip,” in Proc. Euromicro Conf. Digital Syst. Design (DSD), Sep. 2013, pp. 319–326.

[19]

E. Kasapaki and J. Sparsø, “Argo: A time-elastic time-division-multiplexed NOC using asynchronous routers,” in Proc. 20th IEEE Int. Symp. Asynchron. Circuits Syst. (ASYNC), May 2014, pp. 45–52.

[20]

R. B. Sørensen, J. Sparsø, M. R. Pedersen, and J. Højgaard, “A metaheuristic scheduler for time division multiplexed networks-on-chip,” in Proc. IEEE/IFIP Workshop Softw. Technol. Future Embedded Ubiquitous Syst. (SEUS), Jun. 2014, pp. 309–316.

[21]

M. Singh and S. Nowick, “MOUSETRAP: High-speed transition-signaling asynchronous pipelines,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 6, pp. 684–698, Jun. 2007.

Digital Library

[22]

P. T. Wolkotte, G. J. M. Smit, G. K. Rauwerda, and L. T. Smit, “An energy-efficient reconfigurable circuit-switched network-on-chip,” in Proc. 19th IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), Apr. 2005, p. 155a.

[23]

D. Wiklund and D. Liu, “SoCBUS: Switched network on chip for hard real time embedded systems,” in Proc. IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), Apr. 2003, p. 78a.

[24]

K. Goossens, J. Dielissen, and A. Rădulescu, “Æthereal network on chip: Concepts, architectures, and implementations,” IEEE Des. Test. Comput., vol. 22, no. 5, pp. 414–421, Sep./Oct. 2005.

Digital Library

[25]

A. Hansson and K. Goossens, On-Chip Interconnect With Aelite/Composable and Predictable Systems. New York, NY, USA: Springer-Verlag, 2011.

[26]

M. Millberg, E. Nilsson, R. Thid, and A. Jantsch, “Guaranteed bandwidth using looped containers in temporally disjoint networks within the nostrum network on chip,” in Proc. Design, Autom. Test Eur. (DATE), Feb. 2004, pp. 890–895.

[27]

M. Schoeberl, “A time-triggered network-on-chip,” in Proc. Int. Conf. Field-Program. Logic Appl. (FPL), Aug. 2007, pp. 377–382.

[28]

C. Paukovits and H. Kopetz, “Concepts of switching in the time-triggered network-on-chip,” in Proc. 14th IEEE Int. Conf. Embedded Real-Time Comput. Syst. Appl. (RTCSA), Aug. 2008, pp. 120–129.

[29]

H. Zhang, “Service disciplines for guaranteed performance service in packet-switching networks,” Proc. IEEE, vol. 83, no. 10, pp. 1374–1396, Oct. 1995.

[30]

M. Harrand and Y. Durand, “Network on chip with quality of service,” U.S. Patent 8 619 622, Dec. 31, 2013. [Online]. Available: http://www.google.com/patents/US8619622

[31]

J. Sparsø, “Networks-on-chip for real-time multi-processor systems-on-chip,” in Proc. Int. Conf. Appl. Concurrency Syst. Design (ACSD), Jun. 2012, pp. 1–5.

[32]

J.-Y. Le Boudec, “Application of network calculus to guaranteed service networks,” IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 1087–1096, May 1998.

Digital Library

[33]

M. Bakhouya, S. Suboh, J. Gaber, and T. El-Ghazawi, “Analytical modeling and evaluation of on-chip interconnects using network calculus,” in Proc. 3rd ACM/IEEE Int. Symp. Netw.-Chip (NOCS), May 2009, pp. 74–79.

[34]

S. Zheng, A. Burns, and L. S. Indrusiak, “Schedulability analysis for real time on-chip communication with wormhole switching,” Int. J. Embedded Real-Time Commun. Syst., vol. 1, no. 2, pp. 1–22, May 2010.

[35]

L. S. Indrusiak, “End-to-end schedulability tests for multiprocessor embedded systems based on networks-on-chip with priority-preemptive arbitration,” J. Syst. Archit., vol. 60, no. 7, pp. 553–561, 2014.

[36]

Y. Qian, Z. Lu, and Q. Dou, “QoS scheduling for NoCs: Strict priority queueing versus weighted round robin,” in Proc. IEEE Int. Conf. Comput. Design (ICCD), Oct. 2010, pp. 52–59.

[37]

M. Modarressi, H. Sarbazi-Azad, and M. Arjomand, “A hybrid packet-circuit switched on-chip network based on SDM,” in Proc. Conf. Design, Autom. Test Eur. (DATE), Apr. 2009, pp. 566–569.

[38]

G. Chen et al., “16.1 A 340 mV-to-0.9 V 20.2 Tb/s source-synchronous hybrid packet/circuit-switched $16\times 16$ network-on-chip in 22 nm trigate CMOS,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2014, pp. 276–277.

[39]

P. Ou et al., “A 65 nm 39 GOPS/W 24-core processor with 11 Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2013, pp. 56–57.

[40]

K. Goossens, J. Dielissen, A. Radulescu, E. Rijpkema, and P. Wielage, “Electronic device and a method for arbitrating shared resources,” WO Patent 2 006 092 768, Sep. 8, 2006.

[41]

The International Technology Roadmap for Semiconductors. (2011). ITRS 2011 Edition—Design. [Online]. Available: http://www.itrs.net/

[42]

W. J. Dally and J. W. Poulton, Digital Systems Engineering. Cambridge, U.K.: Cambridge Univ. Press, 1998.

Digital Library

[43]

I. M. Panades, A. Greiner, and A. Sheibanyrad, “A low cost network-on-chip with guaranteed service well suited to the GALS approach,” in Proc. 1st Int. Conf. Nano-Netw. (Nano-Net), Sep. 2006, pp. 1–5.

[44]

I. M. Panades and A. Greiner, “Bi-synchronous FIFO for synchronous circuit communication well suited for network-on-chip in GALS architectures,” in Proc. IEEE/ACM Int. Symp. Netw.-Chip (NOCS), May 2007, pp. 83–92.

[45]

P. Wielage, E. J. Marinissen, M. Altheimer, and C. Wouters, “Design and DfT of a high-speed area-efficient embedded asynchronous FIFO,” in Proc. Design, Autom. Test Eur. (DATE), Apr. 2007, pp. 1–6.

[46]

J. Bainbridge and S. Furber, “Chain: A delay-insensitive chip area interconnect,” IEEE Micro, vol. 22, no. 5, pp. 16–23, Sep./Oct. 2002.

Digital Library

[47]

B. Flachs et al., “The microarchitecture of the synergistic processor for a cell processor,” IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 63–70, Jan. 2006.

[48]

D. Berozzi, “Network interface architecture and design issues,” in Networks on Chips, G. DeMicheli and L. Benini, Eds. San Mateo, CA, USA: Morgan Kaufmann, 2006, ch. 6, pp. 203–284.

[49]

J. Sparsø, “Asynchronous circuit design—A tutorial,” in Principles of Asynchronous Circuit Design—A Systems Perspective, J. Sparsø and S. Furber, Eds. Norwell, MA, USA: Kluwer, 2001, chs. 1–8, pp. 1–152.

[50]

J. Liu, S. M. Nowick, and M. Seokl, “Soft MOUSETRAP: A bundled-data asynchronous pipeline scheme tolerant to random variations at ultra-low supply voltages,” in Proc. IEEE 19th Int. Symp. Asynchron. Circuits Syst. (ASYNC), May 2013, pp. 1–7.

[51]

O. Gangwal, A. Rădulescu, K. Goossens, S. G. Pestana, and E. Rijpkema, “Building predictable systems on chip: An analysis of guaranteed communication in the Æthereal network on chip,” in Dynamic and Robust Streaming in and Between Connected Consumer-Electronic Devices (Philips Research), P. van der Stok, Ed. Amsterdam, The Netherlands: Springer-Verlag, 2005, vol. 3, pp. 1–36.

[52]

The Open Group, “D 9.5—Final project report,” Tech. Univ. Denmark, Copenhagen, Denmark, Tech. Rep., 2014. [Online]. Available: http://www.t-crest.org/page/results

[53]

S. Saponara, L. Fanucci, and M. Coppola, “Design and coverage-driven verification of a novel network-interface IP macrocell for network-on-chip interconnects,” Microprocess. Microsyst., vol. 35, no. 6, pp. 579–592, 2011.

[54]

A. Radulescu et al., “An efficient on-chip NI offering guaranteed services, shared-memory abstraction, and flexible network configuration,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 1, pp. 4–17, Jan. 2005.

[55]

P. Kollig, C. Osborne, and T. Henriksson, “Heterogeneous multi-core platform for consumer multimedia applications,” in Proc. Design, Autom. Test Eur. (DATE), Apr. 2009, pp. 1254–1259.

[56]

S. R. Vangal et al., “An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS,” IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 29–41, Jan. 2008.

[57]

L. Pezzarossa, “Hardware accelerators in network-on-chip based multi-core platforms,” M.S. thesis, Dept. Appl. Math. Comput. Sci., Tech. Univ. Denmark, Kongens Lyngby, Denmark, 2014.

[58]

C. T. Müller, E. Kasapaki, R. B. Sørensen, and J. Sparsø, “Synthesis and layout of an asynchronous network-on-chip using standard EDA tools,” in Proc. NORCHIP, Oct. 2014, pp. 1–6.

Cited By

Ahuja HKubendran RPayvand MParsa MSchuman C(2023)High-resolution Extreme-throughput Event-based Cameras using GALS Data-scanning ArchitectureProceedings of the 2023 International Conference on Neuromorphic Systems10.1145/3589737.3605981(1-6)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1145/3589737.3605981
Jellum ELin SDonovan PJerad CWang ELohstroh MLee ESchoeberl M(2023)InterPRET: a Time-predictable Multicore ProcessorProceedings of Cyber-Physical Systems and Internet of Things Week 202310.1145/3576914.3587497(331-336)Online publication date: 9-May-2023
https://dl.acm.org/doi/10.1145/3576914.3587497
Malik GLang IPellizzoni RKapre N(2022)HopliteML: Evolving Application Customized FPGA NoCs with Adaptable Routers and RegulatorsACM Transactions on Reconfigurable Technology and Systems10.1145/350769915:4(1-33)Online publication date: 8-Aug-2022
https://dl.acm.org/doi/10.1145/3507699
Show More Cited By

Index Terms

Argo: A Real-Time Network-on-Chip Architecture With an Efficient GALS Implementation
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
  2. Real-time systems
2. Hardware

Index terms have been assigned to the content through auto-classification.

Recommendations

A low-area multi-link interconnect architecture for GALS chip multiprocessors

A new inter-processor communication architecture for chip multiprocessors is proposed which has a low area cost, flexible routing capability, and supports globally asynchronous locally synchronous (GALS) clocking styles. To achieve a low area cost, the ...
Design of On-chip and Off-chip Interfaces for a GALS NoC Architecture
ASYNC '06: Proceedings of the 12th IEEE International Symposium on Asynchronous Circuits and Systems

In this paper, we propose the design of On-chip and Off-chip Interfaces adapted to a Globally Asynchronous Locally Synchronous (GALS) Network-on-Chip (NoC) architecture. The proposed On-chip interface not only handles the resynchronization between the ...
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems
ISCA '10

Scaling trends of logic, memories, and interconnect networks lead towards dense many-core chips. Unfortunately, process yields and reticle sizes limit the scalability of large single-chip systems. Multi-chip systems break free of these areal limits, but ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Volume 24, Issue 2

Feb. 2016

400 pages

ISSN:1063-8210

Issue’s Table of Contents

Copyright © 2015.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 February 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ahuja HKubendran RPayvand MParsa MSchuman C(2023)High-resolution Extreme-throughput Event-based Cameras using GALS Data-scanning ArchitectureProceedings of the 2023 International Conference on Neuromorphic Systems10.1145/3589737.3605981(1-6)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1145/3589737.3605981
Jellum ELin SDonovan PJerad CWang ELohstroh MLee ESchoeberl M(2023)InterPRET: a Time-predictable Multicore ProcessorProceedings of Cyber-Physical Systems and Internet of Things Week 202310.1145/3576914.3587497(331-336)Online publication date: 9-May-2023
https://dl.acm.org/doi/10.1145/3576914.3587497
Malik GLang IPellizzoni RKapre N(2022)HopliteML: Evolving Application Customized FPGA NoCs with Adaptable Routers and RegulatorsACM Transactions on Reconfigurable Technology and Systems10.1145/350769915:4(1-33)Online publication date: 8-Aug-2022
https://dl.acm.org/doi/10.1145/3507699
Tadros RBeerel P(2020)A Theoretical Foundation for Timing Synchronous Systems Using Asynchronous StructuresACM Transactions on Design Automation of Electronic Systems10.1145/337335525:2(1-28)Online publication date: 3-Feb-2020
https://dl.acm.org/doi/10.1145/3373355
Schoeberl MPezzarossa LSparsø JChen KAbadal SMonteleone S(2019)S4NOCProceedings of the 12th International Workshop on Network on Chip Architectures10.1145/3356045.3360714(1-6)Online publication date: 13-Oct-2019
https://dl.acm.org/doi/10.1145/3356045.3360714
Strøm TSparsø JSchoeberl M(2019)HardlockJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.02.00397:C(467-476)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1016/j.sysarc.2019.02.003
Petersen MRiber AAndersen SSchoeberl M(2019)Time-predictable distributed shared on-chip memoryMicroprocessors & Microsystems10.1016/j.micpro.2019.10289671:COnline publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1016/j.micpro.2019.102896
Schoeberl M(2019)Multicore Models of Communication for Cyber-Physical SystemsCyber Physical Systems. Model-Based Design10.1007/978-3-030-41131-2_2(28-43)Online publication date: 17-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-41131-2_2
Zhou QYang LYan X(2018)Reconfigurable Instruction-Based Multicore Parallel Convolution and Its Application in Real-Time Template MatchingIEEE Transactions on Computers10.1109/TC.2018.284435167:12(1780-1793)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1109/TC.2018.2844351
Salcic ZPark HTeich JMalik ANadeem M(2017)Noc-HMPACM Transactions on Design Automation of Electronic Systems10.1145/307341622:4(1-25)Online publication date: 13-Jun-2017
https://dl.acm.org/doi/10.1145/3073416
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents