Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Argo: A Real-Time Network-on-Chip Architecture With an Efficient GALS Implementation

Published: 01 February 2016 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper, we present an area-efficient, globally asynchronous, locally synchronous network-on-chip (NoC) architecture for a hard real-time multiprocessor platform. The NoC implements message-passing communication between processor cores. It uses statically scheduled time-division multiplexing (TDM) to control the communication over a structure of routers, links, and network interfaces (NIs) to offer real-time guarantees. The area-efficient design is a result of two contributions: 1) asynchronous routers combined with TDM scheduling and 2) a novel NI microarchitecture. Together they result in a design in which data are transferred in a pipelined fashion, from the local memory of the sending core to the local memory of the receiving core, without any dynamic arbitration, buffering, and clock synchronization. The routers use two-phase bundled-data handshake latches based on the Mousetrap latch controller and are extended with a clock gating mechanism to reduce the energy consumption. The NIs integrate the direct memory access functionality and the TDM schedule, and use dual-ported local memories to avoid buffering, flow-control, and synchronization. To verify the design, we have implemented a 4 <inline-formula> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 4 bitorus NoC in 65-nm CMOS technology and we present results on area, speed, and energy consumption for the router, NI, NoC, and postlayout.

    References

    [1]
    W. J. Dally and B. Towles, “Route packets, not wires: On-chip interconnection networks,” in Proc. Design Autom. Conf., Jun. 2001, pp. 684–689.
    [2]
    L. Benini and G. De Micheli, “Networks on chips: A new SoC paradigm,” Computer, vol. 35, no. 1, pp. 70–78, Jan. 2002.
    [3]
    F. Clermidy et al., “A 477 mW NoC-based digital baseband for MIMO 4G SDR,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2010, pp. 278–279.
    [4]
    J. Howard et al., “A 48-core IA-32 processor in 45 nm CMOS using on-die message-passing and DVFS for performance and power scaling,” IEEE J. Solid-State Circuits, vol. 46, no. 1, pp. 173–183, Jan. 2011.
    [5]
    L. A. Plana et al., “SpiNNaker: Design and implementation of a GALS multicore system-on-chip,” ACM J. Emerg. Technol. Comput. Syst., vol. 7, no. 4, 2011, Art. ID.
    [6]
    L. Benini, E. Flamand, D. Fuin, and D. Melpignano, “P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator,” in Proc. Design, Autom. Test Eur. (DATE), Mar. 2012, pp. 983–987.
    [7]
    M. Schoeberl, D. V. Chong, W. Puffitsch, and J. Sparsø, “A time-predictable memory network-on-chip,” in Proc. 14th Int. Workshop Worst-Case Execution Time Anal. (WCET), 2014, p. 53.
    [8]
    Accellera Systems Initiative. (2013). Open Core Protocol Specification, Release 3.0. [Online]. Available: http://www.accellera.org/downloads/standards/ocp/ocp_3.0/
    [9]
    F. G. Moraes, A. Mello, L. Möller, L. Ost, and N. L. V. Calazans, “A low area overhead packet-switched network on chip: Architecture and prototyping,” in Proc. IFIP/IEEE Int. Conf. Very Large Scale Integr. (VLSI-SOC), Dec. 2003, pp. 318–323.
    [10]
    E. Beigne, F. Clermidy, P. Vivet, A. Clouard, and M. Renaudin, “An asynchronous NOC architecture providing low latency service and its multi-level design framework,” in Proc. 11th IEEE Int. Symp. Asynchron. Circuits Syst. (ASYNC), Mar. 2005, pp. 54–63.
    [11]
    M. Dall’Osso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini, “Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs,” in Proc. IEEE 30th Int. Conf. Comput. Design (ICCD), Sep. 2012, pp. 45–48.
    [12]
    R. Dobkin, V. Vishnyakov, E. Friedman, and R. Ginosar, “An asynchronous router for multiple service levels networks on chip,” in Proc. 11th IEEE Int. Symp. Asynchron. Circuits Syst. (ASYNC), Mar. 2005, pp. 44–53.
    [13]
    T. Felicijan and S. B. Furber, “An asynchronous on-chip network router with quality-of-service (QoS) support,” in Proc. IEEE Int. Syst.-Chip Conf. (SOCC), Sep. 2004, pp. 274–277.
    [14]
    K. Goossens and A. Hansson, “The Æthereal network on chip after ten years: Goals, evolution, lessons, and future,” in Proc. ACM/IEEE Design Autom. Conf. (DAC), Jun. 2010, pp. 306–311.
    [15]
    T. Bjerregaard and J. Sparsø, “Scheduling discipline for latency and bandwidth guarantees in asynchronous network-on-chip,” in Proc. 11th IEEE Int. Symp. Asynchron. Circuits Syst. (ASYNC), Mar. 2005, pp. 34–43.
    [16]
    I. E. Sutherland, “Micropipelines,” Commun. ACM, vol. 32, no. 6, pp. 720–738, Jun. 1989.
    [17]
    J. Sparsø, E. Kasapaki, and M. Schoeberl, “An area-efficient network interface for a TDM-based network-on-chip,” in Proc. Design, Autom. Test Eur. (DATE), Mar. 2013, pp. 1044–1047.
    [18]
    E. Kasapaki, J. Sparsø, R. B. Sørensen, and K. Goossens, “Router designs for an asynchronous time-division-multiplexed network-on-chip,” in Proc. Euromicro Conf. Digital Syst. Design (DSD), Sep. 2013, pp. 319–326.
    [19]
    E. Kasapaki and J. Sparsø, “Argo: A time-elastic time-division-multiplexed NOC using asynchronous routers,” in Proc. 20th IEEE Int. Symp. Asynchron. Circuits Syst. (ASYNC), May 2014, pp. 45–52.
    [20]
    R. B. Sørensen, J. Sparsø, M. R. Pedersen, and J. Højgaard, “A metaheuristic scheduler for time division multiplexed networks-on-chip,” in Proc. IEEE/IFIP Workshop Softw. Technol. Future Embedded Ubiquitous Syst. (SEUS), Jun. 2014, pp. 309–316.
    [21]
    M. Singh and S. Nowick, “MOUSETRAP: High-speed transition-signaling asynchronous pipelines,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 6, pp. 684–698, Jun. 2007.
    [22]
    P. T. Wolkotte, G. J. M. Smit, G. K. Rauwerda, and L. T. Smit, “An energy-efficient reconfigurable circuit-switched network-on-chip,” in Proc. 19th IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), Apr. 2005, p. 155a.
    [23]
    D. Wiklund and D. Liu, “SoCBUS: Switched network on chip for hard real time embedded systems,” in Proc. IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), Apr. 2003, p. 78a.
    [24]
    K. Goossens, J. Dielissen, and A. Rădulescu, “Æthereal network on chip: Concepts, architectures, and implementations,” IEEE Des. Test. Comput., vol. 22, no. 5, pp. 414–421, Sep./Oct. 2005.
    [25]
    A. Hansson and K. Goossens, On-Chip Interconnect With Aelite/Composable and Predictable Systems. New York, NY, USA: Springer-Verlag, 2011.
    [26]
    M. Millberg, E. Nilsson, R. Thid, and A. Jantsch, “Guaranteed bandwidth using looped containers in temporally disjoint networks within the nostrum network on chip,” in Proc. Design, Autom. Test Eur. (DATE), Feb. 2004, pp. 890–895.
    [27]
    M. Schoeberl, “A time-triggered network-on-chip,” in Proc. Int. Conf. Field-Program. Logic Appl. (FPL), Aug. 2007, pp. 377–382.
    [28]
    C. Paukovits and H. Kopetz, “Concepts of switching in the time-triggered network-on-chip,” in Proc. 14th IEEE Int. Conf. Embedded Real-Time Comput. Syst. Appl. (RTCSA), Aug. 2008, pp. 120–129.
    [29]
    H. Zhang, “Service disciplines for guaranteed performance service in packet-switching networks,” Proc. IEEE, vol. 83, no. 10, pp. 1374–1396, Oct. 1995.
    [30]
    M. Harrand and Y. Durand, “Network on chip with quality of service,” U.S. Patent 8 619 622, Dec. 31, 2013. [Online]. Available: http://www.google.com/patents/US8619622
    [31]
    J. Sparsø, “Networks-on-chip for real-time multi-processor systems-on-chip,” in Proc. Int. Conf. Appl. Concurrency Syst. Design (ACSD), Jun. 2012, pp. 1–5.
    [32]
    J.-Y. Le Boudec, “Application of network calculus to guaranteed service networks,” IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 1087–1096, May 1998.
    [33]
    M. Bakhouya, S. Suboh, J. Gaber, and T. El-Ghazawi, “Analytical modeling and evaluation of on-chip interconnects using network calculus,” in Proc. 3rd ACM/IEEE Int. Symp. Netw.-Chip (NOCS), May 2009, pp. 74–79.
    [34]
    S. Zheng, A. Burns, and L. S. Indrusiak, “Schedulability analysis for real time on-chip communication with wormhole switching,” Int. J. Embedded Real-Time Commun. Syst., vol. 1, no. 2, pp. 1–22, May 2010.
    [35]
    L. S. Indrusiak, “End-to-end schedulability tests for multiprocessor embedded systems based on networks-on-chip with priority-preemptive arbitration,” J. Syst. Archit., vol. 60, no. 7, pp. 553–561, 2014.
    [36]
    Y. Qian, Z. Lu, and Q. Dou, “QoS scheduling for NoCs: Strict priority queueing versus weighted round robin,” in Proc. IEEE Int. Conf. Comput. Design (ICCD), Oct. 2010, pp. 52–59.
    [37]
    M. Modarressi, H. Sarbazi-Azad, and M. Arjomand, “A hybrid packet-circuit switched on-chip network based on SDM,” in Proc. Conf. Design, Autom. Test Eur. (DATE), Apr. 2009, pp. 566–569.
    [38]
    G. Chen et al., “16.1 A 340 mV-to-0.9 V 20.2 Tb/s source-synchronous hybrid packet/circuit-switched $16\times 16$ network-on-chip in 22 nm trigate CMOS,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2014, pp. 276–277.
    [39]
    P. Ou et al., “A 65 nm 39 GOPS/W 24-core processor with 11 Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2013, pp. 56–57.
    [40]
    K. Goossens, J. Dielissen, A. Radulescu, E. Rijpkema, and P. Wielage, “Electronic device and a method for arbitrating shared resources,” WO Patent 2 006 092 768, Sep. 8, 2006.
    [41]
    The International Technology Roadmap for Semiconductors. (2011). ITRS 2011 Edition—Design. [Online]. Available: http://www.itrs.net/
    [42]
    W. J. Dally and J. W. Poulton, Digital Systems Engineering. Cambridge, U.K.: Cambridge Univ. Press, 1998.
    [43]
    I. M. Panades, A. Greiner, and A. Sheibanyrad, “A low cost network-on-chip with guaranteed service well suited to the GALS approach,” in Proc. 1st Int. Conf. Nano-Netw. (Nano-Net), Sep. 2006, pp. 1–5.
    [44]
    I. M. Panades and A. Greiner, “Bi-synchronous FIFO for synchronous circuit communication well suited for network-on-chip in GALS architectures,” in Proc. IEEE/ACM Int. Symp. Netw.-Chip (NOCS), May 2007, pp. 83–92.
    [45]
    P. Wielage, E. J. Marinissen, M. Altheimer, and C. Wouters, “Design and DfT of a high-speed area-efficient embedded asynchronous FIFO,” in Proc. Design, Autom. Test Eur. (DATE), Apr. 2007, pp. 1–6.
    [46]
    J. Bainbridge and S. Furber, “Chain: A delay-insensitive chip area interconnect,” IEEE Micro, vol. 22, no. 5, pp. 16–23, Sep./Oct. 2002.
    [47]
    B. Flachs et al., “The microarchitecture of the synergistic processor for a cell processor,” IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 63–70, Jan. 2006.
    [48]
    D. Berozzi, “Network interface architecture and design issues,” in Networks on Chips, G. DeMicheli and L. Benini, Eds. San Mateo, CA, USA: Morgan Kaufmann, 2006, ch. 6, pp. 203–284.
    [49]
    J. Sparsø, “Asynchronous circuit design—A tutorial,” in Principles of Asynchronous Circuit Design—A Systems Perspective, J. Sparsø and S. Furber, Eds. Norwell, MA, USA: Kluwer, 2001, chs. 1–8, pp. 1–152.
    [50]
    J. Liu, S. M. Nowick, and M. Seokl, “Soft MOUSETRAP: A bundled-data asynchronous pipeline scheme tolerant to random variations at ultra-low supply voltages,” in Proc. IEEE 19th Int. Symp. Asynchron. Circuits Syst. (ASYNC), May 2013, pp. 1–7.
    [51]
    O. Gangwal, A. Rădulescu, K. Goossens, S. G. Pestana, and E. Rijpkema, “Building predictable systems on chip: An analysis of guaranteed communication in the Æthereal network on chip,” in Dynamic and Robust Streaming in and Between Connected Consumer-Electronic Devices (Philips Research), P. van der Stok, Ed. Amsterdam, The Netherlands: Springer-Verlag, 2005, vol. 3, pp. 1–36.
    [52]
    The Open Group, “D 9.5—Final project report,” Tech. Univ. Denmark, Copenhagen, Denmark, Tech. Rep., 2014. [Online]. Available: http://www.t-crest.org/page/results
    [53]
    S. Saponara, L. Fanucci, and M. Coppola, “Design and coverage-driven verification of a novel network-interface IP macrocell for network-on-chip interconnects,” Microprocess. Microsyst., vol. 35, no. 6, pp. 579–592, 2011.
    [54]
    A. Radulescu et al., “An efficient on-chip NI offering guaranteed services, shared-memory abstraction, and flexible network configuration,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 1, pp. 4–17, Jan. 2005.
    [55]
    P. Kollig, C. Osborne, and T. Henriksson, “Heterogeneous multi-core platform for consumer multimedia applications,” in Proc. Design, Autom. Test Eur. (DATE), Apr. 2009, pp. 1254–1259.
    [56]
    S. R. Vangal et al., “An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS,” IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 29–41, Jan. 2008.
    [57]
    L. Pezzarossa, “Hardware accelerators in network-on-chip based multi-core platforms,” M.S. thesis, Dept. Appl. Math. Comput. Sci., Tech. Univ. Denmark, Kongens Lyngby, Denmark, 2014.
    [58]
    C. T. Müller, E. Kasapaki, R. B. Sørensen, and J. Sparsø, “Synthesis and layout of an asynchronous network-on-chip using standard EDA tools,” in Proc. NORCHIP, Oct. 2014, pp. 1–6.

    Cited By

    View all
    • (2023)High-resolution Extreme-throughput Event-based Cameras using GALS Data-scanning ArchitectureProceedings of the 2023 International Conference on Neuromorphic Systems10.1145/3589737.3605981(1-6)Online publication date: 1-Aug-2023
    • (2023)InterPRET: a Time-predictable Multicore ProcessorProceedings of Cyber-Physical Systems and Internet of Things Week 202310.1145/3576914.3587497(331-336)Online publication date: 9-May-2023
    • (2022)HopliteML: Evolving Application Customized FPGA NoCs with Adaptable Routers and RegulatorsACM Transactions on Reconfigurable Technology and Systems10.1145/350769915:4(1-33)Online publication date: 8-Aug-2022
    • Show More Cited By

    Index Terms

    1. Argo: A Real-Time Network-on-Chip Architecture With an Efficient GALS Implementation
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems
          IEEE Transactions on Very Large Scale Integration (VLSI) Systems  Volume 24, Issue 2
          Feb. 2016
          400 pages

          Publisher

          IEEE Educational Activities Department

          United States

          Publication History

          Published: 01 February 2016

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 27 Jul 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)High-resolution Extreme-throughput Event-based Cameras using GALS Data-scanning ArchitectureProceedings of the 2023 International Conference on Neuromorphic Systems10.1145/3589737.3605981(1-6)Online publication date: 1-Aug-2023
          • (2023)InterPRET: a Time-predictable Multicore ProcessorProceedings of Cyber-Physical Systems and Internet of Things Week 202310.1145/3576914.3587497(331-336)Online publication date: 9-May-2023
          • (2022)HopliteML: Evolving Application Customized FPGA NoCs with Adaptable Routers and RegulatorsACM Transactions on Reconfigurable Technology and Systems10.1145/350769915:4(1-33)Online publication date: 8-Aug-2022
          • (2020)A Theoretical Foundation for Timing Synchronous Systems Using Asynchronous StructuresACM Transactions on Design Automation of Electronic Systems10.1145/337335525:2(1-28)Online publication date: 3-Feb-2020
          • (2019)S4NOCProceedings of the 12th International Workshop on Network on Chip Architectures10.1145/3356045.3360714(1-6)Online publication date: 13-Oct-2019
          • (2019)HardlockJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.02.00397:C(467-476)Online publication date: 1-Aug-2019
          • (2019)Time-predictable distributed shared on-chip memoryMicroprocessors & Microsystems10.1016/j.micpro.2019.10289671:COnline publication date: 1-Nov-2019
          • (2019)Multicore Models of Communication for Cyber-Physical SystemsCyber Physical Systems. Model-Based Design10.1007/978-3-030-41131-2_2(28-43)Online publication date: 17-Oct-2019
          • (2018)Reconfigurable Instruction-Based Multicore Parallel Convolution and Its Application in Real-Time Template MatchingIEEE Transactions on Computers10.1109/TC.2018.284435167:12(1780-1793)Online publication date: 1-Dec-2018
          • (2017)Noc-HMPACM Transactions on Design Automation of Electronic Systems10.1145/307341622:4(1-25)Online publication date: 13-Jun-2017
          • Show More Cited By

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media