Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Energy Efficient Chip-to-Chip Wireless Interconnection for Heterogeneous Architectures

Published: 26 July 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Heterogeneous multichip architectures have gained significant interest in high-performance computing clusters to cater to a wide range of applications. In particular, heterogeneous systems with multiple multicore CPUs, GPUs, and memory have become common to meet application requirements. The shared resources like interconnection network in such systems pose significant challenges due to the diverse traffic requirements of CPUs and GPUs. Especially, the performance and energy consumption of inter-chip communication have remained a major bottleneck due to limitations imposed by off-chip wired links. To overcome these challenges, we propose a wireless interconnection network to provide energy-efficient, high-performance communication in heterogeneous multi-chip systems. Interference-free communication between GPUs and memory modules is achieved through directional wireless links, while omnidirectional wireless interfaces connect cores in the CPUs with other components in the system. Besides providing low-energy, high-bandwidth inter-chip communication, the wireless interconnection scales efficiently with system size to provide high performance across multiple chips. The proposed inter-chip wireless interconnection is evaluated on two system sizes with multiple CPU and multiple GPU chips, along with main memory modules. On a system with 4 CPU and 4 GPU chips, application runtime is sped up by 3.94×, packet energy is reduced by 94.4%, and packet latency is reduced by 58.34% as compared to baseline system with wired inter-chip interconnection.

    References

    [1]
    {n.d.}. AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). Retrieved from http://developer.amd.com/sdks/amdappsdk/.
    [2]
    Levi Barnes. 2013. Multi-GPU programming. In Proceedings of the GPU Technology Conference. nVIDIA.
    [3]
    Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
    [4]
    M. S. Birrittella, M. Debbage, R. Huggahalli, J. Kunz, T. Lovett, T. Rimmer, K. D. Underwood, and R. C. Zak. 2015. Intel® Omni-path architecture: Enabling scalable, high performance fabrics. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. 1--9.
    [5]
    M. S. Birrittella, M. Debbage, R. Huggahalli, J. Kunz, T. Lovett, T. Rimmer, K. D. Underwood, and R. C. Zak. 2016. Enabling scalable high-performance systems with the Intel Omni-path architecture. IEEE Micro 36, 4 (Jul. 2016), 38--47.
    [6]
    S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC’09). 44--54.
    [7]
    Ian Cutress. 2017. AMD’s Future in Servers: New 7000-series CPUs Launched and EPYC Analysis. Retrieved from https://www.anandtech.com/show/11551/amds-future-in-servers-new-7000-series-cpus-launched-and-epyc-analysis/2.
    [8]
    Manish Deo. 2017. Enabling Next-Generation Platforms Using Intel’s 3D System-in-Package Techology. Technical Report. Intel.
    [9]
    S. H. Gade and S. Deb. 2017. HyWin: Hybrid wireless NoC with sandboxed sub-networks for CPU/GPU architectures. IEEE Trans. Comput. 66, 7 (Jul. 2017), 1145--1158.
    [10]
    S. H. Gade, S. Garg, and S. Deb. 2017. OFDM based high data rate, fading resilient transceiver for wireless networks-on-chip. In Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17). 483--488.
    [11]
    Sri Harsha Gade, Shobha Sundar Ram, and Sujay Deb. 2019. Millimeter wave wireless interconnects in deep submicron chips: Challenges and opportunities. Integration 64 (2019), 127--136.
    [12]
    S. H. Gade, S. S. Rout, M. Sinha, H. K. Mondal, W. Singh, and S. Deb. 2018. A utilization aware robust channel access mechanism for wireless NoCs. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS’18). 1--5.
    [13]
    S. H. Gade, M. Sinha, S. S. Rout, and S. Deb. 2018. Enabling reliable high throughput on-chip wireless communication for many core architectures. In Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI’18). 591--596.
    [14]
    Mark Harris. 2013. Unified Memory in CUDA 6. Retrieved from https://devblogs.nvidia.com/unified-memory-in-cuda-6/.
    [15]
    Intel. 2016. Intel omni-path 4.8 Tbps switch ASIC and platform. In Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS’16). 1--17.
    [16]
    A. Karkar, T. Mak, K. F. Tong, and A. Yakovlev. 2016. A survey of emerging interconnects for on-chip efficient multicast and broadcast in many-cores. IEEE Circ. Syst. Mag. 16, 1 (Firstquarter 2016), 58--72.
    [17]
    G. Kim, M. Lee, J. Jeong, and J. Kim. 2014. Multi-GPU system design with memory networks. In Proceedings of the 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 484--495.
    [18]
    Akhilesh Kumar and Malay Trivedi. 2017. Intel Xeon Scalable Processor Architecture Deep Dive. Retrieved from http://www.primeline-solutions.de/files/intel-xeon-scalable-architecture-deep-dive1.pdf.
    [19]
    Byung-Jae Kwak, Nah-Oak Song, and L. E. Miller. 2005. Performance analysis of exponential backoff. IEEE/ACM Trans. Netw. 13, 2 (Apr. 2005), 343--355.
    [20]
    S. Laha, S. Kaya, D. W. Matolak, W. Rayess, D. DiTomaso, and A. Kodi. 2015. A new frontier in ultralow power wireless links: Network-on-chip and chip-to-chip interconnects. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 34, 2 (Feb. 2015), 186--198.
    [21]
    J. H. Lau. 2011. Evolution, challenge, and outlook of TSV, 3D IC integration and 3d silicon integration. In Proceedings of the 2011 International Symposium on Advanced Packaging Materials (APM’11). 462--488.
    [22]
    Jason Lawley. 2014. Understanding Performance of PCI Express Systems. Whitepaper. Xilinx.
    [23]
    L. Li, P. Ton, M. Nagar, and P. Chia. 2017. Reliability challenges in 2.5D and 3D IC integration. In Proceedings of the 2017 IEEE 67th Electronic Components and Technology Conference (ECTC’17). 1504--1509.
    [24]
    Gabriel H. Loh. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, Los Alamitos, CA, 453--464.
    [25]
    R. Mahajan, R. Sankman, N. Patel, D. Kim, K. Aygun, Z. Qian, Y. Mekonnen, I. Salama, S. Sharan, D. Iyengar, and D. Mallik. 2016. Embedded multi-die interconnect bridge (EMIB)—A high density, high bandwidth packaging interconnect. In Proceedings of the 2016 IEEE 66th Electronic Components and Technology Conference (ECTC’16). 557--565.
    [26]
    H. Matsutani, M. Koibuchi, I. Fujiwara, T. Kagami, Y. Take, T. Kuroda, P. Bogdan, R. Marculescu, and H. Amano. 2014. Low-latency wireless 3D NoCs via randomized shortcut chips. In Proceedings of the 2014 Design, Automation Test in Europe Conference Exhibition (DATE’14). 1--6.
    [27]
    H. K. Mondal, S. H. Gade, S. Kaushik, and S. Deb. 2017. Adaptive multi-voltage scaling with utilization prediction for energy-efficient wireless NoC. IEEE Trans. Sust. Comput. 2, 4 (Oct. 2017), 382--395.
    [28]
    Hemanta Kumar Mondal, Sri Harsha Gade, Raghav Kishore, and Sujay Deb. 2017. P2NoC: Power- and performance-aware NoC architectures for sustainable computing. Sust. Comput. Inf. Syst. 16 (2017), 25--37.
    [29]
    J. Nickolls and W. J. Dally. 2010. The GPU computing era. IEEE Micro 30, 2 (Mar. 2010), 56--69.
    [30]
    nVIDIA. 2017. NVIDIA DGX-1 System Architecture-The Fastest Platform for Deep Learning. nVIDIA. Retrieved from http://www.nvidia.com/dgx1.
    [31]
    nVIDIA. 2018. NVIDIA NVSWITCH-The World’s Highest Bandwidth On-Node Switch. Retrieved from http://images.nvidia.com/content/pdf/nvswitch-technical-overview.pdf.
    [32]
    A. Samaiyar, S. S. Ram, and S. Deb. 2014. Millimeter-wave planar log periodic antenna for on-chip wireless interconnects. In Proceedings of the 8th European Conference on Antennas and Propagation (EuCAP’14). 1007--1009.
    [33]
    M. S. Shamim, M. M. Ahmed, N. Mansoor, and A. Ganguly. 2017. Energy-efficient wireless interconnection framework for multichip systems with in-package memory stacks. In Proceedings of the 2017 30th IEEE International System-on-Chip Conference (SOCC’17). 357--362.
    [34]
    M. S. Shamim, N. Mansoor, R. S. Narde, V. Kothandapani, A. Ganguly, and J. Venkataraman. 2017. A wireless interconnection framework for seamless inter and intra-chip communication in multichip systems. IEEE Trans. Comput. 66, 3 (Mar. 2017), 389--402.
    [35]
    R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 335--344.
    [36]
    Verizon. 2017. State of the Market: Internet of Things 2017. Retrieved from https://www.verizon.com/about/sites/default/files/Verizon-2017-State-of-the-Market-IoT-Report.pdf.
    [37]
    WikiChip. 2018. Infinity Fabric (IF)—AMD. Retrieved from https://en.wikichip.org/wiki/amd/infinity_fabric.
    [38]
    S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 24--36.
    [39]
    X. Wu, Y. Ye, J. Xu, W. Zhang, W. Liu, M. Nikdast, and X. Wang. 2014. UNION: A unified inter/intrachip optical network for chip multiprocessors. IEEE Trans. VLSI Syst. 22, 5 (May 2014), 1082--1095.
    [40]
    X. Yu, J. Baylon, P. Wettin, D. Heo, P. P. Pande, and S. Mirabbasi. 2014. Architecture and design of multichannel millimeter-wave wireless NoC. IEEE Des. Test 31, 6 (Dec. 2014), 19--28.
    [41]
    Xiaowu Zhang, Jong Kai Lin, Sunil Wickramanayaka, Songbai Zhang, Roshan Weerasekera, Rahul Dutta, Ka Fai Chang, King-Jien Chui, Hong Yu Li, David Soon Wee Ho, Liang Ding, Guruprasad Katti, Suryanarayana Bhattacharya, and Dim-Lee Kwong. 2015. Heterogeneous 2.5D integration on through silicon interposer. Appl. Phys. Rev. 2, 2 (2015), 021308. arXiv:https://doi.org/10.1063/1.4921463

    Cited By

    View all
    • (2024)Electromagnetic Nanonetworks Beyond 6G: From Wearable and Implantable Networks to On-Chip and Quantum CommunicationIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.339925342:8(2122-2142)Online publication date: Aug-2024
    • (2023)Design Of Low Area Interconnect Architecture for CPU-GPU Network-On-Chips (NoCs)2023 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)10.1109/CONECCT57959.2023.10234778(1-5)Online publication date: 14-Jul-2023
    • (2023)A low-power WNoC transceiver with a novel energy consumption management scheme for dependable IoT systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.10.010172:C(144-158)Online publication date: 1-Feb-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Design Automation of Electronic Systems
    ACM Transactions on Design Automation of Electronic Systems  Volume 24, Issue 5
    September 2019
    282 pages
    ISSN:1084-4309
    EISSN:1557-7309
    DOI:10.1145/3339837
    • Editor:
    • Naehyuck Chang
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 26 July 2019
    Accepted: 01 June 2019
    Revised: 01 April 2019
    Received: 01 December 2018
    Published in TODAES Volume 24, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Inter-chip wireless
    2. heterogeneous architectures
    3. multi-chip system

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)508
    • Downloads (Last 6 weeks)58
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Electromagnetic Nanonetworks Beyond 6G: From Wearable and Implantable Networks to On-Chip and Quantum CommunicationIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.339925342:8(2122-2142)Online publication date: Aug-2024
    • (2023)Design Of Low Area Interconnect Architecture for CPU-GPU Network-On-Chips (NoCs)2023 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)10.1109/CONECCT57959.2023.10234778(1-5)Online publication date: 14-Jul-2023
    • (2023)A low-power WNoC transceiver with a novel energy consumption management scheme for dependable IoT systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.10.010172:C(144-158)Online publication date: 1-Feb-2023
    • (2022)Multi-Chiplet Placement Design for 3D Integration2022 17th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT)10.1109/IMPACT56280.2022.9966719(1-4)Online publication date: 26-Oct-2022
    • (2022)A systematic analysis of power saving techniques for wireless network-on-chip architecturesJournal of Systems Architecture10.1016/j.sysarc.2022.102485126(102485)Online publication date: May-2022
    • (2021)Die-stacking Placement for Heterogeneous integration Architecture2021 16th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT)10.1109/IMPACT53160.2021.9696848(211-214)Online publication date: 21-Dec-2021

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media