Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

GPUOPT: Power-efficient Photonic Network-on-Chip for a Scalable GPU

Published: 22 September 2020 Publication History
  • Get Citation Alerts
  • Abstract

    On-chip photonics is a disruptive technology, and such NoCs are superior to traditional electrical NoCs in terms of latency, power, and bandwidth. Hence, researchers have proposed a wide variety of optical networks for multicore processors. The high bandwidth and low latency features of photonic NoCs have led to the overall improvement in the system performance. However, there are very few proposals that discuss the usage of optical interconnects in Graphics Processor Units (GPUs). GPUs can also substantially gain from such novel technologies, because they need to provide significant computational throughput without further stressing their power budgets.
    The main shortcoming of optical networks is their high static power usage, because the lasers are turned on all the time by default, even when there is no traffic inside the chip, and thus sophisticated laser modulation schemes are required. Such modulation schemes base their decisions on an accurate prediction of network traffic in the future. In this article, we propose an energy-efficient and scalable optical interconnect for modern GPUs called GPUOPT that smartly creates an overlay network by dividing the symmetric multiprocessors (SMs) into clusters. It furthermore has separate sub-networks for coherence and non-coherence traffic. To further increase the throughput, we connect the off-chip memory with optical links as well.
    Subsequently, we show that traditional laser modulation schemes (for reducing static power consumption) that were designed for multicore processors are not that effective for GPUs. Hence, there was a need to create a bespoke scheme for predicting the laser power usage in GPUs.
    Using this set of techniques, we were able to improve the performance of a modern GPU by 45% as compared to a state-of-the-art electrical NoC. Moreover, as compared to competing optical NoCs for GPUs, our scheme reduces the laser power consumption by 67%, resulting in a net 65% reduction in ED2 for a suite of Rodinia benchmarks.

    References

    [1]
    A. Bakhoda, J. Kim, and T. M. Aamodt. 2010. On-chip network design considerations for compute accelerators. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10).
    [2]
    Ali Bakhoda, John Kim, and Tor M. Aamodt. 2010. Throughput-effective on-chip networks for manycore accelerators. In Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture (MICRO’10). IEEE Computer Society.
    [3]
    A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 163--174.
    [4]
    Janibul Bashir, Eldhose Peter, and Smruti R. Sarangi. 2019. BigBus: A scalable optical interconnect. ACM J. Emerg. Technol. Comput. Syst. 15, 1 (2019), 8.
    [5]
    Janibul Bashir, Eldhose Peter, and Smruti R. Sarangi. 2019. A survey of on-chip optical interconnects. ACM Comput. Surv. 51, 6 (Jan. 2019), 115:1–115:34.
    [6]
    Janibul Bashir and Smruti R. Sarangi. 2017. NUPLet: A photonics based multi-chip NUCA architecture. In Proceedings of the IEEE 35th International Conference on Computer Design (ICCD’17). IEEE.
    [7]
    Janibul Bashir and Smruti Ranjan Sarangi. 2019. Predict, share, and recycle your way to low-power nanophotonic networks. ACM J. Emerg. Technol. Comput. Syst. 16, 1 (2019), 1--26.
    [8]
    Janibul Bashir, Khushal Sethi, and Smruti R. Sarangi. 2019. Power efficient photonic network-on-chip for a scalable GPU. In Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. 1--2.
    [9]
    J.-R. Burie, G. Beuchet, M. Mimoun, P. Pagnod-Rossiaux, B. Ligat, J. C. Bertreux, J.-M. Rousselet, J. Dufour, P. Rougeolle, and F. Laruelle. 2010. Ultra high power, ultra low RIN up to 20 GHz 1.55 m DFB AlGaInAsP laser for analog applications. In Proceedings of the OPTO Conference. International Society for Optics and Photonics, 76160Y–76160Y.
    [10]
    Rodolfo E. Camacho-Aguilera, Yan Cai, Neil Patel, Jonathan T. Bessette, Marco Romagnoli, Lionel C. Kimerling, and Jurgen Michel. 2012. An electrically pumped germanium laser. Opt. Exp. 20, 10 (2012), 11316--11320.
    [11]
    Jaime Cardenas, Carl B. Poitras, Jacob T. Robinson, Kyle Preston, Long Chen, and Michal Lipson. 2009. Low loss etchless silicon photonic waveguides. Opt. Express 17, 6 (Mar. 2009), 4752–4757.
    [12]
    Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization. IEEE, 44--54.
    [13]
    Guoqing Chen, Hui Chen, Mikhail Haurylau, Nicholas A. Nelson, David H. Albonesi, Philippe M. Fauchet, and Eby G. Friedman. 2007. Predictions of CMOS compatible on-chip optical interconnect. Integ. VLSI J. 40, 4 (2007), 434--446.
    [14]
    Xuning Chen, Li-Shiuan Peh, Gu-Yeon Wei, Yue-Kai Huang, and Paul Prucnal. 2005. Exploring the design space of power-aware opto-electronic networked systems. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. IEEE, 120--131.
    [15]
    John Cheng, Max Grossman, and Ty McKercher. 2014. Professional Cuda C Programming. John Wiley 8 Sons.
    [16]
    William J. Dally and Brian Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proceedings of the 38th Design Automation Conference. ACM, 684--689.
    [17]
    Luan H. K. Duong, Mahdi Nikdast, Jiang Xu, Zhehui Wang, Yvain Thonnart, Sébastien Le Beux, Peng Yang, Xiaowen Wu, and Zhifei Wang. 2015. Coherent crosstalk noise analyses in ring-based optical interconnects. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. EDA Consortium, San Jose, CA.
    [18]
    M. Faugeron, M. Chtioui, A. Enard, O. Parillaud, F. Lelarge, M. Achouche, J. Jacquet, A. Marceaux, and F. van Dijk. 2013. High optical power, high gain and high dynamic range directly modulated optical link. J. Lightw. Technol. 31, 8 (Apr. 2013), 1227--1233.
    [19]
    Mickaël Faugeron, Michaël Tran, François Lelarge, Mourad Chtioui, Yannick Robert, Eric Vinet, Alain Enard, Joël Jacquet, and Frederic Van Dijk. 2012. High-power, low RIN 1.55-directly modulated DFB lasers for analog signal transmission. Photon. Technol. Lett. 24, 2 (2012), 116--118.
    [20]
    M. Faugeron, M. Tran, O. Parillaud, M. Chtioui, Y. Robert, E. Vinet, A. Enard, J. Jacquet, and F. V. Dijk. 2013. High-power tunable dilute mode DFB laser with low RIN and narrow linewidth. Photon. Technol. Lett. 25, 1 (2013), 7--10.
    [21]
    Naznin Fauzia, Louis-Noël Pouchet, and P. Sadayappan. 2015. Characterizing and enhancing global memory data coalescing on GPUs. In Proceedings of the 13th IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 12--22.
    [22]
    J. Fujikata, K. Nishi, A. Gomyo, J. Ushida, I. Tsutomu, H. Yukawa, D. Okamoto, M. Nakada, T. Shimizu, M. Kinoshita, et al. 2008. LSI on-chip optical interconnection with Si nano-photonics. IEICE Trans. Electron. 91, 2 (2008), 131--137.
    [23]
    Rajib R. Ghosh, Janib Bashir, Smruti R. Sarangi, and Anuj Dhawan. 2019. SpliESR: Tunable power splitter based on an electro-optic slotted ring resonator. Opt. Commun. 442 (2019), 117–122.
    [24]
    N. Goswami, Z. Li, R. Shankar, and T. Li. 2014. Exploring silicon nanophotonics in throughput architecture. IEEE Des. Test 31, 5 (Oct. 2014), 18--27.
    [25]
    Michael K. Gowan, Larry L. Biro, and Daniel B. Jackson. 1998. Power considerations in the design of the alpha 21264 microprocessor. In Proceedings of the 35th Design Automation Conference. ACM, 726--731.
    [26]
    H. Gu and J. Xu. 2009. Design of 3D optical network on chip. In Proceedings of the Symposium on Photonics and Optoelectronics.
    [27]
    M. J. Humphrey. 1994. Calculation of Coupling between Tapered Fiber Modes and Whispering-gallery Modes of a Spherical Microlaser. Ph.D. Dissertation. University of Maryland, College Park, Maryland.
    [28]
    Ajay Joshi, Christopher Batten, Yong-Jin Kwon, Scott Beamer, Imran Shamim, Krste Asanovic, and Vladimir Stojanovic. 2009. Silicon-photonic clos networks for global on-chip communication. In Proceedings of the International Symposium on Networks-on-Chip.
    [29]
    Andrew B. Kahng, Bin Li, Li-Shiuan Peh, and Kambiz Samadi. 2011. Orion 2.0: A power-area simulator for interconnection networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 20, 1 (2011), 191--196.
    [30]
    Leonid Khriachtchev. 2016. Silicon Nanophotonics: Basic Principles, Present Status, and Perspectives. CRC Press.
    [31]
    Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili. 2013. Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture. J. Parallel Distrib. Comput. 73, 12 (Dec. 2013).
    [32]
    Jacob S. Levy, Yoshitomo Okawachi, Michal Lipson, Alexander L. Gaeta, and Kasturi Saha. 2011. High-performance silicon-based multiple wavelength source. In Proceedings of the Conference on Lasers and Electro-Optics: Science and Innovations. OSA, CMAA7.
    [33]
    Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture.
    [34]
    Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2 (2008).
    [35]
    John D. C. Little and Stephen C. Graves. 2008. Little’s law. In Building Intuition. Springer, 81--100.
    [36]
    Jifeng Liu, Xiaochen Sun, Rodolfo Camacho-Aguilera, Lionel C. Kimerling, and Jurgen Michel. 2010. Ge-on-Si laser operating at room temperature. Opt. Lett. 35, 5 (2010), 679--681.
    [37]
    LUXTERA. 2001. LUXTERA: Fibre to the chip. Retrieved from http://www.luxtera.com/luxtera/products.
    [38]
    Geetika Malhotra, Seep Goel, and Smruti R. Sarangi. 2014. Gputejas: A parallel simulator for GPU architectures. In Proceedings of the 21st International Conference on High Performance Computing (HiPC’14). IEEE, 1--10.
    [39]
    Sumit K. Mandal, Raid Ayoub, Michael Kishinevsky, and Umit Y. Ogras. 2019. Analytical performance models for NoCs with multiple priority traffic classes. ACM Trans. Embed. Comput. Syst. 18, 5s (2019).
    [40]
    Rainer Michalzik. 2012. VCSELs: Fundamentals, Technology and Applications of Vertical-cavity Surface-emitting Lasers. Vol. 166. Springer.
    [41]
    Jurgen Michel, Jifeng Liu, and Lionel C. Kimerling. 2010. High-performance Ge-on-Si photodetectors. Nature Photon. 4, 8 (2010), 527.
    [42]
    Ming Li, Qing-An Zeng, and Wen-Ben Jone. 2006. DyXY—A proximity congestion-aware deadlock-free dynamic routing method for network on chip. In Proceedings of the 43rd ACM/IEEE Design Automation Conference.
    [43]
    Thomas Moscibroda and Onur Mutlu. 2009. A case for bufferless routing in on-chip networks. SIGARCH Comput. Archit. News 37, 3 (June 2009).
    [44]
    Berkeley News. 2015. Engineers demo first processor that uses light for ultrafast communications. Retrieved from http://news.berkeley.edu/2015/12/23/electronic-photonic-microprocessor-chip/.
    [45]
    John Nickolls and William J. Dally. 2010. The GPU computing era. IEEE Micro 30, 2 (2010).
    [46]
    NVIDIA. 2016. NVIDIA Tesla P100: The most advanced datacenter accelerator ever built featuring Pascal GP100, the world’s fastest GPU. Whitepaper. Retrieved from https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf.
    [47]
    John D. Owens, Mike Houston, David Luebke, Simon Green, John E. Stone, and James C. Phillips. 2008. GPU computing. Proc. IEEE 96, 5 (2008), 879--899.
    [48]
    Yan Pan, John Kim, and Gokhan Memik. 2010. Flexishare: Channel sharing for an energy-efficient nanophotonic crossbar. In Proceedings of the International Symposium on High-performance Computer Architecture.
    [49]
    Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik, Yu Zhang, and Alok Choudhary. 2009. Firefly: Illuminating future network-on-chip with nanophotonics. In ACM SIGARCH Computer Architecture News. ACM.
    [50]
    Eldhose Peter, Arun Thomas, Anuj Dhawan, and Smruti R. Sarangi. 2015. ColdBus: A near-optimal power efficient optical bus. In Proceedings of the International Conference on High Performance Computing.
    [51]
    K. Preston, N. Sherwood-Droz, J. S. Levy, and M. Lipson. 2011. Performance guidelines for WDM interconnects based on silicon microring resonators. In Proceedings of the Conference on Lasers and Electro-Optics.
    [52]
    Circuits Multi Projects. 2018. Silicon photonics. Retrieved from https://mycmp.fr/datasheet/silicon-photonic-ics-si310-phmp2m.
    [53]
    Xu Qianfan, Manipatruni Sasikanth, Schmidt Brad, Shakya Jagat, and Lipson Michal. 2007. 12.5 Gbit/s carrier-injection-based silicon micro-ring silicon modulators. Opt. Exp. 15 (Jan. 2007).
    [54]
    Carl Ramey. 2011. Tile-gx100 manycore processor: Acceleration interfaces and architecture. In Proceedings of the IEEE Hot Chips Symposium (HCS’11). IEEE, 1--21.
    [55]
    Jason Sanders and Edward Kandrot. 2010. CUDA by Example: An Introduction to General-purpose GPU Programming. Addison-Wesley Professional.
    [56]
    Jinuk Luke Shin, Kenway Tam, Dawei Huang, Bruce Petrick, Ha Pham, Changku Hwang, Hongping Li, Alan Smith, Timothy Johnson, Francis Schumacher, et al. 2010. A 40nm 16-core 128-thread CMT SPARC SoC processor. In Proceedings of the IEEE International Solid-state Circuits Conference (ISSCC’10). IEEE, 98--99.
    [57]
    S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. 2008. Cacti 5.3. HP Laboratories, Palo Alto, CA.
    [58]
    A. W. Topol, D. C. L. Tulipe, L. Shi, D. J. Frank, K. Bernstein, S. E. Steen, A. Kumar, G. U. Singco, A. M. Young, K. W. Guarini, and M. Ieong. 2006. Three-dimensional integrated circuits. IBM J. Res. Dev. (July 2006).
    [59]
    Dana Vantrease, Nathan Binkert, Robert Schreiber, and Mikko H. Lipasti. 2009. Light speed arbitration and flow control for nanophotonic interconnects. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’09). IEEE.
    [60]
    Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, Raymond G. Beausoleil, and Jung Ho Ahn. 2008. Corona: System implications of emerging nanophotonic technology. In Proceedings of the International Symposium on Computer Architecture.
    [61]
    Yiyuan Xie, Mahdi Nikdast, Jiang Xu, Wei Zhang, Qi Li, Xiaowen Wu, Yaoyao Ye, Xuan Wang, and Weichen Liu. 2010. Crosstalk noise and bit error rate analysis for optical network-on-chip. In Proceedings of the 47th Design Automation Conference. ACM, 657--660.
    [62]
    Li Zhou and Avinash Karanth Kodi. 2013. Probe: Prediction-based optical bandwidth scaling for energy-efficient nocs. In Proceedings of the International Symposium on Networks-on-Chip.
    [63]
    Amir Kavyan Ziabari. 2016. Improving the global memory efficiency in GPU-based systems. Northeastern University.
    [64]
    Amir Kavyan Ziabari, José L. Abellán, Yenai Ma, Ajay Joshi, and David Kaeli. 2015. Asymmetric NoC architectures for GPU systems. In Proceedings of the 9th International Symposium on Networks-on-Chip (NOCS’15). ACM, New York, NY.
    [65]
    Amir Kavyan Kavyan Ziabari, Jose L. Abellán, Rafael Ubal, Chao Chen, Ajay Joshi, and David Kaeli. 2015. Leveraging silicon-photonic NoC for designing scalable GPUs. In Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 273--282.
    [66]
    D. Zydek, N. Shlayan, E. Regentova, and H. Selvaraj. 2008. Review of packet switching technologies for future NoC. In Proceedings of the International Conference on Systems Engineering. 306--311.

    Cited By

    View all
    • (2023)Fast Performance Analysis for NoCs With Weighted Round-Robin Arbitration and Finite BuffersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.325066231:5(670-683)Online publication date: 16-Mar-2023
    • (2023)RONet: Scaling GPU System with Silicon Photonic Chiplet2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323762(1-9)Online publication date: 28-Oct-2023
    • (2021)Energy-Efficient Networks-on-Chip Architectures: Design and Run-Time OptimizationNetwork-on-Chip Security and Privacy10.1007/978-3-030-69131-8_3(55-75)Online publication date: 22-Jan-2021

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 17, Issue 1
    January 2021
    232 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/3425108
    • Editor:
    • Ramesh Karri
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 22 September 2020
    Accepted: 01 August 2020
    Revised: 01 July 2020
    Received: 01 January 2020
    Published in JETC Volume 17, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPUs
    2. On-chip networks
    3. photonics
    4. static power consumption

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Fast Performance Analysis for NoCs With Weighted Round-Robin Arbitration and Finite BuffersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.325066231:5(670-683)Online publication date: 16-Mar-2023
    • (2023)RONet: Scaling GPU System with Silicon Photonic Chiplet2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323762(1-9)Online publication date: 28-Oct-2023
    • (2021)Energy-Efficient Networks-on-Chip Architectures: Design and Run-Time OptimizationNetwork-on-Chip Security and Privacy10.1007/978-3-030-69131-8_3(55-75)Online publication date: 22-Jan-2021

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media