research-article

Public Access

Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms

Authors:

Karthi Duraisamy,

Janardhan Rao Doppa,

Partha Pratim Pande,

Radu Marculescu,

Diana MarculescuAuthors Info & Claims

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Article No.: 13, Pages 1 - 10

https://doi.org/10.1145/2968455.2968510

Published: 01 October 2016 Publication History

Abstract

In recent years, designing specialized manycore heterogeneous architectures for deep learning kernels has become an area of great interest. However, the typical on-chip communication infrastructures employed on conventional manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. Hence, in this paper, our aim is to enhance the performance of heterogeneous manycore architectures through the design of a hybrid NoC consisting of both wireline and wireless links. To this end, we specifically target the resource-intensive backpropagation algorithm commonly used as the training method in deep learning. For backpropagation, the proposed hybrid NoC achieves 1.9X reduction in network latency and improves the network throughput by a factor of 2 with respect to a highly optimized mesh NoC. These network level improvements translate into 25% savings in full system energy-delay-product (EDP). This demonstrates the capability of the proposed hybrid and heterogeneous manycore architecture in accelerating deep learning kernels in an energy-efficient manner.

References

[1]

Y. LeCun, Y. Bengio, and G. Hinton. "Deep learning". Nature 521: 436--444. 2015.

[2]

D. Silver et al. "Mastering the game of Go with deep neural networks and tree search". Nature 529, 484--489. 2016.

[3]

D. Rumelhard, G. Hinton, and R. Willians. "Learning representations by back-propagating errors". Nature 323 (6088): 533--536.

[4]

D. Strigl, K. Kofler, and S. Podlipnig, "Performance and Scalability of GPU-Based Convolutional Neural Networks," Proc. Euromicro Int'l Conf. Parallel, Distributed and Network-Based Processing, IEEE, 317-324, 2010.

Digital Library

[5]

S. Che et al, "Rodinia: A benchmark suite for heterogeneous computing," in Proc. IEEE Int. Symp. Workload Characterization, 44--54, 2009.

Digital Library

[6]

J. Power et al. "Heterogeneous system coherence for integrated CPU-GPU systems." In Proc. of the 46th Int'l Symp. on Microarchitecture, 2013. 457--467.

Digital Library

[7]

M.J. Schulte et al, "Achieving Exascale Capabilities through Heterogeneous Computing", IEEE Micro, vol. 35, no.4, 26-36, Aug, 2015.

Digital Library

[8]

J. Hestness, S.W. Keckler, D.A. Wood. "GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors". IISWC: 87-97, 2015.

Digital Library

[9]

U. Y. Ogras and R. Marculescu, ' "It's a small world after all": NoC Performance Optimization via Long-range Link Insertion, ' in IEEE Trans. on Very Large Scale Integration Systems, Vol.14, No. 7, 2006.

Digital Library

[10]

P. Wettin et al., "Design Space Exploration for wireless NoCs Incorporating Irregular Network Routing", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 33, Issue 11, 1732-1745, 2014.

[11]

S. Deb et al., "Wireless NoC as Interconnection Backbone for Multicore Chip: Promises and Challenges", IEEE Journal on Emerging and Selected Topics in Circuits and Systems, Vol. 2, No. 2, 228-239, 2012.

[12]

S. Deb et al., (2013, December). "Design of an energy efficient CMOS-compatible NoC architecture with millimeter-wave wireless interconnects," IEEE Transactions on Computers, 62(12), pp.2382-2396.

Digital Library

[13]

E. Painkras et al., "SpiNNaker: A 1-W 18-core system-on-chip for massively-parallel neural network simulation," IEEE J. Solid-State Circuits, vol. 48, no. 8, 1943--1953

[14]

V. Dmitri and R. Ginosar. "Network-on-chip architectures for neural networks.". Proc of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip, 135-144.

Digital Library

[15]

A. Firuzan, M. Modarressi, and M. Daneshtalab, M. "Reconfigurable communication fabric for efficient implementation of neural networks". in Proc., of IEEE ReCoSoC, 1-8. 2015.

[16]

Y. Chen et al., "DaDianNao: A Machine Learning Supercomputer," Proc. 47th Ann. IEEE/ACM Int'l Symp. Microarchitecture, 609--622, 2014.

Digital Library

[17]

A. Coates et al., "Deep learning with COTS HPC systems", Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013.

[18]

A. Bakhoda, J. Kim, and T.M. Aamodt, "Throughput-Effective On-Chip Networks for Manycore Accelerators," Proc. of 46th Int'l Symp. Microarchitecture, 457--467, 2013.

Digital Library

[19]

H. Jang et al., "Bandwidth-efficient on-chip interconnect designs for GPGPUs" Design Automation Conference (DAC), 2015 52^nd ACM/EDAC/IEEE,San Francisco,CA.1-6.

Digital Library

[20]

A. Ziabari et al., "Asymmetric NoC Architectures for GPU Systems" Proc. Of the 9^th International Symposium on Network-on-Chip. Article No. 25, 2015.

Digital Library

[21]

J. Lee, S. Li, H. Kim, and S. Yalamanchilli, "Design Space Exploration of On-chip Ring Interconnection for a CPU-GPU Heterogeneous Architecture," JPDC, 2013.

Digital Library

[22]

O. Kayiran et al., "Managing GPU concurrency in heterogeneous architectures". Proc. 47th Int'l Symp. Microarchitecture, 1--13, 2014.

Digital Library

[23]

J. Lee, et al. "Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures." ACM Transactions on Design Automation of Electronic Systems (TODAES) 18.4 (2013): 48.

Digital Library

[24]

J-J. Lin et al., (2007, August). "Communication Using Antennas Fabricated in Silicon Integrated Circuits," IEEE Journal of Solid-State Circuits, 42(8), pp.1678-1687.

[25]

Y. P. Zhang, Z. M. Chen, and M. Sun, (2007, October). "Propagation Mechanisms of Radio Waves Over Intra-Chip Channels with Integrated Antennas: Frequency-Domain Measurements and Time-Domain Analysis," Transactions on Antennas and Propagation, 55(10), pp.2900-2906.

[26]

J. Branch, et al., (2005, April). "Wireless communication in a flip-chip package using integrated antennas on silicon substrates," Electron Device Letters, 26(2), pp.115-117.

[27]

W. Bogaerts, M. Fiers, P. Dumon, "Design Challenges in Silicon Photonics," IEEE Journal of Selected Topics in Quantum Electronics, vol.20, no.4, 1-8, 2014.

[28]

A. Karkar, T. Mak, K. F. Tong, and A. Yakovlev, "A Survey of Emerging Interconnects for On-Chip Efficient Multicast and Broadcast in Many-Cores". IEEE Circuits and Systems Magazine, vol. 16, no. 1, 58-72, 2016.

[29]

A. Baroon. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, Vol. 39, no.3, 930--945, 1993.

Digital Library

[30]

S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb, "A simulated annealing-based multi-objective optimization algorithm: AMOSA," IEEE Transactions on Evolutionary Computation, vol. 12, no. 3, 269--283, 2008.

Digital Library

[31]

O. Lysne, T. Skeie, S.-A. Reinemo and I. Theiss, "Layered routing in irregular networks", IEEE Trans. On Parallel Distributed Systems, 2006, 17(1), 1 -65.

Digital Library

[32]

K. Duraisamy, R. Kim, P. Pande, "Enhancing Performance of Wireless NoCs with Distributed MAC Protocols", in Proc., of ISQED, 2015, 406 -- 411.

[33]

J. Power, J. Hestness, M. Orr, M. Hill, and D. Wood, "gem5-gpu: A Heterogeneous CPU-GPU Simulator," Computer Architecture Letters, vol. 13, no. 1, 2014.

[34]

N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha, "GARNET: A Detailed On-chip Network Model inside a Full-system Simulator", In Proceedings of International Symposium on Performance Analysis of Systems and Software, Apr. 2009

[35]

J. Leng et al., "GPUWattch: enabling energy optimizations in GPGPUs," in International Symposium on Computer Architecture, 487--498, 2013.

Digital Library

Cited By

Biglari SHosseini FUpadhyay AZhao H(2024)Survey of Network-on-Chip (NoC) for Heterogeneous Multicore Systems2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC64144.2024.00036(155-162)Online publication date: 16-Dec-2024
https://doi.org/10.1109/MCSoC64144.2024.00036
Shen YNiknam SPathania APimentel A(2023)Thermal Management for S-NUCA Many-Cores via Synchronous Thread Rotations2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136895(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10136895
Iqbal SN. Qureshi ALi JMahmood T(2023)On the Analyses of Medical Images Using Traditional Machine Learning Techniques and Convolutional Neural NetworksArchives of Computational Methods in Engineering10.1007/s11831-023-09899-930:5(3173-3233)Online publication date: 4-Apr-2023
https://doi.org/10.1007/s11831-023-09899-9
Show More Cited By

Index Terms

Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms

Recommendations

3D NoC-Enabled Heterogeneous Manycore Architectures for Accelerating CNN Training: Performance and Thermal Trade-offs
NOCS '17: Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip

As deep learning technology is increasingly employed in diverse applications domains, the demand for computational power to enable these algorithms also increases. In this respect, high-performance three-dimensional (3D) heterogeneous manycore systems ...
GPGPU-Accelerated Parallel and Fast Simulation of Thousand-Core Platforms
CCGRID '11: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

The multicore revolution and the ever-increasing complexity of computing systems is dramatically changing sys-tem design, analysis and programming of computing platforms. Future architectures will feature hundreds to thousands of simple processors and ...
Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms

The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

October 2016

187 pages

ISBN:9781450344821

DOI:10.1145/2968455

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF Office of the Director

Conference

ESWEEK'16

ESWEEK'16: TWELFTH EMBEDDED SYSTEM WEEK

October 1 - 7, 2016

Pennsylvania, Pittsburgh

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
1,146
Total Downloads

Downloads (Last 12 months)156
Downloads (Last 6 weeks)21

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Biglari SHosseini FUpadhyay AZhao H(2024)Survey of Network-on-Chip (NoC) for Heterogeneous Multicore Systems2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC64144.2024.00036(155-162)Online publication date: 16-Dec-2024
https://doi.org/10.1109/MCSoC64144.2024.00036
Shen YNiknam SPathania APimentel A(2023)Thermal Management for S-NUCA Many-Cores via Synchronous Thread Rotations2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136895(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10136895
Iqbal SN. Qureshi ALi JMahmood T(2023)On the Analyses of Medical Images Using Traditional Machine Learning Techniques and Convolutional Neural NetworksArchives of Computational Methods in Engineering10.1007/s11831-023-09899-930:5(3173-3233)Online publication date: 4-Apr-2023
https://doi.org/10.1007/s11831-023-09899-9
Sudhakar MReddy PPenchalaiah UReddy P(2023)A deep learning based latency aware predictive routing model for network‐on‐chip architecturesInternational Journal of Communication Systems10.1002/dac.560236:17Online publication date: 7-Sep-2023
https://doi.org/10.1002/dac.5602
Dalzotto Ada Silva Borges CRuaro MMoraes F(2022)Non-intrusive Monitoring Framework for NoC-based Many-Cores2022 XII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC56799.2022.9965177(1-7)Online publication date: 21-Nov-2022
https://doi.org/10.1109/SBESC56799.2022.9965177
Abdou M(2022)Literature review: efficient deep neural networks techniques for medical image analysisNeural Computing and Applications10.1007/s00521-022-06960-934:8(5791-5812)Online publication date: 1-Feb-2022
https://doi.org/10.1007/s00521-022-06960-9
Bhat GMandal SManchukonda SVadlamudi SAgarwal AWang JOgras U(2021)Per-Core Power Modeling for Heterogenous SoCsElectronics10.3390/electronics1019242810:19(2428)Online publication date: 7-Oct-2021
https://doi.org/10.3390/electronics10192428
Kim YChoi SHan T(2021)Rapid Topology Generation and Core Mapping of Optical Network-on-Chip for Heterogeneous Computing PlatformIEEE Access10.1109/ACCESS.2021.31022709(110359-110370)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3102270
Belman APhoha V(2020)Discriminative Power of Typing Features on Desktops, Tablets, and Phones for User IdentificationACM Transactions on Privacy and Security10.1145/337740423:1(1-36)Online publication date: 5-Feb-2020
https://dl.acm.org/doi/10.1145/3377404
Liu LHuan YZou ZHu XZheng L(2020)An Autonomous Error-Tolerant Architecture Featuring Self-reparation for Convolutional Neural Networks2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring)10.1109/VTC2020-Spring48590.2020.9128611(1-6)Online publication date: May-2020
https://doi.org/10.1109/VTC2020-Spring48590.2020.9128611
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten