Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2968455.2968510acmotherconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article
Public Access

Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms

Published: 01 October 2016 Publication History

Abstract

In recent years, designing specialized manycore heterogeneous architectures for deep learning kernels has become an area of great interest. However, the typical on-chip communication infrastructures employed on conventional manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. Hence, in this paper, our aim is to enhance the performance of heterogeneous manycore architectures through the design of a hybrid NoC consisting of both wireline and wireless links. To this end, we specifically target the resource-intensive backpropagation algorithm commonly used as the training method in deep learning. For backpropagation, the proposed hybrid NoC achieves 1.9X reduction in network latency and improves the network throughput by a factor of 2 with respect to a highly optimized mesh NoC. These network level improvements translate into 25% savings in full system energy-delay-product (EDP). This demonstrates the capability of the proposed hybrid and heterogeneous manycore architecture in accelerating deep learning kernels in an energy-efficient manner.

References

[1]
Y. LeCun, Y. Bengio, and G. Hinton. "Deep learning". Nature 521: 436--444. 2015.
[2]
D. Silver et al. "Mastering the game of Go with deep neural networks and tree search". Nature 529, 484--489. 2016.
[3]
D. Rumelhard, G. Hinton, and R. Willians. "Learning representations by back-propagating errors". Nature 323 (6088): 533--536.
[4]
D. Strigl, K. Kofler, and S. Podlipnig, "Performance and Scalability of GPU-Based Convolutional Neural Networks," Proc. Euromicro Int'l Conf. Parallel, Distributed and Network-Based Processing, IEEE, 317-324, 2010.
[5]
S. Che et al, "Rodinia: A benchmark suite for heterogeneous computing," in Proc. IEEE Int. Symp. Workload Characterization, 44--54, 2009.
[6]
J. Power et al. "Heterogeneous system coherence for integrated CPU-GPU systems." In Proc. of the 46th Int'l Symp. on Microarchitecture, 2013. 457--467.
[7]
M.J. Schulte et al, "Achieving Exascale Capabilities through Heterogeneous Computing", IEEE Micro, vol. 35, no.4, 26-36, Aug, 2015.
[8]
J. Hestness, S.W. Keckler, D.A. Wood. "GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors". IISWC: 87-97, 2015.
[9]
U. Y. Ogras and R. Marculescu, ' "It's a small world after all": NoC Performance Optimization via Long-range Link Insertion, ' in IEEE Trans. on Very Large Scale Integration Systems, Vol.14, No. 7, 2006.
[10]
P. Wettin et al., "Design Space Exploration for wireless NoCs Incorporating Irregular Network Routing", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 33, Issue 11, 1732-1745, 2014.
[11]
S. Deb et al., "Wireless NoC as Interconnection Backbone for Multicore Chip: Promises and Challenges", IEEE Journal on Emerging and Selected Topics in Circuits and Systems, Vol. 2, No. 2, 228-239, 2012.
[12]
S. Deb et al., (2013, December). "Design of an energy efficient CMOS-compatible NoC architecture with millimeter-wave wireless interconnects," IEEE Transactions on Computers, 62(12), pp.2382-2396.
[13]
E. Painkras et al., "SpiNNaker: A 1-W 18-core system-on-chip for massively-parallel neural network simulation," IEEE J. Solid-State Circuits, vol. 48, no. 8, 1943--1953
[14]
V. Dmitri and R. Ginosar. "Network-on-chip architectures for neural networks.". Proc of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip, 135-144.
[15]
A. Firuzan, M. Modarressi, and M. Daneshtalab, M. "Reconfigurable communication fabric for efficient implementation of neural networks". in Proc., of IEEE ReCoSoC, 1-8. 2015.
[16]
Y. Chen et al., "DaDianNao: A Machine Learning Supercomputer," Proc. 47th Ann. IEEE/ACM Int'l Symp. Microarchitecture, 609--622, 2014.
[17]
A. Coates et al., "Deep learning with COTS HPC systems", Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013.
[18]
A. Bakhoda, J. Kim, and T.M. Aamodt, "Throughput-Effective On-Chip Networks for Manycore Accelerators," Proc. of 46th Int'l Symp. Microarchitecture, 457--467, 2013.
[19]
H. Jang et al., "Bandwidth-efficient on-chip interconnect designs for GPGPUs" Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE,San Francisco,CA.1-6.
[20]
A. Ziabari et al., "Asymmetric NoC Architectures for GPU Systems" Proc. Of the 9th International Symposium on Network-on-Chip. Article No. 25, 2015.
[21]
J. Lee, S. Li, H. Kim, and S. Yalamanchilli, "Design Space Exploration of On-chip Ring Interconnection for a CPU-GPU Heterogeneous Architecture," JPDC, 2013.
[22]
O. Kayiran et al., "Managing GPU concurrency in heterogeneous architectures". Proc. 47th Int'l Symp. Microarchitecture, 1--13, 2014.
[23]
J. Lee, et al. "Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures." ACM Transactions on Design Automation of Electronic Systems (TODAES) 18.4 (2013): 48.
[24]
J-J. Lin et al., (2007, August). "Communication Using Antennas Fabricated in Silicon Integrated Circuits," IEEE Journal of Solid-State Circuits, 42(8), pp.1678-1687.
[25]
Y. P. Zhang, Z. M. Chen, and M. Sun, (2007, October). "Propagation Mechanisms of Radio Waves Over Intra-Chip Channels with Integrated Antennas: Frequency-Domain Measurements and Time-Domain Analysis," Transactions on Antennas and Propagation, 55(10), pp.2900-2906.
[26]
J. Branch, et al., (2005, April). "Wireless communication in a flip-chip package using integrated antennas on silicon substrates," Electron Device Letters, 26(2), pp.115-117.
[27]
W. Bogaerts, M. Fiers, P. Dumon, "Design Challenges in Silicon Photonics," IEEE Journal of Selected Topics in Quantum Electronics, vol.20, no.4, 1-8, 2014.
[28]
A. Karkar, T. Mak, K. F. Tong, and A. Yakovlev, "A Survey of Emerging Interconnects for On-Chip Efficient Multicast and Broadcast in Many-Cores". IEEE Circuits and Systems Magazine, vol. 16, no. 1, 58-72, 2016.
[29]
A. Baroon. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, Vol. 39, no.3, 930--945, 1993.
[30]
S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb, "A simulated annealing-based multi-objective optimization algorithm: AMOSA," IEEE Transactions on Evolutionary Computation, vol. 12, no. 3, 269--283, 2008.
[31]
O. Lysne, T. Skeie, S.-A. Reinemo and I. Theiss, "Layered routing in irregular networks", IEEE Trans. On Parallel Distributed Systems, 2006, 17(1), 1 -65.
[32]
K. Duraisamy, R. Kim, P. Pande, "Enhancing Performance of Wireless NoCs with Distributed MAC Protocols", in Proc., of ISQED, 2015, 406 -- 411.
[33]
J. Power, J. Hestness, M. Orr, M. Hill, and D. Wood, "gem5-gpu: A Heterogeneous CPU-GPU Simulator," Computer Architecture Letters, vol. 13, no. 1, 2014.
[34]
N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha, "GARNET: A Detailed On-chip Network Model inside a Full-system Simulator", In Proceedings of International Symposium on Performance Analysis of Systems and Software, Apr. 2009
[35]
J. Leng et al., "GPUWattch: enabling energy optimizations in GPGPUs," in International Symposium on Computer Architecture, 487--498, 2013.

Cited By

View all
  • (2024)Survey of Network-on-Chip (NoC) for Heterogeneous Multicore Systems2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC64144.2024.00036(155-162)Online publication date: 16-Dec-2024
  • (2023)Thermal Management for S-NUCA Many-Cores via Synchronous Thread Rotations2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136895(1-6)Online publication date: Apr-2023
  • (2023)On the Analyses of Medical Images Using Traditional Machine Learning Techniques and Convolutional Neural NetworksArchives of Computational Methods in Engineering10.1007/s11831-023-09899-930:5(3173-3233)Online publication date: 4-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems
October 2016
187 pages
ISBN:9781450344821
DOI:10.1145/2968455
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NoC
  2. backpropagation
  3. deep learning
  4. heterogeneous
  5. manycore

Qualifiers

  • Research-article

Funding Sources

Conference

ESWEEK'16
ESWEEK'16: TWELFTH EMBEDDED SYSTEM WEEK
October 1 - 7, 2016
Pennsylvania, Pittsburgh

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)156
  • Downloads (Last 6 weeks)21
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Survey of Network-on-Chip (NoC) for Heterogeneous Multicore Systems2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC64144.2024.00036(155-162)Online publication date: 16-Dec-2024
  • (2023)Thermal Management for S-NUCA Many-Cores via Synchronous Thread Rotations2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136895(1-6)Online publication date: Apr-2023
  • (2023)On the Analyses of Medical Images Using Traditional Machine Learning Techniques and Convolutional Neural NetworksArchives of Computational Methods in Engineering10.1007/s11831-023-09899-930:5(3173-3233)Online publication date: 4-Apr-2023
  • (2023)A deep learning based latency aware predictive routing model for network‐on‐chip architecturesInternational Journal of Communication Systems10.1002/dac.560236:17Online publication date: 7-Sep-2023
  • (2022)Non-intrusive Monitoring Framework for NoC-based Many-Cores2022 XII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC56799.2022.9965177(1-7)Online publication date: 21-Nov-2022
  • (2022)Literature review: efficient deep neural networks techniques for medical image analysisNeural Computing and Applications10.1007/s00521-022-06960-934:8(5791-5812)Online publication date: 1-Feb-2022
  • (2021)Per-Core Power Modeling for Heterogenous SoCsElectronics10.3390/electronics1019242810:19(2428)Online publication date: 7-Oct-2021
  • (2021)Rapid Topology Generation and Core Mapping of Optical Network-on-Chip for Heterogeneous Computing PlatformIEEE Access10.1109/ACCESS.2021.31022709(110359-110370)Online publication date: 2021
  • (2020)Discriminative Power of Typing Features on Desktops, Tablets, and Phones for User IdentificationACM Transactions on Privacy and Security10.1145/337740423:1(1-36)Online publication date: 5-Feb-2020
  • (2020)An Autonomous Error-Tolerant Architecture Featuring Self-reparation for Convolutional Neural Networks2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring)10.1109/VTC2020-Spring48590.2020.9128611(1-6)Online publication date: May-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media