Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics

Published: 01 September 2016 Publication History

Abstract

With its applicability spanning numerous data-driven fields, the implementation of graph analytics on multicore platforms is gaining momentum. One of the most important components of a multicore chip is its communication backbone. Due to inherent irregularities in data movements manifested by graph-based applications, it is essential to design efficient on-chip interconnection architectures for multicore chips performing graph analytics. In this article, we present a detailed analysis of the traffic patterns generated by graph-based applications when mapped to multicore chips. Based on this analysis, we explore the design-space for the Network-on-Chip (NoC) architecture to enable an efficient implementation of graph analytics. We principally consider three types of NoC architectures, viz., traditional mesh, small-world, and high-radix networks. We demonstrate that the small-world-network-enabled wireless NoC (WiNoC) is the most suitable platform for executing the considered graph applications. The WiNoC achieves an average of 38% and 18% full-system Energy Delay Product savings compared to wireline-mesh and high-radix NoCs, respectively.

References

[1]
N. Abeyratne, R. Das, Q. Li, K. Sewell, B. Giridhar, R. G. Dreslinski, D. Blaauw, and T. Mudge. 2013. Scaling toward kilo-core processors with asymmetric high-radix topologies. In Proceedings of 19th International Symposium on High Performance Computer Architecture (HPCA2013). 496--507.
[2]
D. A. Bader, G. Cong, and John Feo. 2005. On the architectural requirements for efficient execution of graph algorithms. In Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005). 547--556.
[3]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Said, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The GEM5 simulator. ACM SIGARCH Computer Architecture News 39, 2, 1--7.
[4]
H. L. Bodlaendera and F. V. Fominb. 2005. Equitable colorings of bounded treewidth graphs. Theoretical Computer Science 349, 1, 22--30.
[5]
W. Bogaerts, M. Fiers, and P. Dumon. 2014. Design challenges in silicon photonics. IEEE Journal of Selected Topics in Quantum Electronics 20, 4, 1,8 (July-Aug. 2014).
[6]
J. Branch, X. Guo, A. Sugavanam, J. J. Lin, and K. K. O. 2005. Wireless communication in a flip-chip package using integrated antennas on silicon substrates. IEEE Electronic Device Letters 26, 2, 115--117.
[7]
M. Castro, E. Francesquini, T. M. Nguélé, and J. F. Méhaut. 2013. Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application. In Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms.
[8]
Umit V. Çatalyürek, J. Feo, A. H. Gebremedhin, M. Halappanavar, and A. Pothen. 2012. Graph coloring algorithms for multi-core and massively multithreaded architectures. Parallel Computing 38, 10--11 (October 2012), 576--594.
[9]
D. Chavarría-Miranda, M. Halappanavar, and A. Kalyanaraman. 2014. Scaling graph community detection on the tilera manycore architecture. In Proceedings of HiPC 2014, Goa, India, 2014.
[10]
D. Chen, N. Eisley, P. Heidelberger, S. Kumar, A. Mamidala, F. Petrini, R. Senger, Y. Sugawara, R. Walkup, B. Steinmacher-Burow, and A. Choudhury. 2012. Looking under the hood of the IBM blue gene/Q network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--12.
[11]
W. J. Dally and C. L. Seitz. 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Computer C-36, 5 (May 1987), 547--553.
[12]
B. D. De Dinechin, D. Van Amstel, M. Poulhies, and G. Lager. 2014.Time-critical computing on a single-chip massively parallel processor. In Proceedings of IEEE DATE. 1--6.
[13]
S. Deb, A. Ganguly, P. P. Pande, B. Belzer, D. Heo. 2012. Wireless NoC as interconnection backbone for multicore chips: Promises and challenges. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, 2, 228--239.
[14]
S. Deb, K. Chang, Yu Xinmin, S. P. Sah, M. Cosic, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo. 2013. Design of an energy efficient CMOS compatible NoC architecture with millimeter-wave wireless interconnects. IEEE Transactions on Computers 62, 12, 2382--2396.
[15]
DIMACS10. 2016. The 10th DIMACS implementation challenge -- Graph partioning and clustering. URL: http://www.cc.gatech.edu/dimacs10/ (Last date accessed: May 2016).
[16]
K. Duraisamy, R. G. Kim, and P. P. Pande. 2015. Enhancing performance of wireless NoCs with distributed MAC protocols. In Proceeedings of ISQED. 2015.
[17]
D. Ediger. 2013. Analyzing Hybrid Architectures for Massively Parallel Graph Analysis. Ph.D. Dissertation, Georgia Institute of Technology, Atlanta, Ga., (May 2013).
[18]
S. Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3, 75--174.
[19]
E. Francesquini, M. Castro, P. H. Penna, F. Dupros, H. C. Freitas, P. O. Navaux, and J. F. Méhaut. 2015. On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms. Journal of Parallel and Distributed Computing 76, 32--48.
[20]
M. Frasca, K. Madduri, and P. Raghavan. 2012. NUMA-aware graph mining techniques for performance and energy efficiency. In Proceedings of IEEE International Conference on High Performance Computing. Networking, Storage and Analysis (SC), 1--11.
[21]
H. Furmanczyk. 2004. Equitable coloring of graphs. In Graph Colorings, M. Kubale (Ed.). Contemporary Mathematics, Vol. 352. American Mathematical Society, Providence, Rhode Island, 35--53.
[22]
L. Gwennup. 2011. Adapteva: More flops, less watts: Epiphany offers floating-point accelerator for mobile processors. Microprocess. Rep 2, 1--5.
[23]
T. R. Jensen and B. Toft. 1995. Graph Coloring Problems. Wiley Series in Discrete Mathematics and Optimization, Wiley Interscience, New York.
[24]
M. T. Jones and P. E. Plassmann. 1993. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing 14, 3, 654--669.
[25]
J. Kim, J. Balfour, and W. J. Dally. 2007. Flattened Butter-Fly: A cost-efficient topology for high-radix networks. IEEE Computer Architecture Letters 6, 2, 37--40.
[26]
T. Krishna, A. Kumar, P. Chiang, M. Erez, and L. Peh. 2008. NoC with near-ideal express virtual channels using global-line communication. In Proceedings of the 16th IEEE Symposium on High Performance Interconnects (HOTI’08). 11--20, 26--28.
[27]
T. Krishna, C. O. Chen, S. Park, W. C. Kwon, S. Subramanian, A. P. Chandrakasan, and L. Peh. 2013. Single-cycle multihop asynchronous repeated traversal: A smart future for reconfigurable on-chip networks. IEEE Computer 10, 48--55.
[28]
T. Krishna, C. O. Chen, W. C. Kwon, and L. Peh. 2014. Smart:single-cycle multihop traversals over a shared network on chip. IEEE Micro 34, 3, 43--56.
[29]
A. Kumar, L. Peh, P. Kundu, and N. K. Jha. 2008. Toward ideal on-chip communication using express virtual channels. IEEE Micro 28, 1, 80--90.
[30]
F. T. Leighton. 1979. A graph coloring algorithm for large scheduling problems. Journal of Research of the National Bureau of Standards 84, 6, 489--506.
[31]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT:An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480.
[32]
J. J. Lin, H. Wu, Y. Su, L. Gao, A. Sugavanam, J. E. Brewer, and K. K. O. 2007. Communication using antennas fabricated in silicon integrated circuits. IEEE Journal of Solid-State Circuits 42, 8, 1678--1687.
[33]
H. Lu, M. Halappanavar, and A. Kalyanaraman. 2015b. Parallel heuristics for scalable community detection. Parallel Computing 47, 19--37.
[34]
H. Lu, M. Halappanavar, D. Chavarria-Miranda, A. Gebremedhin, and A. Kalyanaraman. 2015a. Balanced coloring for parallel computing applications, In Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 25--29, Hyderabad, India.
[35]
R. Marculescu, U. Y. Ogras, Peh Li-Shiuan, N. E. Jerger, and Y. Hoskote. 2009. Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 1, 3--21.
[36]
M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23, 8577--8582.
[37]
U. Y. Ogras and R. Marculescu. 2006. It's a small world after all: NoC performance optimization via long-range link insertion. IEEE Trans. Very Large Scale Integration Systems. 14, 7, 693--706.
[38]
P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh. 2005. Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers. 54, 8, 1025--1040.
[39]
T. Petermann and P. De Los Rios. 2005. Spatial small-world networks: A wiring cost perspective. arXiv: Condmat/0501420v2.
[40]
J. E. Riedy, H. Meyerhenke, D. Ediger, and D. A. Bader. 2012. Parallel community detection for massive graphs. In Parallel Processing and Applied Mathematics. Springer, Berlin, 286--296.
[41]
Y. Saad. 2003. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA.
[42]
E. E. Schadt, M. Linderman, J. Sorenson, L. Lee, and G. P. Nolan. 2010. Computational solutions to large-scale data management and analysis. Journal of Nature Reviews Genetics 11, 9, 647--657.
[43]
E. Seok and K. K. O. 2005. Design rules for improving predictability of on-chip antenna characteristics in the presence of other metal structures. In Proceedings of IEEE International Interconnect Technology Conference. 6--8, 120--122.
[44]
K. Sewell, R. G. Dreslinski, T. Manville, S. Satpathy, N. Pinckney, G. Blake, M. Cieslak, R. Das, T. F. Wenisch, D. Sylvester, D. Blaauw, and T. Mudge. 2012. Swizzle-switch networks for many-core systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, 2, 278--294.
[45]
C. L. Staudt and M. Meyerhenke. 2013. Engineering high-performance community detection heuristics for massive graphs. In Proceedings of 42nd International Conference on Parallel Processing (ICPP). 180--189.
[46]
Tilera Corporation. 2015. TILE-Gx72 Processor Product Brief. http://www.tilera.com/files/drim__TILE-Gx8072_PB041-04_WEB_7683.pdf (Last Accessed: May. 2016).
[47]
D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Letters to Nature. 393.6684, 440--442.
[48]
P. Wettin, R. Kim, J. Murray, Yu Xinmin, P. P. Pande, A. Ganguly, and D. Heoamlan. 2014. Design-space exploration for wireless NoCs incorporating irregular network routing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33, 11, (Nov. 2014), 1732--1745.
[49]
B. Wu, Y. Dong, Q. Ke, and Y. Cai. 2011. A parallel computing model for large graph mining with MapReduce. In Proceedings of 7th International Conference on Natural Computation (ICNC). 43--47.
[50]
Y. P. Zhang, Z. M. Chen, and M. Sun. 2007. Propagation mechanisms of radio waves over intra-chip channels with integrated antennas: Frequency-domain measurements and time-domain analysis. IEEE Transactions on Antennas and Propagation 55, 10, 2900--2906.

Cited By

View all
  • (2024)Electromagnetic Nanonetworks Beyond 6G: From Wearable and Implantable Networks to On-Chip and Quantum CommunicationIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.339925342:8(2122-2142)Online publication date: 1-Aug-2024
  • (2023)Adaptive distribution of control messages for improving bandwidth utilization in multiple NoCThe Journal of Supercomputing10.1007/s11227-023-05208-079:15(17208-17246)Online publication date: 1-Oct-2023
  • (2022)Software/Hardware Co-design of 3D NoC-based GPU Architectures for Accelerated Graph ComputationsACM Transactions on Design Automation of Electronic Systems10.1145/3514354Online publication date: 4-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 15, Issue 4
Special Issue on ESWEEK2015 and Regular Papers
August 2016
411 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2982215
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 01 September 2016
Accepted: 01 May 2016
Revised: 01 May 2016
Received: 01 December 2015
Published in TECS Volume 15, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph analytics
  2. community detection
  3. graph coloring
  4. wireless NoCs

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Army Research Office
  • US DOE
  • US National Science Foundation (NSF)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)99
  • Downloads (Last 6 weeks)10
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Electromagnetic Nanonetworks Beyond 6G: From Wearable and Implantable Networks to On-Chip and Quantum CommunicationIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.339925342:8(2122-2142)Online publication date: 1-Aug-2024
  • (2023)Adaptive distribution of control messages for improving bandwidth utilization in multiple NoCThe Journal of Supercomputing10.1007/s11227-023-05208-079:15(17208-17246)Online publication date: 1-Oct-2023
  • (2022)Software/Hardware Co-design of 3D NoC-based GPU Architectures for Accelerated Graph ComputationsACM Transactions on Design Automation of Electronic Systems10.1145/3514354Online publication date: 4-Apr-2022
  • (2022)High-Performance and Energy-Efficient 3D Manycore GPU Architecture for Accelerating Graph AnalyticsACM Journal on Emerging Technologies in Computing Systems10.1145/348288018:1(1-19)Online publication date: 31-Jan-2022
  • (2021)WiDir: A Wireless-Enabled Directory Cache Coherence Protocol2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00034(304-317)Online publication date: Feb-2021
  • (2020)Learning-Enabled NoC Design for Heterogeneous Manycore Systems2020 21st International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED48828.2020.9137000(268-272)Online publication date: Mar-2020
  • (2020)A high‐performance FPGA‐based multicrossbar prioritized network‐on‐chipConcurrency and Computation: Practice and Experience10.1002/cpe.605533:6Online publication date: 12-Oct-2020
  • (2019)A Brief Survey of Algorithms, Architectures, and Challenges toward Extreme-scale Graph Analytics2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715024(1307-1312)Online publication date: Mar-2019
  • (2019)ReplicaProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304033(849-863)Online publication date: 4-Apr-2019
  • (2018)C-GraphProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225136(1-10)Online publication date: 13-Aug-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media