Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3291656.3291675acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

TriCore: parallel triangle counting on GPUs

Published: 11 November 2018 Publication History

Abstract

Exact triangle counting algorithm enumerates the triangles in a graph by identifying the common neighbors of two vertices of each edge. In this work, we present TriCore, a scalable GPU-based triangle counting system that consists of three major techniques. First, we design a binary search based algorithm that can increase both the thread parallelism and memory performance on Graphics Processing Units (GPUs), both of which are absent from prior work. Second, in contrast to prior attempts which require multiple graph representations, i.e., compressed sparse row (CSR), edge list, and bitmap, to be present in the GPU memory, TriCore evenly partitions and distributes the partitioned CSR data across all the GPUs, and uses a streaming buffer to load the edge list from the CPU memory on the fly. This design enables TriCore to process the graphs that are orders of magnitude larger than the GPU memory. Third, we further develop a dynamic workload management technique to balance the workload across GPUs. our evaluation demonstrates that TriCore on a single GPU can count the triangles in the billion-edge Twitter graph within 24 seconds, that is, 22X faster than the state-of-the-art CPU project which uses CPUs that are 8X more expensive. When processing big graphs (up to 33.4 billion edges) that are ~22X larger than the memory size of a single GPU, it achieves 24X speedup when scaling from 1 to 32 GPUs.

References

[1]
DARPA HIVE GraphChallenge, https://graphchallenge.mit.edu/darpa-hive.
[2]
Graph Challenge Datasets, http://graphchallenge.mit.edu/data-sets.
[3]
Intel Xeon E5 2683 v3 Processor, https://ark.intel.com/products/81055/Intel-Xeon-Processor-E5-2683-v3-35M-Cache.
[4]
Kronecker: Graph 500 Generator, https://graph500.org/?page_id=12#sec-3.
[5]
NVIDIA TESLA V100 GPU ACCELERATOR, http://www.nvidia.com/content/pdf/volta-datasheet.pdf.
[6]
N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 1997.
[7]
S. Arifuzzaman, M. Khan, and M. Marathe. Patric: A parallel algorithm for counting triangles in massive networks. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, 2013.
[8]
S. Arifuzzaman, M. Khan, and M. Marathe. A fast parallel algorithm for counting triangles in graphs using dynamic load balancing. In Big Data. IEEE, 2015.
[9]
Ariful Azad, Aydin Buluç, and John Gilbert. Parallel triangle counting and enumeration using matrix algebra. In Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pages 804--811. IEEE, 2015.
[10]
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In SIGKDD. ACM, 2008.
[11]
Maciej Besta, Michal Podstawski, Linus Groner, Edgar Solomonik, and Torsten Hoefler. To push or to pull: On reducing communication and synchronization in graph computations. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, pages 93--104. ACM, 2017.
[12]
Mauro Bisson and Massimiliano Fatica. Static graph challenge on gpu. In High Performance Extreme Computing Conference (HPEC), 2017 IEEE, pages 1--8. IEEE, 2017.
[13]
Mauro Bisson and Massimilliano Fatica. High performance exact triangle counting on gpus. IEEE Transactions on Parallel and Distributed Systems, 2017.
[14]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar, editors, Proceedings of the 20th international conference on World Wide Web, pages 587--596. ACM Press, 2011.
[15]
Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595--601, Manhattan, USA, 2004. ACM Press.
[16]
A. Buluç and K. Madduri. Parallel breadth-first search on distributed memory systems. In SC. ACM, 2011.
[17]
Aydin Buluç, Henning Meyerhenke, Ilya Safro, Peter Sanders, and Christian Schulz. Recent advances in graph partitioning. In Algorithm Engineering, pages 117--158. Springer, 2016.
[18]
R. Burt. Structural holes and good ideas1. American journal of sociology, 2004.
[19]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, volume 4, 2004.
[20]
J. Coleman. Social capital in the creation of human capital. American journal of sociology, 1988.
[21]
Disa Mhembere Da Zheng, Randal Burns, Joshua Vogelstein, Carey E Priebe, and Alexander S Szalay. Flashgraph: Processing billion-node graphs on an array of commodity ssds. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, pages 45--58, 2015.
[22]
J. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the national academy of sciences, 2002.
[23]
E. Elenberg, K. Shanmugam, M. Borokhovich, and A. Dimakis. Beyond triangles: A distributed framework for estimating 3-profiles of large graphs. In SIGKDD. ACM, 2015.
[24]
I. Giechaskiel, G. Panagopoulos, and E. Yoneki. Pdtl: Parallel and distributed triangle listing for massive graphs. In ICPP. IEEE, 2015.
[25]
Minas Gjoka, Maciej Kurant, Carter T Butts, and Athina Markopoulou. Practical recommendations on crawling online social networks. IEEE Journal on Selected Areas in Communications, 29(9):1872--1892, 2011.
[26]
O. Green, R. McColl, and D. Bader. Gpu merge path: a gpu merging algorithm. In Proceedings of the 26th ICS, 2012.
[27]
O. Green, P. Yalamanchili, and L. Munguía. Fast triangle counting on the gpu. In Proceedings of the Fourth Workshop on Irregular Applications: Architectures and Algorithms, 2014.
[28]
S. Hong, S. Kim, T. Oguntebi, and K. Olukotun. Accelerating cuda graph algorithms at maximum warp. In ACM SIGPLAN Notices, 2011.
[29]
Yang Hu, Pradeep Kumar, Guy Swope, and H. Howie Huang. Trix: Triangle counting at extreme scale. Technical report, Department of Electrical and Computer Engineering, The George Washington University, 2017.
[30]
Edward Kao, Vijay Gadepally, Michael Hurley, Michael Jones, Jeremy Kepner, Sanjeev Mohindra, Paul Monticciolo, Albert Reuther, Siddharth Samsi, William Song, et al. Streaming graph challenge: Stochastic block partition. In High Performance Extreme Computing Conference (HPEC), 2017 IEEE, pages 1--12. IEEE, 2017.
[31]
Pradeep Kumar and H Howie Huang. G-store: high-performance graph store for trillion-edge processing. In High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for, pages 830--841. IEEE, 2016.
[32]
J. Kunegis. Konect: the koblenz network collection. In International conference on World Wide Web companion. International World Wide Web Conferences Steering Committee, 2013.
[33]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, pages 591--600. ACM, 2010.
[34]
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In OSDI, 2012.
[35]
M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science, 2008.
[36]
H. Liu and H. Huang. Enterprise: breadth-first graph traversal on gpus. In SC, 2015.
[37]
Hang Liu and H Howie Huang. Graphene: Fine-grained io management for graph computing. In FAST, pages 285--300, 2017.
[38]
Hang Liu, H Howie Huang, and Yang Hu. ibfs: Concurrent breadth-first search on gpus. In Proceedings of the 2016 International Conference on Management of Data, pages 403--416. ACM, 2016.
[39]
K. Madduri and D. Bader. Gtgraph: A suite of synthetic random graph generators, 2012.
[40]
Duane Merrill and Michael Garland. Merge-based parallel sparse matrix-vector multiplication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 58. IEEE Press, 2016.
[41]
Duane Merrill, Michael Garland, and Andrew Grimshaw. Scalable gpu graph traversal. In ACM SIGPLAN Notices, volume 47, pages 117--128. ACM, 2012.
[42]
Alan Mislove, Massimiliano Marcon, Krishna P Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 29--42. ACM, 2007.
[43]
CUDA Nvidia. Programming guide, 2008.
[44]
R. Pagh and C. Tsourakakis. Colorful triangle counting and a mapreduce implementation. Information Processing Letters, 2012.
[45]
H. Park and C. Chung. An efficient mapreduce algorithm for counting triangles in a very large graph. In International conference on Conference on information & knowledge management, 2013.
[46]
Roger Pearce. Triangle counting for scale-free graphs at scale in distributed memory. In High Performance Extreme Computing Conference (HPEC), 2017 IEEE, pages 1--4. IEEE, 2017.
[47]
Roger Pearce, Maya Gokhale, and Nancy M Amato. Faster parallel traversal of scale free graphs at extreme scale with vertex delegates. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for, pages 549--559. IEEE, 2014.
[48]
M. Rahman and M. Al Hasan. Approximate triangle counting algorithms on multi-cores. In BigData, 2013.
[49]
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M Tamer Özsu. The ubiquity of large graphs and surprising challenges of graph processing. Proceedings of the VLDB Endowment, 11(4), 2017.
[50]
C Seshadhri, Ali Pinar, and Tamara G Kolda. Wedge sampling for computing clustering coefficients and triangle counts on large graphs. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(4):294--307, 2014.
[51]
J. Shun and K. Tangwongsan. Multicore triangle computations without tuning. In Proceedings of the IEEE ICDE, 2015.
[52]
Marc Snir, Steve Otto, Steven Huss-Lederman, Jack Dongarra, and David Walker. MPI-the Complete Reference: the MPI core, volume 1. MIT press, 1998.
[53]
S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In International conference on World wide web, 2011.
[54]
C. Tsourakakis. Fast counting of triangles in large real networks without counting: Algorithms and laws. In ICDM, 2008.
[55]
C. Tsourakakis, P. Drineas, E. Michelakis, I. Koutis, and C. Faloutsos. Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation. Social Network Analysis and Mining, 2011.
[56]
C. Tsourakakis, U Kang, G. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In SIGKDD. ACM, 2009.
[57]
Chad Voegele, Yi-Shan Lu, Sreepathi Pai, and Keshav Pingali. Parallel triangle counting and k-truss identification using graph-centric methods. In High Performance Extreme Computing Conference (HPEC), 2017 IEEE, pages 1--7. IEEE, 2017.
[58]
Jia Wang and James Cheng. Truss decomposition in massive networks. Proceedings of the VLDB Endowment, 5(9):812--823, 2012.
[59]
Leyuan Wang, Yangzihao Wang, Carl Yang, and John D Owens. A comparative study on exact triangle counting algorithms on the gpu. In Proceedings of the ACM Workshop on High Performance Graph Processing, pages 1--8. ACM, 2016.
[60]
Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. Superneurons: dynamic gpu memory management for training deep neural networks. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 41--53. ACM, 2018.
[61]
W. Wang, Y. Gu, Z. Wang, and G. Yu. Parallel triangle counting over large graphs. In Database Systems for Advanced Applications, 2013.
[62]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. Gunrock: A high-performance graph processing library on the gpu. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, page 11. ACM, 2016.
[63]
D. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. nature, 1998.
[64]
Michael M Wolf, Mehmet Deveci, Jonathan W Berry, Simon D Hammond, and Sivasankaran Rajamanickam. Fast linear algebra-based triangle counting with kokkoskernels. In High Performance Extreme Computing Conference (HPEC), 2017 IEEE, pages 1--7. IEEE, 2017.

Cited By

View all
  • (2019)GRAPHONEProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323322(249-263)Online publication date: 25-Feb-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis
November 2018
932 pages

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 11 November 2018

Check for updates

Qualifiers

  • Research-article

Conference

SC18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)GRAPHONEProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323322(249-263)Online publication date: 25-Feb-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media