Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
abstract
Public Access

A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing

Published: 05 June 2017 Publication History

Abstract

Graph edge partition models have recently become an appealing alternative to graph vertex partition models for distributed computing due to both their flexibility in balancing loads and their performance in reducing communication cost.
In this paper, we propose a simple yet effective graph edge partitioning algorithm. In practice, our algorithm provides good partition quality while maintaining low partition overhead. It also outperforms similar state-of-the-art edge partition approaches, especially for power-law graphs. In theory, previous work showed that an approximation guarantee of O(dmax√(log n log k)) apply to the graphs with m=Ω(k2) edges (n is the number of vertices, and k is the number of partitions). We further rigorously proved that this approximation guarantee hold for all graphs.
We also demonstrate the applicability of the proposed edge partition algorithm in real parallel computing systems. We draw our example from GPU program locality enhancement and demonstrate that the graph edge partition model does not only apply to distributed computing with many computer nodes, but also to parallel computing in a single computer node with a many-core processor.

References

[1]
Fusion of parallel array operations. pages 71--85, 2016.
[2]
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. science, 286(5439):509--512, 1999.
[3]
N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, Dec. 2008.
[4]
R. F. Boisvert, R. Pozo, K. A. Remington, R. F. Barrett, and J. Dongarra. Matrix market: a web resource for test matrix collections. In Quality of Numerical Software, pages 125--137, 1996.
[5]
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 101--113, New York, NY, USA, 2008. ACM.
[6]
F. Bourse, M. Lelarge, and M. Vojnovic. Balanced graph edge partition. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 1456--1465, New York, NY, USA, 2014. ACM.
[7]
U. V. Catalyürek and C. Aykanat. PaToH: a multilevel hypergraph partitioning tool, version 3.0. Bilkent University, Department of Computer Engineering, Ankara, 6533, 1999.
[8]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC '09, pages 44--54, Washington, DC, USA, 2009. IEEE Computer Society.
[9]
A. Corrigan, F. Camelli, R. Löhner, and J. Wallin. Running unstructured grid cfd solvers on modern graphics hardware. In 19th AIAA Computational Fluid Dynamics Conference, number AIAA 2009--4001, June 2009.
[10]
A. Corrigan, F. Camelli, R. Löhner, and J. Wallin. Running unstructured grid-based cfd solvers on modern graphics hardware. Int. J. Numer. Meth. Fluids, 66:221--229, 2011.
[11]
S. Dalton and N. Bell. CUSP: A C+ templated sparse matrix library, 2014.
[12]
T. A. Davis and Y. Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011.
[13]
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI '99, pages 229--241, New York, NY, USA, 1999. ACM.
[14]
C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1):108--134, 2004.
[15]
G. Even. Fast approximate graph partitioning algorithms. SIAM J. Comput., 28(6):2187--2214, Aug. 1999.
[16]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, volume 12, page 2, 2012.
[17]
B. Hendrickson and T. G. Kolda. Graph partitioning models for parallel computing. Parallel Comput., 26(12):1519--1534, Nov. 2000.
[18]
B. Hendrickson and T. G. Kolda. Partitioning rectangular and structurally unsymmetric sparse matrices for parallel processing. SIAM Journal on Scientific Computing, 21(6):2048--2072, 2000.
[19]
M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. 1952.
[20]
G. Karypis and V. Kumar. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. 1995.
[21]
G. Karypis and V. Kumar. hMETIS 1.5: A hypergraph partitioning package. Technical report, Department of Computer Science, University of Minnesota, 1998.
[22]
R. Krauthgamer, J. S. Naor, and R. Schwartz. Partitioning graphs into balanced components. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '09, pages 942--949, Philadelphia, PA, USA, 2009. Society for Industrial and Applied Mathematics.
[23]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716--727, 2012.
[24]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135--146. ACM, 2010.
[25]
NVIDIA. NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110. 2012.
[26]
NVIDIA. cuSPARSE library. 2014.
[27]
C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM '14, pages 333--342, New York, NY, USA, 2014. ACM.
[28]
B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 57--68, New York, NY, USA, 2013. ACM.
[29]
E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 369--380, New York, NY, USA, 2011. ACM.

Cited By

View all
  • (2023)Authenticating Outsourced Location-Based Skyline Queries under Shortest Path Distance2023 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS59707.2023.10288754(1-9)Online publication date: 2-Oct-2023
  • (2022)XTree: Traversal-Based Partitioning for Extreme-Scale Graph Processing on Supercomputers2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00199(2046-2059)Online publication date: May-2022
  • (2021)DTransE: Distributed Translating Embedding for Knowledge GraphIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306644232:10(2509-2523)Online publication date: 1-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 45, Issue 1
Performance evaluation review
June 2017
70 pages
ISSN:0163-5999
DOI:10.1145/3143314
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMETRICS '17 Abstracts: Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems
    June 2017
    84 pages
    ISBN:9781450350327
    DOI:10.1145/3078505
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2017
Published in SIGMETRICS Volume 45, Issue 1

Check for updates

Author Tags

  1. GPU
  2. data sharing
  3. edge partition
  4. graph model
  5. program locality

Qualifiers

  • Abstract

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)11
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Authenticating Outsourced Location-Based Skyline Queries under Shortest Path Distance2023 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS59707.2023.10288754(1-9)Online publication date: 2-Oct-2023
  • (2022)XTree: Traversal-Based Partitioning for Extreme-Scale Graph Processing on Supercomputers2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00199(2046-2059)Online publication date: May-2022
  • (2021)DTransE: Distributed Translating Embedding for Knowledge GraphIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306644232:10(2509-2523)Online publication date: 1-Oct-2021
  • (2020)Exploration of TransE in a Distributed Environment2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS47774.2020.00190(1173-1174)Online publication date: Nov-2020
  • (2019)Efficient adaptive load balancing approach for compressive background subtraction algorithm on heterogeneous CPU–GPU platformsJournal of Real-Time Image Processing10.1007/s11554-019-00916-4Online publication date: 26-Sep-2019
  • (2018)Making pull-based graph processing performantProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178506(246-260)Online publication date: 10-Feb-2018
  • (2018)Graph Partitioning: Formulations and Applications to Big DataEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_312-2(1-7)Online publication date: 20-Mar-2018
  • (2018)Graph PartitionEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_312-1(1-7)Online publication date: 12-Feb-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media