research-article

A yoke of oxen and a thousand chickens for heavy lifting graph processing

Authors:

Abdullah Gharaibeh,

Lauro Beltrão Costa,

Elizeu Santos-Neto,

Matei RipeanuAuthors Info & Claims

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Pages 345 - 354

https://doi.org/10.1145/2370816.2370866

Published: 19 September 2012 Publication History

Abstract

Large, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-to- memory access ratio. Additionally, most real-world graphs have a low diameter and a highly heterogeneous node degree distribution. Partitioning these graphs and simultaneously achieve access locality and load-balancing is difficult if not impossible.

This paper demonstrates the feasibility of graph processing on heterogeneous (i.e., including both CPUs and GPUs) platforms as a cost-effective approach towards addressing the graph processing challenges above. To this end, this work (i) presents and evaluates a performance model that estimates the achievable performance on heterogeneous platforms; (ii) introduces TOTEM -- a processing engine based on the Bulk Synchronous Parallel (BSP) model that offers a convenient environment to simplify the implementation of graph algorithms on heterogeneous platforms; and, (iii) demonstrates TOTEM'S efficiency by implementing and evaluating two graph algorithms (PageRank and breadth-first search). TOTEM achieves speedups close to the model's prediction, and applies a number of optimizations that enable linear speedups with respect to the share of the graph offloaded for processing to accelerators.

References

[1]

Agarwal, V., Petrini, F., Pasetto, D., and Bader, D.A. Scalable Graph Exploration on Multicore Processors. SuperComputing, (2010).

Digital Library

[2]

Barabasi, A.-L. Linked: How Everything Is Connected to Everything Else and What It Means. Recherche 67, (2003).

Digital Library

[3]

Barrett, R., Berry, M., Chan, T.F., et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, 1994.

[4]

Chakrabarti, D., Zhan, Y., and Faloutsos, C. R-MAT: A Recursive Model for Graph Mining. SDM, (2004).

[5]

Harish, P., Narayanan, P., Aluru, S., Parashar, M., Badrinath, R., and Prasanna, V. Accelerating Large Graph Algorithms on the GPU Using CUDA. HiPC, (2007).

Digital Library

[6]

Hong, S., Chafi, H., Sedlar, E., and Olukotun, K. Green-Marl: A DSL for Easy and Efficient Graph Analysis. ASPLOS, (2012).

Digital Library

[7]

Hong, S., Kim, S.K., Oguntebi, T., and Olukotun, K. Accelerating CUDA graph algorithms at maximum warp. PPoPP, (2011).

Digital Library

[8]

Hong, S., Oguntebi, T., and Olukotun, K. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU. PACT, (2011).

Digital Library

[9]

Karypis, G. and Kumar, V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing 20, 1 (1998).

Digital Library

[10]

Katz, G.J. and Kider Jr, J.T. All-pairs shortest-paths for large graphs on the GPU. SIGGRAPH/EUROGRAPHICS, (2008).

Digital Library

[11]

Malewicz, G., Austern, M.H., Bik, A.J., et al. Pregel: a system for large-scale graph processing. SIGMOD, (2010).

Digital Library

[12]

Merrill, D., Michael, G., and Grimshaw, A. Scalable GPU Graph Traversal. PPoPP, (2012).

Digital Library

[13]

Pinedo, M.L. Scheduling: Theory, Algorithms, and Systems. Springer Verlag, 2012.

[14]

Scarpazza, D.P., Villa, O., and Petrini, F. Efficient Breadth-First Search on the Cell/BE Processor. IEEE TPDS 19, 10 (2008).

Digital Library

[15]

Valiant, L.G. A bridging model for parallel computation. Communications of the ACM 33, 8 (1990).

Digital Library

[16]

Vineet, V. and Narayanan, P.J. CUDA cuts: Fast graph cuts on the GPU. Conference on Computer Vision and Pattern Recognition Workshops, IEEE (2008).

[17]

Xia, Y. and Prasanna, V.K. Topologically Adaptive Parallel Breadth-First Search on Multicore Processors. ICPDCS, (2009).

[18]

TITAN: Paving the Way to Exascale. 2011.

[19]

Graph500. 2012. http://www.graph500.org.

[20]

Top500. 2012. http://www.top500.org/.

Cited By

Jeong SCho SLee YPark HHeo SKim GKim YKim H(2024)CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673056(544-554)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673056
Kim SLee JPaik YKim CLee WKim S(2024)Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning InferenceACM Transactions on Design Automation of Electronic Systems10.1145/362859929:2(1-22)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3628599
Sallinen SLuo JRipeanu MButt AMi NChard K(2023)Real-Time PageRank on Dynamic GraphsProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3593004(239-251)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3593004
Show More Cited By

Index Terms

A yoke of oxen and a thousand chickens for heavy lifting graph processing
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

A timeenergy performance analysis of MapReduce on heterogeneous systems with GPUs

Motivated by the explosion of Big Data analytics, performance improvements in low-power (wimpy) systems and the increasing energy efficiency of GPUs, this paper presents a timeenergy performance analysis of MapReduce on heterogeneous systems with GPUs. ...
High-Performance and Scalable GPU Graph Traversal
Special Issue on PPOPP 2012

Breadth-First Search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular ...
Scalable GPU graph traversal
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

September 2012

512 pages

ISBN:9781450311823

DOI:10.1145/2370816

General Chairs:
Pen-Chung Yew
University of Minnesota
,
Sangyeun Cho
University of Pittsburgh
,
Program Chairs:
Luiz DeRose
Cray, Inc.
,
David J. Lilja
University of Minnesota

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP WG 10.3
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '12

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE CS TCPP
IEEE CS TCAA

PACT '12: International Conference on Parallel Architectures and Compilation Techniques

September 19 - 23, 2012

Minnesota, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Upcoming Conference

PACT '24

Sponsor:
sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 13 - 16, 2024

Long Beach , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

126
Total Citations
View Citations
482
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jeong SCho SLee YPark HHeo SKim GKim YKim H(2024)CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673056(544-554)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673056
Kim SLee JPaik YKim CLee WKim S(2024)Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning InferenceACM Transactions on Design Automation of Electronic Systems10.1145/362859929:2(1-22)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3628599
Sallinen SLuo JRipeanu MButt AMi NChard K(2023)Real-Time PageRank on Dynamic GraphsProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3593004(239-251)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3593004
Li STang RZhu JZhao ZGong XWang WZhang JYew P(2023)Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.3268662(1-14)Online publication date: 2023
https://doi.org/10.1109/TPDS.2023.3268662
Xing YLi YWang ZXu YLui J(2023)LightTraffic: On Optimizing CPU-GPU Data Traffic for Efficient Large-scale Random Walks2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00073(882-895)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00073
Wang QAi XZhang YChen JYu G(2023)HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00049(558-571)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00049
Zhang YLiang YZhao JMao FGu LLiao XJin HLiu HGuo SZeng YHu HLi CZhang JWang B(2022)EGraph: Efficient Concurrent GPU-based Dynamic Graph ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3171588(1-1)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3171588
Zheng ZBashir A(2022)Graph-Enabled Intelligent Vehicular Network Data ProcessingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.315804523:5(4726-4735)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1109/TITS.2022.3158045
Zhang YPeng DLiao XJin HLiu HGu LHe B(2021)LargeGraphACM Transactions on Architecture and Code Optimization10.1145/347760318:4(1-24)Online publication date: 29-Sep-2021
https://dl.acm.org/doi/10.1145/3477603
Tang RZhao ZWang KGong XZhang JWang WYew P(2021)Ascetic: Enhancing Cross-Iterations Data Efficiency in Out-of-Memory Graph Processing on GPUsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472457(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472457
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents