Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2370816.2370866acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

A yoke of oxen and a thousand chickens for heavy lifting graph processing

Published: 19 September 2012 Publication History

Abstract

Large, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-to- memory access ratio. Additionally, most real-world graphs have a low diameter and a highly heterogeneous node degree distribution. Partitioning these graphs and simultaneously achieve access locality and load-balancing is difficult if not impossible.
This paper demonstrates the feasibility of graph processing on heterogeneous (i.e., including both CPUs and GPUs) platforms as a cost-effective approach towards addressing the graph processing challenges above. To this end, this work (i) presents and evaluates a performance model that estimates the achievable performance on heterogeneous platforms; (ii) introduces TOTEM -- a processing engine based on the Bulk Synchronous Parallel (BSP) model that offers a convenient environment to simplify the implementation of graph algorithms on heterogeneous platforms; and, (iii) demonstrates TOTEM'S efficiency by implementing and evaluating two graph algorithms (PageRank and breadth-first search). TOTEM achieves speedups close to the model's prediction, and applies a number of optimizations that enable linear speedups with respect to the share of the graph offloaded for processing to accelerators.

References

[1]
Agarwal, V., Petrini, F., Pasetto, D., and Bader, D.A. Scalable Graph Exploration on Multicore Processors. SuperComputing, (2010).
[2]
Barabasi, A.-L. Linked: How Everything Is Connected to Everything Else and What It Means. Recherche 67, (2003).
[3]
Barrett, R., Berry, M., Chan, T.F., et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, 1994.
[4]
Chakrabarti, D., Zhan, Y., and Faloutsos, C. R-MAT: A Recursive Model for Graph Mining. SDM, (2004).
[5]
Harish, P., Narayanan, P., Aluru, S., Parashar, M., Badrinath, R., and Prasanna, V. Accelerating Large Graph Algorithms on the GPU Using CUDA. HiPC, (2007).
[6]
Hong, S., Chafi, H., Sedlar, E., and Olukotun, K. Green-Marl: A DSL for Easy and Efficient Graph Analysis. ASPLOS, (2012).
[7]
Hong, S., Kim, S.K., Oguntebi, T., and Olukotun, K. Accelerating CUDA graph algorithms at maximum warp. PPoPP, (2011).
[8]
Hong, S., Oguntebi, T., and Olukotun, K. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU. PACT, (2011).
[9]
Karypis, G. and Kumar, V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing 20, 1 (1998).
[10]
Katz, G.J. and Kider Jr, J.T. All-pairs shortest-paths for large graphs on the GPU. SIGGRAPH/EUROGRAPHICS, (2008).
[11]
Malewicz, G., Austern, M.H., Bik, A.J., et al. Pregel: a system for large-scale graph processing. SIGMOD, (2010).
[12]
Merrill, D., Michael, G., and Grimshaw, A. Scalable GPU Graph Traversal. PPoPP, (2012).
[13]
Pinedo, M.L. Scheduling: Theory, Algorithms, and Systems. Springer Verlag, 2012.
[14]
Scarpazza, D.P., Villa, O., and Petrini, F. Efficient Breadth-First Search on the Cell/BE Processor. IEEE TPDS 19, 10 (2008).
[15]
Valiant, L.G. A bridging model for parallel computation. Communications of the ACM 33, 8 (1990).
[16]
Vineet, V. and Narayanan, P.J. CUDA cuts: Fast graph cuts on the GPU. Conference on Computer Vision and Pattern Recognition Workshops, IEEE (2008).
[17]
Xia, Y. and Prasanna, V.K. Topologically Adaptive Parallel Breadth-First Search on Multicore Processors. ICPDCS, (2009).
[18]
TITAN: Paving the Way to Exascale. 2011.
[19]
Graph500. 2012. http://www.graph500.org.
[20]
Top500. 2012. http://www.top500.org/.

Cited By

View all
  • (2024)CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673056(544-554)Online publication date: 12-Aug-2024
  • (2024)Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning InferenceACM Transactions on Design Automation of Electronic Systems10.1145/362859929:2(1-22)Online publication date: 14-Feb-2024
  • (2023)Real-Time PageRank on Dynamic GraphsProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3593004(239-251)Online publication date: 7-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
September 2012
512 pages
ISBN:9781450311823
DOI:10.1145/2370816
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. breadth-first search
  2. gpu
  3. graph algorithms
  4. heterogeneous systems
  5. pagerank
  6. totem

Qualifiers

  • Research-article

Conference

PACT '12
Sponsor:
  • IFIP WG 10.3
  • SIGARCH
  • IEEE CS TCPP
  • IEEE CS TCAA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Upcoming Conference

PACT '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673056(544-554)Online publication date: 12-Aug-2024
  • (2024)Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning InferenceACM Transactions on Design Automation of Electronic Systems10.1145/362859929:2(1-22)Online publication date: 14-Feb-2024
  • (2023)Real-Time PageRank on Dynamic GraphsProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3593004(239-251)Online publication date: 7-Aug-2023
  • (2023)Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.3268662(1-14)Online publication date: 2023
  • (2023)LightTraffic: On Optimizing CPU-GPU Data Traffic for Efficient Large-scale Random Walks2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00073(882-895)Online publication date: Apr-2023
  • (2023)HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00049(558-571)Online publication date: Apr-2023
  • (2022)EGraph: Efficient Concurrent GPU-based Dynamic Graph ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3171588(1-1)Online publication date: 2022
  • (2022)Graph-Enabled Intelligent Vehicular Network Data ProcessingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.315804523:5(4726-4735)Online publication date: 1-May-2022
  • (2021)LargeGraphACM Transactions on Architecture and Code Optimization10.1145/347760318:4(1-24)Online publication date: 29-Sep-2021
  • (2021)Ascetic: Enhancing Cross-Iterations Data Efficiency in Out-of-Memory Graph Processing on GPUsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472457(1-10)Online publication date: 9-Aug-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media