Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models

Jose, Jithin; Potluri, Sreeram; Tomko, Karen; Panda, Dhabaleswar K.

doi:10.1007/978-3-642-38750-0_9

Jithin Jose¹⁹,
Sreeram Potluri¹⁹,
Karen Tomko²⁰ &
…
Dhabaleswar K. Panda¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7905))

Included in the following conference series:

International Supercomputing Conference

2577 Accesses
21 Citations

Abstract

MPI has been the de-facto programming model for scientific parallel applications. However, it is hard to extract the maximum performance for irregular data-driven applications using MPI. The Partitioned Global Address Space (PGAS) programming models present an alternative approach to improve programmability. The lower overhead in one-sided communication and the global view of data in PGAS models have the potential to increase the performance at scale. In this study, we take up ‘Concurrent Search’ kernel of Graph500 — a highly data driven irregular benchmark — and redesign it using both MPI and OpenSHMEM constructs. We also implement load balancing in Graph500. Our performance evaluations using MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) indicate a 59% reduction in execution time for the hybrid design, compared to the best performing MPI based design at 8,192 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Study in SHMEM: Parallel Graph Algorithm Acceleration with Distributed Symmetric Memory

Integrating Asynchronous Task Parallelism with OpenSHMEM

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels

References

Basumallik, A., Eigenmann, R.: Optimizing Irregular Shared-memory Applications for Distributed-memory Systems. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2006 (2006)
Google Scholar
Bonachea, D.: GASNet Specification v1.1. Tech. Rep. UCB/CSD-02-1207, U. C. Berkeley (2008)
Google Scholar
Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable Work Stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (2009)
Google Scholar
Dinan, J., Balaji, P., Lusk, E., Sadayappan, P., Thakur, R.: Hybrid Parallel Programming with MPI and Unified Parallel C. In: Proceedings of the 7th ACM International Conference on Computing Frontiers, CF 2010 (2010)
Google Scholar
Dongarra, J., Beckman, P., Moore, T., Patrick, Aerts, e.a.: The International Exascale Software Project Roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011), http://dx.doi.org/10.1177/1094342010391989
Article Google Scholar
HPC Advisory Council, http://www.hpcadvisorycouncil.com
HPCToolkit, http://hpctoolkit.org/
Jose, J., Kandalla, K., Luo, M., Panda, D.: Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation. In: 2012 41st International Conference on Parallel Processing, ICPP (2012)
Google Scholar
Jose, J., Luo, M., Sur, S., Panda, D.K.: Unifying UPC and MPI Runtimes: Experience with MVAPICH. In: PGAS (2010)
Google Scholar
Jose, J., Potluri, S., Luo, M., Sur, S., Panda, D.K.: UPC Queues for Scalable Graph Traversals: Design and Evaluation on InfiniBand Clusters. In: PGAS (2011)
Google Scholar
Message Passing Interface Forum, http://www.mpi-forum.org/
Min, S.J., Iancu, C., Yelick, K.: Hierarchical Work Stealing on Manycore Clusters. In: PGAS (2011)
Google Scholar
MVAPICH2-X: Unified MPI+PGAS Communication Runtime over OpenFabrics/Gen2 for Exascale Systems, http://mvapich.cse.ohio-state.edu/
OpenSHMEM, http://openshmem.org/
Preissl, R., Shalf, J., Wichmann, N., Long, B., Ethier, S.: Advanced Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms. In: PGAS (2011)
Google Scholar
Shan, H., Austin, B., Wright, N.J., Strohmaier, E., Shalf, J., Yelick, K.: Accelerating Applications at Scale Using One-Sided Communication. In: PGAS (2012)
Google Scholar
Silicon Graphics International.: SHMEM API for Parallel Programming, http://www.shmem.org/
Suzumura, T., Ueno, K., Sato, H., Fujisawa, K., Matsuoka, S.: Performance Characteristics of Graph500 on Large-scale Distributed Environment. In: 2011 IEEE International Symposium on Workload Characterization, IISWC (2011)
Google Scholar
TACC Stampede Cluster, http://www.xsede.org/resources/overview
The Graph500, http://www.graph500.org
Ueno, K., Suzumura, T.: 2D Partitioning Based Graph Search for the Graph500 Benchmark. In: 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops PhD Forum, IPDPSW (2012)
Google Scholar
Ueno, K., Suzumura, T.: Highly Scalable Graph Search for the Graph500 Benchmark. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012 (2012)
Google Scholar
UPC Consortium: UPC Language Specifications, v1.2. Tech. Rep. LBNL-59208, Lawrence Berkeley National Lab (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Ohio State University, USA
Jithin Jose, Sreeram Potluri & Dhabaleswar K. Panda
Ohio Supercomputer Center, Columbus, OH, USA
Karen Tomko

Authors

Jithin Jose
View author publications
You can also search for this author in PubMed Google Scholar
Sreeram Potluri
View author publications
You can also search for this author in PubMed Google Scholar
Karen Tomko
View author publications
You can also search for this author in PubMed Google Scholar
Dhabaleswar K. Panda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Hamburg, Department of Informatics, Bundestraße 45a, 20146, Hamburg, Germany
Julian Martin Kunkel
Deutsches Klimarechenzentrum, Bundestraße 45a, 20146, Hamburg, Germany
Thomas Ludwig
Germany and Prometeus GmbH, University of Mannheim, Fliederstraße 2, 74915, Waibstadt, Germany
Hans Werner Meuer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jose, J., Potluri, S., Tomko, K., Panda, D.K. (2013). Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-38750-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38749-4
Online ISBN: 978-3-642-38750-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Study in SHMEM: Parallel Graph Algorithm Acceleration with Distributed Symmetric Memory

Integrating Asynchronous Task Parallelism with OpenSHMEM

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Study in SHMEM: Parallel Graph Algorithm Acceleration with Distributed Symmetric Memory

Integrating Asynchronous Task Parallelism with OpenSHMEM

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation