Abstract
MPI has been the de-facto programming model for scientific parallel applications. However, it is hard to extract the maximum performance for irregular data-driven applications using MPI. The Partitioned Global Address Space (PGAS) programming models present an alternative approach to improve programmability. The lower overhead in one-sided communication and the global view of data in PGAS models have the potential to increase the performance at scale. In this study, we take up ‘Concurrent Search’ kernel of Graph500 — a highly data driven irregular benchmark — and redesign it using both MPI and OpenSHMEM constructs. We also implement load balancing in Graph500. Our performance evaluations using MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) indicate a 59% reduction in execution time for the hybrid design, compared to the best performing MPI based design at 8,192 cores.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Basumallik, A., Eigenmann, R.: Optimizing Irregular Shared-memory Applications for Distributed-memory Systems. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2006 (2006)
Bonachea, D.: GASNet Specification v1.1. Tech. Rep. UCB/CSD-02-1207, U. C. Berkeley (2008)
Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable Work Stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (2009)
Dinan, J., Balaji, P., Lusk, E., Sadayappan, P., Thakur, R.: Hybrid Parallel Programming with MPI and Unified Parallel C. In: Proceedings of the 7th ACM International Conference on Computing Frontiers, CF 2010 (2010)
Dongarra, J., Beckman, P., Moore, T., Patrick, Aerts, e.a.: The International Exascale Software Project Roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011), http://dx.doi.org/10.1177/1094342010391989
HPC Advisory Council, http://www.hpcadvisorycouncil.com
HPCToolkit, http://hpctoolkit.org/
Jose, J., Kandalla, K., Luo, M., Panda, D.: Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation. In: 2012 41st International Conference on Parallel Processing, ICPP (2012)
Jose, J., Luo, M., Sur, S., Panda, D.K.: Unifying UPC and MPI Runtimes: Experience with MVAPICH. In: PGAS (2010)
Jose, J., Potluri, S., Luo, M., Sur, S., Panda, D.K.: UPC Queues for Scalable Graph Traversals: Design and Evaluation on InfiniBand Clusters. In: PGAS (2011)
Message Passing Interface Forum, http://www.mpi-forum.org/
Min, S.J., Iancu, C., Yelick, K.: Hierarchical Work Stealing on Manycore Clusters. In: PGAS (2011)
MVAPICH2-X: Unified MPI+PGAS Communication Runtime over OpenFabrics/Gen2 for Exascale Systems, http://mvapich.cse.ohio-state.edu/
OpenSHMEM, http://openshmem.org/
Preissl, R., Shalf, J., Wichmann, N., Long, B., Ethier, S.: Advanced Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms. In: PGAS (2011)
Shan, H., Austin, B., Wright, N.J., Strohmaier, E., Shalf, J., Yelick, K.: Accelerating Applications at Scale Using One-Sided Communication. In: PGAS (2012)
Silicon Graphics International.: SHMEM API for Parallel Programming, http://www.shmem.org/
Suzumura, T., Ueno, K., Sato, H., Fujisawa, K., Matsuoka, S.: Performance Characteristics of Graph500 on Large-scale Distributed Environment. In: 2011 IEEE International Symposium on Workload Characterization, IISWC (2011)
TACC Stampede Cluster, http://www.xsede.org/resources/overview
The Graph500, http://www.graph500.org
Ueno, K., Suzumura, T.: 2D Partitioning Based Graph Search for the Graph500 Benchmark. In: 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops PhD Forum, IPDPSW (2012)
Ueno, K., Suzumura, T.: Highly Scalable Graph Search for the Graph500 Benchmark. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012 (2012)
UPC Consortium: UPC Language Specifications, v1.2. Tech. Rep. LBNL-59208, Lawrence Berkeley National Lab (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jose, J., Potluri, S., Tomko, K., Panda, D.K. (2013). Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-38750-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38749-4
Online ISBN: 978-3-642-38750-0
eBook Packages: Computer ScienceComputer Science (R0)