Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2792745.2792785acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

Porting scientific libraries to PGAS in XSEDE resources: practice and experience

Published: 26 July 2015 Publication History

Abstract

The next generation of supercomputers presents new and complex challenges that might require a change in the current paradigm of how parallel applications are developed. Hybrid programming is usually described as the best approach for exascale computers. PGAS programming models are considered an interesting alternative to work together with MPI in this hybrid model to achieve good performance in those machines. This is a very promising approach especially for one-sided and irregular communication patterns. However, this is still an emerging technology and there is not much previous experience on how to port existing MPI applications to the PGAS model. Due to the promising relevance of this approach for the next generation of devices, it is relevant to have early experience on porting applications as well as knowledge on the issues that might be faced in this new paradigm. In this paper we present two different scientific applications that are currently implemented in MPI and that are promising candidates for this PGAS paradigm. We describe how these applications have been ported, the challenges faced and some of the solutions that we found. We also show how PGAS models can achieve great performance when compared to MPI.

References

[1]
A. Basumallik and R. Eigenmann. Optimizing Irregular Shared-memory Applications for Distributed-memory Systems. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '06, 2006.
[2]
J. Dinan, D. Larkins, P. Sadayappan, S. Krishnamoorthy, and J. Nieplocha. Scalable Work Stealing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009.
[3]
Dongarra, Jack and Beckman, Pete and Moore, Terry and Aerts, Patrick et al. The International Exascale Software Project Roadmap. Int. J. High Perform. Comput. Appl., 25(1):3--60, Feb. 2011.
[4]
D. Donzis, K. Aditya, P. Yeung, and K. Sreenivasan. The turbulent schmidt number. Journal of Fluid Engineering, 136:060912, 2014.
[5]
T. Hoefler, J. Squyres, W. Rehm, and A. Lumsdaine. A case for non-blocking collective operations. In Frontiers of High Performance Computing and Networking. ISPA 2006 Workshops, Lecture Notes in Computer Science, volume 4331/2006, pages 155--164, 2006.
[6]
H. Homann, O. Kamps, R. Friedrich, and R. Grauer. Bridging from eulerian to lagrangian statistics in 3d hydro- and magnetohydrodynamic turbulent flows. New J. Phys., 11:073020.
[7]
HPC Advisory Council. www.hpcadvisorycouncil.com.
[8]
J. Jose, K. T. Sreeram Potluri, and D. K. Panda. Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models. In International Supercomputing Conference (ISC), 2013.
[9]
K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, S. Sur, and D. Panda. High-performance and scalable non-blocking all-to-all with collective offload on infiniband clusters: a study with parallel 3d fft. In Computer Science: Research and Development, volume 26, pages 237--246, 2011.
[10]
M. Li, J. Lin, X. Lu, K. Hamidouche, K. Tomko, and D. Panda. Scalable MiniMD Design with Hybrid MPI and OpenSHMEM. In OpenShmem User Group 2014, Affiliated with The Conference on Partitioned Global Address Space Programming Models (PGAS), 2014.
[11]
J. Liu, J. Wu, and D. Panda. High performance RDMA-based MPI implementation over infiniband. International Journal of Parallel Programming, 32(3):167--198, 2004.
[12]
MVAPICH2-X: Unified MPI+PGAS Communication Runtime over OpenFabrics/Gen2 for Exascale Systems. http://mvapich.cse.ohio-state.edu/.
[13]
D. Pekurovsky. P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions. SIAM Journal on Scientific Computing, 34(4):C192 -- C209, 2012.
[14]
S. Pophale, H. Jin, S. Poole, and J. Kuehn. OpenSHMEM Performance and Potential: A NPB Experimental Study. In Proceedings of the 1st Conference on OpenSHMEM Workshop, Oct 2013.
[15]
R. Preissl, J. Shalf, N. Wichmann, B. Long, and S. Ethier. Advanced Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms. In Conference on Partitioned Global Address Space Programming Models (PGAS), 2011.
[16]
K. W. Schulz and C. Simmons. libGRVY. Toolkit for HPC application development. https://red.ices.utexas.edu/projects/software/wiki/GRVY. {Online; accessed March-2015}.
[17]
J. Schumacher. Lagrangian studies in convective turbulence. Phys. Rev. E, 79:056301, 2009.
[18]
H. Shan, F. Blagojević, S.-J. Min, P. Hargrove, H. Jin, K. Fuerlinger, A. Koniges, and N. J. Wright. A Programming Model Performance Study Using the NAS Parallel Benchmarks. Sci. Program., 18(3-4):153--167, Aug. 2010.
[19]
C. S. Simmons and K. W. Schulz. A distributed memory out-of-core method on HPC clusters and its application to quantum chemistry applications. In Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond, XSEDE '12, pages 1--7, New York, NY, USA, 2012. ACM.
[20]
H. Subramoni, A. A. Awan, K. Hamidouche, D. Pekurovsky, A. Venkatesh, S. Chakraborty, K. Tomko, and D. K. Panda. Designing non-blocking personalized collectives with near perfect overlap for rdma-enabled clusters. 2015.
[21]
J. Vienne, J. Chen, M. Wasi-ur-Rahman, N. S. Islam, H. Subramoni, and D. K. Panda. Performance analysis and evaluation of infiniband FDR and 40GigE RoCE on HPC and cloud computing systems. In IEEE 20th Annual Symposium on High-Performance Interconnects, HOTI 2012, Santa Clara, CA, USA, August 22--24, 2012, pages 48--55. IEEE Computer Society, 2012.
[22]
Y. Zheng, A. Kamil, M. B. Driscoll, H. Shan, and K. A. Yelick. UPC++: A PGAS extension for C++. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19--23, 2014, pages 1105--1114. IEEE, 2014.

Cited By

View all
  • (2019)Global Task Data-Dependencies in PGAS ApplicationsHigh Performance Computing10.1007/978-3-030-20656-7_16(312-329)Online publication date: 17-May-2019
  • (2018)Recent experiences in using MPI-3 RMA in the DASH PGAS runtimeProceedings of Workshops of HPC Asia10.1145/3176364.3176367(21-30)Online publication date: 31-Jan-2018
  • (2016)DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS.2016.0140(983-990)Online publication date: Dec-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
XSEDE '15: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure
July 2015
296 pages
ISBN:9781450337205
DOI:10.1145/2792745
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • San Diego Super Computing Ctr: San Diego Super Computing Ctr
  • HPCWire: HPCWire
  • Omnibond: Omnibond Systems, LLC
  • SGI
  • Internet2
  • Indiana University: Indiana University
  • CASC: The Coalition for Academic Scientific Computation
  • NICS: National Institute for Computational Sciences
  • Intel: Intel
  • DDN: DataDirect Networks, Inc
  • DELL
  • CORSA: CORSA Technology
  • ALLINEA: Allinea Software
  • Cray
  • RENCI: Renaissance Computing Institute

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI
  2. MVAPICH2-X
  3. OpenSHMEM
  4. PGAS
  5. XSEDE
  6. porting
  7. stampede

Qualifiers

  • Research-article

Conference

XSEDE '15
Sponsor:
  • San Diego Super Computing Ctr
  • HPCWire
  • Omnibond
  • Indiana University
  • CASC
  • NICS
  • Intel
  • DDN
  • CORSA
  • ALLINEA
  • RENCI

Acceptance Rates

XSEDE '15 Paper Acceptance Rate 49 of 70 submissions, 70%;
Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Global Task Data-Dependencies in PGAS ApplicationsHigh Performance Computing10.1007/978-3-030-20656-7_16(312-329)Online publication date: 17-May-2019
  • (2018)Recent experiences in using MPI-3 RMA in the DASH PGAS runtimeProceedings of Workshops of HPC Asia10.1145/3176364.3176367(21-30)Online publication date: 31-Jan-2018
  • (2016)DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS.2016.0140(983-990)Online publication date: Dec-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media