Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3149457.3154482acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Multi-tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-core Clusters

Published: 28 January 2018 Publication History

Abstract

Large-scale clusters based on many-core processors such as Intel Xeon Phi have recently been deployed. Multi-tasking execution using task dependencies in OpenMP 4.0 is a promising candidate for facilitating the parallelization of such many-core processors, because this enables users to avoid global synchronization through fine-grained task-to-task synchronization using user-specified data dependencies. Recently, the partitioned global address space (PGAS) model has emerged as a usable distributed-memory programming model. In this paper, we propose a multi-tasking execution model in the PGAS language XcalableMP (XMP) for many-core clusters. The model provides a method to describe interactions between tasks based on point-to-point communications on the global address space. A communication is executed non-collectively among nodes. We implemented the proposed execution model in XMP, and designed a simple code transformation algorithm to MPI and OpenMP. We implemented two benchmarks using our model for preliminary evaluation, namely blocked Cholesky factorization and the Laplace equation solver. Most of the implementations using our model outperform the conventional barrier-based data-parallel model. To improve the performance in many-core clusters, we propose a communication optimization method by dedicating a single thread for communications, to avoid performance problems related to the current multi-threaded MPI execution. As a result, the performances of blocked Cholesky factorization and the Laplace equation solver using this communication optimization are improved to 138% and 119% compared with the barrier-based implementation in Intel Xeon Phi KNL clusters, respectively. From the viewpoint of productivity, the program implemented by our model in XMP is almost the same as the implementation based on the OpenMP task depend clause, because XMP enables the parallelization of the serial source code with additional directives and small changes as well as OpenMP.

References

[1]
"Top500 Supercomputer Sites", Retrieved August 11, 2017 from https://www.top500.org/
[2]
M. De Wael, S. Marr, B. De Fraine, T. Van Cutsem, and W. De Meuter, "Partitioned Global Address Space Languages", ACM Computing Surveys (CSUR), Vol.47 No.4, pp. 1--27, 2015.
[3]
UPC Consortium, "UPC Language Specifications Version 1.3", Retrieved August 11, 2017 from https://upc-lang.org/assets/Uploads/spec/upc-lang-spec-l-3.pdf, 2013.
[4]
B. L. Chamberlain, D. Callahan, and H.P. Zima, "Parallel Programmability and the Chapel Language", The International Journal of High Performance Computing Applications, Vol. 21, Issue. 3, pp. 291--312, 2007.
[5]
XcalableMP Specification Working Group, "XcalableMP Website", Retrieved August 11, 2017 from http://www.xcalablemp.org/
[6]
J. Lee and M. Sato, "Implementation and Performance Evaluation of XcalableMP: a Parallel Programming Language for Distributed Memory Systems", The 39th International Conference on Parallel Processing Workshops (ICPPW), San Diego, pp. 413--420, 2010.
[7]
M. Nakao, J. Lee, T. Boku, and M. Sato, "Productivity and Performance of Global-view Programming with XcalableMP PGAS Language," The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, pp. 402--409, 2012.
[8]
D. Alejandro, A. Eduard, B. Rosa M, L. Jesus, M. Luis, M. Xavier, and P. Judit, "Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures", Parallel Processing Letters, Vol. 21, pp. 173--193, 2011.
[9]
A. Fernandez, V Beltran, X. Martorell, R. M. Badia, E. Ayguade, J. Labarta, "Task-Based Programming with OmpSs and Its Application", Euro-Par 2014: Parallel Processing Workshops, Porto, Portugal, pp. 25--26, 2014.
[10]
"PC Cluster Consortium", Retrieved August 11, 2017 from http://www.pccluster.org/en/
[11]
RIKEN AICS and University of Tsukuba, "Omni Compiler Project", Retrieved August 11, 2017 from http://omni-compiler.org/
[12]
Joint Center for Advanced High Performance Computing (JCAHPC), "Basic Specification of Oakforest-PACS", Retrieved August 11, 2017 from http://jcahpc.jp/files/OFP-basic.pdf
[13]
Center for Computational Sciences, University of Tsukuba, "COMA (PACS-IX)", Retrieved August 11, 2017 from https://www.ccs.tsukuba.ac.jp/eng/supercomputers/#COMA
[14]
A. Stone, J. Dennis, and M. Strout, "Evaluating Coarray Fortran with the CG-POP Miniapp", International Conference on Partitioned Global Address Space Programming Models (PGAS), Texas, pp. 1--10, 2011.
[15]
"OSU Micro-Benchmarks", Retrieved August 11, 2017 from http://mvapich.cse.ohio-state.edu/benchmarks/
[16]
A. Cedric, T Samuel, N. Raymond, and W. Pierre-Andre, "StarPU: a unified platform for task scheduling on heterogeneous multicore architectures", Concurrency and Computation: Practice and Experience, Vol.23, No.2, pp. 187--198, 2011.
[17]
A. YarKhan, "Dynamic Task Execution on Shared and Distributed Memory Architectures", PhD Dissertation, Major Advisor: J. Dongarra, University of Tennessee, pp. 1--20, 012.
[18]
Y. Zheng, A. Kamil, M. B. Driscoll, H. Shan, and K. Yelick, "UPC+ +: A PGAS Extension for C++", 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Arizona, pp. 1105--1114, 2014.
[19]
M. Garland, M. Kudlur, and Y Zheng, "Designing a unified programming model for heterogeneous machines", International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, Salt Lake City, pp. 67:1--67:11, 2012.
[20]
P. Charles, C. Grothoff, V Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V Sarkar, "X10: an object-oriented approach to non-uniform cluster computing", 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA '05), San Diego, pp. 519--538, 2005.
[21]
J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. M. Badia, E. Ayguade, J. Labarta, "Productive Cluster Programming with OmpSs", Euro-Par 2011 Parallel Processing: 17th International Conference, Euro-Par 2011, Bordeaux, France, pp. 555--566, 2011
[22]
"Intel Threading Building Blocks", Retrieved August 11, 2017 from https://www.threadingbuildingblocks.org/
[23]
"Intel CilkPlus", Retrieved August 11, 2017 from https://www.cilkplus.org/

Cited By

View all
  • (2020)Task Priority Control for the HPX Runtime System2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00137(806-813)Online publication date: May-2020
  • (2020)DASH: Distributed Data Structures and Parallel Algorithms in a Global Address SpaceSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_6(103-142)Online publication date: 31-Jul-2020
  • (2019)Global Task Data-Dependencies in PGAS ApplicationsHigh Performance Computing10.1007/978-3-030-20656-7_16(312-329)Online publication date: 17-May-2019
  • Show More Cited By

Index Terms

  1. Multi-tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-core Clusters

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
      January 2018
      322 pages
      ISBN:9781450353724
      DOI:10.1145/3149457
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 January 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Many-core cluster
      2. PGAS
      3. Task Parallelism
      4. XcalableMP

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      HPC Asia 2018

      Acceptance Rates

      HPCAsia '18 Paper Acceptance Rate 30 of 67 submissions, 45%;
      Overall Acceptance Rate 69 of 143 submissions, 48%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 04 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Task Priority Control for the HPX Runtime System2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00137(806-813)Online publication date: May-2020
      • (2020)DASH: Distributed Data Structures and Parallel Algorithms in a Global Address SpaceSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_6(103-142)Online publication date: 31-Jul-2020
      • (2019)Global Task Data-Dependencies in PGAS ApplicationsHigh Performance Computing10.1007/978-3-030-20656-7_16(312-329)Online publication date: 17-May-2019
      • (2018)Mapping OpenMP to a Distributed Tasking RuntimeEvolving OpenMP for Evolving Architectures10.1007/978-3-319-98521-3_15(222-235)Online publication date: 29-Aug-2018
      • (2018)The Impact of Taskyield on the Design of Tasks Communicating Through MPIEvolving OpenMP for Evolving Architectures10.1007/978-3-319-98521-3_1(3-17)Online publication date: 29-Aug-2018

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media