Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448290.3448559acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Porting and Evaluation of a Distributed Task-driven Stencil-based Application

Published: 24 July 2021 Publication History

Abstract

Alternative programming models and runtimes are increasing in popularity and maturity. This allows porting and comparing, on competitive grounds, emerging parallel approaches against the traditional MPI+X paradigm. In this work, an implementation of distributed task-based stencil computation is compared with a traditional MPI+X implementation of the same application. The Legion task-based parallel programming system is used as an alternative to MPI, but the underlying OpenMP approach is kept at the subdomain level. Overall results are promising toward making this alternative method competitive to the traditional MPI approach. In future work, extensions to other applications will be explored, as well as the use of GPUs.

References

[1]
2020. About Ookami. https://www.stonybrook.edu/commcms/ookami/about/index.php
[2]
B. Acun, A. Gupta, N. Jain, A. Langer, H. Menon, E. Mikida, X. Ni, M. Robson, Y. Sun, E. Totoni, L. Wesolowski, and L. Kale. 2014. Parallel Programming with Migratable Objects: Charm++ in Practice. In SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 647-658. https://doi.org/10.1109/SC.2014.58
[3]
M. Araya-Polo, J. Cabezas, M. Hanzich, M. Pericas, F. Rubio, I. Gelado, M. Shafiq, E. Morancho, N. Navarro, E. Ayguade, J. M. Cela, and M. Valero. 2011. Assessing Accelerator-Based HPC Reverse Time Migration. IEEE Transactions on Parallel and Distributed Systems 22, 1 (2011), 147--162. https://doi.org/10.1109/TPDS.2010.144
[4]
Mauricio Araya-Polo, Félix Rubio, Raúl De la Cruz, Mauricio Hanzich, José María Cela, and Daniele Paolo Scarpazza. 2009. 3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors. Scientific Programming 17, 1-2 (2009), 185--198.
[5]
Patrick Atkinson and Simon McIntosh-Smith. 2017. On the Performance of Parallel Tasking Runtimes for an Irregular Fast Multipole Method Application. In Scaling OpenMP for Exascale Performance and Portability, Bronis R. de Supinski, Stephen L. Olivier, Christian Terboven, Barbara M. Chapman, and Matthias S. Müller (Eds.). Springer International Publishing, Cham, 92--106. https://doi.org/10.1007/978-3-319-65578-9_7
[6]
Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187--198. https://doi.org/10.1002/cpe.1631
[7]
M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1-11. https://doi.org/10.1109/SC.2012.71
[8]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. SIGPLAN Not. 30, 8 (Aug. 1995), 207--216. https://doi.org/10.1145/209937.209958
[9]
G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Herault, and J. J. Dongarra. 2013. PaRSEC: Exploiting Heterogeneity to Enhance Scalability. Computing in Science Engineering 15, 6 (2013), 36--45. https://doi.org/10.1109/MCSE.2013.98
[10]
H. Carter Edwards, Christian R. Trott, and Daniel Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel and Distrib. Comput. 74, 12 (2014), 3202 - 3216. https://doi.org/10.1016/j.jpdc.2014.07.003 Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.
[11]
Raúl de la Cruz and Mauricio Araya-Polo. 2011. Towards a Multi-Level Cache Performance Model for 3D Stencil Computation. Procedia Computer Science 4 (2011), 2146 - 2155. https://doi.org/10.1016/j.procs.2011.04.235 Proceedings of the International Conference on Computational Science, ICCS 2011.
[12]
Raúl de la Cruz and Mauricio Araya-Polo. 2014. Algorithm 942: Semi-Stencil. ACM Trans. Math. Softw. 40, 3, Article 23 (April 2014), 39 pages. https://doi.org/10.1145/2591006
[13]
O. Delannoy and S. Petiton. 2004. A peer to peer computing framework: design and performance evaluation of YML. In Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks. 362-369. https://doi.org/10.1109/ISPDC.2004.7
[14]
Alejandro Duran, Eduard Ayguadé, Rosa M Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. Ompss: a proposal for programming heterogeneous multi-core architectures. Parallel processing letters 21, 02 (2011), 173--193. https://doi.org/10.1142/S0129626411000151
[15]
Alejandro Duran, Julita Corbalán, and Eduard Ayguadé. 2008. Evaluation of OpenMP Task Scheduling Strategies. In OpenMP in a New Era of Parallelism, Rudolf Eigenmann and Bronis R. de Supinski (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 100--110. https://doi.org/10.1007/978-3-540-79561-2_9
[16]
Matteo Frigo and Volker Strumpen. 2005. Cache Oblivious Stencil Computations. In Proceedings of the 19th Annual International Conference on Supercomputing (Cambridge, Massachusetts) (ICS '05). Association for Computing Machinery, New York, NY, USA, 361--366. https://doi.org/10.1145/1088149.1088197
[17]
S. Ghosh, T. Liao, H. Calandra, and B. M. Chapman. 2012. Experiences with OpenMP, PGI, HMPP and OpenACC Directives on ISO/TTI Kernels. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. 691-700. https://doi.org/10.1109/SC.Companion.2012.95
[18]
Jérôme Gurhem, Miwako Tsuji, Serge G. Petiton, and Mitsuhisa Sato. 2019. Distributed and Parallel Programming Paradigms on the K Computer and a Cluster. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (Guangzhou, China) (HPC Asia 2019). Association for Computing Machinery, New York, NY, USA, 9--17. https://doi.org/10.1145/3293320.3293330
[19]
Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, and Tobias Grosser. 2020. Domain-Specific Multi-Level IR Rewriting for GPU. arXiv:2005.13014 [cs.PL]
[20]
Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. HPX: A Task Based Programming Model in a Global Address Space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (Eugene, OR, USA) (PGAS '14). Association for Computing Machinery, New York, NY, USA, Article 6, 11 pages. https://doi.org/10.1145/2676870.2676883
[21]
Jannis Klinkenberg, Philipp Samfass, Michael Bader, Christian Terboven, and Matthias S. Müller. 2020. CHAMELEON: Reactive Load Balancing for Hybrid MPI+OpenMP Task-Parallel Applications. J. Parallel and Distrib. Comput. 138 (2020), 55 - 64. https://doi.org/10.1016/j.jpdc.2019.12.005
[22]
J. Lee and M. Sato. 2010. Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems. In 2010 39th International Conference on Parallel Processing Workshops. 413-420. https://doi.org/10.1109/ICPPW.2010.62
[23]
M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G.J. Gorman. 2019. Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geoscientific Model Development 12, 3 (2019), 1165--1187. https://doi.org/10.5194/gmd-12-1165-2019
[24]
Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, and Satoshi Matsuoka. 2020. AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (San Diego, CA, USA) (CGO 2020). Association for Computing Machinery, New York, NY, USA, 199--211. https://doi.org/10.1145/3368826.3377904
[25]
John Mellor-Crummey, Robert Fowler, and David Whalley. 2001. Tools for Application-Oriented Performance Tuning. In Proceedings of the 15th International Conference on Supercomputing (Sorrento, Italy) (ICS '01). Association for Computing Machinery, New York, NY, USA, 154--165. https://doi.org/10.1145/377792.377826
[26]
Jie Meng, Andreas Atle, Henri Calandra, and Mauricio Araya-Polo. 2020. Minimod: A Finite Difference solver for Seismic Modeling. arXiv (2020). arXiv:2007.06048 [cs.DC] https://arxiv.org/abs/2007.06048
[27]
Salli Moustafa, Wilfried Kirschenmann, Fabrice Dupros, and Hideo Aochi. 2018. Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical Kernel. In Euro-Par 2018: Parallel Processing, Marco Aldinucci, Luca Padovani, and Massimo Torquati (Eds.). Springer International Publishing, Cham, 764--777. https://doi.org/10.1007/978-3-319-96983-1_54
[28]
A. Nguyen, N. Satish, J. Chhugani, C. Kim, and P. Dubey. 2010. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. 1-13.
[29]
Oak Ridge Leadership Computing Facility. [n.d.]. Summit. https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/
[30]
T. Odajima, Y. Kodama, M. Tsuji, M. Matsuda, Y. Maruyama, and M. Sato. 2020. Preliminary Performance Evaluation of the Fujitsu A64FX Using HPC Applications. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). 523-530. https://doi.org/10.1109/CLUSTER49012.2020.00075
[31]
Judit Planas, Rosa M. Badia, Eduard Ayguadé, and Jesus Labarta. 2009. Hierarchical Task-Based Programming With StarSs. The International Journal of High Performance Computing Applications 23, 3 (2009), 284--299. https://doi.org/10.1177/1094342009106195
[32]
Ahmad Qawasmeh, Maxime R Hugues, Henri Calandra, and Barbara M Chapman. 2017. Performance portability in reverse time migration and seismic modelling via OpenACC. The International Journal of High Performance Computing Applications 31, 5 (2017), 422--440. https://doi.org/10.1177/1094342016675678
[33]
Eric Raut, Jie Meng, Mauricio Araya-Polo, and Barbara Chapman. 2020. Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application. In OpenMP: Portable Multi-Level Parallelism on Modern Systems, Kent Milfeld, Bronis R. de Supinski, Lars Koesterke, and Jannis Klinkenberg (Eds.). Springer International Publishing, Cham, 67--81.
[34]
P. S. Rawat, M. Vaidya, A. Sukumaran-Rajam, A. Rountev, L. Pouchet, and P. Sadayappan. 2019. On Optimizing Complex Stencils on GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 641-652. https://doi.org/10.1109/IPDPS.2019.00073
[35]
J. Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media.
[36]
Alejandro Rico, Isaac Sánchez Barrera, Jose A. Joao, Joshua Randall, Marc Casas, and Miquel Moretó. 2019. On the Benefits of Tasking with OpenMP. In OpenMP: Conquering the Full Hardware Spectrum, Xing Fan, Bronis R. de Supinski, Oliver Sinnen, and Nasser Giacaman (Eds.). Springer International Publishing, Cham, 217--230. https://doi.org/10.1007/978-3-030-28596-8_15
[37]
Ryuichi Sai, John Mellor-Crummey, Xiaozhu Meng, Mauricio Araya-Polo, and Jie Meng. 2020. Accelerating High-Order Stencils on GPUs. arXiv:2009.04619 [cs.DC]
[38]
Elliott Slaughter, Wonchan Lee, Sean Treichler, Michael Bauer, and Alex Aiken. 2015. Regent: A High-Productivity Programming Language for HPC with Logical Regions. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, Texas) (SC '15). Association for Computing Machinery, New York, NY, USA, Article 81, 12 pages. https://doi.org/10.1145/2807591.2807629
[39]
Rupanshu Soi, Nischay Mamidi, Elliott Slaughter, Kumar Prasun, and Suresh Deshpande. 2020. An Implicitly Parallel Meshfree Solver in Regent 3 rd Annual Parallel Applications Workshop Alternatives to MPI+X, Nov 12, 2020. Virtual Workshop.
[40]
Raul Vidal, Marc Casas, Miquel Moretó, Dimitrios Chasapis, Roger Ferrer, Xavier Martorell, Eduard Ayguadé, Jesús Labarta, and Mateo Valero. 2015. Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads. In OpenMP: Heterogenous Execution and Data Movements, Christian Terboven, Bronis R. de Supinski, Pablo Reble, Barbara M. Chapman, and Matthias S. Müller (Eds.). Springer International Publishing, Cham, 60--72. https://doi.org/10.1007/978-3-319-24595-9_5
[41]
Philippe Virouleau, Pierrick Brunet, François Broquedis, Nathalie Furmento, Samuel Thibault, Olivier Aumage, and Thierry Gautier. 2014. Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite. In Usingand Improving OpenMP for Devices, Tasks, and More, Luiz DeRose, Bronis R. de Supinski, Stephen L. Olivier, Barbara M. Chapman, and Matthias S. Müller (Eds.). Springer International Publishing, Cham, 16--29. https://doi.org/10.1007/978-3-319-11454-5_2
[42]
David Wonnacott. 2000. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000. IEEE, 171--180.

Cited By

View all
  • (2021)Ookami: Deployment and Initial ExperiencesPractice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions10.1145/3437359.3465578(1-8)Online publication date: 17-Jul-2021
  • (2021)Evaluation of Distributed Tasks in Stencil-based Application on GPUs2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM254806.2021.00011(45-52)Online publication date: Nov-2021

Index Terms

  1. Porting and Evaluation of a Distributed Task-driven Stencil-based Application

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PMAM'21: Proceedings of the 12th International Workshop on Programming Models and Applications for Multicores and Manycores
    February 2021
    34 pages
    ISBN:9781450383486
    DOI:10.1145/3448290
    • Editors:
    • Quan Chen,
    • Zhiyi Huang,
    • Min Si
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. finite difference
    2. stencil
    3. tasks
    4. wave equation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    PPoPP '21

    Acceptance Rates

    Overall Acceptance Rate 53 of 97 submissions, 55%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)40
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Ookami: Deployment and Initial ExperiencesPractice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions10.1145/3437359.3465578(1-8)Online publication date: 17-Jul-2021
    • (2021)Evaluation of Distributed Tasks in Stencil-based Application on GPUs2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM254806.2021.00011(45-52)Online publication date: Nov-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media