research-article

Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

Authors:

Joseph Schuchart,

Christoph Niethammer, and

José GraciaAuthors Info & Claims

EuroMPI/USA '20: Proceedings of the 27th European MPI Users' Group Meeting

September 2020

Pages 39 - 50

https://doi.org/10.1145/3416315.3416320

Published: 07 October 2020 Publication History

Abstract

Asynchronous programming models (APM) are gaining more and more traction, allowing applications to expose the available concurrency to a runtime system tasked with coordinating the execution. While MPI has long provided support for multi-threaded communication and non-blocking operations, it falls short of adequately supporting APMs as correctly and efficiently handling MPI communication in different models is still a challenge. Meanwhile, new low-level implementations of light-weight, cooperatively scheduled execution contexts (fibers, aka user-level threads (ULT)) are meant to serve as a basis for higher-level APMs and their integration in MPI implementations has been proposed as a replacement for traditional POSIX thread support to alleviate these challenges.

In this paper, we first establish a taxonomy in an attempt to clearly distinguish different concepts in the parallel software stack. We argue that the proposed tight integration of fiber implementations with MPI is neither warranted nor beneficial and instead is detrimental to the goal of MPI being a portable communication abstraction. We propose MPI Continuations as an extension to the MPI standard to provide callback-based notifications on completed operations, leading to a clear separation of concerns by providing a loose coupling mechanism between MPI and APMs. We show that this interface is flexible and interacts well with different APMs, namely OpenMP detached tasks, OmpSs-2, and Argobots.

References

[1]

Atul Adya, Jon Howell, Marvin Theimer, William J. Bolosky, and John R. Douceur. 2002. Cooperative Task Management Without Manual Stack Management. In Proceedings of the General Track of the Annual Conference on USENIX Annual Technical Conference(ATEC ’02). USENIX Association. https://doi.org/10.5555/647057.713851

Digital Library

[2]

Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. 1991. Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles(SOSP ’91). Association for Computing Machinery. https://doi.org/10.1145/121132.121151

Digital Library

[3]

Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrent Computing: Practice and Experience 23, 2 (Feb. 2011), 187–198. https://doi.org/10.1002/cpe.1631

Digital Library

[4]

Purushotham V. Bangalore, Rolf Rabenseifner, Daniel J. Holmes, Julien Jaeger, Guillaume Mercier, Claudia Blaas-Schenner, and Anthony Skjellum. 2019. Exposition, Clarification, and Expansion of MPI Semantic Terms and Conventions: Is a Nonblocking MPI Function Permitted to Block?. In Proceedings of the 26th European MPI Users’ Group Meeting (Zürich, Switzerland) (EuroMPI ’19). Association for Computing Machinery, New York, NY, USA, Article 2, 10 pages. https://doi.org/10.1145/3343211.3343213

Digital Library

[5]

Brian W. Barrett, Ron Brightwell, Ryan E. Grant, Scott Hemmert, Kevin Pedretti, Kyle Wheeler, Keith Underwood, Rolf Riesen, Torsten Hoefler, Arthur B. Maccabe, and Trammell Hudson. 2018. The Portals 4.2 Network Programming Interface. Technical Report SAND2018-12790. Sandia National Laboratories.

[6]

David E. Bernholdt, Swen Boehm, George Bosilca, Manjunath Grentla Venkata, Ryan E. Grant, Thomas Naughton, Howard P. Pritchard, Martin Schulz, and Geoffroy R. Vallee. 2018. A Survey of MPI Usage in the US Exascale Computing Project. Concurrency Computation: Practice and Experience (09-2018 2018). https://doi.org/10.1002/cpe.4851

[7]

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra. 2011. DAGuE: A Generic Distributed DAG Engine for High Performance Computing. In 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. 1151–1158. https://doi.org/10.1109/IPDPS.2011.281

Digital Library

[8]

Emilio Castillo, Nikhil Jain, Marc Casas, Miquel Moreto, Martin Schulz, Ramon Beivide, Mateo Valero, and Abhinav Bhatele. 2019. Optimizing Computation-Communication Overlap in Asynchronous Task-Based Programs. In Proceedings of the ACM International Conference on Supercomputing(ICS ’19). Association for Computing Machinery. https://doi.org/10.1145/3330345.3330379

Digital Library

[9]

Barcelona Supercomputing Center. 2020. OmpSs-2 Specification. Technical Report. https://pm.bsc.es/ftp/ompss-2/doc/spec/OmpSs-2-Specification.pdf Last accessed May 12, 2020.

[10]

Gregor Daiß, Parsa Amini, John Biddiscombe, Patrick Diehl, Juhan Frank, Kevin Huck, Hartmut Kaiser, Dominic Marcello, David Pfander, and Dirk Pfüger. 2019. From Piz Daint to the Stars: Simulation of Stellar Mergers Using High-Level Abstractions. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC ’19). https://doi.org/10.1145/3295500.3356221

Digital Library

[11]

Rob F. Van der Wijngaart and Haoqiang Jin. 2003. NAS Parallel Benchmarks, Multi-Zone Versions. Technical Report NAS-03-010. NASA Advanced Supercomputing (NAS) Division.

[12]

James Dinan, Pavan Balaji, David Goodell, Douglas Miller, Marc Snir, and Rajeev Thakur. 2013. Enabling MPI Interoperability through Flexible Communication Endpoints. In Proceedings of the 20th European MPI Users’ Group Meeting(EuroMPI ’13). Association for Computing Machinery. https://doi.org/10.1145/2488551.2488553

Digital Library

[13]

Alejandro Duran, Ayguadem Eduard, Rosa M. Badia, Jesus Larbarta, Luis Martinell, Xavier Martorell, and Judit Plana. 2011. OmpSs: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters 21, 02 (2011), 173–193. https://doi.org/10.1142/S0129626411000151

[14]

Ralf S. Engelschall. 2006. GNU Pth—The GNU Portable Threads. Technical Report. https://www.gnu.org/software/pth/pth-manual.html Last accessed April 24, 2020.

[15]

Daniel P. Friedman, Christopher T. Haynes, and Eugene Kohlbecker. 1984. Programming with Continuations. In Program Transformation and Programming Environments, Peter Pepper (Ed.). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-46490-4_23

[16]

Nat Goodspeed and Oliver Kowalke. 2014. Distinguishing coroutines and fibers. Technical Report N4024. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4024.pdf Last accessed April 24, 2020.

[17]

Ryan E. Grant, Matthew G. F. Dosanjh, Michael J. Levenhagen, Ron Brightwell, and Anthony Skjellum. 2019. Finepoints: Partitioned Multithreaded MPI Communication. In High Performance Computing, Michèle Weiland, Guido Juckeland, Carsten Trinitis, and Ponnuswamy Sadayappan (Eds.). Springer International Publishing, Cham, 330–350.

[18]

Niklas Gustafsson, Artur Laksberg, Herb Sutter, and Sana Mithani. 2014. N3857: Improvements to std::future<T> and Related APIs. Technical Report N3857.

[19]

Marc-André Hermanns, Nathan T. Hjelm, Michael Knobloch, Kathryn Mohror, and Martin Schulz. 2019. The MPI_T events interface: An early evaluation and overview of the interface. Parallel Comput. (2019). https://doi.org/10.1016/j.parco.2018.12.006

[20]

Nathan Hjelm, Matthew G. F. Dosanjh, Ryan E. Grant, Taylor Groves, Patrick Bridges, and Dorian Arnold. 2018. Improving MPI Multi-threaded RMA Communication Performance. In Proceedings of the 47th International Conference on Parallel Processing(ICPP 2018). ACM, 58:1–58:11. https://doi.org/10.1145/3225058.3225114

Digital Library

[21]

Torsten Hoefler. 2008. Request Completion Callback Function. MPI Forum Discussion. Archived at https://github.com/mpi-forum/mpi-forum-historic/issues/26, last accessed July 6, 2020.

[22]

Torsten Hoefler, Greg Bronevetsky, Brian Barrett, Bronis R. de Supinski, and Andrew Lumsdaine. 2010. Efficient MPI Support for Advanced Hybrid Programming Models. In Recent Advances in the Message Passing Interface, Rainer Keller, Edgar Gabriel, Michael Resch, and Jack Dongarra (Eds.). Springer Berlin Heidelberg.

[23]

Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, Ryan E. Grant, and Ron Brightwell. 2017. sPIN: High-performance Streaming Processing In the Network. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC ’17). ACM, 59:1–59:16. https://doi.org/10.1145/3126908.3126970

Digital Library

[24]

IEEE and The Open Group. 2018. The Open Group Base Specifications Issue 7. IEEE Std 1003.1-2017. IEEE. https://pubs.opengroup.org/onlinepubs/9699919799/ Last accessed April 24, 2020.

[25]

Nusrat Sharmin Islam, Gengbin Zheng, Sayantan Sur, Akhil Langer, and Maria Garzaran. 2019. Minimizing the Usage of Hardware Counters for Collective Communication Using Triggered Operations. In Proceedings of the 26th European MPI Users’ Group Meeting (Zürich, Switzerland) (EuroMPI ’19). Association for Computing Machinery. https://doi.org/10.1145/3343211.3343222

Digital Library

[26]

S. Iwasaki, A. Amer, K. Taura, S. Seo, and P. Balaji. 2019. BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 29–42.

[27]

Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. HPX: A Task Based Programming Model in a Global Address Space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models(PGAS ’14). ACM, 6:1–6:11. https://doi.org/10.1145/2676870.2676883

Digital Library

[28]

Oliver Kowalke. 2013. Boost.Fiber – Overview. https://www.boost.org/doc/libs/1_72_0/libs/fiber/doc/html/fiber/overview.html Last accessed April 24, 2020.

[29]

Robert Latham, William Gropp, Robert Ross, and Rajeev Thakur. 2007. Extending the MPI-2 Generalized Request Interface. In Proceedings of the 14th European Conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface(PVM/MPI’07). Springer-Verlag.

[30]

Linux man-pages project 2019. makecontext, swapcontext – manipulate user context. Linux man-pages project. http://man7.org/linux/man-pages/man3/makecontext.3.htmlLast accessed April 24, 2020.

[31]

Linux man-pages project 2019. pthreads – POSIX threads. Linux man-pages project. http://man7.org/linux/man-pages/man7/pthreads.7.htmlLast accessed April 24, 2020.

[32]

H. Lu, S. Seo, and P. Balaji. 2015. MPI+ULT: Overlapping Communication and Computation with User-Level Threads. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. 444–454. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.82

[33]

Michael Mattsson. 1996. Object-Oriented Frameworks - A survey of methodological issues. Licentiate thesis. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.1424&rep=rep1&type=pdf

[34]

Guillaume Mercier, François Trahay, Darius Buntinas, and Elisabeth Brunet. 2009. NewMadeleine: An Efficient Support for High-Performance Networks in MPICH2. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS).

Digital Library

[35]

Microsoft 2018. About Processes and Threads. Microsoft. https://docs.microsoft.com/en-us/windows/win32/procthread/about-processes-and-threadsLast accessed April 24, 2020.

[36]

Microsoft 2018. Fibers. Microsoft. https://docs.microsoft.com/en-us/windows/win32/procthread/fibersLast accessed April 24, 2020.

[37]

MPI v3.1 2015. MPI: A Message-Passing Interface Standard. Technical Report. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdfLast accessed April 24, 2020.

[38]

NASA Advanced Supercomputing Division. [n.d.]. Problem Sizes and Parameters in NAS Parallel Benchmarks. https://www.nas.nasa.gov/publications/npb_problem_sizes.html

[39]

OpenFabrics Interfaces Working Group. 2017. High Performance Network Programming with OFI. https://github.com/ofiwg/ofi-guide/blob/master/OFIGuide.md Last accessed May 14, 2020.

[40]

OpenMP Architecture Review Board 2018. OpenMP Application Programming Interface, Version 5.0. OpenMP Architecture Review Board. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdfLast accessed April 24, 2020.

[41]

T. Patinyasakdikul, D. Eberius, G. Bosilca, and N. Hjelm. 2019. Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). 1–11. https://doi.org/10.1109/CLUSTER.2019.8891015

[42]

Howard Pritchard, Igor Gorodetsky, and Darius Buntinas. 2011. A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE. In Recent Advances in the Message Passing Interface, Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra (Eds.). Springer Berlin Heidelberg.

[43]

James Reinders. 2007. Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O’Reilly & Associates.

[44]

John C. Reynolds. 1993. The discoveries of continuations. LISP and Symbolic Computation (Nov. 1993). https://doi.org/10.1007/BF01019459

[45]

Kevin Sala, Jorge Bellón, Pau Farré, Xavier Teruel, Josep M. Perez, Antonio J. Peña, Daniel Holmes, Vicenç Beltran, and Jesus Labarta. 2018. Improving the Interoperability between MPI and Task-Based Programming Models. In Proceedings of the 25th European MPI Users’ Group Meeting(EuroMPI’18). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3236367.3236382

Digital Library

[46]

Kevin Sala, Xavier Teruel, Josep M. Perez, Antonio J. Peña, Vicenç Beltran, and Jesus Labarta. 2019. Integrating blocking and non-blocking MPI primitives with task-based programming models. Parallel Comput. 85 (Jul 2019), 153–166. https://doi.org/10.1016/j.parco.2018.12.008

[47]

Whit Schonbein, Ryan E. Grant, Matthew G. F. Dosanjh, and Dorian Arnold. 2019. INCA: In-Network Compute Assistance. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC ’19). Association for Computing Machinery. https://doi.org/10.1145/3295500.3356153

Digital Library

[48]

J. Schuchart, A. Bouteiller, and G. Bosilca. 2019. Using MPI-3 RMA for Active Messages. In 2019 IEEE/ACM Workshop on Exascale MPI (ExaMPI). 47–56.

[49]

Joseph Schuchart and José Gracia. 2019. Global Task Data-Dependencies in PGAS Applications. In High Performance Computing, Michèle Weiland, Guido Juckeland, Carsten Trinitis, and Ponnuswamy Sadayappan (Eds.). Springer International Publishing.

[50]

Joseph Schuchart, Keisuke Tsugane, José Gracia, and Mitsuhisa Sato. 2018. The Impact of Taskyield on the Design of Tasks Communicating Through MPI. In Evolving OpenMP for Evolving Architectures, Bronis R. de Supinski, Pedro Valero-Lara, Xavier Martorell, Sergi Mateo Bellido, and Jesus Labarta (Eds.). Springer International Publishing, 3–17. https://doi.org/10.1007/978-3-319-98521-3_1Awarded Best Paper.

[51]

S. Seo, A. Amer, P. Balaji, C. Bordage, G. Bosilca, A. Brooks, P. Carns, A. Castelló, D. Genet, T. Herault, S. Iwasaki, P. Jindal, L. V. Kalé, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, Y. Sun, K. Taura, and P. Beckman. 2018. Argobots: A Lightweight Low-Level Threading and Tasking Framework. IEEE Transactions on Parallel and Distributed Systems 29, 3 (March 2018), 512–526. https://doi.org/10.1109/TPDS.2017.2766062

[52]

Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. 2014. Operating System Concepts(9 ed.). John Wiley and Sons.

[53]

Dylan T. Stark, Richard F. Barrett, Ryan E. Grant, Stephen L. Olivier, Kevin T. Pedretti, and Courtenay T. Vaughan. 2014. Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications. In Proceedings of the 2014 Workshop on Exascale MPI(ExaMPI ’14). IEEE Press. https://doi.org/10.1109/ExaMPI.2014.6

Digital Library

[54]

Sun Microsystems, Inc 2002. Multithreading in the Solaris Operating Environment. Sun Microsystems, Inc. https://web.archive.org/web/20090226174929http://www.sun.com/software/whitepapers/solaris9/multithread.pdfLast accessed April 24, 2020.

[55]

The FreeBSD Project 2018. pthread – POSIX thread functions. The FreeBSD Project. https://www.freebsd.org/cgi/man.cgi?query=pthreadLast accessed April 24, 2020.

[56]

The NetBSD project 2016. pthread – POSIX Threads Library. The NetBSD project. https://netbsd.gw.com/cgi-bin/man-cgi?pthread+3.i386+NetBSD-8.0Last accessed April 24, 2020.

[57]

Ted Unangst, Kurt Miller, Marco S Hyman, Otto Moerbeek, and Philip Guenther. 2019. pthreads – POSIX 1003.1c thread interface. The OpenBSD project. https://man.openbsd.org/pthreads.3Last accessed April 24, 2020.

[58]

Unified Communication Framework Consortium. 2019. UCX: Unified Communication X API Standard v1.6. Unified Communication Framework Consortium. https://github.com/openucx/ucx/wiki/api-doc/v1.6/ucx-v1.6.pdf

[59]

Uresh Vahalia. 1998. UNIX Internals: The New Frontiers. Pearson.

[60]

Md. Wasi-ur Rahman, David Ozog, and James Dinan. 2020. Simplifying Communication Overlap in OpenSHMEM Through Integrated User-Level Thread Scheduling. In High Performance Computing, Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, and Hatem Ltaief (Eds.). Springer International Publishing.

[61]

K. B. Wheeler, R. C. Murphy, and D. Thain. 2008. Qthreads: An API for programming with millions of lightweight threads. In 2008 IEEE International Symposium on Parallel and Distributed Processing. https://doi.org/10.1109/IPDPS.2008.4536359

Cited By

Bak SBertoni CBoehm SBudiardja RChapman BDoerfert JEisenbach MFinkel HHernandez OHuber JIwasaki SKale VKent PKwack JLin MLuszczek PLuo YPham BPophale SRavikumar KSarkar VScogland TTian SYeung P(2021)OpenMP application experiencesParallel Computing10.1016/j.parco.2021.102856109:COnline publication date: 30-Dec-2021
https://dl.acm.org/doi/10.1016/j.parco.2021.102856

Recommendations

Frustrated With MPI+Threads? Try MPIxThreads!
EuroMPI '23: Proceedings of the 30th European MPI Users' Group Meeting

MPI + Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP provides ...
Read More
Integrating blocking and non-blocking MPI primitives with task-based programming models
Highlights
- The Task-Aware MPI (TAMPI) library allows the interoperability between MPI and tasks.
Abstract
In this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-blocking MPI primitives with task-based programming models. The TAMPI library leverages two new runtime APIs to improve both ...
Read More
Toward Supporting Multi-GPU Targets via Taskloop and User-Defined Schedules
OpenMP: Portable Multi-Level Parallelism on Modern Systems
Abstract
Many modern supercomputers such as ORNL’s Summit, LLNL’s Sierra, and LBL’s upcoming Perlmutter offer or will offer multiple, e.g., 4 to 8, GPUs per node for running computational science and engineering applications. One should expect an ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EuroMPI/USA '20: Proceedings of the 27th European MPI Users' Group Meeting

September 2020

88 pages

ISBN:9781450388801

DOI:10.1145/3416315

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Bundesministerium für Bildung und Forschung
Deutsche Forschungsgemeinschaft

Conference

EuroMPI/USA '20

EuroMPI/USA '20: 27th European MPI Users' Group Meeting

September 21 - 24, 2020

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
137
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Bak SBertoni CBoehm SBudiardja RChapman BDoerfert JEisenbach MFinkel HHernandez OHuber JIwasaki SKale VKent PKwack JLin MLuszczek PLuo YPham BPophale SRavikumar KSarkar VScogland TTian SYeung P(2021)OpenMP application experiencesParallel Computing10.1016/j.parco.2021.102856109:COnline publication date: 30-Dec-2021
https://dl.acm.org/doi/10.1016/j.parco.2021.102856

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents