research-article

Towards millions of communicating threads

Authors:

Hoang-Vu Dang,

Marc Snir,

William GroppAuthors Info & Claims

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

Pages 1 - 14

https://doi.org/10.1145/2966884.2966914

Published: 25 September 2016 Publication History

Get Access

Abstract

We explore in this paper the advantages that accrue from avoiding the use of wildcards in MPI. We show that, with this change, one can efficiently support millions of concurrently communicating light-weight threads using send-receive communication.

References

[1]

Graph 500. http://www.graph500.org/. {Online; accessed 13-May-2016}.

Google Scholar

[2]

TACC Stampede Cluster. http://www.xsede.org/resources/overview, 2016.

Google Scholar

[3]

The unbalanced tree search benchmark. https://sourceforge.net/projects/uts-benchmark/files/, 2016.

Google Scholar

[4]

A. Amer, H. Lu, Y. Wei, P. Balaji, and S. Matsuoka. MPI+threads: Runtime contention and remedies. ACM SIGPLAN Notices, 50(8):239--248, 2015.

Digital Library

Google Scholar

[5]

N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115--144, 2001.

Crossref

Google Scholar

[6]

P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Toward efficient support for multithreaded MPI communication. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 120--129. Springer, 2008.

Digital Library

Google Scholar

[7]

P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Fine-grained multithreading support for hybrid threaded MPI programming. International Journal of High Performance Computing Applications, 24(1):49--57, 2010.

Digital Library

Google Scholar

[8]

B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.

Digital Library

Google Scholar

[9]

A. Brooks, H.-V. Dang, N. Dryden, and M. Snir. PPL: an abstract runtime system for hybrid parallel programming. In Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, pages 2--9. ACM, 2015.

Digital Library

Google Scholar

[10]

S. G. Caglar, G. D. Benson, Q. Huang, and C.-W. Chu. USFMPI: a multi-threaded implementation of MPI for Linux clusters. In Fifteenth IASTED International Conference on Parallel and Distributed Computing and Systems, pages 674--680, 2003.

Google Scholar

[11]

S. Chatterjee, S. Tasιrlar, Z. Budimlic, V. Cave, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan. Integrating asynchronous task parallelism with MPI. In IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013, pages 712--725. IEEE, 2013.

Digital Library

Google Scholar

[12]

E. D. Demaine. A threads-only mpi implementation for the development of parallel programs. In Proceedings of the 11th international symposium on high performance computing systems, pages 153--163. Citeseer, 1997.

Google Scholar

[13]

A. Denis. pioman: a pthread-based multithreaded communication engine. In Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on, pages 155--162. IEEE, 2015.

Digital Library

Google Scholar

[14]

P. Dhabaleswar. OSU Micro-Benchmarks 5.3. http://mvapich.cse.ohio-state.edu/benchmarks/, 2016. {Online; accessed 18-April-2016}.

Google Scholar

[15]

J. Dinan, S. Olivier, G. Sabin, J. Prins, P. Sadayappan, and C.-W. Tseng. Dynamic load balancing of unbalanced computations using message passing. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1--8. IEEE, 2007.

Crossref

Google Scholar

[16]

J. Dongarra, D. Walker, E. Lusk, B. Knighten, M. Snir, A. Geist, S. Otto, R. Hempel, E. Lusk, W. Gropp, et al. MPI: a message-passing interface standard. International Journal of Supercomputer Applications and High Performance Computing, 8(3-4):165, 1994.

Google Scholar

[17]

G. Dózsa, S. Kumar, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, J. Ratterman, and R. Thakur. Enabling concurrent multithreaded MPI communication on multicore petascale systems. In Recent Advances in the Message Passing Interface, pages 11--20. Springer, 2010.

Digital Library

Google Scholar

[18]

M. Flajslik, J. Dinan, and K. D. Underwood. Mitigating MPI message matching misery. In International Supercomputing Conference, 2016.

Crossref

Google Scholar

[19]

F. García, A. Calderón, and J. Carretero. Mimpi: A multithread-safe implementation of mpi. In European Parallel Virtual Machine/Message Passing Interface UsersâĂ&Zacute; Group Meeting, pages 207--214. Springer, 1999.

Digital Library

Google Scholar

[20]

A. Geist, W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, W. Saphir, T. Skjellum, and M. Snir. MPI-2: Extending the message-passing interface. In Euro-Par'96 Parallel Processing, pages 128--135. Springer, 1996.

Digital Library

Google Scholar

[21]

W. Gropp and M. Snir. Programming for exascale computers. Computing in Science & Engineering, 15(6):27--35, 2013.

Digital Library

Google Scholar

[22]

W. Gropp and R. Thakur. Issues in developing a thread-safe MPI implementation. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 12--21. Springer, 2006.

Digital Library

Google Scholar

[23]

M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463--492, 1990.

Digital Library

Google Scholar

[24]

C. A. R. Hoare. Communicating sequential processes. Springer, 1978.

Digital Library

Google Scholar

[25]

C. Huang, O. Lawlor, and L. V. Kale. Adaptive mpi. In International workshop on languages and compilers for parallel computing, pages 306--322. Springer, 2003.

Google Scholar

[26]

W. Huang, G. Santhanaraman, H.-W. Jin, Q. Gao, and D. K. Panda. Design of high performance MVAPICH2: MPI2 over InfiniBand. In Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 06), volume 1, pages 43--48. IEEE, 2006.

Digital Library

Google Scholar

[27]

J. Jose, S. Potluri, K. Tomko, and D. K. Panda. Designing scalable Graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In Supercomputing, pages 109--124. Springer, 2013.

Crossref

Google Scholar

[28]

H. Kamal and A. Wagner. Fg-mpi: Fine-grain mpi for multicore and clusters. In Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, pages 1--8. IEEE, 2010.

Google Scholar

[29]

X. Li, D. G. Andersen, M. Kaminsky, and M. J. Freedman. Algorithmic improvements for fast concurrent cuckoo hashing. In Proceedings of the Ninth European Conference on Computer Systems, page 27. ACM, 2014.

Digital Library

Google Scholar

[30]

J. Liu, J. Wu, and D. K. Panda. High performance RDMA-based MPI implementation over InfiniBand. International Journal of Parallel Programming, 32(3):167--198, 2004.

Digital Library

Google Scholar

[31]

H. Lu, S. Seo, and P. Balaji. MPI+ ULT: Overlapping communication and computation with user-level threads. In High Performance Computing and Communications (HPCC), 2015 IEEE 17th International Conference on, pages 444--454. IEEE, 2015.

Digital Library

Google Scholar

[32]

M. Luo, D. K. Panda, K. Z. Ibrahim, and C. Iancu. Congestion avoidance on manycore high performance computing systems. In Proceedings of the 26th ACM international conference on Supercomputing, pages 121--132. ACM, 2012.

Digital Library

Google Scholar

[33]

R. Machado, C. Lojewski, S. Abreu, and F.-J. Pfreundt. Unbalanced tree search on a manycore system using the GPI programming model. Computer Science-Research and Development, 26 (3-4): 229--236, 2011.

Digital Library

Google Scholar

[34]

S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C.-W. Tseng. UTS: An unbalanced tree search benchmark. In Languages and Compilers for Parallel Computing, pages 235--250. Springer, 2006.

Digital Library

Google Scholar

[35]

Message Passing Interface Forum. MPI 4.0 Standardization Effort, Point to Point Communication. https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/PtpWikiPage. {Online; accessed 6-May-2016}.

Google Scholar

[36]

NASA. NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html, 2016.

Google Scholar

[37]

C. Pheatt. Intel® threading building blocks. Journal of Computing Sciences in Colleges, 23(4):298--298, 2008.

Digital Library

Google Scholar

[38]

E. R. Rodrigues, P. O. A. Navaux, J. Panetta, and C. L. Mendes. A new technique for data privatization in user-level threads and its use in parallel applications. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC '10, pages 2149--2154, New York, NY, USA, 2010. ACM.

Digital Library

Google Scholar

[39]

S. Seo, A. Amer, P. Balaji, P. Beckman, C. Bordage, G. Bosilca, A. Brooks, A. CastellÃş, D. Genet, T. Herault, P. Jindal, L. V. Kale, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, and Y. Sun. Argobots: A lightweight low-level threading/tasking framework. Technical Report ANL/MCS-P5515-0116, 2016.

Google Scholar

[40]

M. Si, A. J. Peña, P. Balaji, M. Takagi, and Y. Ishikawa. MT-MPI: Multithreaded MPI for many-core environments. In Proceedings of the 28th ACM international conference on Supercomputing, pages 125--134. ACM, 2014.

Digital Library

Google Scholar

[41]

A. Skjellum, B. Protopopov, and S. Hebert. A thread taxonomy for mpi. In MPI Developer's Conference, 1996. Proceedings., Second, pages 50--57. IEEE, 1996.

Digital Library

Google Scholar

[42]

D. T. Stark, R. F. Barrett, R. E. Grant, S. L. Olivier, K. T. Pedretti, and C. T. Vaughan. Early experiences co-scheduling work and communication tasks for hybrid MPI+X applications. In Proceedings of the 2014 Workshop on Exascale MPI, pages 9--19. IEEE Press, 2014.

Digital Library

Google Scholar

[43]

H. Tang, K. Shen, and T. Yang. Compile/run-time support for threaded mpi execution on multiprogrammed shared memory machines. In ACM SIGPLAN Notices, volume 34, pages 107--118. ACM, 1999.

Digital Library

Google Scholar

[44]

The Open MPI Project. Is Open MPI thread safe. shttps://www.open-mpi.org/faq/?category=supported-systems#thread-support, 2016. {Online; accessed 8-May-2016}.

Google Scholar

Cited By

View all

Mor OBosilca GSnir M(2023)Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication EngineProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605642(153-162)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605642
Choi EKim TJun YLee SHan M(2022)On-the-Fly Repairing of Atomicity Violations in ARINC 653 SoftwareApplied Sciences10.3390/app1204201412:4(2014)Online publication date: 15-Feb-2022
https://doi.org/10.3390/app12042014
White SKale L(2022)Improving Communication Asynchrony and Concurrency for Adaptive MPI Endpoints2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI56604.2022.00007(11-21)Online publication date: Nov-2022
https://doi.org/10.1109/ExaMPI56604.2022.00007
Show More Cited By

Recommendations

Frustrated With MPI+Threads? Try MPIxThreads!
EuroMPI '23: Proceedings of the 30th European MPI Users' Group Meeting

MPI + Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP provides ...
Eliminating contention bottlenecks in multithreaded MPI

The performance sustains with many thousands of concurrently communicating threads.A constant time overhead algorithm for MPI point-to-point communication.A thread scheduler that achieves a single write for marking a thread as runnable.A new set of ...
Extending TP—monitors for intra-transaction parallelism
DIS '96: Proceedings of the fourth international conference on on Parallel and distributed information systems

Inter-transaction parallelism, the concurrent execution of independent client transactions, is currently well supported by database systems. Intra-transaction parallelism, the parallel execution of operations within the same transaction, is generally ...

Comments

Information & Contributors

Information

Published In

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

September 2016

225 pages

ISBN:9781450342346

DOI:10.1145/2966884

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EuroMPI 2016

EuroMPI 2016: The 23rd European MPI Users' Group Meeting

September 25 - 28, 2016

Edinburgh, United Kingdom

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
183
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Mor OBosilca GSnir M(2023)Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication EngineProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605642(153-162)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605642
Choi EKim TJun YLee SHan M(2022)On-the-Fly Repairing of Atomicity Violations in ARINC 653 SoftwareApplied Sciences10.3390/app1204201412:4(2014)Online publication date: 15-Feb-2022
https://doi.org/10.3390/app12042014
White SKale L(2022)Improving Communication Asynchrony and Concurrency for Adaptive MPI Endpoints2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI56604.2022.00007(11-21)Online publication date: Nov-2022
https://doi.org/10.1109/ExaMPI56604.2022.00007
Ibrahim K(2021)CSPACER: A Reduced API Set Runtime for the Space Consistency ModelThe International Conference on High Performance Computing in Asia-Pacific Region10.1145/3432261.3432272(58-68)Online publication date: 20-Jan-2021
https://dl.acm.org/doi/10.1145/3432261.3432272
Xiao YNazarian SBogdan P(2021)Plasticity-on-Chip Design: Exploiting Self-Similarity for Data CommunicationsIEEE Transactions on Computers10.1109/TC.2021.307150770:6(950-962)Online publication date: 1-Jun-2021
https://doi.org/10.1109/TC.2021.3071507
Marts WDosanjh MLevy SSchonbein WGrant RBridges P(2021)MiniMod: A Modular Miniapplication Benchmarking Framework for HPC2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00028(12-22)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00028
Nigay AMosimann LSchneider THoefler T(2020)Communication and Timing Issues with MPI VirtualizationProceedings of the 27th European MPI Users' Group Meeting10.1145/3416315.3416317(11-20)Online publication date: 21-Sep-2020
https://dl.acm.org/doi/10.1145/3416315.3416317
Moríñigo JGarcía-Muller PRubio-Montero AGómez-Iglesias AMeyer NMayo-García R(2020)Performance drop at executing communication-intensive parallel algorithmsThe Journal of Supercomputing10.1007/s11227-019-03142-876:9(6834-6859)Online publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1007/s11227-019-03142-8
Jammer TIwainsky CBischof C(2020)Automatic Detection of MPI AssertionsHigh Performance Computing10.1007/978-3-030-59851-8_3(34-42)Online publication date: 22-Jun-2020
https://dl.acm.org/doi/10.1007/978-3-030-59851-8_3
Marts WDosanjh MSchonbein WGrant RBridges P(2019)MPI tag matching performance on ConnectX and ARMProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343224(1-10)Online publication date: 11-Sep-2019
https://dl.acm.org/doi/10.1145/3343211.3343224
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Frustrated With MPI+Threads? Try MPIxThreads!

Eliminating contention bottlenecks in multithreaded MPI

Extending TP—monitors for intra-transaction parallelism

Comments

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Recommendations

Frustrated With MPI+Threads? Try MPIxThreads!

Eliminating contention bottlenecks in multithreaded MPI

Extending TP—monitors for intra-transaction parallelism

Comments

Information

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations