Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2966884.2966914acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Towards millions of communicating threads

Published: 25 September 2016 Publication History

Abstract

We explore in this paper the advantages that accrue from avoiding the use of wildcards in MPI. We show that, with this change, one can efficiently support millions of concurrently communicating light-weight threads using send-receive communication.

References

[1]
Graph 500. http://www.graph500.org/. {Online; accessed 13-May-2016}.
[2]
TACC Stampede Cluster. http://www.xsede.org/resources/overview, 2016.
[3]
The unbalanced tree search benchmark. https://sourceforge.net/projects/uts-benchmark/files/, 2016.
[4]
A. Amer, H. Lu, Y. Wei, P. Balaji, and S. Matsuoka. MPI+threads: Runtime contention and remedies. ACM SIGPLAN Notices, 50(8):239--248, 2015.
[5]
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115--144, 2001.
[6]
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Toward efficient support for multithreaded MPI communication. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 120--129. Springer, 2008.
[7]
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Fine-grained multithreading support for hybrid threaded MPI programming. International Journal of High Performance Computing Applications, 24(1):49--57, 2010.
[8]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[9]
A. Brooks, H.-V. Dang, N. Dryden, and M. Snir. PPL: an abstract runtime system for hybrid parallel programming. In Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, pages 2--9. ACM, 2015.
[10]
S. G. Caglar, G. D. Benson, Q. Huang, and C.-W. Chu. USFMPI: a multi-threaded implementation of MPI for Linux clusters. In Fifteenth IASTED International Conference on Parallel and Distributed Computing and Systems, pages 674--680, 2003.
[11]
S. Chatterjee, S. Tasιrlar, Z. Budimlic, V. Cave, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan. Integrating asynchronous task parallelism with MPI. In IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013, pages 712--725. IEEE, 2013.
[12]
E. D. Demaine. A threads-only mpi implementation for the development of parallel programs. In Proceedings of the 11th international symposium on high performance computing systems, pages 153--163. Citeseer, 1997.
[13]
A. Denis. pioman: a pthread-based multithreaded communication engine. In Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on, pages 155--162. IEEE, 2015.
[14]
P. Dhabaleswar. OSU Micro-Benchmarks 5.3. http://mvapich.cse.ohio-state.edu/benchmarks/, 2016. {Online; accessed 18-April-2016}.
[15]
J. Dinan, S. Olivier, G. Sabin, J. Prins, P. Sadayappan, and C.-W. Tseng. Dynamic load balancing of unbalanced computations using message passing. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1--8. IEEE, 2007.
[16]
J. Dongarra, D. Walker, E. Lusk, B. Knighten, M. Snir, A. Geist, S. Otto, R. Hempel, E. Lusk, W. Gropp, et al. MPI: a message-passing interface standard. International Journal of Supercomputer Applications and High Performance Computing, 8(3-4):165, 1994.
[17]
G. Dózsa, S. Kumar, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, J. Ratterman, and R. Thakur. Enabling concurrent multithreaded MPI communication on multicore petascale systems. In Recent Advances in the Message Passing Interface, pages 11--20. Springer, 2010.
[18]
M. Flajslik, J. Dinan, and K. D. Underwood. Mitigating MPI message matching misery. In International Supercomputing Conference, 2016.
[19]
F. García, A. Calderón, and J. Carretero. Mimpi: A multithread-safe implementation of mpi. In European Parallel Virtual Machine/Message Passing Interface UsersâĂŹ Group Meeting, pages 207--214. Springer, 1999.
[20]
A. Geist, W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, W. Saphir, T. Skjellum, and M. Snir. MPI-2: Extending the message-passing interface. In Euro-Par'96 Parallel Processing, pages 128--135. Springer, 1996.
[21]
W. Gropp and M. Snir. Programming for exascale computers. Computing in Science & Engineering, 15(6):27--35, 2013.
[22]
W. Gropp and R. Thakur. Issues in developing a thread-safe MPI implementation. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 12--21. Springer, 2006.
[23]
M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463--492, 1990.
[24]
C. A. R. Hoare. Communicating sequential processes. Springer, 1978.
[25]
C. Huang, O. Lawlor, and L. V. Kale. Adaptive mpi. In International workshop on languages and compilers for parallel computing, pages 306--322. Springer, 2003.
[26]
W. Huang, G. Santhanaraman, H.-W. Jin, Q. Gao, and D. K. Panda. Design of high performance MVAPICH2: MPI2 over InfiniBand. In Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 06), volume 1, pages 43--48. IEEE, 2006.
[27]
J. Jose, S. Potluri, K. Tomko, and D. K. Panda. Designing scalable Graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In Supercomputing, pages 109--124. Springer, 2013.
[28]
H. Kamal and A. Wagner. Fg-mpi: Fine-grain mpi for multicore and clusters. In Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, pages 1--8. IEEE, 2010.
[29]
X. Li, D. G. Andersen, M. Kaminsky, and M. J. Freedman. Algorithmic improvements for fast concurrent cuckoo hashing. In Proceedings of the Ninth European Conference on Computer Systems, page 27. ACM, 2014.
[30]
J. Liu, J. Wu, and D. K. Panda. High performance RDMA-based MPI implementation over InfiniBand. International Journal of Parallel Programming, 32(3):167--198, 2004.
[31]
H. Lu, S. Seo, and P. Balaji. MPI+ ULT: Overlapping communication and computation with user-level threads. In High Performance Computing and Communications (HPCC), 2015 IEEE 17th International Conference on, pages 444--454. IEEE, 2015.
[32]
M. Luo, D. K. Panda, K. Z. Ibrahim, and C. Iancu. Congestion avoidance on manycore high performance computing systems. In Proceedings of the 26th ACM international conference on Supercomputing, pages 121--132. ACM, 2012.
[33]
R. Machado, C. Lojewski, S. Abreu, and F.-J. Pfreundt. Unbalanced tree search on a manycore system using the GPI programming model. Computer Science-Research and Development, 26 (3-4): 229--236, 2011.
[34]
S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C.-W. Tseng. UTS: An unbalanced tree search benchmark. In Languages and Compilers for Parallel Computing, pages 235--250. Springer, 2006.
[35]
Message Passing Interface Forum. MPI 4.0 Standardization Effort, Point to Point Communication. https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/PtpWikiPage. {Online; accessed 6-May-2016}.
[36]
NASA. NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html, 2016.
[37]
C. Pheatt. Intel® threading building blocks. Journal of Computing Sciences in Colleges, 23(4):298--298, 2008.
[38]
E. R. Rodrigues, P. O. A. Navaux, J. Panetta, and C. L. Mendes. A new technique for data privatization in user-level threads and its use in parallel applications. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC '10, pages 2149--2154, New York, NY, USA, 2010. ACM.
[39]
S. Seo, A. Amer, P. Balaji, P. Beckman, C. Bordage, G. Bosilca, A. Brooks, A. CastellÃş, D. Genet, T. Herault, P. Jindal, L. V. Kale, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, and Y. Sun. Argobots: A lightweight low-level threading/tasking framework. Technical Report ANL/MCS-P5515-0116, 2016.
[40]
M. Si, A. J. Peña, P. Balaji, M. Takagi, and Y. Ishikawa. MT-MPI: Multithreaded MPI for many-core environments. In Proceedings of the 28th ACM international conference on Supercomputing, pages 125--134. ACM, 2014.
[41]
A. Skjellum, B. Protopopov, and S. Hebert. A thread taxonomy for mpi. In MPI Developer's Conference, 1996. Proceedings., Second, pages 50--57. IEEE, 1996.
[42]
D. T. Stark, R. F. Barrett, R. E. Grant, S. L. Olivier, K. T. Pedretti, and C. T. Vaughan. Early experiences co-scheduling work and communication tasks for hybrid MPI+X applications. In Proceedings of the 2014 Workshop on Exascale MPI, pages 9--19. IEEE Press, 2014.
[43]
H. Tang, K. Shen, and T. Yang. Compile/run-time support for threaded mpi execution on multiprogrammed shared memory machines. In ACM SIGPLAN Notices, volume 34, pages 107--118. ACM, 1999.
[44]
The Open MPI Project. Is Open MPI thread safe. shttps://www.open-mpi.org/faq/?category=supported-systems#thread-support, 2016. {Online; accessed 8-May-2016}.

Cited By

View all
  • (2023)Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication EngineProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605642(153-162)Online publication date: 7-Aug-2023
  • (2022)On-the-Fly Repairing of Atomicity Violations in ARINC 653 SoftwareApplied Sciences10.3390/app1204201412:4(2014)Online publication date: 15-Feb-2022
  • (2022)Improving Communication Asynchrony and Concurrency for Adaptive MPI Endpoints2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI56604.2022.00007(11-21)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
September 2016
225 pages
ISBN:9781450342346
DOI:10.1145/2966884
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI
  2. Message Passing Interface
  3. communication
  4. concurrent execution
  5. multi-threading
  6. runtime system

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroMPI 2016
EuroMPI 2016: The 23rd European MPI Users' Group Meeting
September 25 - 28, 2016
Edinburgh, United Kingdom

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication EngineProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605642(153-162)Online publication date: 7-Aug-2023
  • (2022)On-the-Fly Repairing of Atomicity Violations in ARINC 653 SoftwareApplied Sciences10.3390/app1204201412:4(2014)Online publication date: 15-Feb-2022
  • (2022)Improving Communication Asynchrony and Concurrency for Adaptive MPI Endpoints2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI56604.2022.00007(11-21)Online publication date: Nov-2022
  • (2021)CSPACER: A Reduced API Set Runtime for the Space Consistency ModelThe International Conference on High Performance Computing in Asia-Pacific Region10.1145/3432261.3432272(58-68)Online publication date: 20-Jan-2021
  • (2021)Plasticity-on-Chip Design: Exploiting Self-Similarity for Data CommunicationsIEEE Transactions on Computers10.1109/TC.2021.307150770:6(950-962)Online publication date: 1-Jun-2021
  • (2021)MiniMod: A Modular Miniapplication Benchmarking Framework for HPC2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00028(12-22)Online publication date: Sep-2021
  • (2020)Communication and Timing Issues with MPI VirtualizationProceedings of the 27th European MPI Users' Group Meeting10.1145/3416315.3416317(11-20)Online publication date: 21-Sep-2020
  • (2020)Performance drop at executing communication-intensive parallel algorithmsThe Journal of Supercomputing10.1007/s11227-019-03142-876:9(6834-6859)Online publication date: 1-Sep-2020
  • (2020)Automatic Detection of MPI AssertionsHigh Performance Computing10.1007/978-3-030-59851-8_3(34-42)Online publication date: 22-Jun-2020
  • (2019)MPI tag matching performance on ConnectX and ARMProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343224(1-10)Online publication date: 11-Sep-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media