article

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

Authors:

Carlos H. González,

Basilio B. FraguelaAuthors Info & Claims

Cluster Computing, Volume 20, Issue 3

Pages 2605 - 2626

https://doi.org/10.1007/s10586-017-0766-y

Published: 01 September 2017 Publication History

Abstract

Divide-and-conquer is one of the most important patterns of parallelism, being applicable to a large variety of problems. In addition, the most powerful parallel systems available nowadays are computer clusters composed of distributed-memory nodes that contain an increasing number of cores that share a common memory. The optimal exploitation of these systems often requires resorting to a hybrid model that mimics the underlying hardware by combining a distributed and a shared memory parallel programming model. This results in longer development times and increased maintenance costs. In this paper we present a very general skeleton library that allows to parallelize any divide-and-conquer problem in hybrid distributed-shared memory systems with little effort while providing much flexibility and good performance. Our proposal combines a message-passing paradigm at the process level and a threaded model inside each process, hiding the related complexity from the user. The evaluation shows that this skeleton provides performance comparable, and often better than that of manually optimized codes while requiring considerably less effort when parallelizing applications on multi-core clusters.

References

[1]

Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA (1974)

Digital Library

[2]

Aldinucci, M., Danelutto, M., Teti, P.: An advanced environment supporting structured parallel programming in Java. Future Gener. Comput. Syst. 19(5), 611---626 (2003)

Digital Library

[3]

Barnes, J., Hut, P.: A hierarchical O (N log N) force-calculation algorithm. Nature 324(4), 446---559 (1986)

[4]

Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana- Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1---26 (2005)

Digital Library

[5]

Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J Parallel Distrib. Comput. 37(1), 55---69 (1996)

Digital Library

[6]

Boost.org. Boost C++ libraries. http://boost.org (2016). Accessed 10 Dec 2016

[7]

Ciechanowicz, P., Kuchen, H.: Enhancing Muesli's data parallel skeletons for multi-core computer architectures. In 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 108---113. Los Alamitos, CA (2010)

Digital Library

[8]

Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA (1989)

Digital Library

[9]

Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389---406 (2004)

Digital Library

[10]

Danelutto, M., De Matteis, T., Mencagli, G., Torquati, M.: A divide-and-conquer parallel pattern implementation for multicores. In Proceedings 3rd International Workshop on Software Engineering for Parallel Systems, SEPS 2016, pp. 10---19. New York, NY (2016). ACM

Digital Library

[11]

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107---113 (2008)

Digital Library

[12]

Denis, A.: pioman: A pthread-based multithreaded communication engine. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 155---162. Los Alamitos, CA (2015)

Digital Library

[13]

Falcou, J., Sérot, J., Chateau, T., Lapresté, J.-T.: Quaff: efficient C++ design for parallel skeletons. Parallel Comput. 32(7---8), 604---615 (2006)

Digital Library

[14]

Fleming, P.J., Wallace, J.J.: How not to lie with statistics: the correct way to summarize benchmark results. Commun. ACM 29(3), 218---221 (1986)

Digital Library

[15]

Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In Proceeding of 40th Annual Symposium on Foundations of Computer Science, FOCS '99, pp. 285---297. Washington, DC, USA. IEEE Computer Society (1999)

Digital Library

[16]

González, C.H., Fraguela, B.B.: A generic algorithm template for divide-and-conquer in multicore systems. In Proceedings of 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 79---88. Los Alamitos, CA (2010)

Digital Library

[17]

González, C.H., Fraguela, B.B.: A framework for argument-based task synchronization with automatic detection of dependencies. Parallel Comput. 39(9), 475---489 (2013)

[18]

González, C.H., Fraguela, B.B.: An algorithm template for domain-based parallel irregular algorithms. Int. J. Parallel Program. 42(6), 948---967 (2014)

Digital Library

[19]

Gorlatch, S., Cole, M.: Parallel skeletons. Encyclopedia of Parallel Computing, pp. 1417---1422. Springer, New York (2011)

[20]

Gregor, D., Troyer, M.: Boost.MPI. http://boost.cowic.de/rc/pdf/mpi.pdf (2007)

[21]

Halstead, M.H.: Elements of Software Science. Elsevier, New York, NY (1977)

Digital Library

[22]

Hijma, P., Jacobs, C.J.H., van Nieuwpoort, R.V., Bal, H.E.: Cashmere: Heterogeneous many-core computing. In 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2015), pp. 135---145 (2015)

Digital Library

[23]

Horowitz, E., Zorat, A.: Divide-and-conquer for parallel processing. IEEE Trans. Comput. 32(6), 582---585 (1983)

Digital Library

[24]

Intel$$\textregistered $$®. Cilk$$^{\text{TM}}$$TM Plus. https://www.cilkplus.org (2016). Accessed 10 Dec 2016

[25]

Karasawa, Y., Iwasaki, H.: A parallel skeleton library for multi-core clusters. In Proceedings of 2009 International Conference on Parallel Processing (ICPP'09), pp. 84---91. Los Alamitos, CA (2009)

Digital Library

[26]

Kawakatsu, T., Kinoshita, A., Takasu, A., Adachi, J.: Divide-and-Conquer Parallelism for Learning Mixture Models, pp. 23---47. Springer, Berlin (2016)

Digital Library

[27]

Kuchen, H.: A skeleton library. Euro-Par 2002 Parallel Processing. Volume 2400 of Lecture Notes in Computer Science, pp. 620---629. Springer, Berlin (2002)

Digital Library

[28]

Kulkarni, M., Burtscher, M., Cascaval, C., Pingali, K.: Lonestar: A suite of parallel irregular programs. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 65---76 (2009)

[29]

Leyton, M., Piquer, J. M.: Skandium: Multi-core programming with algorithmic skeletons. In Proceedings of 18th Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2010), pp. 289---296. Los Alamitos, CA (2010)

Digital Library

[30]

Lima, J.V.F., Broquedis, F., Gautier, T., Raffin, B.: Preliminary experiments with xKaapi on Intel Xeon Phi coprocessor. In Proceedings of 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2013), pp. 105---112. Los Alamitos, CA (2013)

Digital Library

[31]

Mallón, D.A., Taboada, G.L., Teijeiro, C., Touriño, J., Fraguela, B.B., Gómez-Tato, A., Doallo, R., Carlos Mouriño, J.: Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users' Group Meeting, pp. 174---184. Springer, Berlin (2009)

Digital Library

[32]

Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley Professional, Boston, MA (2004)

Digital Library

[33]

McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 2, 308---320 (1976)

Digital Library

[34]

Nakatsukasa, Y., Higham, N .J.: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD. SIAM J. Sci. Comput. 35(3), A1325---A1349 (2013)

Digital Library

[35]

National Aeronautics and Space Administration. NAS Parallel Benchmarks. http://www.nas.nasa.gov/Software/NPB/ (2010). Accessed 10 Dec 2016

[36]

Olivier, S.L., Prins, J.F.: Comparison of OpenMP 3.0 and other task parallel frameworks on unbalanced task graphs. Int. J. Parallel Program. 38(5---6), 341---360 (2010)

[37]

Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly, Sebastopol, CA (2007)

Digital Library

[38]

Rogers, A., Carlisle, M .C., Reppy, J .H., Hendren, L .J.: Supporting dynamic data structures on distributed-memory machines. ACM Trans. Program. Lang. Syst. 17(2), 233---263 (1995)

Digital Library

[39]

Tang, G., Yang, W., Li, K., Ye, Y., Xiao, G., Li, K.: An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures. Concurr. Comput. Pract. Exp. 27(17), 5076---5095 (2015)

Digital Library

[40]

Teijeiro, C., Taboada, G.L., Touriño, J., Fraguela, B.B., Doallo, R., Mallón, D.A., Gómez, A., Mouriño, J.C., Wibecan, B.: Evaluation of UPC programmability using classroom studies. In Proceedings of Third Conference on Partitioned Global Address Space Programing Models, PGAS '09, pp. 10:1---10:7, New York, NY (2009)

Digital Library

[41]

Tejedor, E., Farreras, M., Grove, D., Badia, R.M., Almasi, G., Labarta, J.: A high-productivity task-based programming model for clusters. Concurr. Comput. Pract. Exp. 24(18), 2421---2448 (2012)

Digital Library

[42]

Tousimojarad, A., Vanderbauwhede, W.: Number of tasks, not threads, is key. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 128---136. Los Alamitos, CA (2015)

Digital Library

[43]

Van Nieuwpoort, R.V., Wrzesińska, G., Jacobs, C.J.H., Bal, H.E.: Satin: A high-level and efficient grid programming model. ACM Trans. Program. Lang. Syst. 32(3), 9 (2010)

Digital Library

[44]

Walter, J., Koch, M.: Boost basic linear algebra library (uBLAS). http://www.boost.org/libs/numeric/ublas/ (2002). Accessed 7 Dec 2016

[45]

White, T: Hadoop: The Definitive Guide. O'Reilly Media, Inc., 1st edition (2009)

Digital Library

[46]

Yelick, K., Bonachea, D., Chen, W.-Y., Colella, P., Datta, K., Duell, J., Graham, S.L., Hargrove, P., Hilfinger, P., Husbands, P., Iancu, C., Kamil, A., Nishtala, R., Su, J., Welcome, M., Wen, T.: Productivity and performance using partitioned global address space languages. In Proceedings of 2007 International Workshop on Parallel Symbolic Computation, PASCO '07, pp. 24---32. New York, NY (2007)

Digital Library

[47]

Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In Proceedings of 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pp. 10---10, Berkeley, CA. USENIX Association (2010)

Digital Library

[48]

Zang, W., Zhang, P., Zhou, C., Guo, L.: Locating multiple sources in social networks under the SIR model: a divide-and-conquer approach. J. Comput. Sci. 10, 278---287 (2015)

[49]

Zhang, Y., Duchi, J.C., Wainwright, M.J.: Divide and conquer kernel ridge regression. In 26th Annual Conference on Learning Theory (COLT 2013), pp. 592---617 (2013)

Cited By

Martínez MFraguela BCabaleiro J(2022)A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clustersThe Journal of Supercomputing10.1007/s11227-021-04259-578:8(10434-10454)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s11227-021-04259-5
Martínez MFraguela BCabaleiro J(2021)A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep ProblemsInternational Journal of Parallel Programming10.1007/s10766-021-00709-y49:6(820-845)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10766-021-00709-y
Fraguela BAndrade DGonzález-Domínguez J(2021)ScalaParBiBit: scaling the binary biclustering in distributed-memory systemsCluster Computing10.1007/s10586-021-03261-z24:3(2249-2268)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1007/s10586-021-03261-z
Show More Cited By

Index Terms

A general and efficient divide-and-conquer algorithm framework for multi-core clusters
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Index terms have been assigned to the content through auto-classification.

Recommendations

A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clusters
Abstract
Efficiently implementing the divide-and-conquer pattern of parallelism in distributed memory systems is very relevant, given its ubiquity, and difficult, given its recursive nature and the need to exchange tasks and data among the processors. This ...
Design of efficient Java message-passing collectives on multi-core clusters

This paper presents a scalable and efficient Message-Passing in Java (MPJ) collective communication library for parallel computing on multi-core architectures. The continuous increase in the number of cores per processor underscores the need for ...
Efficient parallelization of MATLAB stencil applications for multi-core clusters
WOLFHPC '16: Proceedings of the Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for HPC

This paper presents the automatic parallelization of Stencil codes written in MATLAB for distributed systems. The compiler translates MATLAB source into C code and automatically parallelizes using MPI. For clusters of multi-cores, also a hybrid approach ...

Comments

Information & Contributors

Information

Published In

cover image Cluster Computing

Cluster Computing Volume 20, Issue 3

September 2017

926 pages

ISSN:1386-7857

Issue’s Table of Contents

Copyright © Copyright © 2017 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2017

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Martínez MFraguela BCabaleiro J(2022)A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clustersThe Journal of Supercomputing10.1007/s11227-021-04259-578:8(10434-10454)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s11227-021-04259-5
Martínez MFraguela BCabaleiro J(2021)A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep ProblemsInternational Journal of Parallel Programming10.1007/s10766-021-00709-y49:6(820-845)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10766-021-00709-y
Fraguela BAndrade DGonzález-Domínguez J(2021)ScalaParBiBit: scaling the binary biclustering in distributed-memory systemsCluster Computing10.1007/s10586-021-03261-z24:3(2249-2268)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1007/s10586-021-03261-z
Jahić JAli KChatrangoon MJahani N(2019)(Dis)Advantages of Lock-free Synchronization Mechanisms for Multicore Embedded SystemsWorkshop Proceedings of the 48th International Conference on Parallel Processing10.1145/3339186.3339191(1-8)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3339186.3339191

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents