research-article

Towards a compiler analysis for parallel algorithmic skeletons

Authors:

Tobias J. K. Edler von Koch,

Stanislav Manilov,

Christos Vasiladiotis,

Björn FrankeAuthors Info & Claims

CC '18: Proceedings of the 27th International Conference on Compiler Construction

Pages 174 - 184

https://doi.org/10.1145/3178372.3179513

Published: 24 February 2018 Publication History

Abstract

Parallelizing compilers aim to detect data-parallel loops in sequential programs, which -- after suitable transformation -- can be safely and profitably executed in parallel. However, in the traditional model safe parallelization requires provable absence of dependences. At the same time, several well-known parallel algorithmic skeletons cannot be easily expressed in a data dependence framework due to spurious depedences, which prevent parallel execution. In this paper we argue that commutativity is a more suitable concept supporting formal characterization of parallel algorithmic skeletons. We show that existing commutativity definitions cannot be easily adapted for practical use, and develop a new concept of commutativity based on liveness, which readily integrates with existing compiler analyses. This enables us to develop formal definitions of parallel algorithmic skeletons such as task farms, MapReduce and Divide&Conquer. We show that existing informal characterizations of various parallel algorithmic skeletons are captured by our abstract formalizations. In this way we provide the urgently needed formal characterization of widely used parallel constructs allowing their immediate use in novel parallelizing compilers.

References

[1]

K. Albayraktaroglu, A. Jaleel, X. Wu, M. Franklin, B. Jacob, C.-W. Tseng, and D. Yeung. BioBench: A benchmark suite of bioinformatics applications. In IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005., pages 2–9, March 2005.

Digital Library

[2]

M. Aldinucci, M. Danelutto, P. Kilpatrick, M. Meneghin, and M. Torquati. Accelerating sequential programs using FastFlow and self-offloading. CoRR, abs/1002.4668, 2010.

[3]

M. Aldinucci, M. Torquati, and M. Meneghin. FastFlow: Efficient parallel streaming applications on multi-core. Technical Report TR-09-12, Università di Pisa, Dipartimento di Informatica, Italy, Sept. 2009.

[4]

F. Aleen and N. Clark. Commutativity analysis for software parallelization: Letting program transformations see the big picture. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pages 241–252, New York, NY, USA, 2009. ACM.

Digital Library

[5]

M. Arenaz, J. Touriño, and R. Doallo. XARK: An extensible framework for automatic recognition of computational kernels. ACM Trans. Program. Lang. Syst., 30(6):32:1–32:56, Oct. 2008.

Digital Library

[6]

K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. Commun. ACM, 52(10):56–67, Oct. 2009.

Digital Library

[7]

M. Ashley. PHYS2020 – Computational Physics, based on the C programming language. http://www.phys.unsw.edu.au/ mcba/phys2020, 2004.

[8]

D. A. Bader and V. Sachdeva. An open benchmark suite for evaluating computer architecture on bioinformatics and life science applications. Technical Report GT-CSE-06-08, Georgia Institute of Technology, 2007.

[9]

A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, EC-15(5):757–763, Oct 1966.

[10]

M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pages 69–84, Washington, DC, USA, 2007. IEEE Computer Society.

Digital Library

[11]

A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E. Kohler. The scalable commutativity rule: Designing scalable software for multicore processors. ACM Trans. Comput. Syst., 32(4):10:1– 10:47, Jan. 2015.

Digital Library

[12]

M. Cole. Algorithmic Skeletons: Structured Management of Parallel Computation . MIT Press, Cambridge, MA, USA, 1991.

Digital Library

[13]

M. Cole. Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Comput., 30(3):389–406, Mar. 2004.

Digital Library

[14]

D. Cordes, M. Engel, O. Neugebauer, and P. Marwedel. Automatic extraction of pipeline parallelism for embedded heterogeneous multicore platforms. In Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES ’13, pages 4:1–4:10, Piscataway, NJ, USA, 2013. IEEE Press.

Digital Library

[15]

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107–113, Jan. 2008.

Digital Library

[16]

B. Deepa and V. Nagaveni. Parallel smith-waterman algorithm for gene sequencing. International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), 3(5):3237 – 3240, 2015.

[17]

M. Dieterle, T. Horstmeyer, J. Berthold, and R. Loogen. Iterating skeletons - structured parallelism by composition. In IFL, 2012.

[18]

A. Dorta, P. López, and F. de Sande. Basic skeletons in 11C. Parallel Comput., 32(7):491–506, Sept. 2006.

Digital Library

[19]

J. Falcou, J. Sérot, T. Chateau, and J. T. Lapresté. QUAFF: Efficient C++ design for parallel skeletons. Parallel Comput., 32(7):604–615, Sept. 2006.

Digital Library

[20]

B. Freisleben and T. Kielmann. Automated transformation of sequential divide-and-conquer algorithms into parallel programs. Computers and Artificial Intelligence, 14:579–596, 1995.

[21]

P. Ginsbach and M. F. P. O'Boyle. Discovery and exploitation of general reductions: A constraint based approach. In Proceedings of the 2017 International Symposium on Code Generation and Optimization, CGO ’17, pages 269–280, Piscataway, NJ, USA, 2017. IEEE Press.

[22]

H. González-Vélez and M. Leyton. A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers. Softw. Pract. Exper., 40(12):1135–1160, Nov. 2010.

Digital Library

[23]

M. Gupta, S. Mikhopadhyay, and N. Sinha. Automatic parallelization of recursive procedures. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 139–148, Oct 1999.

Digital Library

[24]

K. Hammond and G. Michelson, editors. Research Directions in Parallel Functional Programming . Springer-Verlag, London, UK, UK, 2000.

Digital Library

[25]

Intel. Intel Threading Building Blocks documentation – divide and conquer. https://software.intel.com/en-us/node/506118, 2016.

[26]

K. Jiang, O. Thorsen, A. Peters, B. Smith, and C. P. Sosa. An efficient parallel implementation of the hidden markov methods for genomic sequence-search on a massively parallel system. IEEE Transactions on Parallel and Distributed Systems, 19(1):15–23, Jan 2008.

Digital Library

[27]

R. Johnson, D. Pearson, and K. Pingali. The program structure tree: Computing control regions in linear time. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI ’94, pages 171–185, New York, NY, USA, 1994. ACM.

Digital Library

[28]

C. W. Kessler. Pattern-driven automatic parallelization. Scientific Programming, 5(3):251–274, Aug 1996.

Digital Library

[29]

D. Kim and M. C. Rinard. Verification of semantic commutativity conditions and inverse operations on linked data structures. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 528–541, New York, NY, USA, 2011. ACM.

Digital Library

[30]

H. Kuchen and J. Striegnitz. Higher-order functions and partial applications for a C++ skeleton library. In Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande, JGI ’02, pages 122–130, New York, NY, USA, 2002. ACM.

Digital Library

[31]

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’07, pages 211–222, New York, NY, USA, 2007. ACM.

Digital Library

[32]

L. Lamport. The parallel execution of DO loops. Commun. ACM, 17(2):83–93, Feb. 1974.

Digital Library

[33]

M. Leyton and J. M. Piquer. Skandium: Multi-core programming with algorithmic skeletons. In 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pages 289–296, Feb 2010.

Digital Library

[34]

K.-B. Li. ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics, 19(12):1585, 2003.

[35]

B. D. Martino and G. Iannello. PAP recognizer: A tool for automatic recognition of parallelizable patterns. In Proceedings of the 4th International Workshop on Program Comprehension, WPC ’96, pages 164–173, Washington, DC, USA, 1996. IEEE Computer Society.

Digital Library

[36]

M. McCool, J. Reinders, and A. Robison. Structured parallel programming: patterns for efficient computation . Elsevier, 2012.

Digital Library

[37]

J. Ortega-Arjona. Patterns for parallel software design. Wiley, 2010.

Digital Library

[38]

W. R. Pearson. Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics, 11(3):635 – 650, 1991.

[39]

K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 12–25, New York, NY, USA, 2011. ACM.

Digital Library

[40]

J. Plevyak, A. Chien, and V. Karamcheti. Analysis of dynamic structures for efficient parallel execution. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, volume 768 of Lecture Notes in Computer Science, pages 37–56. Springer Berlin Heidelberg, 1994.

Digital Library

[41]

M. Poldner and H. Kuchen. On implementing the farm skeleton. In Proceedings of the 3rd International Workshop HLPP 2005, 2005.

[42]

P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: A language extension for implicit parallel programming. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 1–11, New York, NY, USA, 2011. ACM.

Digital Library

[43]

S. Quirem, F. Ahmed, and B. K. Lee. CUDA acceleration of P7Viterbi algorithm in HMMER 3.0. In 30th IEEE International Performance Computing and Communications Conference, pages 1–2, Nov 2011.

Digital Library

[44]

J. Reinders. Intel Threading Building Blocks. O’Reilly & Associates, Inc., Sebastopol, CA, USA, first edition, 2007.

Digital Library

[45]

M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis framework for parallelizing compilers. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, PLDI ’96, pages 54–67, New York, NY, USA, 1996. ACM.

Digital Library

[46]

M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst., 19(6):942–991, Nov. 1997.

Digital Library

[47]

R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’99, pages 72–83, New York, NY, USA, 1999. ACM.

Digital Library

[48]

T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, and D. Etiemble. Algorithmic skeletons within an embedded domain specific language for the CELL processor. In International Conference on Parallel Architectures and Compilation Techniques, PACT’09, pages 67–76, Sept 2009.

Digital Library

[49]

M. Samadi, D. A. Jamshidi, J. Lee, and S. Mahlke. Paraprox: Patternbased approximation for data parallel applications. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pages 35–50, New York, NY, USA, 2014. ACM.

Digital Library

[50]

A. Shafiee Sarvestani, E. Hansson, and C. Kessler. Extensible recognition of algorithmic patterns in DSP programs for automatic parallelization. Int. J. Parallel Program., 41(6):806–824, Dec. 2013.

Digital Library

[51]

W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pages 356–369, Washington, DC, USA, 2007. IEEE Computer Society.

Digital Library

[52]

G. Tournavitis and B. Franke. Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, pages 377–388, New York, NY, USA, 2010. ACM.

Digital Library

[53]

A. Udupa, K. Rajan, and W. Thies. ALTER: Exploiting breakable dependences for parallelization. In Proceedings of the 32Nd ACM SIG-PLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 480–491, New York, NY, USA, 2011. ACM.

Digital Library

[54]

H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, pages 389–400, New York, NY, USA, 2010. ACM.

Digital Library

Cited By

Franke BLi ZMorton MSteuwer M(2024)Collection skeletonsJournal of Systems and Software10.1016/j.jss.2024.112042213:COnline publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112042
Martínez MFraguela BCabaleiro J(2022)A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clustersThe Journal of Supercomputing10.1007/s11227-021-04259-578:8(10434-10454)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s11227-021-04259-5
Ranjan NShang ZKrishnan SElmore A(2021)Version Reconciliation for Collaborative DatabasesProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486980(473-488)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1145/3472883.3486980
Show More Cited By

Index Terms

Towards a compiler analysis for parallel algorithmic skeletons
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Composing Algorithmic Skeletons to Express High-Performance Scientific Applications
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Algorithmic skeletons are high-level representations for parallel programs that hide the underlying parallelism details from program specification. These skeletons are defined in terms of higher-order functions that can be composed to build larger ...
A verified library of algorithmic skeletons on evenly distributed arrays
ICA3PP'12: Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I

To make parallel programming as widespread as parallel architectures, more structured parallel programming paradigms are necessary. One of the possible approaches are algorithmic skeletons. They can be seen as higher order functions implemented in ...
Basic skeletons in 11c
Algorithmic skeletons

11c is a high-level parallel language that provides support for some of the most widely used algorithmic skeletons. The language has a syntax based on OpenMP-like directives and the compiler uses direct translation to MPI to produce parallel code. To ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CC '18: Proceedings of the 27th International Conference on Compiler Construction

February 2018

206 pages

ISBN:9781450356442

DOI:10.1145/3178372

General Chair:
Christophe Dubach
University of Edinburgh, UK
,
Program Chair:
Jingling Xue
University of New South Wales, Australia

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

EPSRC

Conference

CGO '18

Sponsor:

CGO '18: 16th Annual IEEE/ACM International Symposium on Code Generation and Optimization

February 24 - 25, 2018

Vienna, Austria

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
304
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Franke BLi ZMorton MSteuwer M(2024)Collection skeletonsJournal of Systems and Software10.1016/j.jss.2024.112042213:COnline publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112042
Martínez MFraguela BCabaleiro J(2022)A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clustersThe Journal of Supercomputing10.1007/s11227-021-04259-578:8(10434-10454)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s11227-021-04259-5
Ranjan NShang ZKrishnan SElmore A(2021)Version Reconciliation for Collaborative DatabasesProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486980(473-488)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1145/3472883.3486980
Vasiladiotis CLozano RCole MFranke BLee J(2021)Loop parallelization using dynamic commutativity analysisProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370319
Martínez MFraguela BCabaleiro J(2021)A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep ProblemsInternational Journal of Parallel Programming10.1007/s10766-021-00709-y49:6(820-845)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10766-021-00709-y

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents