Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3178372.3179513acmconferencesArticle/Chapter ViewAbstractPublication PagesccConference Proceedingsconference-collections
research-article

Towards a compiler analysis for parallel algorithmic skeletons

Published: 24 February 2018 Publication History

Abstract

Parallelizing compilers aim to detect data-parallel loops in sequential programs, which -- after suitable transformation -- can be safely and profitably executed in parallel. However, in the traditional model safe parallelization requires provable absence of dependences. At the same time, several well-known parallel algorithmic skeletons cannot be easily expressed in a data dependence framework due to spurious depedences, which prevent parallel execution. In this paper we argue that commutativity is a more suitable concept supporting formal characterization of parallel algorithmic skeletons. We show that existing commutativity definitions cannot be easily adapted for practical use, and develop a new concept of commutativity based on liveness, which readily integrates with existing compiler analyses. This enables us to develop formal definitions of parallel algorithmic skeletons such as task farms, MapReduce and Divide&Conquer. We show that existing informal characterizations of various parallel algorithmic skeletons are captured by our abstract formalizations. In this way we provide the urgently needed formal characterization of widely used parallel constructs allowing their immediate use in novel parallelizing compilers.

References

[1]
K. Albayraktaroglu, A. Jaleel, X. Wu, M. Franklin, B. Jacob, C.-W. Tseng, and D. Yeung. BioBench: A benchmark suite of bioinformatics applications. In IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005., pages 2–9, March 2005.
[2]
M. Aldinucci, M. Danelutto, P. Kilpatrick, M. Meneghin, and M. Torquati. Accelerating sequential programs using FastFlow and self-offloading. CoRR, abs/1002.4668, 2010.
[3]
M. Aldinucci, M. Torquati, and M. Meneghin. FastFlow: Efficient parallel streaming applications on multi-core. Technical Report TR-09-12, Università di Pisa, Dipartimento di Informatica, Italy, Sept. 2009.
[4]
F. Aleen and N. Clark. Commutativity analysis for software parallelization: Letting program transformations see the big picture. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pages 241–252, New York, NY, USA, 2009. ACM.
[5]
M. Arenaz, J. Touriño, and R. Doallo. XARK: An extensible framework for automatic recognition of computational kernels. ACM Trans. Program. Lang. Syst., 30(6):32:1–32:56, Oct. 2008.
[6]
K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. Commun. ACM, 52(10):56–67, Oct. 2009.
[7]
M. Ashley. PHYS2020 – Computational Physics, based on the C programming language. http://www.phys.unsw.edu.au/ mcba/phys2020, 2004.
[8]
D. A. Bader and V. Sachdeva. An open benchmark suite for evaluating computer architecture on bioinformatics and life science applications. Technical Report GT-CSE-06-08, Georgia Institute of Technology, 2007.
[9]
A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, EC-15(5):757–763, Oct 1966.
[10]
M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pages 69–84, Washington, DC, USA, 2007. IEEE Computer Society.
[11]
A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E. Kohler. The scalable commutativity rule: Designing scalable software for multicore processors. ACM Trans. Comput. Syst., 32(4):10:1– 10:47, Jan. 2015.
[12]
M. Cole. Algorithmic Skeletons: Structured Management of Parallel Computation . MIT Press, Cambridge, MA, USA, 1991.
[13]
M. Cole. Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Comput., 30(3):389–406, Mar. 2004.
[14]
D. Cordes, M. Engel, O. Neugebauer, and P. Marwedel. Automatic extraction of pipeline parallelism for embedded heterogeneous multicore platforms. In Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES ’13, pages 4:1–4:10, Piscataway, NJ, USA, 2013. IEEE Press.
[15]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107–113, Jan. 2008.
[16]
B. Deepa and V. Nagaveni. Parallel smith-waterman algorithm for gene sequencing. International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), 3(5):3237 – 3240, 2015.
[17]
M. Dieterle, T. Horstmeyer, J. Berthold, and R. Loogen. Iterating skeletons - structured parallelism by composition. In IFL, 2012.
[18]
A. Dorta, P. López, and F. de Sande. Basic skeletons in 11C. Parallel Comput., 32(7):491–506, Sept. 2006.
[19]
J. Falcou, J. Sérot, T. Chateau, and J. T. Lapresté. QUAFF: Efficient C++ design for parallel skeletons. Parallel Comput., 32(7):604–615, Sept. 2006.
[20]
B. Freisleben and T. Kielmann. Automated transformation of sequential divide-and-conquer algorithms into parallel programs. Computers and Artificial Intelligence, 14:579–596, 1995.
[21]
P. Ginsbach and M. F. P. O'Boyle. Discovery and exploitation of general reductions: A constraint based approach. In Proceedings of the 2017 International Symposium on Code Generation and Optimization, CGO ’17, pages 269–280, Piscataway, NJ, USA, 2017. IEEE Press.
[22]
H. González-Vélez and M. Leyton. A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers. Softw. Pract. Exper., 40(12):1135–1160, Nov. 2010.
[23]
M. Gupta, S. Mikhopadhyay, and N. Sinha. Automatic parallelization of recursive procedures. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 139–148, Oct 1999.
[24]
K. Hammond and G. Michelson, editors. Research Directions in Parallel Functional Programming . Springer-Verlag, London, UK, UK, 2000.
[25]
Intel. Intel Threading Building Blocks documentation – divide and conquer. https://software.intel.com/en-us/node/506118, 2016.
[26]
K. Jiang, O. Thorsen, A. Peters, B. Smith, and C. P. Sosa. An efficient parallel implementation of the hidden markov methods for genomic sequence-search on a massively parallel system. IEEE Transactions on Parallel and Distributed Systems, 19(1):15–23, Jan 2008.
[27]
R. Johnson, D. Pearson, and K. Pingali. The program structure tree: Computing control regions in linear time. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI ’94, pages 171–185, New York, NY, USA, 1994. ACM.
[28]
C. W. Kessler. Pattern-driven automatic parallelization. Scientific Programming, 5(3):251–274, Aug 1996.
[29]
D. Kim and M. C. Rinard. Verification of semantic commutativity conditions and inverse operations on linked data structures. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 528–541, New York, NY, USA, 2011. ACM.
[30]
H. Kuchen and J. Striegnitz. Higher-order functions and partial applications for a C++ skeleton library. In Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande, JGI ’02, pages 122–130, New York, NY, USA, 2002. ACM.
[31]
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’07, pages 211–222, New York, NY, USA, 2007. ACM.
[32]
L. Lamport. The parallel execution of DO loops. Commun. ACM, 17(2):83–93, Feb. 1974.
[33]
M. Leyton and J. M. Piquer. Skandium: Multi-core programming with algorithmic skeletons. In 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pages 289–296, Feb 2010.
[34]
K.-B. Li. ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics, 19(12):1585, 2003.
[35]
B. D. Martino and G. Iannello. PAP recognizer: A tool for automatic recognition of parallelizable patterns. In Proceedings of the 4th International Workshop on Program Comprehension, WPC ’96, pages 164–173, Washington, DC, USA, 1996. IEEE Computer Society.
[36]
M. McCool, J. Reinders, and A. Robison. Structured parallel programming: patterns for efficient computation . Elsevier, 2012.
[37]
J. Ortega-Arjona. Patterns for parallel software design. Wiley, 2010.
[38]
W. R. Pearson. Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics, 11(3):635 – 650, 1991.
[39]
K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 12–25, New York, NY, USA, 2011. ACM.
[40]
J. Plevyak, A. Chien, and V. Karamcheti. Analysis of dynamic structures for efficient parallel execution. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, volume 768 of Lecture Notes in Computer Science, pages 37–56. Springer Berlin Heidelberg, 1994.
[41]
M. Poldner and H. Kuchen. On implementing the farm skeleton. In Proceedings of the 3rd International Workshop HLPP 2005, 2005.
[42]
P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: A language extension for implicit parallel programming. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 1–11, New York, NY, USA, 2011. ACM.
[43]
S. Quirem, F. Ahmed, and B. K. Lee. CUDA acceleration of P7Viterbi algorithm in HMMER 3.0. In 30th IEEE International Performance Computing and Communications Conference, pages 1–2, Nov 2011.
[44]
J. Reinders. Intel Threading Building Blocks. O’Reilly & Associates, Inc., Sebastopol, CA, USA, first edition, 2007.
[45]
M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis framework for parallelizing compilers. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, PLDI ’96, pages 54–67, New York, NY, USA, 1996. ACM.
[46]
M. C. Rinard and P. C. Diniz. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst., 19(6):942–991, Nov. 1997.
[47]
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’99, pages 72–83, New York, NY, USA, 1999. ACM.
[48]
T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, and D. Etiemble. Algorithmic skeletons within an embedded domain specific language for the CELL processor. In International Conference on Parallel Architectures and Compilation Techniques, PACT’09, pages 67–76, Sept 2009.
[49]
M. Samadi, D. A. Jamshidi, J. Lee, and S. Mahlke. Paraprox: Patternbased approximation for data parallel applications. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pages 35–50, New York, NY, USA, 2014. ACM.
[50]
A. Shafiee Sarvestani, E. Hansson, and C. Kessler. Extensible recognition of algorithmic patterns in DSP programs for automatic parallelization. Int. J. Parallel Program., 41(6):806–824, Dec. 2013.
[51]
W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pages 356–369, Washington, DC, USA, 2007. IEEE Computer Society.
[52]
G. Tournavitis and B. Franke. Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, pages 377–388, New York, NY, USA, 2010. ACM.
[53]
A. Udupa, K. Rajan, and W. Thies. ALTER: Exploiting breakable dependences for parallelization. In Proceedings of the 32Nd ACM SIG-PLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 480–491, New York, NY, USA, 2011. ACM.
[54]
H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, pages 389–400, New York, NY, USA, 2010. ACM.

Cited By

View all

Index Terms

  1. Towards a compiler analysis for parallel algorithmic skeletons

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CC '18: Proceedings of the 27th International Conference on Compiler Construction
    February 2018
    206 pages
    ISBN:9781450356442
    DOI:10.1145/3178372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • IEEE-CS: Computer Society

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 February 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Commutativity
    2. algorithmic skeletons
    3. commutativity analysis
    4. parallelism

    Qualifiers

    • Research-article

    Funding Sources

    • EPSRC

    Conference

    CGO '18

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Collection skeletonsJournal of Systems and Software10.1016/j.jss.2024.112042213:COnline publication date: 17-Jul-2024
    • (2022)A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clustersThe Journal of Supercomputing10.1007/s11227-021-04259-578:8(10434-10454)Online publication date: 1-May-2022
    • (2021)Version Reconciliation for Collaborative DatabasesProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486980(473-488)Online publication date: 1-Nov-2021
    • (2021)Loop parallelization using dynamic commutativity analysisProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
    • (2021)A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep ProblemsInternational Journal of Parallel Programming10.1007/s10766-021-00709-y49:6(820-845)Online publication date: 1-Dec-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media