Article

Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops

Authors:

Wlodzimierz Bielecki,

Krzysztof Siedlecki,

Pierluigi San PietroAuthors Info & Claims

ICCSA '08: Proceedings of the international conference on Computational Science and Its Applications, Part II

Pages 871 - 886

https://doi.org/10.1007/978-3-540-69848-7_69

Published: 30 June 2008 Publication History

Abstract

This paper presents a new approach for extracting synchronization-free parallelism being represented by dependent statement instances of an arbitrarily nested loop. Presented algorithms can be applied to both uniform and non-uniform loops. The main advantage is that more synchronization-free parallelism may be extracted than that yielded by existing techniques. Our approach, based on operations on relations and sets, requires exact dependence analysis, such as the one by Pugh and Wonnacott, where dependences are found in the form of tuple relations. Results of experiments with the NAS benchmark are presented.

References

[1]

Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures, p. 790. Morgan Kaufmann, San Francisco (2001)

[2]

Amarasinghe, S.P., Lam, M.S.: Communication optimization and code generation for distributed memory machines. In: Proceedings of the SIGPLAN 1993, pp. 126-138 (1993)

[3]

Ancourt, C., Irigoin, F.: Scanning polyhedra with do loops. In: Proc. of the Third ACM/SIGPLAN Symp. on Principles and Practice of Parallel Programming, pp. 39-50. ACM Press, New York (1991)

[4]

Banerjee, U.: Unimodular transformations of double loops. In: Proceedings of the Third Workshop on Languages and Compilers for Parallel Computing, pp. 192-219 (1990)

[5]

Bastoul, C., Cohen, A., Girbal, S., Sharma, S., Temam, O.: Putting polyhedral loop transformations to work. In: LCPC 16 Intern.l Workshop on Languages and Compilers for Parallel Computing. LNCS, vol. 2958, pp. 209-225. College Station (September 2003)

[6]

Bastoul, C.: Code Generation in the Polyhedral Model Is Easier Than You Think. In: Proceedings of the PACT 13 IEEE International Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, pp. 7-16 (2004)

[7]

Beletska, A., Bielecki, W., San Pietro, P.: Extracting Synchronization-Free Slices of Operations in Perfectly-Nested Loops. In: Proceedings of PDCS 2007 (2007)

[8]

Boulet, P., Darte, A., Silber, G.A., Vivien, F.: Loop parallelization algorithms: from parallelism extraction to code generation. Parallel Computing 24, 421-444 (1998)

[9]

Cohen, A., Girbal, S., Temam, O.: A polyhedral approach to ease the composition of program transformations. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 292-303. Springer, Heidelberg (2004)

[10]

Darte, A., Robert, Y., Vivien, F.: Scheduling and Automatic Parallelization. Birkhäuser Boston (2000)

[11]

Feautrier, P.: Some efficient solutions to the affine scheduling problem, part i, one dimensional time. International Journal of Parallel Programming 21, 313-348 (1992)

[12]

Feautrier, P.: Some efficient solutions to the affine scheduling problem, part ii, multidimensional time. International Journal of Parallel Programming 21, 389-420 (1992)

[13]

Feautrier, P.: Toward automatic distribution. Journal of Parallel Processing Letters 4, 233-244 (1994)

[14]

Gavaldà, R., Ayguadé, E., Torres, J.: Obtaining Synchronization-Free Code with Maximum Parallelism, Technical Report LSI-96-23-R, Universitat Politècnica de Catalunya (1996)

[15]

Griebl, M., Lengauer, C.: Classifying Loops for Space-Time Mapping. In: Proceedings of the Euro-Par. LNCS, pp. 467-474. Springer, Heidelberg (1996)

[16]

Huang, C., Sadayappan, P.: Communication-free hyperplane partitioning of nested loops. Journal of Parallel and Distributed Computing 19, 90-102 (1993)

[17]

Kelly, W., Pugh, W., Rosser, E., Shpeisman, T.: Transitive Closure of Infinite Graphs and its Applications. International Journal of Parallel Programming 24(6), 579-598 (1996)

[18]

Kelly, W., Pugh, W.: Minimizing communication while preserving parallelism. In: Proc. of the 1996 ACM International Conference on Supercomputing, pp. 52-60 (1996)

[19]

Kelly, W., Maslov, V., Pugh, W., Rosser, E., Shpeisman, T., Wonnacott, D.: The omega library interface guide, Technical Report CS-TR-3445, University of Maryland (1995)

[20]

Lim, W., Lam, M.S.: Communication-free parallelization via affine transformations. In: Proc. of the 7th workshop on languages and compilers for parallel computing, pp. 92-106 (1994)

[21]

Lim, W., Cheong, G.I., Lam, M.S.: An affine partitioning algorithm to maximize parallelism and minimize communication. In: Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing (1999)

[22]

Lim, W., Liao, S.W., Lam, M.: Blocking and Array Contraction Across Arbitrarily Nested Loops Using Affine Partitioning. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2001)

[23]

Pugh, W., Wonnacott, D.: Constraint-based array dependence analysis. ACM Trans. on Programming Languages and Systems (1998)

[24]

Pugh, W., Rosser, E.: Iteration Space Slicing and Its Application to Communication Optimization. In: Proc. of the International Conf. on Supercomputing, pp. 221-228 (1997)

[25]

Quillere, F., Rajopadhye, S., Wilde, D.: Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming 28 (2000)

[26]

Weiser, M.: Program slices: formal, psychological, and practical investigations of an automatic program abstraction method, PhD thesis, University of Michigan, Ann Arbor, MI (1979)

[27]

Weiser, M.: Program Slicing. IEEE Transactions on Software Engineering SE-10(7), 352-357 (1984)

[28]

Wolf, M.E.: Improving locality and parallelism in nested loops, Ph.D. Dissertation CSLTR-92-538, Stanford University, Dept. Computer Science (1992)

[29]

Vasilache, N., Bastoul, C., Cohen, A.: Polyhedral code generation in the real world. In: Proceedings of the International Conference on Compiler Construction (ETAPS CC 2006). LNCS, pp. 185-201. Springer, Vienna (2006)

[30]

Netlib Repository at UTK and ORNL, http://www.netlib.org/benchmark/livermorec

[31]

http://www.nas.nasa.gov

Cited By

Wlodzimierz BTomasz KMarek PBeletska A(2010)An iterative algorithm of computing the transitive closure of a union of parameterized affine integer tuple relationsProceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part I10.5555/1940390.1940400(104-113)Online publication date: 18-Dec-2010
https://dl.acm.org/doi/10.5555/1940390.1940400
Beletska ABielecki WCohen APalkowski M(2009)Synchronization-Free automatic parallelizationProceedings of the 22nd international conference on Languages and Compilers for Parallel Computing10.1007/978-3-642-13374-9_16(233-246)Online publication date: 8-Oct-2009
https://dl.acm.org/doi/10.1007/978-3-642-13374-9_16

Recommendations

Extracting synchronization-free slices of operations in perfectly-nested loops
PDCS '07: Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems

Extracting synchronization-free slices in loops is of great importance for parallel and distributed computing, it allows for code locality enhancement and relaxation of memory requirements. We present a novel approach for extracting synchronization-free ...
Extracting synchronization-free threads in perfectly nested loops using the omega project software
SEPADS'05: Proceedings of the 4th WSEAS International Conference on Software Engineering, Parallel & Distributed Systems

Algorithms, permitting us to find synchronization-free threads comprised of iterations of perfectly nested uniform and non-uniform loops, are presented. They require an exact representation of loop-carried dependences. To describe and implement the ...
Optimized Unrolling of Nested Loops

Loop unrolling is a well known loop transformation that has been used in optimizing compilers for over three decades. In this paper, we address the problems of automatically selecting unroll factors for perfectly nested loops, and generating compact ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICCSA '08: Proceedings of the international conference on Computational Science and Its Applications, Part II

June 2008

1273 pages

ISBN:9783540698401

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 30 June 2008

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wlodzimierz BTomasz KMarek PBeletska A(2010)An iterative algorithm of computing the transitive closure of a union of parameterized affine integer tuple relationsProceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part I10.5555/1940390.1940400(104-113)Online publication date: 18-Dec-2010
https://dl.acm.org/doi/10.5555/1940390.1940400
Beletska ABielecki WCohen APalkowski M(2009)Synchronization-Free automatic parallelizationProceedings of the 22nd international conference on Languages and Compilers for Parallel Computing10.1007/978-3-642-13374-9_16(233-246)Online publication date: 8-Oct-2009
https://dl.acm.org/doi/10.1007/978-3-642-13374-9_16

View Options

View options

Figures

Tables

Media

View Table of Conten