Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Open access

Static analysis of upper and lower bounds on dependences and parallelism

Published: 01 July 1994 Publication History

Abstract

Existing compilers often fail to parallelize sequential code, even when a program can be manually transformed into parallel form by a sequence of well-understood transformations (as in the case for many of the Perfect Club Benchmark programs). These failures can occur for several reasons: the code transformations implemented in the compiler may not be sufficient to produce parallel code, the compiler may not find the proper sequence of transformations, or the compiler may not be able to prove that one of the necessary transformations is legal.
When a compiler fails to extract sufficient parallelism from a program, the programmer may try to extract additional parallelism. Unfortunately, the programmer is typically left to search for parallelism without significant assistance. The compiler generally does not give feedback about which parts of the program might contain additional parallelism, or about the types of transformations that might be needed to realize this parallelism. Standard program transformations and dependence abstractions cannot be used to provide this feedback.
In this paper, we propose a two-step approach to the search for parallelism in sequential programs. In the first step, we construct several sets of constraints that describe, for each statement, which iterations of that statement can be executed concurrently. By constructing constraints that correspond to different assumptions about which dependences might be eliminated through additional analysis, transformations, and user assertions, we can determine whether we can expose parallelism by eliminating dependences. In the second step of our search for parallelism, we examine these constraint sets to identify the kinds of transformations needed to exploit scalable parallelism. Our tests will identify conditional parallelism and parallelism that can be exposed by combinations of transformations that reorder the iteration space (such as loop interchange and loop peeling).
This approach lets us distinguish inherently sequential code from code that contains unexploited parallelism. It also produces information about the kinds of transformations needed to parallelize the code, without worrying about the order of application of the transformations. Furthermore, when our dependence test is inexact we can identify which unresolved dependences inhibit parallelism by comparing the effects of assuming dependence or independence. We are currently exploring the use of this information in programmer-assisted parallelization.

References

[1]
ALLEN, J., AND KENNEDY, K. 1987. Automatic translation of Fortran programs to vector form. ACM Trans. Program Lang. Syst 9, 4 (Oct.), 491-542.
[2]
A51ARASINGHE, S., AND LaM, M. 1993. Communication optimization and code generation for distributed memory machines. In ACM '93 Conference on Prograrnmtng Language Destgn and Implementatton (June). ACM Press, New York.
[3]
ANCOURT, C., AND IRIt?~OIN. F. 1991. Scanning polyhedra with DO loops In Proceedzngs of the 3rd ACM SIGPLAN Syrnposzum on Pmnczples and Practtce of Parallel Programming (Apt) ACM Press, New York, 39-50.
[4]
BANERIEE, U. 1990. Unimodular transformations of double loops. In Proceedzngs of the 3rd Workshop on Programrnzng Languages and Cornpzlers for Parallel Computzng (Irvine, Cahf., Aug.), 192-219.
[5]
BANERJEE, U. 1988.An introduction to a formal theory of dependence analysis. J. Supercomput 2, 2, 133-149.
[6]
BERRY. M. ET AL. 1989. The PERFECT Club benchmarks: Effective performance evaluation 04' supercomputers. Int. J. Supercomput. Appl. 3, 3 (Mar.), 5-40.
[7]
BRANDES, T. 1988. The Importance of direct dependences for automatic parallelism. In Proceedzngs of 1988 International Conference on Supercomputtng (July), 407-417.
[8]
COOPER, D. 1972. Theorem proving in arithmetic with multiphcatmn. In Machtne Intelligence 7, B. Meltzer and D Michie, Eds American Elsewer, New York, 91-99.
[9]
DUESTERWALD, E, GUPTA, R., AND SOFFA, M. 1993. A practical data flow framework for array reference analysis and its use in optlmizations In ACM '93 Conference on Programming Language Design and Implementatton (June). ACM Press, New York
[10]
EIGENMANN, R. 1993. Towards a methodology of optimizing programs for high-performance computers. In international Con/erence on Supercomputlng (July), 27-36.
[11]
EIGENMANN, R. 1992. Toward a methodology of optimizing programs for high-performance computers. CSRD Report 1178. (Aug.). Dept. of Computer Science, Univ. of illinois at Urbana- Champaign.
[12]
EIGENMANN, R., HOEFLINGER, J., LI, Z., AND PADUA, D. 1991. Experience in the automatic parallelization of 4 Perfect benchmark programs. In Proceedings of the 4th Workshop on Programming Languages and Compilers for Parallel Computing (Aug.).
[13]
FEAUTRIER, P. 1991. Dataflow analysis of array and scalar references. Int. J. Parallel Program. 20, 1 (Feb.).
[14]
FEAUTRIER, P. 1988. Array expansion. In ACM International Conference on Supercomput~ng (St. Malo). ACM Press, New York, 429-441.
[15]
GOFF, G., KENNEDY, K., AND TSENG, C.-W. 1991. Practical dependence testing. In ACM SIGPLAN '91 Conference on Programming Language Destgn and Implementatwn. ACM Press, New York.
[16]
GROSS, T., AND STEENKISTE, P. 1990. Structured data flow analysis for arrays and its use in an optimizing compiler. Softw. Pract. Exper. 20 (Feb.), 133-155.
[17]
IRIGOIN, F., JOUVELOT, P., AND TRIOLET, R. 1991. Semantical interprocedural parallelization. In Proceedzngs of the 1991 international Conference on Supercomputing (June), 244-253.
[18]
KELLY, W., AND PUGH, W. 1993. A framework for unifying reordering transformations. Tech. Rep. CS-TR-3193. (Apr.). Dept. of Computer Science, Univ. of Maryland, College Park.
[19]
KONG, X., KLAPPHOLZ, D., AND PSARRIS, K. 1990. The I test: A new tst for subscript data dependence. In Proceedings of the 1990 Internattonal Conference on Parallel Processing (Aug.).
[20]
KREISEL, G., AND KREVINE, J. 1967. Elements of Mathematical Logic. North-Holland Publishing Co., Amsterdam, The Netherlands.
[21]
KUMAR, K., KULKARNI, D., AND BASU, A. 1992. Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial time. In Proceedtngs of the 1992 International Conference on Supercomputing (July), 82-92.
[22]
LARUS, J. 1993. Loop-level parallelism in numeric and symbolic programs. IEEE Trans. Parallel Distrib. Syst. 4, 7 (Oct.), 812-826.
[23]
Li, Z. 1992. Array privatization for parallel execution of loops. In Proceedtngs of the 1992 International Conference on Supercomputing (July), 313-322.
[24]
Lu, L., AND CHEN, M. 1990. Subdomain dependence test for massive parallelism. In Proceedings of Supercomputing '90 (New York, Nov.).
[25]
MASLOV, V. 1994. Lazy array data-flow dependence analysis. In ACM '94 Conference on Principles of Programming Languages (Jan.). ACM Press, New York.
[26]
MASLOV, V. 1992. Delinearization: An efficient way to break multiloop dependence equations. In ACM SIGPLAN '92 Conference on Programming Language Design and Implementation (San Francisco, Calif., June). ACM Press, New York.
[27]
MAYDAN, D. 1992. Accurate analysis of array references. PhD thesis. (Sept.). Computer Systems Laboratory, Stanford Univ., Calif.
[28]
MAYDAN, D., AMARASINGHE, S., AND LAM, M. 1993. Array data-flow analysis and its use in array privatization. In ACM '93 Conference on Princtples of Programming Languages (Jan.). ACM Press, New York.
[29]
MAYDAN, D., AMARASINGHE, S., AND LAM, M. 1992. Data dependence and data-flow analysis of arrays. In 5th Workshop on Languages and Compders for Parallel Computing (Aug.). Tech. Rep. YALEU/DCS/RR-915. Yale Univ., New Haven, Conn., 283-292.
[30]
MAYDAN, D., HENNESSY, J., AND LAM, S. 1991a. Effectiveness of data dependence analysis. In Proceedings of the NSF-NCRD Workshop on Advanced Compilatwn Techntques for Novel Architectures.
[31]
MAYDAN, D., HENNESSY, J., AND LAM, M. 1991b. Efficient and exact data dependence analysis. In ACM SIGPLAN '91 Conference on Programming Language Design and Implementation (June). ACM Press, New York, 1-14.
[32]
MCKINLEY, K. 1990. Dependence analysis of arrays subscripted by index arrays. Tech. Rep. RICE COMP TR 91-162. (Dec.). Dept. of Computer Science, Rice Univ., Houston, Tex.
[33]
PETERSEN, P., AND PADUA, D. 1993. Static and dynamic evaluation of data dependence analysis. In International Conference on Supercomputers (July).
[34]
PUGH, W. 1992a. Definitions of dependence distance. Lett. Program. Lang. Syst. 1, 3 (Sept.).
[35]
PUGH, W. 1992b. The Omega Test: A fast and practical integer programming algorithm for dependence analysis. Commun. ACM 35, 8 (Aug), 102-114.
[36]
PUGH, W. 1991. Untform techniques for loop optirnzzatton. In 1991 International Conference on Supercomputing (Cologne, Germany, June), 341-352.
[37]
PUGH, W. AND WONNACOT'r, D. 1994. Going beyond integer programming with the Omega test to eliminate false data dependences IEEE Trans. Parallel Dtstrib. Syst. To be published
[38]
PUGH, W., AND WONNACOTT, D. 1993. An evaluation of exact methods for analysis of value-based array data dependences. In 6th Annual Workshop on Programmtng Languages and Compzlers for Parallel Computing (Portland, Oreg., Aug.).
[39]
PuGH, W., AND WONNACOTT, D. 1992a. Eliminating false data dependences using the Omega test. In SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif. Junel. ACM, New York, 140-151.
[40]
PUGH, W, AND WONNACOTT, D. 1992b. Going beyond integer programming with the Omega test to eliminate data dependences. Tech Rep. CS-TR-3191. (Dec.). Dept. of Computer Science, Univ. of Maryland, College Park. (Earlier version at SIGPLAN PLDI '92 conference.)
[41]
RIBAS, H. 1990. Obtaining dependence vectors for nested-loop computations. In Proceedings of 1990 Internatzonal Conference on Parallel Processing (Aug t, II-212-219.
[42]
ROSENE, C. 1990. Incemental dependence analysis. PhD thesis. (Mar.). Dept. of Computer Science, Rice Univ., Houston, Tex.
[43]
VOEVODIN, V. 1992a. Mathematical Foundations of Parallel Computing. World Scientific Series in Computer Science, vol. 33. World Scientific Publishers, Inc., Teaneck, N.J.
[44]
VOEVODIN, V. 1992b. Theory and practice of parallelism detection in sequential programs. Program. Comput. Softw. (Programm~ro van~ye) 18, 3 (May).
[45]
WOLF, M., AND LAM, M. 1990. Maximizing parallelism via loop transformations. In Proceedings of the 3rd Workshop on Languages and Comptlers for Parallel Computing (irvine, Calif., Aug.).
[46]
WOLFE, M. 1991. The tiny loop restructuring research tool. In Proceedzngs of 1991 International Conference on Parallel Processing, II 46-53.
[47]
WOLFE, M.1989. Opttmiz~ng Supercompilers for Supercomputers Pitman Publishing, Inc, London.
[48]
WOLFE, M. AND TSENG, C. 1992. The Power test for data dependence. IEEE Trans. Parallel Distmb. Syst. 3, 5 (Sept.), 591-601.
[49]
ZIMA, H., AND CHAPMAN, B. 1991. Supercompders for Parallel and Vector Computers. ACM Press, New York.

Cited By

View all
  • (2023)High-Performance Computation of the Number of Nested RNA Structures with 3D Parallel Tiled CodeEng10.3390/eng40100304:1(507-525)Online publication date: 3-Feb-2023
  • (2022)A Pipeline Pattern Detection Technique in PollyWorkshop Proceedings of the 51st International Conference on Parallel Processing10.1145/3547276.3548445(1-10)Online publication date: 29-Aug-2022
  • (2022)Automatic loop invariant generation for data dependence analysisProceedings of the IEEE/ACM 10th International Conference on Formal Methods in Software Engineering10.1145/3524482.3527649(34-45)Online publication date: 18-May-2022
  • Show More Cited By

Index Terms

  1. Static analysis of upper and lower bounds on dependences and parallelism

    Recommendations

    Reviews

    Benjamin Rayborn Seyfarth

    The authors present a two-step approach to searching source code for possible parallelism. The basic problem is to accurately compute array data dependences. Existing compilers employ conservative algorithms and sometimes fail to detect parallelism, leaving it to the programmer to study source code, typically without assistance from the compiler. The first step of the authors' algorithm is to construct constraints that describe which iterations of a statement may be executed concurrently. Following this, their algorithm searches for program transformations that provide parallelism. In some cases where the algorithm cannot find a parallelizing transformation, it suggests to the programmer what might be possible if certain apparent dependences could be ignored. The authors discuss the effectiveness of their algorithm on several short code segments and on one routine of the Perfect Club b enchmark. In addition, they present timings that illustrate that the computation of memory- and value-based dependences is both valuable and practical. The authors present a useful discussion of the fundamentals of dependence analysis along with the discussion of their algorithm. They also provide an extensive bibliography of related work. The paper is moderately long and discusses difficult material. It is directed toward researchers and compiler writers interested in automatic parallelization.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Programming Languages and Systems
    ACM Transactions on Programming Languages and Systems  Volume 16, Issue 4
    July 1994
    318 pages
    ISSN:0164-0925
    EISSN:1558-4593
    DOI:10.1145/183432
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 July 1994
    Published in TOPLAS Volume 16, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Omega test
    2. Presburger arithmetic
    3. array data-dependence analysis
    4. automatic parallelization
    5. compilation
    6. dependence relation
    7. optimization

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)89
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)High-Performance Computation of the Number of Nested RNA Structures with 3D Parallel Tiled CodeEng10.3390/eng40100304:1(507-525)Online publication date: 3-Feb-2023
    • (2022)A Pipeline Pattern Detection Technique in PollyWorkshop Proceedings of the 51st International Conference on Parallel Processing10.1145/3547276.3548445(1-10)Online publication date: 29-Aug-2022
    • (2022)Automatic loop invariant generation for data dependence analysisProceedings of the IEEE/ACM 10th International Conference on Formal Methods in Software Engineering10.1145/3524482.3527649(34-45)Online publication date: 18-May-2022
    • (2021)Spray: Sparse Reductions of Arrays in OPENMP2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00056(475-484)Online publication date: May-2021
    • (2020)Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00044(427-441)Online publication date: Oct-2020
    • (2019)Declarative Loop Tactics for Domain-specific OptimizationACM Transactions on Architecture and Code Optimization10.1145/337226616:4(1-25)Online publication date: 26-Dec-2019
    • (2019)Flextended TilesACM Transactions on Architecture and Code Optimization10.1145/336938216:4(1-25)Online publication date: 17-Dec-2019
    • (2019)The Next 700 Accelerated LayersACM Transactions on Architecture and Code Optimization10.1145/335560616:4(1-26)Online publication date: 11-Oct-2019
    • (2019)A Program Logic for Dependence AnalysisIntegrated Formal Methods10.1007/978-3-030-34968-4_5(83-100)Online publication date: 22-Nov-2019
    • (2018)High-Performance Generalized Tensor OperationsACM Transactions on Architecture and Code Optimization10.1145/323502915:3(1-27)Online publication date: 4-Sep-2018
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media