Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Skip header Section
Optimizing Supercompilers for SupercomputersNovember 1990
Publisher:
  • MIT Press
  • 55 Hayward St.
  • Cambridge
  • MA
  • United States
ISBN:978-0-262-73082-2
Published:01 November 1990
Pages:
176
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

From the Publisher:

Effective use of a supercomputer requires users to have a good algorithm and to express this algorithm in an appropriate language, and requires compilers to generate efficient code.

This book investigates several problems facing compiler design for supercomputers, including building efficient and comprehensive data dependence graphs, recurrence relations, the management of compiler temporary variables, and WHILE loops.

The book first proposes an efficient means of representing the flow of data in a program by labeling the arcs in a data dependence graph with "direction vectors" to show how the flow of data corresponds to the loop structure of the program. These data dependence direction vectors are then used in several high level compiler loop optimizations: loop vectorization, loop concurrentization, loop fusion, and loop interchanging. The book shows how to perform these transformations and how to use them to optimize programs for a wide range of supercomputers.

The problems of recurrence relations studied include arithmetic recurrences with IF statements and recurrences involving both data and control dependence relations in a cycle. The wavefront method of solving recurrences is also treated. The book discusses ways to make the problem of managing temporary arrays more tractable. It concludes by offering several methods for executing WHILE loops and describes a general structure of an optimizing compiler for supercomputers developed from the author's experience with a test bed compiler.

Michael Wolfe is Associate Professor in the Computer Science and Engineering Department at the Oregon Graduate Center Optimizing Supercompilers forSupercomputers is included in the series Research Monographs in Parallel Computing. Copublished with Pitman Publishing.

Cited By

  1. Zheng B, Vijaykumar N and Pekhimenko G Echo Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture, (1089-1102)
  2. ACM
    Arabnejad H, Bispo J, Barbosa J and Cardoso J AutoPar-Clava Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms, (13-19)
  3. ACM
    Kjolstad F, Kamil S, Chou S, Lugato D and Amarasinghe S (2017). The tensor algebra compiler, Proceedings of the ACM on Programming Languages, 1:OOPSLA, (1-29), Online publication date: 12-Oct-2017.
  4. Yin S, Lin X, Liu L and Wei S (2016). Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures, IEEE Transactions on Parallel and Distributed Systems, 27:11, (3199-3213), Online publication date: 1-Nov-2016.
  5. ACM
    Kumar S, Srinivasan V, Sharifian A, Sumner N and Shriraman A Peruse and Profit Proceedings of the 2016 International Conference on Supercomputing, (1-13)
  6. ACM
    Anand K, Elwazeer K, Kotha A, Smithson M, Barua R and Keromytis A (2016). A Stack Memory Abstraction and Symbolic Analysis Framework for Executables, ACM Transactions on Software Engineering and Methodology, 25:2, (1-38), Online publication date: 16-May-2016.
  7. ACM
    Overbey J, Behrang F and Hafiz M A foundation for refactoring C with macros Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (75-85)
  8. ACM
    Kennedy K and McKinley K Optimizing for parallelism and data locality ACM International Conference on Supercomputing 25th Anniversary Volume, (151-162)
  9. ACM
    Irigoin F, Jouvelot P and Triolet R Semantical interprocedural parallelization ACM International Conference on Supercomputing 25th Anniversary Volume, (143-150)
  10. ACM
    Venkat A, Shantharam M, Hall M and Strout M Non-affine Extensions to Polyhedral Code Generation Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (185-194)
  11. ACM
    Venkat A, Shantharam M, Hall M and Strout M Non-affine Extensions to Polyhedral Code Generation Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (185-194)
  12. ACM
    Benoit A, Çatalyürek Ü, Robert Y and Saule E (2013). A survey of pipelined workflow scheduling, ACM Computing Surveys, 45:4, (1-36), Online publication date: 1-Aug-2013.
  13. ACM
    Cui H, Yi Q, Xue J and Feng X (2013). Layout-oblivious compiler optimization for matrix computations, ACM Transactions on Architecture and Code Optimization, 9:4, (1-20), Online publication date: 1-Jan-2013.
  14. ACM
    Solano-Quinde L, Bode B and Somani A Techniques for the parallelization of unstructured grid applications on multi-GPU systems Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (140-147)
  15. Nicolau A and Kejariwal A How many threads to spawn during program multithreading? Proceedings of the 23rd international conference on Languages and compilers for parallel computing, (166-183)
  16. Aslam A and Hendren L McFLAT Proceedings of the 23rd international conference on Languages and compilers for parallel computing, (1-15)
  17. ACM
    Kejariwal A, Nicolau A, Banerjee U, Veidenbaum A and Polychronopoulos C Cache-aware partitioning of multi-dimensional iteration spaces Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, (1-12)
  18. ACM
    Ryoo S, Rodrigues C, Baghsorkhi S, Stone S, Kirk D and Hwu W Optimization principles and application performance evaluation of a multithreaded GPU using CUDA Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, (73-82)
  19. Chen G and Kandemir M An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors Transactions on High-Performance Embedded Architectures and Compilers I, (214-233)
  20. ACM
    Kennedy K, Koelbel C and Zima H The rise and fall of High Performance Fortran Proceedings of the third ACM SIGPLAN conference on History of programming languages, (7-1-7-22)
  21. ACM
    Agarwal S, Barik R, Sarkar V and Shyamasundar R May-happen-in-parallel analysis of X10 programs Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, (183-193)
  22. Yang X, Du J, Yan X and Deng Y Matrix-Based programming optimization for improving memory hierarchy performance on imagine Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications, (782-793)
  23. ACM
    Kejariwal A, Nicolau A, Saito H, Tian X, Girkar M, Banerjee U and Polychronopoulos C A general approach for partitioning N-dimensional parallel nested loops with conditionals Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, (49-58)
  24. ACM
    Garay J and Huelsbergen L Software integrity protection using timed executable agents Proceedings of the 2006 ACM Symposium on Information, computer and communications security, (189-200)
  25. Shen X and Ding C Parallelization of utility programs based on behavior phase analysis Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (425-432)
  26. ACM
    Nikolik B Convergence debugging Proceedings of the sixth international symposium on Automated analysis-driven debugging, (89-98)
  27. ACM
    Chen G and Kandemir M Optimizing inter-processor data locality on embedded chip multiprocessors Proceedings of the 5th ACM international conference on Embedded software, (227-236)
  28. ACM
    Kejariwal A, Nicolau A, Banerjee U and Polychronopoulos C A novel approach for partitioning iteration spaces with variable densities Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, (120-131)
  29. ACM
    Ziegler H and Hall M Evaluating heuristics in automatically mapping multi-loop applications to FPGAs Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays, (184-195)
  30. Arenaz M, Touriño J and Doallo R An inspector-executor algorithm for irregular assignment parallelization Proceedings of the Second international conference on Parallel and Distributed Processing and Applications, (4-15)
  31. Kejariwal A, D'Alberto P, Nicolau A and Polychronopoulos C A geometric approach for partitioning n-dimensional non-rectangular iteration spaces Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (102-116)
  32. Yi Q and Quinlan D Applying loop optimizations to object-oriented abstractions through general classification of array semantics Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (253-267)
  33. ACM
    Yi Q, Kennedy K, You H, Seymour K and Dongarra J Automatic blocking of QR and LU factorizations for locality Proceedings of the 2004 workshop on Memory system performance, (12-22)
  34. Martín M, Parada M and Doallo R (2004). High Performance Air Pollution Simulation Using OpenMP, The Journal of Supercomputing, 28:3, (311-321), Online publication date: 1-Jun-2004.
  35. Yi Q and Kennedy K (2004). Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion, International Journal of High Performance Computing Applications, 18:2, (237-253), Online publication date: 1-May-2004.
  36. ACM
    Burke M and Cytron R (2004). Interprocedural dependence analysis and parallelization, ACM SIGPLAN Notices, 39:4, (139-154), Online publication date: 1-Apr-2004.
  37. Zhai A, Colohan C, Steffan J and Mowry T Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
  38. Yi Q, Kennedy K and Adve V (2004). Transforming Complex Loop Nests for Locality, The Journal of Supercomputing, 27:3, (219-264), Online publication date: 1-Mar-2004.
  39. ACM
    Song L and Kavi K (2004). What can we gain by unfolding loops?, ACM SIGPLAN Notices, 39:2, (26-33), Online publication date: 1-Feb-2004.
  40. Martín M, Singh D, Mouriño J, Rivera F, Doallo R and Bruguera J (2003). High performance air pollution modeling for a power plant environment, Parallel Computing, 29:11-12, (1763-1790), Online publication date: 1-Nov-2003.
  41. ACM
    Goodwin D and Petkov D Automatic generation of application specific processors Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, (137-147)
  42. Bukatov A Building the program parallelization system based on a very wide spectrum program transformation system Proceedings of the 2003 international conference on Computational science: PartII, (945-954)
  43. Dongarra J, Foster I, Fox G, Gropp W, Kennedy K, Torczon L and White A References Sourcebook of parallel computing, (729-789)
  44. Jost G and Hood R (2003). Relative Debugging of Automatically Parallelized Programs, Automated Software Engineering, 10:1, (75-101), Online publication date: 1-Jan-2003.
  45. ACM
    Pechtchanski I and Sarkar V Immutability specification and its applications Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande, (202-211)
  46. ACM
    So B, Hall M and Diniz P A compiler approach to fast hardware design space exploration in FPGA-based systems Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation, (165-176)
  47. Wonnacott D (2002). Achieving Scalable Locality with Time Skewing, International Journal of Parallel Programming, 30:3, (181-221), Online publication date: 1-Jun-2002.
  48. ACM
    So B, Hall M and Diniz P (2002). A compiler approach to fast hardware design space exploration in FPGA-based systems, ACM SIGPLAN Notices, 37:5, (165-176), Online publication date: 17-May-2002.
  49. Dang F, Yu H and Rauchwerger L The R-LRPD Test Proceedings of the 16th International Parallel and Distributed Processing Symposium
  50. Petkov D, Harr R and Amarasinghe S Efficient Pipelining of Nested Loops Proceedings of the 16th International Parallel and Distributed Processing Symposium
  51. Prechelt L and Hánßgen S (2002). Efficient Parallel Execution of Irregular Recursive Programs, IEEE Transactions on Parallel and Distributed Systems, 13:2, (167-178), Online publication date: 1-Feb-2002.
  52. Barua R, Lee W, Amarasinghe S and Agarawal A (2001). Compiler Support for Scalable and Efficient Memory Systems, IEEE Transactions on Computers, 50:11, (1234-1247), Online publication date: 1-Nov-2001.
  53. Sarkar V (2001). Optimized Unrolling of Nested Loops, International Journal of Parallel Programming, 29:5, (545-581), Online publication date: 1-Oct-2001.
  54. Seater R and Wonnacott D Polynomial time array dataflow analysis Proceedings of the 14th international conference on Languages and compilers for parallel computing, (411-426)
  55. ACM
    Sarkar V, Serrano M and Simons B Register-sensitive selection, duplication, and sequencing of instructions Proceedings of the 15th international conference on Supercomputing, (277-288)
  56. ACM
    Ryder B, Landi W, Stocks P, Zhang S and Altucher R (2001). A schema for interprocedural modification side-effect analysis with pointer aliasing, ACM Transactions on Programming Languages and Systems, 23:2, (105-186), Online publication date: 1-Mar-2001.
  57. Ben-Asher Y and Haber G (2001). Parallel Solutions of Simple Indexed Recurrence Equations, IEEE Transactions on Parallel and Distributed Systems, 12:1, (22-37), Online publication date: 1-Jan-2001.
  58. ACM
    Sarkar V Optimized unrolling of nested loops Proceedings of the 14th international conference on Supercomputing, (153-166)
  59. White R, Mueller F, Healy C, Whalley D and Harmon M (1999). Timing Analysis for Data and Wrap-Around Fill Caches, Real-Time Systems, 17:2-3, (209-233), Online publication date: 14-Dec-1999.
  60. Younis M, Marlowe T, Stoyen A and Tsai G (1999). Statically Safe Speculative Execution for Real-Time Systems, IEEE Transactions on Software Engineering, 25:5, (701-721), Online publication date: 1-Sep-1999.
  61. ACM
    Lim A, Cheong G and Lam M An affine partitioning algorithm to maximize parallelism and minimize communication Proceedings of the 13th international conference on Supercomputing, (228-237)
  62. Rauchwerger L and Padua D (1999). The LRPD Test, IEEE Transactions on Parallel and Distributed Systems, 10:2, (160-180), Online publication date: 1-Feb-1999.
  63. ACM
    Roth G and Kennedy K Loop fusion in high performance Fortran Proceedings of the 12th international conference on Supercomputing, (125-132)
  64. ACM
    Knobe K and Sarkar V Array SSA form and its use in parallelization Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (107-120)
  65. ACM
    Debray S, Muth R and Weippert M Alias analysis of executable code Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (12-24)
  66. ACM
    Roth G, Mellor-Crummey J, Kennedy K and Brickner R Compiling stencils in high performance Fortran Proceedings of the 1997 ACM/IEEE conference on Supercomputing, (1-20)
  67. Bhattacharyya S, Murthy P and Lee E Optimized software synthesis for synchronous dataflow Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
  68. Murthy P, Bhattacharyya S and Lee E (1997). Joint Minimization of Code and Data for Synchronous DataflowPrograms, Formal Methods in System Design, 11:1, (41-70), Online publication date: 1-Jul-1997.
  69. ACM
    Megiddo N and Sarkar V Optimal weighted loop fusion for parallel programs Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, (282-291)
  70. Park S and Koo M Detection of Implicit Parallelisms in the Task Parallel Language Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
  71. Chongwe S Simulation of aerodynamics problem on a distributed shared-memory machine Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
  72. Koo M, Park S, Yook H and Park M A transformation method to reduce loop overhead in HPF compiler Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
  73. Cho C, Shim J and Lee M A Loop Transformation for Maximizing Parallelism from Single Loops with Nonuniform Dependencies Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
  74. Koo M, Park S, Yook H and Park M A New Transformation Method to Generate Optimized DO Loop from FORALL Construct Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
  75. Manjikian N and Abdelrahman T (1997). Fusion of Loops for Parallelism and Locality, IEEE Transactions on Parallel and Distributed Systems, 8:2, (193-209), Online publication date: 1-Feb-1997.
  76. ACM
    Lim A and Lam M Maximizing parallelism and minimizing synchronization with affine transforms Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (201-214)
  77. Beckmann C (1996). Generating Self-Scheduling Code for Nonloop Parallelism, Journal of Parallel and Distributed Computing, 39:2, (126-145), Online publication date: 15-Dec-1996.
  78. Gong C, Melhem R and Gupta R (1996). Loop Transformations for Fault Detection in Regular Loops on Massively Parallel Systems, IEEE Transactions on Parallel and Distributed Systems, 7:12, (1238-1249), Online publication date: 1-Dec-1996.
  79. Passos N and Sha E (1996). Achieving Full Parallelism Using Multidimensional Retiming, IEEE Transactions on Parallel and Distributed Systems, 7:11, (1150-1163), Online publication date: 1-Nov-1996.
  80. ACM
    Sarkar V and Simons B Anticipatory instruction scheduling Proceedings of the eighth annual ACM symposium on Parallel Algorithms and Architectures, (119-130)
  81. Chen Y, Wang S and Wang C (1996). Tiling Nested Loops into Maximal Rectangular Blocks, Journal of Parallel and Distributed Computing, 35:2, (123-132), Online publication date: 15-Jun-1996.
  82. Yang T and Ibarra O (1996). Performance Prediction in Symbolic Scheduling of Partitioned Programs with Weight Variation, Journal of Parallel and Distributed Computing, 34:2, (171-182), Online publication date: 1-May-1996.
  83. Bik A and Wijshoff H (1996). The Use of Iteration Space Partitioning to Construct Representative Simple Sections, Journal of Parallel and Distributed Computing, 34:1, (95-110), Online publication date: 10-Apr-1996.
  84. Rim M and Jain R (1996). Valid Transformations, IEEE Transactions on Parallel and Distributed Systems, 7:4, (399-410), Online publication date: 1-Apr-1996.
  85. Bik A and Wijshoff H (1996). Automatic Data Structure Selection and Transformation for Sparse Matrix Computations, IEEE Transactions on Parallel and Distributed Systems, 7:2, (109-126), Online publication date: 1-Feb-1996.
  86. ACM
    Yoshida A, Koshizuka K and Kasahara H Data-localization for Fortran macro-dataflow computation using partial static task assignment Proceedings of the 10th international conference on Supercomputing, (61-68)
  87. ACM
    Suganuma T, Komatsu H and Nakatani T Detection and global optimization of reduction operations for distributed parallel machines Proceedings of the 10th international conference on Supercomputing, (18-25)
  88. Chen X and Megson G (1995). A General Methodology of Partitioning and Mapping for Given Regular Arrays, IEEE Transactions on Parallel and Distributed Systems, 6:10, (1100-1107), Online publication date: 1-Oct-1995.
  89. ACM
    Ghuloum A and Fisher A (1995). Flattening and parallelizing irregular, recurrent loop nests, ACM SIGPLAN Notices, 30:8, (58-67), Online publication date: 1-Aug-1995.
  90. ACM
    Ghuloum A and Fisher A Flattening and parallelizing irregular, recurrent loop nests Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, (58-67)
  91. Fernández A, Llabería J and Valero-García M (1995). Loop Transformation Using Nonunimodular Matrices, IEEE Transactions on Parallel and Distributed Systems, 6:8, (832-840), Online publication date: 1-Aug-1995.
  92. ACM
    Tang P and Gao N Vectorization beyond data dependences Proceedings of the 9th international conference on Supercomputing, (434-443)
  93. ACM
    Ohta H, Saito Y, Kainaga M and Ono H Optimal tile size adjustment in compiling general DOACROSS loop nests Proceedings of the 9th international conference on Supercomputing, (270-279)
  94. ACM
    Rauchwerger L, Amato N and Padua D Run-time methods for parallelizing partially parallel loops Proceedings of the 9th international conference on Supercomputing, (137-146)
  95. ACM
    Tseng C, Anderson J, Amarasinghe S and Lam M Unified compilation techniques for shared and distributed address space machines Proceedings of the 9th international conference on Supercomputing, (67-76)
  96. ACM
    Li W Compiler cache optimizations for banded matrix problems Proceedings of the 9th international conference on Supercomputing, (21-30)
  97. Youssef A Translation of serial recursive codes to parallel SIMD codes Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, (254-263)
  98. Pineo P and Soffa M Practical approach to single assignment code Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, (149-158)
  99. ACM
    Rauchwerger L and Padua D The LRPD test Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation, (218-232)
  100. ACM
    Rauchwerger L and Padua D (1995). The LRPD test, ACM SIGPLAN Notices, 30:6, (218-232), Online publication date: 1-Jun-1995.
  101. Shang S and Hwang K (1995). Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters, IEEE Transactions on Parallel and Distributed Systems, 6:6, (591-605), Online publication date: 1-Jun-1995.
  102. Cann D and Evripidou P (1995). Advanced Array Optimizations for High Performance Functional Languages, IEEE Transactions on Parallel and Distributed Systems, 6:3, (229-239), Online publication date: 1-Mar-1995.
  103. ACM
    Altucher R and Landi W An extended form of must alias analysis for dynamic allocation Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (74-84)
  104. ACM
    Ganesh A (1994). Fusing loops with backward inter loop data dependence, ACM SIGPLAN Notices, 29:12, (25-30), Online publication date: 1-Dec-1994.
  105. ACM
    Bernstein D, Cohen D and Maydan D Dynamic memory disambiguation for array references Proceedings of the 27th annual international symposium on Microarchitecture, (105-111)
  106. Sass R and Mutka M Enabling unimodular transformations Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (753-762)
  107. Zaafrani A and Ito M Expressing cross-loop dependencies through hyperplane data dependence analysis Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (508-517)
  108. Chiueh T Sunder Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (488-496)
  109. ACM
    Thakur R, Bordawekar R and Choudhary A (1994). Compilation of out-of-core data parallel programs for distributed memory machines, ACM SIGARCH Computer Architecture News, 22:4, (23-28), Online publication date: 1-Sep-1994.
  110. Sass R and Mutka M Transformations on Doubly Nested Loops Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques, (343-346)
  111. ACM
    Fisher A and Ghuloum A Parallelizing complex scans and reductions Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, (135-146)
  112. ACM
    Thakur R, Bordawekar R and Choudhary A Compiler and runtime support for out-of-core HPF programs Proceedings of the 8th international conference on Supercomputing, (382-391)
  113. ACM
    Li H and Sevcik K Exploiting cache affinity in software cache coherence Proceedings of the 8th international conference on Supercomputing, (264-273)
  114. ACM
    Chen Y, Wang S and Wang C Compiler techniques for maximizing fine-grain and coarse-grain parallelism in loops with uniform dependences Proceedings of the 8th international conference on Supercomputing, (204-213)
  115. ACM
    Rauchwerger L and Padua D The privatizing DOALL test Proceedings of the 8th international conference on Supercomputing, (33-43)
  116. ACM
    Fisher A and Ghuloum A (1994). Parallelizing complex scans and reductions, ACM SIGPLAN Notices, 29:6, (135-146), Online publication date: 1-Jun-1994.
  117. ACM
    Bik A and Wijshoff H Advanced compiler optimizations for sparse computations Proceedings of the 1993 ACM/IEEE conference on Supercomputing, (430-439)
  118. ACM
    Zaafrani A and Ito M Partitioning the global space for distributed memory systems Proceedings of the 1993 ACM/IEEE conference on Supercomputing, (327-336)
  119. Li H and Sevcik K NUMACROS USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4, (13-13)
  120. O'Keefe M and Dietz H (1993). Loop Coalescing and Scheduling for Barrier MIMD Architectures, IEEE Transactions on Parallel and Distributed Systems, 4:9, (1060-1064), Online publication date: 1-Sep-1993.
  121. Alverson G and Notkin D (1993). Program Structuring for Effective Parallel Portability, IEEE Transactions on Parallel and Distributed Systems, 4:9, (1041-1059), Online publication date: 1-Sep-1993.
  122. ACM
    Bik A and Wijshoff H Compilation techniques for sparse matrix computations Proceedings of the 7th international conference on Supercomputing, (416-424)
  123. ACM
    Ayguadé E and Torres J Partitioning the statement per iteration space using non-singular matrices Proceedings of the 7th international conference on Supercomputing, (407-415)
  124. ACM
    Sabot G and Wholey S CMAX Proceedings of the 7th international conference on Supercomputing, (147-156)
  125. ACM
    Philippsen M, Heinz E and Lukowicz P (1993). Compiling machine-independent parallel programs, ACM SIGPLAN Notices, 28:8, (99-108), Online publication date: 1-Aug-1993.
  126. ACM
    Anderson J and Lam M Global optimizations for parallelism and locality on scalable parallel machines Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, (112-125)
  127. ACM
    Duesterwald E, Gupta R and Soffa M A practical data flow framework for array reference analysis and its use in optimizations Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, (68-77)
  128. ACM
    Landi W, Ryder B and Zhang S Interprocedural modification side effect analysis with pointer aliasing Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, (56-67)
  129. ACM
    Anderson J and Lam M (1993). Global optimizations for parallelism and locality on scalable parallel machines, ACM SIGPLAN Notices, 28:6, (112-125), Online publication date: 1-Jun-1993.
  130. ACM
    Duesterwald E, Gupta R and Soffa M (1993). A practical data flow framework for array reference analysis and its use in optimizations, ACM SIGPLAN Notices, 28:6, (68-77), Online publication date: 1-Jun-1993.
  131. ACM
    Landi W, Ryder B and Zhang S (1993). Interprocedural modification side effect analysis with pointer aliasing, ACM SIGPLAN Notices, 28:6, (56-67), Online publication date: 1-Jun-1993.
  132. Tzen T and Ni L (1993). Dependence Uniformization, IEEE Transactions on Parallel and Distributed Systems, 4:5, (547-558), Online publication date: 1-May-1993.
  133. ACM
    Ito M and Zaafrani A Data flow analysis for parallel programs Proceedings of the 1993 ACM conference on Computer science, (318-325)
  134. Allan V, Rajagopalan M and Lee R Software Pipelining Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, (15-26)
  135. Zaafrani A and Ito M Efficient Execution of Doacross Loops on Distributed Memory Systems Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, (27-38)
  136. Burns C, Kuhn R and Werme E Low copy message passing on the Alliant CAMPUS/800 Proceedings of the 1992 ACM/IEEE conference on Supercomputing, (760-769)
  137. Ramanujam J Non-unimodular transformations of nested loops Proceedings of the 1992 ACM/IEEE conference on Supercomputing, (214-223)
  138. ACM
    Li W and Pingali K (1992). Access normalization, ACM SIGPLAN Notices, 27:9, (285-295), Online publication date: 1-Sep-1992.
  139. ACM
    Li W and Pingali K Access normalization Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, (285-295)
  140. Wolfe M and Tseng C (1992). The Power Test for Data Dependence, IEEE Transactions on Parallel and Distributed Systems, 3:5, (591-601), Online publication date: 1-Sep-1992.
  141. ACM
    O'Boyle M and Hedayat G A transformational approach to compiling Sisal for distributed memory architectures Proceedings of the 6th international conference on Supercomputing, (335-346)
  142. ACM
    Kennedy K and McKinley K Optimizing for parallelism and data locality Proceedings of the 6th international conference on Supercomputing, (323-334)
  143. ACM
    Psarris K On exact data dependence analysis Proceedings of the 6th international conference on Supercomputing, (303-312)
  144. ACM
    Lee Y and Ryder B A comprehensive approach to parallel data flow analysis Proceedings of the 6th international conference on Supercomputing, (236-247)
  145. ACM
    Sarkar V and Thekkath R (1992). A general framework for iteration-reordering loop transformations, ACM SIGPLAN Notices, 27:7, (175-187), Online publication date: 1-Jul-1992.
  146. ACM
    Pugh W and Wonnacott D (1992). Eliminating false data dependences using the Omega test, ACM SIGPLAN Notices, 27:7, (140-151), Online publication date: 1-Jul-1992.
  147. ACM
    Sarkar V and Thekkath R A general framework for iteration-reordering loop transformations Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, (175-187)
  148. ACM
    Pugh W and Wonnacott D Eliminating false data dependences using the Omega test Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, (140-151)
  149. ACM
    De Gloria A and Faraboschi P Instruction-level parallelism in Prolog Proceedings of the 19th annual international symposium on Computer architecture, (224-233)
  150. ACM
    De Gloria A and Faraboschi P (1992). Instruction-level parallelism in Prolog, ACM SIGARCH Computer Architecture News, 20:2, (224-233), Online publication date: 1-May-1992.
  151. ACM
    McCreary C and Gill D Automatic partitioning and virtual scheduling for efficient parallel execution Proceedings of the 30th annual ACM Southeast Regional Conference, (29-36)
  152. ACM
    Pineo P and Soffa M (1991). Debugging parallelized code using code liberation techniques, ACM SIGPLAN Notices, 26:12, (108-119), Online publication date: 1-Dec-1991.
  153. ACM
    Pineo P and Soffa M Debugging parallelized code using code liberation techniques Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging, (108-119)
  154. Ramanujam J and Sadayappan P (1991). Compile-Time Techniques for Data Distribution in Distributed Memory Machines, IEEE Transactions on Parallel and Distributed Systems, 2:4, (472-482), Online publication date: 1-Oct-1991.
  155. Wolf M and Lam M (1991). A Loop Transformation Theory and an Algorithm to Maximize Parallelism, IEEE Transactions on Parallel and Distributed Systems, 2:4, (452-471), Online publication date: 1-Oct-1991.
  156. ACM
    Hall M, Kennedy K and McKinley K Interprocedural transformations for parallel code generation Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (424-434)
  157. ACM
    Myczkowski J and Steele G Seismic modeling at 14 gigaflops on the connection machine Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (316-326)
  158. ACM
    Cann D Retire Fortran? A debate rekindled Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (264-272)
  159. ACM
    Ramanujam J and Sadayappan P Tiling multidimensional iteration spaces for nonshared memory machines Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (111-120)
  160. ACM
    Hiranandani S, Kennedy K and Tseng C Compiler optimizations for Fortran D on MIMD distributed-memory machines Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (86-100)
  161. ACM
    Pugh W The Omega test: a fast and practical integer programming algorithm for dependence analysis Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (4-13)
  162. ACM
    Krothapalli V and Sadayappan P (1991). Removal of redundant dependences in DOACROSS loops with constant dependences, ACM SIGPLAN Notices, 26:7, (51-60), Online publication date: 1-Jul-1991.
  163. ACM
    Ancourt C and Irigoin F (1991). Scanning polyhedra with DO loops, ACM SIGPLAN Notices, 26:7, (39-50), Online publication date: 1-Jul-1991.
  164. ACM
    Lu L (1991). A unified framework for systematic loop transformations, ACM SIGPLAN Notices, 26:7, (28-38), Online publication date: 1-Jul-1991.
  165. ACM
    Bromley M, Heller S, McNerney T and Steele G (1991). Fortran at ten gigaflops, ACM SIGPLAN Notices, 26:6, (145-156), Online publication date: 1-Jun-1991.
  166. ACM
    Chatterjee S, Blelloch G and Fisher A (1991). Size and access inference for data-parallel programs, ACM SIGPLAN Notices, 26:6, (130-144), Online publication date: 1-Jun-1991.
  167. ACM
    Goff G, Kennedy K and Tseng C (1991). Practical dependence testing, ACM SIGPLAN Notices, 26:6, (15-29), Online publication date: 1-Jun-1991.
  168. ACM
    Maydan D, Hennessy J and Lam M (1991). Efficient and exact data dependence analysis, ACM SIGPLAN Notices, 26:6, (1-14), Online publication date: 1-Jun-1991.
  169. ACM
    Appelbe B, Smith K and Stirewalt K PATCH—a new algorithm for rapid incremental dependence analysis Proceedings of the 5th international conference on Supercomputing, (424-432)
  170. ACM
    Pugh W Uniform techniques for loop optimization Proceedings of the 5th international conference on Supercomputing, (341-352)
  171. ACM
    Psarris K, Kong X and Klappholz D Extending the I test to direction vectors Proceedings of the 5th international conference on Supercomputing, (330-340)
  172. ACM
    Wolfe M Experiences with data dependence abstractions Proceedings of the 5th international conference on Supercomputing, (321-329)
  173. ACM
    Irigoin F, Jouvelot P and Triolet R Semantical interprocedural parallelization Proceedings of the 5th international conference on Supercomputing, (244-251)
  174. ACM
    Sarkar V and Gao G Optimization of array accesses by collective loop transformations Proceedings of the 5th international conference on Supercomputing, (194-205)
  175. ACM
    Venugopal S and Eventoff W Automatic transformation of FORTRAN loops to reduce cache conflicts Proceedings of the 5th international conference on Supercomputing, (183-193)
  176. ACM
    Hege H and Stüben H Vectorization and parallelization of irregular problems via graph coloring Proceedings of the 5th international conference on Supercomputing, (47-56)
  177. ACM
    Bromley M, Heller S, McNerney T and Steele G Fortran at ten gigaflops Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, (145-156)
  178. ACM
    Chatterjee S, Blelloch G and Fisher A Size and access inference for data-parallel programs Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, (130-144)
  179. ACM
    Goff G, Kennedy K and Tseng C Practical dependence testing Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, (15-29)
  180. ACM
    Maydan D, Hennessy J and Lam M Efficient and exact data dependence analysis Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, (1-14)
  181. Salz J, Mirchandaney R and Crowley K (1991). Run-Time Parallelization and Scheduling of Loops, IEEE Transactions on Computers, 40:5, (603-612), Online publication date: 1-May-1991.
  182. ACM
    Krothapalli V and Sadayappan P Removal of redundant dependences in DOACROSS loops with constant dependences Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming, (51-60)
  183. ACM
    Ancourt C and Irigoin F Scanning polyhedra with DO loops Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming, (39-50)
  184. ACM
    Lu L A unified framework for systematic loop transformations Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming, (28-38)
  185. Lu L and Chen M Subdomain dependence test for massive parallelism Proceedings of the 1990 ACM/IEEE conference on Supercomputing, (962-972)
  186. Lee Y, Marlowe T and Ryder B Performing data flow analysis in parallel Proceedings of the 1990 ACM/IEEE conference on Supercomputing, (942-951)
  187. Li J and Chen M Generating explicit communication from shared-memory program references Proceedings of the 1990 ACM/IEEE conference on Supercomputing, (865-876)
  188. ACM
    Klappholz D, Psarris K and Kong X (1990). On the perfect accuracy of an approximate subscript analysis test, ACM SIGARCH Computer Architecture News, 18:3b, (201-212), Online publication date: 1-Sep-1990.
  189. ACM
    Klappholz D, Psarris K and Kong X On the perfect accuracy of an approximate subscript analysis test Proceedings of the 4th international conference on Supercomputing, (201-212)
  190. ACM
    Whitfield D and Soffa M (1990). An approach to ordering optimizing transformations, ACM SIGPLAN Notices, 25:3, (137-146), Online publication date: 1-Mar-1990.
  191. ACM
    Whitfield D and Soffa M An approach to ordering optimizing transformations Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming, (137-146)
  192. ACM
    Wolfe M More iteration space tiling Proceedings of the 1989 ACM/IEEE conference on Supercomputing, (655-664)
  193. ACM
    Ramanujam J and Sadayappan P A methodology for parallelizing programs for multicomputers and complex memory multiprocessors Proceedings of the 1989 ACM/IEEE conference on Supercomputing, (637-646)
  194. ACM
    Klappholz D, Kong X and Kalis A Refined Fortran: an update Proceedings of the 1989 ACM/IEEE conference on Supercomputing, (607-615)
  195. ACM
    Ayguadé E, Labarta J, Torres J and Borensztejn P GTS: parallelization and vectorization of tight recurrences Proceedings of the 1989 ACM/IEEE conference on Supercomputing, (531-539)
  196. ACM
    Su H and Yew P (1989). On data synchronization for multiprocessors, ACM SIGARCH Computer Architecture News, 17:3, (416-423), Online publication date: 1-Jun-1989.
  197. ACM
    Rudolph D and Polychronopoulos C An efficient message-passing scheduler based on guided self scheduling Proceedings of the 3rd international conference on Supercomputing, (50-61)
  198. ACM
    Su H and Yew P On data synchronization for multiprocessors Proceedings of the 16th annual international symposium on Computer architecture, (416-423)
  199. ACM
    Cytron R, Lowry A and Zadeck F Code motion of control structures in high-level languages Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, (70-85)
Contributors
  • STMicroelectronics

Recommendations