Article

Free access

Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines

Authors:

Seema Hiranandani,

Chau-Wen TsengAuthors Info & Claims

ICS '92: Proceedings of the 6th international conference on Supercomputing

Pages 1 - 14

https://doi.org/10.1145/143369.143372

Published: 01 August 1992 Publication History

Abstract

The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empirically evaluated for stencil computations. Profitability formulas are derived for each optimization. Results show that exploiting parallelism for pipelined computations, reductions, and scans is vital. Message vectorization, collective communication, and efficient coarse-grain pipelining also significantly affect performance.

References

[1]

J. R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491-542, October 1987.

Digital Library

[2]

F. Andre, J. Pazat, and H. Thomas. Pandore: A system to manage data distribution. In Proceedings o.f the 1990 A CM International Conference on Supercomputing, Amsterdam, The Netherlands, June 1990.

Digital Library

[3]

V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. An interactive environment for data partitioning and distribution. In Proceedings of the 5th Distributed Memory Computing Conference, Charleston, SC, April 1990.

[4]

V. Balaaundararn, G. Fox, K. Kennedy, and U. Kremer. A static performance estimator to guide data partitioning decisions. In Proceedings of the Third A CM SIGPLAN Symposium on Principles and Practsce of Parallel Programming, Williamsburg, VA, April 1991.

Digital Library

[5]

S. Benkner, B. Chapman, and H. Zima, Vienna Fortran 90. In Proceedings of the 1992 Scalable High Perjormanee Computing Conference, Williamsburg, VA, April 1992.

[6]

S. Bokhari. Complete exchange on the iPSC-860. ICASE Report 91-4, Institute for Computer Application in Science and Engineering, Hampton, VA, January 1991.

[7]

M. Bromley, S. Heller, T. McNerney, and G. Steele, Jr. Fortran at ten gigaflops: The Connection Machine convolution compiler. In Proceedings of the SIGPLAN '9i Conference on Program Language Design and Implementation, Toronto, Canada, June 1991.

Digital Library

[8]

D. Callahan, K. Cooper, R. Hood, K. Kennedy, and L. Torczon. ParaScope: A parallel programming environment. The Internatlonal Journal of Supercomputer Applications, 2(4):84-99, Winter 1988.

Digital Library

[9]

D. Callahan and K. Kennedy. Compiling programs for distributed-memory multiprocessors. Journal of Supercom. purist, 2:151-169, October 1988.

[10]

C. Chase, A. Cheung, A. Reeves, and M. Smith. Paragon: A parallel programming environment for scientific applications using communication structures. In Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, IL, August 1991.

[11]

S. Chatterjee, G. Blelloch, and M. Zagha. Scan primitives for vector computers. In Proceedings of Supercomput~ng '90, New York, NY, November 1990.

Digital Library

[12]

G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremet, C. Tseng, and M. Wu. Fortran D language specification. Technical Report TR90-141, Dept. of Computer Science, Rice University, December 1990.

[13]

G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors, volume 1. Prentice-Hall, Englewood Cliffs, NJ, 1988.

Digital Library

[14]

E. Gabber, A. Averbuch, and A. Yehudai. Experience with a portable parallelizing Pascal compiler. In Proceedings of the 1991 international Conference on Parallel Processing, St. Charles, IL, August 1991.

[15]

M. Gerndt. Updating distributed variables in local computations. Concurrency: Practice ~ Experience, 2(3):171-193, September 1990.

Digital Library

[16]

M. Gupta and P. Banerjee. Compile-time estimation of communication costs on multicomputers. In Proceedings of the 6th International Parallel Processing Symposium, Beverly Hills, CA, March 1992.

Digital Library

[17]

P. Hatcher, M. Quinn, A. Lapadula, B. Seevers, R. Anderson, and R. Jones. Data-parallel programming on MIMD computers. IEEE Transactions on Parallel and Distributed Systems, 2(3):377-383, July 1991.

Digital Library

[18]

P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350-360, July 1991.

Digital Library

[19]

S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, and C. Tseng. An overview of the Fortran D programming system. In Proceedings of the Fourth Workshop on Languages and Compilers }or Parallel Computing, Santa Clara, CA, August 1991.

Digital Library

[20]

S. Hiranandani, K. Kennedy, and C. Tseng. Compiler optimizations for Fortran D on MIMD distributed-memory machines. In Proceedings of Supercomputing '91, Albuquerque, NM, November 1991.

Digital Library

[21]

S. Hiranandani, K. Kennedy, and C. Tseng. Compiler support for machine-independent parallel programming in Fortran D. In J. Saltz and P. Mehrotra, editors, Languages, Compilers, and Run-Time Environments ?or Distributed Memory Machines. North-Holland, Amsterdam, The Netherlands, 1992.

Digital Library

[22]

K. Ikudome, G. Fox, A. Kolawa, and J. Flower. An automatic and symbolic parallelization system for distributed memory parallel computers. In Proceedings of the 5th Distributed Memory Computing ConIer~nce, Charleston, SG, April 1990.

[23]

K. Kennedy and U. Kremer. Automatic data alignment and distribution for loosely synchronous problems in an interactive programming environment. Technical Report TR91- 155, Dept. of Computer Science, Rice University, April 1991.

[24]

K. Kennedy and K. S, McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, DC, July 1992.

Digital Library

[25]

K. Kennedy, K. S. MCKinley, and C. Tseng. Analysis and transformation in the ParaScope Editor. In Proceedings of the 1991 A CM International Conference on Supercomputing, Cologne, Germany, June 1991.

Digital Library

[26]

G. Koelbel and P. Mehrotra. Programming data parallel algorithms on distributed memory machines using Kali. in Proceedings of the 1991 A CM International Conference on Supercomputing, Cologne, Germany, June 1991.

Digital Library

[27]

P. Kogge and H. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Transactions on Computers, C-22(8):786-793, August 1973.

Digital Library

[28]

D. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. J. Wolfe. Dependence graphs and compiler optimizations. In Conference Record of the Eighth Annual A CM Symposium on the Principles of Programming Languages, Williamsburg, VA, January 1981.

Digital Library

[29]

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361-376, July 1991.

Digital Library

[30]

F. McMahon. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, 1986.

[31]

D. Olander and R. Schnabel. Preliminary experience in developing a parallel thin-layer Navier Stokes code and implications for parallel language design. In Proceedings of the 1992 Scalable High Performance Computing Conference, Williamsburg, VA, April 1992.

[32]

A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and lmplementatlon, Portland, OR, June 1989.

Digital Library

[33]

J. Rose and G. Steele, Jr. C*: An extended C language for data parallel programming. In L. Kartashev and S. Kartashev, editors, Proceedings of the Second International Conference on Supercomputing, Santa Clara, CA, May 1987.

[34]

M. Rosing, R. Schnabel, and R. Weaver. The DINO parallel programming language. Journal o} Parallel and Distributed Computing, 13(1) :30-42, September 1991.

Digital Library

[35]

R. Ruhl and M. Annaratone. Parallelization of FORTRAN code on distributed-memory parallel processors. In Proceedings of ~he 1990 ACM International Conference on Supercomputing, Amsterdam, The Netherlands, June 1990.

Digital Library

[36]

P.-S. Tseng. A parallelizing compiler for distributed memory parallel computers, in Proceedings of the SIGPLAN '90 Conference on Program Language Design and Implementation, White Plains, NY, June 1990.

[37]

J. Wu, J. Saltz, S. Hiranandani, and H. Berryman. Runtime compilation methods for multicomputers. In Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, IL, August 1991.

[38]

H. Zima, H.-J. Bast, and M. Gerndt. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization. Parallel Computing, 6:1-18, 1988.

Cited By

Kayraklioglu EFavry EEl-Ghazawi T(2021)A Machine-Learning-Based Framework for Productive Locality ExploitationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.305134832:6(1409-1424)Online publication date: 1-Jun-2021
https://doi.org/10.1109/TPDS.2021.3051348
Hayashi AZhao JFerguson MSarkar VFinkel H(2015)LLVM-based communication optimizations for PGAS programsProceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC10.1145/2833157.2833164(1-11)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2833157.2833164
Barik RZhao JGrove DPeshansky IBudimlic ZSarkar V(2011)Communication Optimizations for Distributed-Memory X10 ProgramsProceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium10.1109/IPDPS.2011.105(1101-1113)Online publication date: 16-May-2011
https://dl.acm.org/doi/10.1109/IPDPS.2011.105
Show More Cited By

Recommendations

Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines

Distributed-memory message-passing machines deliver scalable performance but are difficult to program. Shared-memory machines, on the other hand, are easier to program but obtaining scalable performance with large number of processors is difficult. ...
An optimizing Fortran D compiler for MIMD distributed-memory machines
Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers
Special issue on compilation techniques for distributed memory systems

Appropriate data distribution has been found to be critical for obtaining good performance on distributed memory multicomputers such as the Thinking Machines CM-5, Intel Paragon, and IBM SP-1/SP-2. It has also been found that some programs need to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '92: Proceedings of the 6th international conference on Supercomputing

August 1992

495 pages

ISBN:0897914856

DOI:10.1145/143369

Chairmen:
Ken Kennedy
Rice University, Houston, TX
,
Constantine D. Polychronopoulos
Kubota Pacific Computers and CSRD

Copyright © 1992 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1992

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS92

Sponsor:

SIGARCH

ICS92: ACM SIGARCH International Conference on Supercomputing

July 19 - 24, 1992

D. C., Washington, USA

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
409
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)15

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kayraklioglu EFavry EEl-Ghazawi T(2021)A Machine-Learning-Based Framework for Productive Locality ExploitationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.305134832:6(1409-1424)Online publication date: 1-Jun-2021
https://doi.org/10.1109/TPDS.2021.3051348
Hayashi AZhao JFerguson MSarkar VFinkel H(2015)LLVM-based communication optimizations for PGAS programsProceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC10.1145/2833157.2833164(1-11)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2833157.2833164
Barik RZhao JGrove DPeshansky IBudimlic ZSarkar V(2011)Communication Optimizations for Distributed-Memory X10 ProgramsProceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium10.1109/IPDPS.2011.105(1101-1113)Online publication date: 16-May-2011
https://dl.acm.org/doi/10.1109/IPDPS.2011.105
Choudhary AKandemir M(2010)Compilation for Distributed Memory ArchitecturesThe Compiler Design Handbook10.1201/9781420040579.ch11Online publication date: 7-Mar-2010
https://doi.org/10.1201/9781420040579.ch11
Hoefler TLumsdaine A(2008)Overlapping Communication and Computation with High Level Communication RoutinesProceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid10.1109/CCGRID.2008.15(572-577)Online publication date: 19-May-2008
https://dl.acm.org/doi/10.1109/CCGRID.2008.15
Kennedy KKoelbel CZima HRyder BHailpern B(2007)The rise and fall of High Performance FortranProceedings of the third ACM SIGPLAN conference on History of programming languages10.1145/1238844.1238851(7-1-7-22)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1238844.1238851
Zhang CKurdahi F(2007)Reducing Off-Chip Memory Access via Stream-Conscious Tiling on Multimedia ApplicationsInternational Journal of Parallel Programming10.1007/s10766-006-0027-935:1(63-98)Online publication date: 10-Feb-2007
https://doi.org/10.1007/s10766-006-0027-9
Chen TSura ZO’Brien KO’Brien J(2007)Optimizing the Use of Static Buffers for DMA on a CELL ChipLanguages and Compilers for Parallel Computing10.1007/978-3-540-72521-3_23(314-329)Online publication date: 2007
https://doi.org/10.1007/978-3-540-72521-3_23
Chen TSura ZO'Brien KO'Brien J(2006)Optimizing the use of static buffers for DMA on a CELL chipProceedings of the 19th international conference on Languages and compilers for parallel computing10.5555/1757112.1757144(314-329)Online publication date: 2-Nov-2006
https://dl.acm.org/doi/10.5555/1757112.1757144
Satoh MNegishi KKobayashi A(2006)Analysis of two-level data mapping in an HPF compiler for distributed-memory machinesParallel Computing10.1016/j.parco.2005.11.00332:4(280-300)Online publication date: 1-Apr-2006
https://dl.acm.org/doi/10.1016/j.parco.2005.11.003
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents