Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/143369.143372acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article
Free access

Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines

Published: 01 August 1992 Publication History

Abstract

The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empirically evaluated for stencil computations. Profitability formulas are derived for each optimization. Results show that exploiting parallelism for pipelined computations, reductions, and scans is vital. Message vectorization, collective communication, and efficient coarse-grain pipelining also significantly affect performance.

References

[1]
J. R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491-542, October 1987.
[2]
F. Andre, J. Pazat, and H. Thomas. Pandore: A system to manage data distribution. In Proceedings o.f the 1990 A CM International Conference on Supercomputing, Amsterdam, The Netherlands, June 1990.
[3]
V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. An interactive environment for data partitioning and distribution. In Proceedings of the 5th Distributed Memory Computing Conference, Charleston, SC, April 1990.
[4]
V. Balaaundararn, G. Fox, K. Kennedy, and U. Kremer. A static performance estimator to guide data partitioning decisions. In Proceedings of the Third A CM SIGPLAN Symposium on Principles and Practsce of Parallel Programming, Williamsburg, VA, April 1991.
[5]
S. Benkner, B. Chapman, and H. Zima, Vienna Fortran 90. In Proceedings of the 1992 Scalable High Perjormanee Computing Conference, Williamsburg, VA, April 1992.
[6]
S. Bokhari. Complete exchange on the iPSC-860. ICASE Report 91-4, Institute for Computer Application in Science and Engineering, Hampton, VA, January 1991.
[7]
M. Bromley, S. Heller, T. McNerney, and G. Steele, Jr. Fortran at ten gigaflops: The Connection Machine convolution compiler. In Proceedings of the SIGPLAN '9i Conference on Program Language Design and Implementation, Toronto, Canada, June 1991.
[8]
D. Callahan, K. Cooper, R. Hood, K. Kennedy, and L. Torczon. ParaScope: A parallel programming environment. The Internatlonal Journal of Supercomputer Applications, 2(4):84-99, Winter 1988.
[9]
D. Callahan and K. Kennedy. Compiling programs for distributed-memory multiprocessors. Journal of Supercom. purist, 2:151-169, October 1988.
[10]
C. Chase, A. Cheung, A. Reeves, and M. Smith. Paragon: A parallel programming environment for scientific applications using communication structures. In Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, IL, August 1991.
[11]
S. Chatterjee, G. Blelloch, and M. Zagha. Scan primitives for vector computers. In Proceedings of Supercomput~ng '90, New York, NY, November 1990.
[12]
G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremet, C. Tseng, and M. Wu. Fortran D language specification. Technical Report TR90-141, Dept. of Computer Science, Rice University, December 1990.
[13]
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors, volume 1. Prentice-Hall, Englewood Cliffs, NJ, 1988.
[14]
E. Gabber, A. Averbuch, and A. Yehudai. Experience with a portable parallelizing Pascal compiler. In Proceedings of the 1991 international Conference on Parallel Processing, St. Charles, IL, August 1991.
[15]
M. Gerndt. Updating distributed variables in local computations. Concurrency: Practice ~ Experience, 2(3):171-193, September 1990.
[16]
M. Gupta and P. Banerjee. Compile-time estimation of communication costs on multicomputers. In Proceedings of the 6th International Parallel Processing Symposium, Beverly Hills, CA, March 1992.
[17]
P. Hatcher, M. Quinn, A. Lapadula, B. Seevers, R. Anderson, and R. Jones. Data-parallel programming on MIMD computers. IEEE Transactions on Parallel and Distributed Systems, 2(3):377-383, July 1991.
[18]
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350-360, July 1991.
[19]
S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, and C. Tseng. An overview of the Fortran D programming system. In Proceedings of the Fourth Workshop on Languages and Compilers }or Parallel Computing, Santa Clara, CA, August 1991.
[20]
S. Hiranandani, K. Kennedy, and C. Tseng. Compiler optimizations for Fortran D on MIMD distributed-memory machines. In Proceedings of Supercomputing '91, Albuquerque, NM, November 1991.
[21]
S. Hiranandani, K. Kennedy, and C. Tseng. Compiler support for machine-independent parallel programming in Fortran D. In J. Saltz and P. Mehrotra, editors, Languages, Compilers, and Run-Time Environments ?or Distributed Memory Machines. North-Holland, Amsterdam, The Netherlands, 1992.
[22]
K. Ikudome, G. Fox, A. Kolawa, and J. Flower. An automatic and symbolic parallelization system for distributed memory parallel computers. In Proceedings of the 5th Distributed Memory Computing ConIer~nce, Charleston, SG, April 1990.
[23]
K. Kennedy and U. Kremer. Automatic data alignment and distribution for loosely synchronous problems in an interactive programming environment. Technical Report TR91- 155, Dept. of Computer Science, Rice University, April 1991.
[24]
K. Kennedy and K. S, McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, DC, July 1992.
[25]
K. Kennedy, K. S. MCKinley, and C. Tseng. Analysis and transformation in the ParaScope Editor. In Proceedings of the 1991 A CM International Conference on Supercomputing, Cologne, Germany, June 1991.
[26]
G. Koelbel and P. Mehrotra. Programming data parallel algorithms on distributed memory machines using Kali. in Proceedings of the 1991 A CM International Conference on Supercomputing, Cologne, Germany, June 1991.
[27]
P. Kogge and H. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Transactions on Computers, C-22(8):786-793, August 1973.
[28]
D. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. J. Wolfe. Dependence graphs and compiler optimizations. In Conference Record of the Eighth Annual A CM Symposium on the Principles of Programming Languages, Williamsburg, VA, January 1981.
[29]
J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361-376, July 1991.
[30]
F. McMahon. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, 1986.
[31]
D. Olander and R. Schnabel. Preliminary experience in developing a parallel thin-layer Navier Stokes code and implications for parallel language design. In Proceedings of the 1992 Scalable High Performance Computing Conference, Williamsburg, VA, April 1992.
[32]
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and lmplementatlon, Portland, OR, June 1989.
[33]
J. Rose and G. Steele, Jr. C*: An extended C language for data parallel programming. In L. Kartashev and S. Kartashev, editors, Proceedings of the Second International Conference on Supercomputing, Santa Clara, CA, May 1987.
[34]
M. Rosing, R. Schnabel, and R. Weaver. The DINO parallel programming language. Journal o} Parallel and Distributed Computing, 13(1) :30-42, September 1991.
[35]
R. Ruhl and M. Annaratone. Parallelization of FORTRAN code on distributed-memory parallel processors. In Proceedings of ~he 1990 ACM International Conference on Supercomputing, Amsterdam, The Netherlands, June 1990.
[36]
P.-S. Tseng. A parallelizing compiler for distributed memory parallel computers, in Proceedings of the SIGPLAN '90 Conference on Program Language Design and Implementation, White Plains, NY, June 1990.
[37]
J. Wu, J. Saltz, S. Hiranandani, and H. Berryman. Runtime compilation methods for multicomputers. In Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, IL, August 1991.
[38]
H. Zima, H.-J. Bast, and M. Gerndt. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization. Parallel Computing, 6:1-18, 1988.

Cited By

View all
  • (2021)A Machine-Learning-Based Framework for Productive Locality ExploitationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.305134832:6(1409-1424)Online publication date: 1-Jun-2021
  • (2015)LLVM-based communication optimizations for PGAS programsProceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC10.1145/2833157.2833164(1-11)Online publication date: 15-Nov-2015
  • (2011)Communication Optimizations for Distributed-Memory X10 ProgramsProceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium10.1109/IPDPS.2011.105(1101-1113)Online publication date: 16-May-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '92: Proceedings of the 6th international conference on Supercomputing
August 1992
495 pages
ISBN:0897914856
DOI:10.1145/143369
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1992

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICS92
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)15
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A Machine-Learning-Based Framework for Productive Locality ExploitationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.305134832:6(1409-1424)Online publication date: 1-Jun-2021
  • (2015)LLVM-based communication optimizations for PGAS programsProceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC10.1145/2833157.2833164(1-11)Online publication date: 15-Nov-2015
  • (2011)Communication Optimizations for Distributed-Memory X10 ProgramsProceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium10.1109/IPDPS.2011.105(1101-1113)Online publication date: 16-May-2011
  • (2010)Compilation for Distributed Memory ArchitecturesThe Compiler Design Handbook10.1201/9781420040579.ch11Online publication date: 7-Mar-2010
  • (2008)Overlapping Communication and Computation with High Level Communication RoutinesProceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid10.1109/CCGRID.2008.15(572-577)Online publication date: 19-May-2008
  • (2007)The rise and fall of High Performance FortranProceedings of the third ACM SIGPLAN conference on History of programming languages10.1145/1238844.1238851(7-1-7-22)Online publication date: 9-Jun-2007
  • (2007)Reducing Off-Chip Memory Access via Stream-Conscious Tiling on Multimedia ApplicationsInternational Journal of Parallel Programming10.1007/s10766-006-0027-935:1(63-98)Online publication date: 10-Feb-2007
  • (2007)Optimizing the Use of Static Buffers for DMA on a CELL ChipLanguages and Compilers for Parallel Computing10.1007/978-3-540-72521-3_23(314-329)Online publication date: 2007
  • (2006)Optimizing the use of static buffers for DMA on a CELL chipProceedings of the 19th international conference on Languages and compilers for parallel computing10.5555/1757112.1757144(314-329)Online publication date: 2-Nov-2006
  • (2006)Analysis of two-level data mapping in an HPF compiler for distributed-memory machinesParallel Computing10.1016/j.parco.2005.11.00332:4(280-300)Online publication date: 1-Apr-2006
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media