Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Memory-Constrained Communication Minimization for a Class of Array Computations

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2002)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 2481))

Abstract

The accurate modeling of the electronic structure of atoms and molecules involves computationally intensive tensor contractions involving large multidimensional arrays. The efficient computation of complex tensor contractions usually requires the generation of temporary intermediate arrays. These intermediates could be extremely large, but they can often be generated and used in batches through appropriate loop fusion transformations. To optimize the performance of such computations on parallel computers, the total amount of inter-processor communication must be minimized, subject to the available memory on each processor. In this paper, we address the memory-constrained communication minimization problem in the context of this class of computations. Based on a framework that models the relationship between loop fusion and memory usage, we develop an approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit. The effectiveness of the developed optimization approach is demonstrated on a computation representative of a component used in quantum chemistry suites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Cociorva, D., Wilkins, J., Lam, C., Baumgartner, G., Sadayappan, P., Ramanujam, J.: Loop Optimizations for a Class of Memory-Constrained Computations. In: Proc. 15th ACM Intl. Conf. on Supercomputing, Sorrento, Italy, June 2001, pp. 103–113 (2001)

    Google Scholar 

  2. Cociorva, D., Wilkins, J., Baumgartner, G., Sadayappan, P., Ramanujam, J., Nooijen, M., Bernholdt, D., Harrison, R.: Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds.) HiPC 2001. LNCS, vol. 2228, pp. 237–248. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Cociorva, D., Baumgartner, G., Lam, C., Sadayappan, P., Ramanujam, J., Nooijen, M., Bernholdt, D., Harrison, R.: Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations. In: Proceedings of ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI) (June 2002)

    Google Scholar 

  4. Crawford, T.D., Schaefer III, H.F.: An Introduction to Coupled Cluster Theory for Computational Chemists. Reviews in Computational Chemistry 14, 33–136 (2000)

    Article  Google Scholar 

  5. Fraboulet, A., Huard, G., Mignotte, A.: Loop alignment for memory access optimization. In: Proc. 12th International Symposium on System Synthesis, San Jose, California, November 1999, pp. 71–77 (1999)

    Google Scholar 

  6. Gao, G., Olsen, R., Sarkar, V., Thekkath, R.: Collective loop fusion for array contraction. In: Languages and Compilers for Parallel Processing, New Haven, CT (August 1992)

    Google Scholar 

  7. High Performance Computational Chemistry Group. NWChem, A computational chemistry package for parallel computers, Version 3.3, Pacific Northwest National Laboratory, Richland, WA 99352 (1999)

    Google Scholar 

  8. Lam, C., Sadayappan, P., Wenger, R.: On optimizing a class of multi-dimensional loops with reductions for parallel execution. Parallel Processing Letters 7(2), 157–168 (1997)

    Article  MathSciNet  Google Scholar 

  9. Lam, C., Sadayappan, P., Wenger, R.: Optimization of a class of multi-dimensional integrals on parallel machines. In: Proc. Eighth SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, MN (March 1997)

    Google Scholar 

  10. Lam, C., Sadayappan, P., Cociorva, D., Alouani, M., Wilkins, J.: Performance optimization of a class of loops involving sums of products of sparse arrays. In: Proc. Ninth SIAM Conference on Parallel Processing for Scientific Computing, San Antonio, TX (March 1999)

    Google Scholar 

  11. Lam, C., Cociorva, D., Baumgartner, G., Sadayappan, P.: Memory-optimal evaluation of expression trees involving large objects. In: Banerjee, P., Prasanna, V.K., Sinha, B.P. (eds.) HiPC 1999. LNCS, vol. 1745, pp. 103–110. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  12. Lam, C., Cociorva, D., Baumgartner, G., Sadayappan, P.: Optimization of memory usage requirement for a class of loops implementing multi-dimensional integrals. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, p. 350. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  13. Lam, C.: Performance optimization of a class of loops implementing multi-dimensional integrals. Ph.D. Dissertation, Ohio State University, Columbus, Also available as Technical Report No. OSU-CISRC-8/99-TR22, Dept. of Computer and Information Science, The Ohio State University (August 1999)

    Google Scholar 

  14. Lee, T., Scuseria, G.: Achieving chemical accuracy with coupled cluster theory. In: Langhoff, S.R. (ed.) Quantum Mechanical Electronic Structure Calculations with Chemical Accuracy, pp. 47–109. Kluwer Academic, Dordrecht (1997)

    Google Scholar 

  15. Martin, J.: Encyclopedia of Computational Chemistry. In: Schleyer, P., Schreiner, P., Allinger, N., Clark, T., Gasteiger, J., Kollman, P., Schaefer III, H. (eds.), vol. 1, pp. 115–128. Wiley & Sons, Berne (1998)

    Google Scholar 

  16. Sarkar, V., Gao, G.: Optimization of array accesses by collective loop transformations. In: Proc. ACM International Conference on Supercomputing, Cologne, Germany, June 1991, pp. 194–205 (1991)

    Google Scholar 

  17. Song, Y., Xu, R., Wang, C., Li, Z.: Data locality enhancement by memory reduction. In: Proc. of ACM 15th International Conference on Supercomputing, June 2001, pp. 50–64 (2001)

    Google Scholar 

  18. Song, Y., Wang, C., Li, Z.: Locality enhancement by array contraction. In: Proc. 14th International Workshop on Languages and Compilers for Parallel Computing (August 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cociorva, D., Baumgartner, G., Lam, CC., Sadayappan, P., Ramanujam, J. (2005). Memory-Constrained Communication Minimization for a Class of Array Computations. In: Pugh, B., Tseng, CW. (eds) Languages and Compilers for Parallel Computing. LCPC 2002. Lecture Notes in Computer Science, vol 2481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596110_1

Download citation

  • DOI: https://doi.org/10.1007/11596110_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30781-5

  • Online ISBN: 978-3-540-31612-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics