Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Integrating Parallelizing Compilation Technologies for SMP Clusters

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In this paper, a source to source parallelizing complier system, AutoPar, is presentd. The system transforms FORTRAN programs to multi-level hybrid MPI/OpenMP parallel programs. Integrated parallel optimizing technologies are utilized extensively to derive an effective program decomposition in the whole program scope. Other features such as synchronization optimization and communication optimization improve the performance scalability of the generated parallel programs, from both intra-node and inter-node. The system makes great effort to boost automation of parallelization. Profiling feedback is used in performance estimation which is the basis of automatic program decomposition. Performance results for eight benchmarks in NPB1.0 from NAS on an SMP cluster are given, and the speedup is desirable. It is noticeable that in the experiment, at most one data distribution directive and a reduction directive are inserted by the user in BT/SP/LU. The compiler is based on ORC, Open Research Compiler. ORC is a powerful compiler infrastructure, with such features as robustness, flexibility and efficiency. Strong analysis capability and well-defined infrastructure of ORC make the system implementation quite fast.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Siegfried Benkner, Thomas Brandes. High-level data mapping for clusters of SMPs. In Proc. 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS’01), San Francisco, USA, April 23, 2001, pp. 1–15.

  2. John Bircsak, Peter Craig, RaeLyn Crowell et al. Extending OpenMP for NUMA machines. In Proc. the 2000 ACM/IEEE Conference on Supercomputing, allas, Texas, Nov., 2000, pp. 48–77.

  3. Mark Leair, John Merlin, Steven Nakamoto et al. Distributed OMP – A programming model for SMP clusters. Compilers for Parallel Computers (CPC2000), Aussois, Jan. 4–7, 2000, pp. 231–240.

  4. Hu Y C, Lu H, Cox A L, Zwaenepel W. OpenMP for networks of SMPs. In Proc. 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing (IPPS), 1999, pp. 302–310.

  5. Mitsuhisa Sato, Hiroshi Harada, Atsushi Hasegawa. Cluster-enabled OpenMP: An OpenMP compiler for the SCASH software distributed shared memory system. Scientific Programming, 2001, 9(2–3): 123–130.

    Google Scholar 

  6. Tseng C W. Compiler optimizations for eliminating barrier synchronization. In 5th ACM Symposium on Principles and Practice of Parallel Programming (PPOPP’95), Santa Barbara, CA, July 1995, pp. 144–155.

  7. Hwansoo Han, Chau-Wen Tseng, Pete Keleher. Eliminating barrier synchronization for compiler-parallelized codes on software DSMs. Int. J. Parallel Programming, Oct., 1998, 26(5): 591–612.

    Google Scholar 

  8. Chen Li. Optimization of parallel codes on SMP clusters [Dissertation]. ICT, CAS, 2002.

  9. Gerndt M. Updating distributed variables in local computations. Concurrency: Practice and Experience, Sept., 1990, 2(3): 173–193.

    Google Scholar 

  10. Mellor-Crummey J, Adve V, Broom B et al. Advanced optimization strategies in the rice dHPF compiler. Concurrency: Practice and Experience, 2002, 14(8–9): 741–767.

    Google Scholar 

  11. Chen Li, Zhang Zhao-Qing, Feng Xiao-Bing. Redundant computation partition on distributed-memory systems. In Proc. Fifth International Conference on Algorithms and Architectures for Parallel Processing, Oct. 23-25, 2002, pp. 252–260.

  12. Wang Yi-Ran, Chen Li, Zhang Zhao-Qing. Global partial replicate computation partitioning. In the International Conference on Parallel Processing, Montreal, Quebec, Canada, Aug. 15–18, 2004.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Bing Feng.

Additional information

Supported by the National Natural Science Foundation of China under Grant No.60103006, the National High Technology Development 863 Program of China under Grant No.2002AA1Z2104.

Xiao-Bing Feng was born in 1969. He received his B.E. degree from Tianjin University in 1992, M.S. degree from Peking University in 1996 and Ph.D. degree from ICT, CAS, where he is currently an associate professor. His research interests include compiling technology and binary translation.

Li Chen was born in 1970. She received her B.E. and M.E. degrees from Shandong University of Science and Technology in 1992 and 1995 respectively, and Ph.D. degree from ICT, CAS, where she is currently an assistant professor. Her research interests include parallel optimization and environment.

Zhao-Qing Zhang is a professor of Advanced Compiler Technology Laboratory, Division of Computer Systems, ICT, CAS. She graduated from Peking University in 1960. Her research interests include advanced compiler technology and paralleled programming environment.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, XB., Chen, L., Wang, YR. et al. Integrating Parallelizing Compilation Technologies for SMP Clusters. J Comput Sci Technol 20, 125–133 (2005). https://doi.org/10.1007/s11390-005-0014-4

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-005-0014-4

Keywords