Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q

Published: 01 January 2013 Publication History

Abstract

As newer supercomputers continue to increase the number of threads, there is growing pressure on applications to exploit more of the available parallelism in their codes, including coarse-, medium-, and fine-grain parallelism. OpenMPi is one of the dominant shared-memory programming models and is well suited for exploiting medium- and fine-grain parallelism. OpenMP research has focused on application tuning, compiler optimizations, programming-model extensions, and porting to distributed-memory platforms; however, we have found that current algorithms used to implement basic OpenMP constructs have significant overheads and scale poorly. In this paper, we explore low-overhead, scalable algorithms for creating parallel regions and demonstrate reductions in overhead of up to a factor of 5 on an IBM Blue Gene®/Q node.

References

[1]
M. Snir, S. Otto, S. Huss-Lederman, and D. Walker, MPI-The Complete Reference. Cambridge, MA: MIT Press, 1998.
[2]
OpenMP Application Programing Interface, May 2008, Version 3.0. {Online}. Available: http://openmp.org
[3]
W. W. Carlson, J. M. Draper, and D. E. Culler, "Introduction to UPC and language specification," Univ. California-Berkeley, Berkeley, CA, TR CCS-TR-99-157, 1999.
[4]
D. R. Butenhof, Programming With Posix Thread. Reading, MA: Addison-Wesley, 1997.
[5]
F. Cappello and D. Etiemble, "MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks," in Proc. Conf. Supercomput., 2000, p. 12.
[6]
L. Smith and M. Bull, "Development of mixed mode MPI/OpenMP applications," J. Sci. Program., vol. 9, no. 2/3, pp. 83-98, Aug. 2001.
[7]
B. Chapman, P. Mehrotra, and H. Zima, "Enhancing OpenMP with features for locality control," in Proc. 8th ECMWF Workshop Parallel Process. Meteorol.--Towards Teracomput., 1998, pp. 301-313.
[8]
E. Ayguade, M. Gonzalez, J. Labarta, X. Martorell, N. Navarro, and J. Oliver, "NanosCompiler: A research platform for OpenMP extensions," in Proc. Eur. Workshop OpenMP, 1999, pp. 27-31.
[9]
J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris, C. A. Nelson, and C. D. Offner, "Extending OpenMP for NUMA machines," in Proc. ACM/IEEE Supercomput., Washington, DC, 2000, (CDROM).
[10]
H. Lu, Y. C. Hu, and W. Zwaenepoel, "OpenMP on networks of workstations," in Proc. Conf. Supercomput., 1998, pp. 1-15.
[11]
M. Sato, S. Satoh, K. Kusano, and Y. Tanaka, "Design of OpenMP compiler for an SMP cluster," in Proc. Eur. Workshop OpenMP, 1999, pp. 32-39.
[12]
L. Huang, B. Chapman, and R. Kendall, "OpenMP for clusters," in Proc. Eur. Workshop OpenMP, 2003, pp. 22-26.
[13]
K. O'Brien, K. O'Brien, Z. Sura, T. Chen, and T. Zhang, "Supporting OpenMP on cell," Int. J. Parallel Program., vol. 36, no. 3, pp. 289-311, Jun. 2008.
[14]
S. Lee, S.-J. Min, and R. Eigenmann, "OpenMP to GPGPU: A compiler framework for automatic translation and optimization," in Proc. Symp. Princ. Pract. Parallel Program., 2009, pp. 101-110.
[15]
C. Brunschen and M. Brorsson. (2000, Oct.). OdinMP/CCp--A portable implementation of OpenMP for C. Concurrency, Pract. Exp. {Online}. 12(12), pp. 1193-1203. Available: http://parallel. ksu.ru/ftp/openmp/ewomp99.pdf
[16]
X. Tian, A. Bik, M. Girkar, P. Grey, H. Saito, and E. Su, "Intel OpenMP C++/Fortran compiler for hyper-threading technology: Implementation and performance," Intel Technol. J., vol. 6, no. 1, pp. 1-11, 2002.
[17]
X. Tian, M. Girkar, S. Shah, D. Armstrong, E. Su, and P. Petersen, "Compiler and runtime dupport for running OpenMP programs on pentium-and itanium-architectures," in Proc. Int. Workshop High-Level Paralllel Program. Models Support. Environ., 2003, pp. 47-55.
[18]
Y. Chen, J. Li, S. Wang, and D. Wang, "ORC-OpenMP: An OpenMP compiler based on ORC," in Proc. Int. Conf. Comput. Sci., 2004, vol. 3038, pp. 414-423, Lecture Notes in Computer Science.
[19]
C. Liao, O. Hernandez, B. Chapman, W. Chen, and W. Zheng, "OpenUH: An optimizing, portable OpenMP compiler," Concurrency Comput., Pract. Exp., vol. 19, no. 18, pp. 2317-2332, Dec. 2007.
[20]
C. Liao and B. Chapman, "Invited paper: A Compile-time cost model for OpenMP," in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2007, pp. 1-8.
[21]
X. Teruel, P. Unnikrishnan, X. Martorell, E. Ayguade, R. Silvera, G. Zhang, and E. Tiotto, "OpenMP tasks in IBM XL compilers," in Proc. Conf. Center Adv. Stud. Collab. Res., 2008, pp. 16:207-16:221.
[22]
L. Huang, D. Eachempati, M. W. Hervey, and B. Chapman, "Exploiting global optimizations for OpenMP programs in the OpenUH compiler," in Proc. Symp. Princ. Pract. Parallel Program., 2009, pp. 289-290.
[23]
H. Ma, R. Zhao, X. Gao, and Y. Zhang, "Barrier optimization for OpenMP program," in Proc. Int. Conf. Softw. Eng., Artif. Intell., Netw. Parallel/ Distrib. Comput., 2009, pp. 495-500.
[24]
J. M. Bull, "Measuring synchronisation and scheduling overheads in OpenMP," in Proc. Eur. Workshop OpenMP, 1999, pp. 99-105.
[25]
C. Liao, Z. Liu, L. Huang, and B. Chapman, "Evaluating OpenMP on chip multithreading platforms," in Proc. 1st Int. Workshop OpenMP, 2005, pp. 178-190.
[26]
G. Bronevetsky, J. Gyllenhaal, and B. R. de Supinski, "CLOMP: Accurately characterizing OpenMP application overheads," Int. J. Parallel Program., vol. 37, no. 3, pp. 250-265, Jun. 2009.
[27]
K. Fürlinger and M. Gerndt, "Analyzing overheads and scalability characteristics of OpenMP applications," in Proc. 7th Int. Meeting High Perform. Comput. Comput. Sci., 2006, pp. 39-51.
[28]
The GNU OpenMP Implementation, Free Softw. Found., Boston, MA, 2011. {Online}. Available: http://gcc.gnu.org/onlinedocs/libgomp.pdf
[29]
J.-H. Chow, L. E. Lyon, and V. Sarkar, "Automatic parallelization for symmetric shared-memory multiprocessors," in Proc. Conf. Centre Adv. Stud. Collab. Res., 1996, p. 5.
[30]
G. Zhang, R. Silvera, and R. Archambault, "Structure and algorithm for implementing OpenMP workshares," in Proc. 5th Int. Conf. OpenMP Appl. Tools--Shared Memory Parallel Program. OpenMP, 2005, vol. 3349, pp. 110-120, Lecture Notes in Computer Science.
[31]
C. Liao, D. J. Quinlan, T. Panas, and B. de Supinski, "A ROSE-based OpenMP 3.0 research compiler supporting multiple runtime libraries," in Proc. Int. Workshop OpenMP, 2010, pp. 15-28.
[32]
J. Marathe and F. Mueller, "Source-code-correlated cache coherence characterization of OpenMP benchmarks," IEEE Trans. Parallel Distrib. Syst., vol. 18, no. 6, pp. 818-834, Jun. 2007.
[33]
R. Nanjegowda, O. Hernandez, B. Chapman, and H. H. Jin, "Scalability evaluation of barrier algorithms for OpenMP," in Proc. 5th Int. Workshop OpenMP, 2009, pp. 42-52.

Cited By

View all
  • (2014)On the Relevance of Architectural Awareness for Efficient Fork/Join Support on Cluster-Based ManycoresProceedings of International Workshop on Manycore Embedded Systems10.1145/2613908.2613911(9-16)Online publication date: 15-Jun-2014
  • (2013)IBM Blue Gene/Q system software stackIBM Journal of Research and Development10.1147/JRD.2012.222755757:1(55-65)Online publication date: 1-Jan-2013

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IBM Journal of Research and Development
IBM Journal of Research and Development  Volume 57, Issue 1
January 2013
180 pages

Publisher

IBM Corp.

United States

Publication History

Published: 01 January 2013
Accepted: 26 June 2012
Received: 16 March 2012

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2014)On the Relevance of Architectural Awareness for Efficient Fork/Join Support on Cluster-Based ManycoresProceedings of International Workshop on Manycore Embedded Systems10.1145/2613908.2613911(9-16)Online publication date: 15-Jun-2014
  • (2013)IBM Blue Gene/Q system software stackIBM Journal of Research and Development10.1147/JRD.2012.222755757:1(55-65)Online publication date: 1-Jan-2013

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media