Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

BlueGene/L applications: Parallelism On a Massive Scale

Published: 01 February 2008 Publication History

Abstract

BlueGene/L (BG/L), developed through a partnership between IBM and Lawrence Livermore National Laboratory (LLNL), is currently the world's largest system both in terms of scale, with 131,072 processors, and absolute performance, with a peak rate of 367 Tflop/s. BG/L has led the last four Top500 lists with a Linpack rate of 280.6 Tflop/s for the full machine installed at LLNL and is expected to remain the fastest computer in the next few editions. However, the real value of a machine such as BG/L derives from the scientific breakthroughs that real applications can produce by successfully using its unprecedented scale and computational power. In this paper, we describe our experiences with eight large scale applications on BG/ L from several application domains, ranging from molecular dynamics to dislocation dynamics and turbulence simulations to searches in semantic graphs. We also discuss the challenges we faced when scaling these codes and present several successful optimization techniques. All applications show excellent scaling behavior, even at very large processor counts, with one code even achieving a sustained performance of more than 100 Tflop/s, clearly demonstrating the real success of the BG/L design.

References

[1]
Adiga, N. et al. (2002). An overview of the BlueGene/L supercomputer, In Proceedings of IEEE/ACM Supercomputing '02, November.
[2]
Bachega, L., Chatterjee, S., Dockser, K., Gunnels, J., Gupta, M., Gustavson, F., Lapkowski, C., Liu, G., Mendell, M., Wait, C., and Ward, T. (2004). A high-performance SIMD floating point unit design for BlueGene/L: Architecture, compilation, and algorithm design, In Proceedings of the 2004 International Conference on Parallel Architectures and Compilation Techniques, September.
[3]
Blackford, L., Choi, J., Cleary, A., Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammerling, S., Henry, G., Petite, A., Stanley, K., Walker, D., and Whaley, R. (1997). Sca-LAPACK User's Guide, Philadelphia : SIAM.
[4]
Bulatov, V., Cai, W., Fier, J., Hiratani, M., Hommes, G., Pierce, T., Tang, M., Rhee, M., Yates, K., and Arsenlis, T. (2004). Scalable line dynamics in ParaDis, In Proceedings of IEEE/ACM Supercomputing '04, November.
[5]
Car, R. and Parrinello, M. (1985). Physics Review Letters, 55: 2471 .
[6]
Ccse-Lbnl (n.d.). Center for Computational Sciences and Engineering, Lawrence Berkeley National Laboratory . http://seesar.lbl.gov/CCSE
[7]
Cook, A., Cabot, W., Welcome, M., Williams, P., Miller, B., de Supinski, B., and Yates, R. (2005). Tera-scalable algorithms for variable-density elliptic hydrodynamics with spectral accuracy, In Proceedings of IEEE/ACM Supercomputing '05, November.
[8]
Crauser, A., Mehlhorn, K., Meyer, U., and Sanders, P. (1998). A parallelization of Dijkstra's shortest path algorithm, Lecture Notes in Computer Science 1450, pp. 722—731.
[9]
Davis, K., Hoisie, A., Johnson, G., Kerbyson, D., Lang, M., Pakin, S., and Petrini, F. (2004). A performance and scalability analysis of the BlueGene/L architecture, In Proceedings of IEEE/ACM Supercomputing '04, November.
[10]
DelSignore, J. (2003). TotalView on Blue Gene/L . Presented at Blue Gene/L: Applications, Architecture and Software Workshop, presentation available at http://www.llnl.gov/asci/platforms/bluegene/papers/26delsignore.pdf
[11]
Franchetti, F., Kral, S., Lorenz, J., and Überhuber, C. (2005). Efficient utilization of SIMD extensions, Proceedings of the IEEE: Special Issue on Program Generation, Optimization, and Adaptation, 92(2): 409—425.
[12]
Frigo, M. and Johnson, S. (1998). FFTW: an adaptive software architecture for the FFT, In Proceedings of ICASSP, pp. 1381—1384.
[13]
Germann, T., Kadau, K., and Lomdahl, P. (2005). 25 Tflop/s multibillion-atom molecular dynamics simulations and visualization/analysis on BlueGene/L, In Proceedings of IEEE/ACM Supercomputing '05, November.
[14]
Grama, A.Y. and Kumar, V. (1995). A survey of parallel search algorithms for discrete optimization problems . ORSA Journal of Computing, 7(4): 365—85.
[15]
Greenough, J., de Supinski, B., Yates, R., Rendleman, C., Skinner, D., Beckner, V., Lijewski, M., Bell, J., and Sexton, J. (2005). phPerformance of a block structured, hierarchical adaptive mesh refinement code on the 64K Node IBM BlueGene/L computer, Technical Report LBNL-57500, Lawrence Livermore and Lawrence Berkeley National Laboratories .
[16]
Gygi, F. (2005). phQbox: a large-scale parallel implementation of first-principles molecular dynamics, LLNL preprint.
[17]
Gygi, F., Draeger, E., de Supinski, B., Yates, R., Franchetti, F., Kral, S., Lorenz, J., Überhuber, C., Gunnels, J., and Sexton, J. (2005). Large-scale first-principles molecular dynamics simulations on the BlueGene/L platform using the Qbox code, In Proceedings of IEEE/ACM Supercomputing '05, November.
[18]
Han, Y., Pan, V.Y., and Reif, J.H. (1992). Efficient parallel algorithms for computing all pair shortest paths in directed graphs, In ACM Symposium on Parallel Algorithms and Architectures, pp. 353—362.
[19]
Klein, P.N. and Subramanian, S. (1997). A randomized parallel algorithm for single-source shortest paths, J. Algorithms, 25(2): 205—220.
[20]
Llnl (2005). SLURM: Simple Linux Utility for Resource Management, Lawrence Livermore National Laboratory, June, http://www.llnl.gov/linux/slurm/
[21]
Lorenz, J., Kral, S., Franchetti, F., and Überhuber, C. (2005). Vectorization techniques for the BlueGene/L double FPU, IBM Journal of Research and Development, 49(2/3): 437— 446 .
[22]
Moriarty, J. (1990). Physics Review B, 42: 1609.
[23]
Moriarty, J. (1994). Physics Review B, 49: 12431 .
[24]
Moriarty, J. et al. (2002). Journal of Physics: Condensed Matter, 14: 2825 .
[25]
Parrinello, M. (1997). From silicon to RNA: the coming of age of first-principle molecular dynamics, Solid State Communications, 103: 107 .
[26]
Rendleman, C.A., Beckner, V.E., Lijewski, M., Crutchfield, W.Y., and Bell, J.B. (2000). Parallelization of structured, hierarchical adaptive mesh refinement algorithms, Computing and Visualization in Science, 3(3): 147—157.
[27]
Schulz, M., Ahn, D., Bernat, A., de Supinski, B., Ko, S., Lee, G., and Rountree, B. (2005). Scalable dynamic binary instrumentation for Blue Gene/L, In Proceedings of the Workshop on Binary Instrumentation and Applications, September.
[28]
Söderlind, P. and Moriarty, J. (1998). Physics Review B, 57: 10340 .
[29]
Streitz, F., Glosli, J., and Patel, M. (2006). Beyond finite-size scaling in solidification simulations, Physical Review Letters, 96: 225701 .
[30]
Streitz, F., Glosli, J., Patel, M., Chan, B., Yates, R., de Supinski, B., Sexton, J., and Gunnels, J. (2005). 100+ TFlop solidification simulations on BlueGene/L, In Proceedings of IEEE/ACM Supercomputing '05, November.
[31]
University of Mannheim, University of Tennessee, and NERSC/LBNL (n.d.). TOP500 Supercomputing Sites, http://www.top500.org/
[32]
Vetter, J., de Supinski, B., May, J., Kissel, L., and Vaidya, S. (2005). Evaluating high performance computers, Concurrency and Computation: Practice & Experience, 17(10): 1239—1270.
[33]
Yoo, A., Chow, E., Henderson, K., McLendon, W., Hendrickson, B., and Çatalyürek, U. (2005). A scalable distributed parallel breadth-first search algorithm on BlueGene/L, In Proceedings of IEEE/ACM Supercomputing '05, November.

Cited By

View all
  • (2017)Application Modernization at LLNL and the Sierra Center of ExcellenceComputing in Science and Engineering10.1109/MCSE.2017.342155619:5(9-18)Online publication date: 1-Sep-2017
  • (2012)Quantifying the effectiveness of load balance algorithmsProceedings of the 26th ACM international conference on Supercomputing10.1145/2304576.2304601(185-194)Online publication date: 25-Jun-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications  Volume 22, Issue 1
February 2008
126 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 February 2008

Author Tags

  1. BlueGene/L
  2. application scalability
  3. massively parallel architectures
  4. performance study and optimization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Application Modernization at LLNL and the Sierra Center of ExcellenceComputing in Science and Engineering10.1109/MCSE.2017.342155619:5(9-18)Online publication date: 1-Sep-2017
  • (2012)Quantifying the effectiveness of load balance algorithmsProceedings of the 26th ACM international conference on Supercomputing10.1145/2304576.2304601(185-194)Online publication date: 25-Jun-2012

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media