Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Performance Evaluation of a Two-Dimensional Lattice Boltzmann Solver Using CUDA and PGAS UPC Based Parallelisation

Published: 14 July 2017 Publication History

Abstract

The Unified Parallel C (UPC) language from the Partitioned Global Address Space (PGAS) family unifies the advantages of shared and local memory spaces and offers a relatively straightforward code parallelisation with the Central Processing Unit (CPU). In contrast, the Computer Unified Device Architecture (CUDA) development kit gives a tool to make use of the Graphics Processing Unit (GPU). We provide a detailed comparison between these novel techniques through the parallelisation of a two-dimensional lattice Boltzmann method based fluid flow solver. Our comparison between the CUDA and UPC parallelisation takes into account the required conceptual effort, the performance gain, and the limitations of the approaches from the application oriented developers’ point of view. We demonstrated that UPC led to competitive efficiency with the local memory implementation. However, the performance of the shared memory code fell behind our expectations, and we concluded that the investigated UPC compilers could not efficiently treat the shared memory space. The CUDA implementation proved to be more complex compared to the UPC approach mainly because of the complicated memory structure of the graphics card which also makes GPUs suitable for the parallelisation of the lattice Boltzmann method.

References

[1]
G. Amati, S. Succi, and R. Piva. 1997. Massively parallel lattice-Boltzmann simulation of turbulent channel flow. International Journal of Modern Physics C 8, 4 (1997), 869--877.
[2]
J. A. Anderson, C. D. Lorenz, and A. Travesset. 2008. General purpose molecular dynamics simulations fully implemented on graphics processing units. Journal of Computational Physics 227, 10 (2008), 5342--5359.
[3]
P. L. Bhatnagar, E. P. Gross, and M. Krook. 1954. A model for collision processes in gases I: Small amplitude processes in charged and neutral one-component systems. Physical Review 94, 3 (1954), 511.
[4]
F. Cantonnet, Y. Yao, M. Zahran, and T. El-Ghazawi. 2004. Productivity analysis of the UPC language. In Proceedings of the 18th International Parallel and Distributed Processing Symposium. IEEE, 254.
[5]
B. L. Chamberlain, D. Callahan, and H. P. Zima. 2007. Parallel programmability and the chapel language. The International Journal of High Performance Computing Applications 21, 3 (2007), 291--312.
[6]
S. Chauwvin, P. Saha, F. Cantonnet, S. Annareddy, and T. El-Ghazawi. 2007. UPC Manual. The George Washington University, Washington, DC. Version 1.2.
[7]
N. Chentanez and M. Müller. 2011. Real-time Eulerian water simulation using a restricted tall cell grid. ACM Transactions on Graphics 30, 4 (2011).
[8]
A. J. Chorin. 1967. A numerical method for solving incompressible viscous flow problems. Journal of Computational Physics 2, 1 (1967), 12--26.
[9]
Cray Inc. 2015. Performance Measurement and Analysis Tools (s-2376-63 ed.). Cray.
[10]
Cray Inc. 2012. Cray standard C and C++ reference manual. http://docs.cray.com/books/S-2179-81/S-2179-81.pdf.
[11]
D. d’Humières. 1992. Generalized lattice-Boltzmann equations. In Rarefied Gas Dynamics: Theory and Simulations, B. D. Shizgal and D. P. Weaver (Eds.).
[12]
D. d’Humières, I. Ginzburg, M. Krafczyk, P. Lallemand, and L. S. Luo. 2002. Multiple--relaxation--time lattice Boltzmann models in three dimensions. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 360, 1792 (2002), 437--451.
[13]
Kemal Ebcioglu, Vijay Saraswat, and Vivek Sarkar. 2004. X10: Programming for hierarchical parallelism and non-uniform data access. In Proceedings of the International Workshop on Language Runtimes, (OOPSLA 2004).
[14]
T. A. El-Ghazawi, F. Cantonnet, Y. Yao, S. Annareddy, and A. S. Mohamed. 2006. Benchmarking parallel compilers: A UPC case study. Future Generation Computer Systems 22, 7 (2006), 764--775.
[15]
U. Ghia, K. N. Ghia, and C. T. Shin. 1982. High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method. Journal of Computational Physics 48, 3 (1982), 387--411.
[16]
X. He and L.-S. Luo. 1997. Lattice Boltzmann model for the incompressible Navier-Stokes equation. Journal of Statistical Physics 88, 3--4 (1997), 927--944.
[17]
P. Husbands, C. Iancu, and K. Yelick. 2003. A performance analysis of the Berkeley UPC compiler. In Proceedings of the 17th Annual International Conference on Supercomputing. ACM, 63--73.
[18]
Intel. 2015. Automated Relational Knowledgebase (ARK). (2015). Retrieved Feb. 15, 2015 from http://ark.intel.com/
[19]
A. Johnson. 2005. Unified parallel C within computational fluid dynamics applications on the Cray X1. In Proceedings of the Cray User’s Group Conference. 1--9.
[20]
I. T. Józsa, M. Szőke, T.-R. Teschner, L. Könözsy, and I. Moulitsas. 2016. Validation and verification of a 2D lattice Boltzmann solver for incompressible fluid flow. ECCOMAS Congress 2016 - Proceedings of the 7th European Congress on Computational Methods in Applied Sciences and Engineering 1 (2016), 1046--1060.
[21]
D. Kandhai, A. Koponen, A. G. Hoekstra, M. Kataja, J. Timonen, and P. M. A. Sloot. 1998. Lattice-Boltzmann hydrodynamics on parallel systems. Computer Physics Communications 111, 1--3 (1998), 14--26.
[22]
D. A. Mallón, A. Gómez, J. C. Mouriño, G. L. Taboada, C. Teijeiro, J. Touriño, B. B. Fraguela, R. Doallo, and B. Wibecan. 2009. UPC performance evaluation on a multicore system. In Proceedings of the 3rd Conference on Partitioned Global Address Space Programing Models. ACM, 9.
[23]
S. A. Manavski and G. Valle. 2008. CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics 9, Suppl 2 (2008), S10.
[24]
S. Markidis and G. Lapenta. 2010. Development and performance analysis of a UPC Particle-in-Cell code. In Proceedings of the 4th Conference on Partitioned Global Address Space Programming Model. ACM, 10.
[25]
J. C. Maxwell. 1860. Illustrations of the dynamical theory of gases. Philosophical Magazine Series 4 20, 130 (1860), 21--37.
[26]
C. McClanahan. 2010. History and evolution of GPU architecture. In A Paper Survey.
[27]
Message Passing Interface Forum. 2012. MPI: A Message-Passing Interface Standard. (September 2012).
[28]
A. A. Mohamed. 2011. Lattice Boltzmann Method: Fundamentals and Engineering Applications with Computer Codes. Springer, London.
[29]
J. Nickolls, I. Buck, M. Garland, and K. Skadron. 2008. Scalable parallel programming with CUDA. Queue 6, 2 (2008), 40--53.
[30]
Robert W. Numrich and John Reid. 1998. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum 17, 2 (1998), 1--31.
[31]
PGAS. 2015. Partitioned Global Address Space Consortium. Retrieved Feb. 15, 2015 from http://www.pgas.org/.
[32]
B. Ren, C. Li, X. Yan, M. C. Lin, J. Bonet, and S.-M. Hu. 2014. Multiple-fluid SPH simulation using a mixture model. ACM Transactions on Graphics 33, 5 (2014), 171.
[33]
P. R. Rinaldi, E. A. Dari, M. J. Vénere, and A. Clausse. 2012. A lattice-Boltzmann solver for 3D fluid simulation on GPU. Simulation Modelling Practice and Theory 25 (2012), 163--171.
[34]
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-M. W. Hwu. 2008. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 73--82.
[35]
J. Sanders and E. Kandrot. 2010. CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional.
[36]
S. S. Stone, J. P. Haldar, S. C. Tsao, W.-M. Hwu, B. P. Sutton, Z.-P. Liang, and others. 2008. Accelerating advanced MRI reconstructions on GPUs. Journal of Parallel and Distributed Computing 68, 10 (2008), 1307--1318.
[37]
S. Succi. 2001. The Lattice Boltzmann Equation for Fluid Dynamics and Beyond. Oxford.
[38]
G. L. Taboada, C. Teijeiro, J. Tourio, B. B. Fraguela, R. Doallo, J. C. Mourino, and D. A. Mallon. 2009. Performance evaluation of unified parallel C collective communications. In Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications. IEEE, 69--78.
[39]
J. Tölke. 2010. Implementation of a lattice Boltzmann kernel using the compute unified device architecture developed by nVIDIA. Computing and Visualization in Science 13, 1 (2010), 29--39.
[40]
P. Valero-Lara and J. Jansson. 2015. LBM-HPC—An open-source tool for fluid simulations. case study: Unified parallel C (UPC-PGAS). In Proceedings of the IEEE International Conference on Cluster Computing. IEEE, 318--321.
[41]
P. Welander. 1954. On the temperature jump in a rarefied gas. Arkiv Fysik 7 (1954).
[42]
W. Xian and A. Takayuki. 2011. Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster. Parallel Computing 37, 9 (2011), 521--535.
[43]
Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham, David Gay, Phil Colella, and Alex Aiken. 1998. Titanium: A high-performance Java dialect. Concurrency: Practice and Experience 10, 11--13 (1998), 825--836.
[44]
J. Zhang, B. Behzad, and M. Snir. 2011. Optimizing the Barnes-Hut algorithm in UPC. In Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 75:1--75:11.
[45]
Y. Zhang, J. Cohen, and J. D. Owens. 2010. Fast tridiagonal solvers on the GPU. ACM Sigplan Notices 45, 5 (2010), 127--136.
[46]
Q. Zou and X. He. 1997. On pressure and velocity boundary conditions for the lattice Boltzmann BGK model. Physics of Fluids 9, 6 (1997), 1591--1598.

Cited By

View all
  • (2024)Another look at residual dynamic mode decomposition in the regime of fewer snapshots than dictionary sizePhysica D: Nonlinear Phenomena10.1016/j.physd.2024.134341(134341)Online publication date: Aug-2024
  • (2024)The multiverse of dynamic mode decomposition algorithmsNumerical Analysis Meets Machine Learning10.1016/bs.hna.2024.05.004(127-230)Online publication date: 2024
  • (2023)Residual dynamic mode decomposition: robust and verified KoopmanismJournal of Fluid Mechanics10.1017/jfm.2022.1052955Online publication date: 17-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 44, Issue 1
March 2018
308 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3071076
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 July 2017
Accepted: 01 April 2017
Revised: 01 September 2016
Received: 01 January 2016
Published in TOMS Volume 44, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CFD
  2. CUDA
  3. LBM
  4. PGAS
  5. Partitioned global address space
  6. UPC
  7. computational fluid dynamics
  8. compute unified device architecture
  9. lattice Boltzmann method
  10. nvidia
  11. unified parallel C

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Another look at residual dynamic mode decomposition in the regime of fewer snapshots than dictionary sizePhysica D: Nonlinear Phenomena10.1016/j.physd.2024.134341(134341)Online publication date: Aug-2024
  • (2024)The multiverse of dynamic mode decomposition algorithmsNumerical Analysis Meets Machine Learning10.1016/bs.hna.2024.05.004(127-230)Online publication date: 2024
  • (2023)Residual dynamic mode decomposition: robust and verified KoopmanismJournal of Fluid Mechanics10.1017/jfm.2022.1052955Online publication date: 17-Jan-2023
  • (2021)Cross-Platform GPU-Based Implementation of Lattice Boltzmann Method Solver Using ArrayFire LibraryMathematics10.3390/math91517939:15(1793)Online publication date: 28-Jul-2021
  • (2019)Analytical solutions of incompressible laminar channel and pipe flows driven by in-plane wall oscillationsPhysics of Fluids10.1063/1.510435631:8Online publication date: 12-Aug-2019
  • (2018)Parallel realization of the computational algorithm based on the implicit lattice Boltzmann equationsJournal of Physics: Conference Series10.1088/1742-6596/1038/1/0120411038(012041)Online publication date: 14-Jun-2018

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media