Performance Analysis of a High-Level Abstractions-Based Hydrocode on Future Computing Systems

Mudalige, G. R.; Reguly, I. Z.; Giles, M. B.; Mallinson, A. C.; Gaudin, W. P.; Herdman, J. A.

doi:10.1007/978-3-319-17248-4_5

G. R. Mudalige¹⁶,
I. Z. Reguly¹⁶,
M. B. Giles¹⁶,
A. C. Mallinson¹⁸,
W. P. Gaudin¹⁷ &
…
J. A. Herdman¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8966))

Included in the following conference series:

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

1207 Accesses

Abstract

In this paper we present research on applying a domain specific high-level abstractions (HLA) development strategy with the aim to “future-proof” a key class of high performance computing (HPC) applications that simulate hydrodynamics computations at AWE plc. We build on an existing high-level abstraction framework, OPS, that is being developed for the solution of multi-block structured mesh-based applications at the University of Oxford. OPS uses an “active library” approach where a single application code written using the OPS API can be transformed into different highly optimized parallel implementations which can then be linked against the appropriate parallel library enabling execution on different back-end hardware platforms. The target application in this work is the CloverLeaf mini-app from Sandia National Laboratory’s Mantevo suite of codes that consists of algorithms of interest from hydrodynamics workloads. Specifically, we present (1) the lessons learnt in re-engineering an industrial representative hydro-dynamics application to utilize the OPS high-level framework and subsequent code generation to obtain a range of parallel implementations, and (2) the performance of the auto-generated OPS versions of CloverLeaf compared to that of the performance of the hand-coded original CloverLeaf implementations on a range of platforms. Benchmarked systems include Intel multi-core CPUs and NVIDIA GPUs, the Archer (Cray XC30) CPU cluster and the Titan (Cray XK7) GPU cluster with different parallelizations (OpenMP, OpenACC, CUDA, OpenCL and MPI). Our results show that the development of parallel HPC applications using a high-level framework such as OPS is no more time consuming nor difficult than writing a one-off parallel program targeting only a single parallel implementation. However the OPS strategy pays off with a highly maintainable single application source, through which multiple parallelizations can be realized, without compromising performance portability on a range of parallel systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

OpenMP offload toward the exascale using Intel® GPU Max 1550: evaluation of STREAmS compressible solver

Article 06 June 2024

Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment

Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow

Article Open access 01 May 2021

Notes

1.
A similar approach is used in the C kernel implementations of the original CloverLeaf application.
2.
On Intel compilers, IEEE_FLAGS=-ipo -fp-model strict -fp-model source -prec-div -prec-sqrt.

References

The Firedrake Project. http://www.firedrakeproject.org/
Nvidia CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/cuda-samples/#bandwidth-test
Nvidia Tesla Kepler Family Datasheet. http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf
The SCALA Programming Language, http://www.scala-lang.org/
The Mantevo Project (2012). http://mantevo.org/
OP2 for Many-Core Platforms (2013). http://www.oerc.ox.ac.uk/research/op2
Archer - UK national high performance computing facility (2014). http://www.archer.ac.uk/
AWE cloverleaf (2014). http://warwick-pcav.github.io/CloverLeaf/
The montblanc project (2014). http://www.montblanc-project.eu/
OPS for Many-Core Platforms (2014). http://www.oerc.ox.ac.uk/projects/ops
Titan Cray XK7 (2014). https://www.olcf.ornl.gov/titan/
Brandvik, T., Pullan, G.: SBLOCK: a framework for efficient stencil-based PDE solvers on multi-core platforms. In: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology, CIT 2010, pp. 1181–1188. IEEE Computer Society, Washington, DC (2010)
Google Scholar
Czarnecki, K., Glück, R., Vandevoorde, D., Veldhuizen, T.L.: Generative programming and active libraries. In: Jazayeri, M., Musser, D.R., Loos, R.G.K. (eds.) Dagstuhl Seminar 1998. LNCS, vol. 1766, pp. 25–39. Springer, Heidelberg (2000)
Chapter Google Scholar
DeVito, Z., Joubert, N., Palacios, F., Oakley, S., Medina, M., Barrientos, M., Elsen, E., Ham, F., Aiken, A., Duraisamy, K., Darve, E., Alonso, J., Hanrahan, P.: Liszt: a domain specific language for building portable mesh-based PDE solvers. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 9:1–9:12. ACM, New York (2011)
Google Scholar
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990). http://doi.acm.org/10.1145/77626.79170
Article MATH Google Scholar
Gaudin, W., Mallinson, A., Perks, O., Herdman, J., Beckingsale, D., Levesque, J., Jarvis, S.: Optimising hydrodynamics applications for the cray XC30 with the application tool suite. In: The Cray User Group 2014, Lugano, Switzerland, 4–8 May 2014
Google Scholar
Herdman, J.A., Gaudin, W.P., McIntosh-Smith, S., Boulton, M., Beckingsale, D.A., Mallinson, A., Jarvis, S.: Accelerating hydrocodes with OpenACC, OpenCL and CUDA. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp. 465–471, November 2012
Google Scholar
Howes, L.W., Lokhmotov, A., Donaldson, A.F., Kelly, P.H.J.: Deriving efficient data movement from decoupled access/execute specifications. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds.) HiPEAC 2009. LNCS, vol. 5409, pp. 168–182. Springer, Heidelberg (2009)
Chapter Google Scholar
Lindtjorn, O., Clapp, R., Pell, O., Fu, H., Flynn, M., Fu, H.: Beyond traditional microprocessors for geoscience high-performance computing applications. IEEE Micro 31(2), 41–49 (2011)
Article Google Scholar
Mallinson, A., Beckingsale, D., Gaudin, W., Herdman, J., Jarvis, S.: Towards portable performance for explicit hydrodynamics codes. In: International Workshop on OpenCL (IWOCL 2013), Atlanta, USA, May 2013
Google Scholar
Markall, G.R., Slemmer, A., Ham, D.A., Kelly, P.H.J., Cantwell, C.D., Sherwin, S.J.: Finite element assembly strategies on multi- and many-core architectures. Int. J. Numer. Meth. Fluids 71, 80–97 (2013). http://dx.doi.org/10.1002/fld.3648
Article MathSciNet Google Scholar
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995
Google Scholar
Mudalige, G.R., Giles, M.B., Thiyagalingam, J., Reguly, I.Z., Bertolli, C., Kelly, P.H.J., Trefethen, A.E.: Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems. Parallel Comput. 39(11), 669–692 (2013)
Article Google Scholar
Muranushi, T.: Paraiso: an automated tuning framework for explicit solvers of partial differential equations. Comput. Sci. Discov. 5(1), 015003 (2012)
Article Google Scholar
Ølgaard, K.B., Logg, A., Wells, G.N.: Automated Code Generation for Discontinuous Galerkin Methods. CoRR abs/1104.0628 (2011)
Google Scholar
Orchard, D.A., Bolingbroke, M., Mycroft, A.: Ypnos: declarative, parallel structured grid programming. In: Proceedings of the 5th ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming, DAMP 2010, pp. 15–24. ACM, New York (2010)
Google Scholar
Rathgeber, F., Markall, G.R., Mitchell, L., Loriant, M., Ham, D.A., Bertolli, C., Kelly, P.H.J.: PyOP2: a high-level framework for performance-portable simulations on unstructured meshes. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp. 1116–1123 (2012)
Google Scholar
Reguly, I.Z., Mudalige, G.R., Bertolli, C., Giles, M.B., Betts, A., Kelly, P.H.J., Radford, D.: Acceleration of a full-scale industrial CFD application with OP2. ACM Trans. Parallel Comput. (2013, under review). http://arxiv-web3.library.cornell.edu/abs/1403.7209
Sujeeth, A.K., Brown, K.J., Lee, H., Rompf, T., Chafi, H., Odersky, M., Olukotun, K.: Delite: a compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. (TECS) 13(4s), 134 (2014)
Google Scholar
Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The pochoir stencil compiler. In: Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 117–128. ACM, New York (2011)
Google Scholar
Veldhuizen, T.L., Gannon, D.: Active libraries: rethinking the roles of compilers and libraries. In: Proceedings of the SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing (OO 1998). SIAM Press (1998)
Google Scholar

Download references

Acknowledgements

This research is funded by the UK AWE plc. under project “High-level Abstractions for Performance, Portability and Continuity of Scientific Software on Future Computing Systems”.

The OPS project is funded by the UK Engineering and Physical Sciences Research Council projects EP/K038494/1,EP/K038486/1, EP/K038451/1 and EP/K038567/1 on “Future-proof massively-parallel execution of multi-block applications”and EP/J010553/1 “Software for Emerging Architectures” (ASEArch) project. This paper used the Archer UK National Supercomputing Service from time allocated through UK Engineering and Physical Sciences Research Council projects EP/I006079/1, EP/I00677X/1 on “Multi-layered Abstractions for PDEs”.

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Cloverleaf development is supported by the UK Atomic Weapons Establishment under grants CDK0660 (The Production of Predictive Models for Future Computing Requirements) and CDK0724 (AWE Technical Outreach Programe) and also the Royal Society through their Industry Fellowship Scheme (IF090020/AM).

We are thankful to Endre László at PPKE Hungary for his contributions to OPS, David Beckingsale at the University of Warwick and Michael Boulton at the University of Bristol for their insights into the original CloverLeaf application and its implementation.

Author information

Authors and Affiliations

Oxford e-Research Centre, University of Oxford, 7, Keble Road Oxford, Oxford, OX1 3QG, UK
G. R. Mudalige, I. Z. Reguly & M. B. Giles
High Performance Computing, UK AWE plc., Aldermaston, UK
W. P. Gaudin & J. A. Herdman
Department of Computer Science, University of Warwick, Coventry, UK
A. C. Mallinson

Authors

G. R. Mudalige
View author publications
You can also search for this author in PubMed Google Scholar
I. Z. Reguly
View author publications
You can also search for this author in PubMed Google Scholar
M. B. Giles
View author publications
You can also search for this author in PubMed Google Scholar
A. C. Mallinson
View author publications
You can also search for this author in PubMed Google Scholar
W. P. Gaudin
View author publications
You can also search for this author in PubMed Google Scholar
J. A. Herdman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. R. Mudalige .

Editor information

Editors and Affiliations

University of Warwick, Coventry, United Kingdom
Stephen A. Jarvis
University of Warwick, Coventry, United Kingdom
Steven A. Wright
Sandia National Laboratories CSRI, Albuquerque, New Mexico, USA
Simon D. Hammond

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mudalige, G.R., Reguly, I.Z., Giles, M.B., Mallinson, A.C., Gaudin, W.P., Herdman, J.A. (2015). Performance Analysis of a High-Level Abstractions-Based Hydrocode on Future Computing Systems. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2014. Lecture Notes in Computer Science(), vol 8966. Springer, Cham. https://doi.org/10.1007/978-3-319-17248-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-17248-4_5
Published: 18 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17247-7
Online ISBN: 978-3-319-17248-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance Analysis of a High-Level Abstractions-Based Hydrocode on Future Computing Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

OpenMP offload toward the exascale using Intel® GPU Max 1550: evaluation of STREAmS compressible solver

Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment

Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Performance Analysis of a High-Level Abstractions-Based Hydrocode on Future Computing Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

OpenMP offload toward the exascale using Intel® GPU Max 1550: evaluation of STREAmS compressible solver

Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment

Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation