Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

SaC/C formulations of the all-pairs N-body problem and their performance on SMPs and GPGPUs

Published: 25 March 2014 Publication History

Abstract

This paper describes our experience in implementing the classical N-body algorithm in SaC and analysing the runtime performance achieved on three different machines: a dual-processor 8-core Dell PowerEdge 2950 a Beowulf cluster node, the reference machine, a quad-core hyper-threaded Intel Core-i7 based system equipped with an NVidia GTX-480 graphics accelerator and an Oracle Sparc T4-4 server with a total of 256 hardware threads. We contrast our findings with those resulting from the reference C code and a few variants of it that employ OpenMP pragmas as well as explicit vectorisation. Our experiments demonstrate that the SaC implementation successfully combines a high level of abstraction, very close to the mathematical specification, with very competitive runtimes. In fact, SaC matches or outperforms the hand-vectorised and hand-parallelised C codes on all three systems under investigation without the need for any source code modification. Furthermore, only SaC is able to effectively harness the advanced compute power of the graphics accelerator, again by mere recompilation of the same source code. Our results illustrate the benefits that SaC provides to application programmers in terms of coding productivity, source code, and performance portability among different machine architectures, as well as long-term maintainability in evolving hardware environments. Copyright © 2013 John Wiley & Sons, Ltd.

References

[1]
SICSA MultiCore Challenge, Phase II, 2012. Available from: "http://www.macs.hw.ac.uk/sicsawiki/index.php/Challenge-PhaseII" {accessed on 31 July 2013}.
[2]
Barnes J, Hut P. A hierarchical ONlog N force calculation algorithm. Nature 1986; Volume 324: pp.446-449.
[3]
Grelck C, Scholz SB. SAC: a functional array language for efficient multithreaded execution. International Journal of Parallel Programming 2006; Volume 34 Issue 4: pp.383-427.
[4]
SAC. Available from: "http://www.sac-home.org/" {accessed on 31 July 2013}.
[5]
Grelck C. Shared memory multiprocessor support for functional array processing in SAC. Journal of Functional Programming 2005; Volume 15 Issue 3: pp.353-401.
[6]
Guo J, Thiyagalingam J, Scholz SB. Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In 6th Workshop on Declarative Aspects of Multicore Programming DAMP'11. ACM Press: Austin, USA, 2011; pp.15-24.
[7]
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.1, July 2011. Available from: "http://www.openmp.org/mp-documents/OpenMP3.1.pdf" {accessed on 31 July 2013}.
[8]
OpenMP. Available from: "http://www.openmp.org/" {accessed on 31 July 2013}.
[9]
NVidia. NVIDIA CUDA C Programming Guide 4.0. Technical Report, NVidia, 2011.
[10]
Khronos OpenCL Working Group. The OpenCL Specification, version 1.2, 15 November 2011. Available from: "http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf" {accessed on 31 July 2013}.
[11]
OpenCL. Available from: "http://www.khronos.org/opencl/" {accessed on 31 July 2013}.
[12]
Scholz SB. With-loop-folding in SAC - condensing consecutive array operations. In Implementation of Functional Languages, 9th International Workshop IFL'97, St. Andrews, UK, Selected Papers, Vol.Volume 1467, Clack C, Davie T, Hammond K eds, <bookSeriesTitle>Lecture Notes in Computer Science</bookSeriesTitle>. Springer: Berlin Heidelberg, 1998; pp.72-92.
[13]
Scholz SB. A case study: effects of with-loop folding on the NAS benchmark MG in SAC. In Implementation of Functional Languages, 10th International Workshop IFL'98, London, England, UK, Selected Papers, Vol.Volume 1595, Hammond K, Davie T, Clack C eds, <bookSeriesTitle>Lecture Notes in Computer Science</bookSeriesTitle>. Springer: Berlin Heidelberg, 1999; pp.216-228.
[14]
Grelck C, Hinckfuβ K, Scholz SB. With-loop fusion for data locality and parallelism. In Implementation and Application of Functional Languages, 17th International Workshop IFL'05, Dublin, Ireland, Revised Selected Papers, Vol.Volume 4015, Butterfield A ed., <bookSeriesTitle>Lecture Notes in Computer Science</bookSeriesTitle>. Springer: Berlin Heidelberg, 2006; pp.178-195.
[15]
Grelck C, Scholz SB, Trojahner K. With-loop scalarization: merging nested array operations. In Implementation of Functional Languages, 15th International Workshop IFL'03, Edinburgh, Scotland, UK, Revised Selected Papers, Vol.Volume 3145, Trinder P, Michaelson G eds, <bookSeriesTitle>Lecture Notes in Computer Science</bookSeriesTitle>. Springer: Berlin Heidelberg, 2004; pp.118-134.
[16]
Scholz SB, Herhut S, Penczek F, Grelck C. SaC 1.0 - Single Assignment C - Tutorial. Technical Report, University of Hertfordshire, University of Amsterdam, 2010. Available from: "http://www.sac-home.org/publications/tutorial.pdf" {accessed on 31 July 2013}.
[17]
Grelck C. Single Assignment C SAC: high productivity meets high performance. In 4th Central European Functional Programming Summer School CEFP'11, Budapest, Hungary, Vol.Volume 7241, Horváth Z, Zsók V eds, <bookSeriesTitle>Lecture Notes in Computer Science</bookSeriesTitle>. Springer: Berlin Heidelberg, 2012; pp.207-278.
[18]
Nuzman D, Henderson R. Multi-platform auto-vectorization. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '06. IEEE Computer Society: Washington, DC, USA, 2006; pp.281-294. http://dx.doi.org/10.1109/CGO.2006.25.
[19]
Sreraman N, Govindarajan R. A vectorizing compiler for multimedia extensions. International Journal of Parallel Programming 2000; Volume 28: pp.363-400.
[20]
Chen KH, Shen BY, Yang W. An automatic superword vectorization in LLVM. 16th Workshop on Compiler Techniques for High-Performance and Embedded Computing, Taipei, 2010; pp.19-27.
[21]
Šinkarovs A, Scholz SB. Portable support for explicit vectorisation in C. 16th Workshop on Compilers for Parallel Computing CPC'12, Padua, Italy, 2012. "http://ashinkarov.github.io/publications/cpcgcc.pdf" {accessed on 31 July 2013}.
[22]
Chapman B, Jost G, <familyNamePrefix>van der</familyNamePrefix>Pas R. Using OpenMP: Portable Shared Memory Parallel Programming. MIT Press: Cambridge Massachusetts, 2008.
[23]
TOP500. Available from: "http://www.top500.org/" {accessed on 31 July 2013}.
[24]
SUN/Oracle. A technical overview of the Oracle Sparc Supercluster T4-4. White paper, SUN/Oracle, 2012.
[25]
Fulgham B. The Computer Language Benchmarks Game, 2012. Available from: "http://benchmarksgame.alioth.debian.org/" {accessed on 31 July 2013}.
[26]
Kudryavtsev A, Rolls D, Scholz SB, Shafarenko A. Numerical simulations of unsteady shock wave interactions using SAC and Fortran-90. In 10th International Conference on Parallel Computing Technologies PaCT'09, Vol.Volume 5083, <bookSeriesTitle>Lecture Notes in Computer Science</bookSeriesTitle>. Springer: Berlin Heidelberg, 2009; pp.445-456.
[27]
Grelck C, Douma R. SAC on a Niagara T3-4 Server: lessons and experiences. In Applications, Tools and Techniques on the Road to Exascale Computing, Vol.Volume 22, <familyNamePrefix>de</familyNamePrefix>Bosschere K, D'Hollander E, Joubert G, Padua D, Peters F, Sawyer M eds, <bookSeriesTitle>Advances in Parallel Computing</bookSeriesTitle>. IOS Press: Amsterdam, 2012; pp.289-296.
[28]
Wieser V, Grelck C, Haslinger P, Guo J, Korzeniowski F, Bernecky R, Moser B, Scholz S. Combining high productivity and high performance in image processing using Single Assignment C on multi-core CPUs and many-core GPUs. Journal of Electronic Imaging 2012; Volume 21 Issue 2: pp.021116-1-021116-13.

Cited By

View all
  • (2019)Persistent Asynchronous Adaptive Specialization for Generic Array ProgrammingInternational Journal of Parallel Programming10.1007/s10766-018-0567-947:2(164-183)Online publication date: 15-May-2019
  • (2018)A Portability Layer of an All-pairs Operation for Hierarchical N-Body Algorithm Framework TapasProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149471(241-250)Online publication date: 28-Jan-2018
  • (2017)GPUMapProceedings of the 7th Workshop on Python for High-Performance and Scientific Computing10.1145/3149869.3149875(1-10)Online publication date: 12-Nov-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Concurrency and Computation: Practice & Experience
Concurrency and Computation: Practice & Experience  Volume 26, Issue 4
March 2014
169 pages

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 25 March 2014

Author Tags

  1. auto-parallelisation
  2. data parallelism
  3. functional programming
  4. graphics cards
  5. high performance
  6. high productivity
  7. single assignment C
  8. vectorisation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Persistent Asynchronous Adaptive Specialization for Generic Array ProgrammingInternational Journal of Parallel Programming10.1007/s10766-018-0567-947:2(164-183)Online publication date: 15-May-2019
  • (2018)A Portability Layer of an All-pairs Operation for Hierarchical N-Body Algorithm Framework TapasProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149471(241-250)Online publication date: 28-Jan-2018
  • (2017)GPUMapProceedings of the 7th Workshop on Python for High-Performance and Scientific Computing10.1145/3149869.3149875(1-10)Online publication date: 12-Nov-2017
  • (2016)Type-driven data layouts for improved vectorisationConcurrency and Computation: Practice & Experience10.1002/cpe.350128:7(2092-2119)Online publication date: 1-May-2016
  • (2015)Single Assignment C (SAC)Central European Functional Programming School10.1007/978-3-030-28346-9_7(207-282)Online publication date: 6-Jul-2015
  • (2014)SICSA multicore challenge editorial prefaceConcurrency and Computation: Practice & Experience10.1002/cpe.307726:4(929-934)Online publication date: 25-Mar-2014
  • (2013)Next Generation Asynchronous Adaptive Specialization for Data-Parallel Functional Array Processing in SACProceedings of the 25th symposium on Implementation and Application of Functional Languages10.1145/2620678.2620690(117-128)Online publication date: 28-Aug-2013

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media