Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

N-body computations using skeletal frameworks on multicore CPU/graphics processing unit architectures: an empirical performance evaluation

Published: 25 March 2014 Publication History

Abstract

With the emergence of general-purpose computation on graphics processing units, high-level approaches that hide the conceptual complexity of the low-level Compute Unified Device Architecture and Open Computing Language platforms are the subject of active research. However, these approaches may require a trade-off in terms of achieved performance and utilisation on graphics processing units hardware and may impose algorithmic limitations. In this paper, we present and systematically evaluate the parallel performance of three implementations of the brute force, all-pairs N-body algorithm with skeletal deployments based on the FastFlow, SkePU and Thrust frameworks. Our results indicate that the skeletal framework implementation achieves up to two orders of magnitude speed-up over serial version with a Tesla M2050 with lower implementation complexity than low-level Compute Unified Device Architecture programming. Copyright © 2013 John Wiley & Sons, Ltd.

References

[1]
McGuire JB. Study of exactly soluble one-dimensional N-body problems. Journal of Mathematical Physics 1964; Volume 5 Issue 5: pp.622-636.
[2]
Barnes J, Hut P. A hierarchical ON log N force-calculation algorithm. Nature 1986; Volume 324: pp.446-449.
[3]
Greengard LF. The rapid evaluation of potential fields in particle systems. PhD Thesis, Yale University, New Haven, CT, USA, 1987. Distinguished ACM Dissertation.
[4]
Hockney R, Eastwood J. Computer Simulation Using Particles. Taylor & Francis: Abingdon, 1988.
[5]
Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell TJ. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 2007; Volume 26 Issue 1: pp.80-113.
[6]
Malik M, Li T, Sharif U, Shahid R, El-Ghazawi T, Newby G. Productivity of GPUs under different programming paradigms. Concurrency and Computation: Practice and Experience 2012; Volume 24 Issue 2: pp.179-191.
[7]
Cole M. Algorithmic Skeletons: Structured Management of Parallel Computation, Research Monographs in Parallel and Distributed Computing. MIT Press/Pitman: London, 1989.
[8]
González-Vélez H, Leyton M. A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Software-Practice and Experience 2010; Volume 40 Issue 12: pp.1135-1160.
[9]
Enmyren J, Kessler CW. SkePU: a multi-backend skeleton programming library for multi-GPU systems. In HLPP '10. ACM: Baltimore, 2010; pp.5-14.
[10]
Bell N, Hoberock J. Thrust: a productivity-oriented library for CUDA. In GPU Computing Gems, Hwu Wm ed., <bookSeriesTitle>chap. 26, Applications of GPU Computing</bookSeriesTitle>, Jade edn. Morgan Kaufmann: Waltham, 2011; pp.359-371.
[11]
Marco A, Massimo T, Massimiliano M. FastFlow: efficient parallel streaming applications on multi-core. Technical Report TR-09-12, Universita di Pisa, Dipartimento di Informatica, Italy, September 2009. Available from: "http://arxiv.org/pdf/0909.1187v1" {Accessed on 13 February 2012}.
[12]
The SICSA multicore challenge: N-body computation, 2011. Available from: "http://goo.gl/2i4Fe" {Accessed on 13 February 2012}.
[13]
Bischof H, Gorlatch S, Leshchinskiy R, Müller J. Data parallelism in C++ template programs: a Barnes-Hut case study. Parallel Processing Letters 2005; Volume 15 Issue 3: pp.257-272.
[14]
Nyland L, Harris M, Prins J. Fast N-body simulation with CUDA. GPU Gems 2007; Volume 3: pp.677-695.
[15]
Aldinucci M, Danelutto M, Kilpatrick P, Meneghin M, Torquati M. Accelerating code on multi-cores with FastFlow. In Euro-Par 2011, vol.Volume 6853, <bookSeriesTitle>LNCS</bookSeriesTitle>. Springer: Bordeaux, 2011; pp.170-181.
[16]
Ubuntu: Intel q6600 one core computer language benchmarks game: N-body benchmark. Available from: "http://goo.gl/37iQx" {Accessed on 13 February 2012}.
[17]
Edinburgh compute and data facility web site. 1 August 2007. U of Edinburgh. February 01 2012. Available from: "http://www.ecdf.ed.ac.uk" {Accessed on 13 February 2012}.
[18]
Intel. Intel Xeon processor 5600 series. Technical Report Revision-002, Intel Corporation, Santa Clara, June 2011. Available from: "http://goo.gl/UpQbS" {Accessed on 13 February 2012} Datasheet.

Cited By

View all
  • (2018)A Portability Layer of an All-pairs Operation for Hierarchical N-Body Algorithm Framework TapasProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149471(241-250)Online publication date: 28-Jan-2018
  • (2018)Formalised Composition and Interaction for Heterogeneous Structured ParallelismInternational Journal of Parallel Programming10.1007/s10766-017-0511-446:1(120-151)Online publication date: 28-Dec-2018
  • (2018)Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core ArchitecturesInternational Journal of Parallel Programming10.1007/s10766-016-0419-445:2(203-224)Online publication date: 28-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Concurrency and Computation: Practice & Experience
Concurrency and Computation: Practice & Experience  Volume 26, Issue 4
March 2014
169 pages

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 25 March 2014

Author Tags

  1. GPU
  2. algorithmic skeletons
  3. general-purpose computing on graphics processing units
  4. parallel computing
  5. structured parallelism

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2018)A Portability Layer of an All-pairs Operation for Hierarchical N-Body Algorithm Framework TapasProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149471(241-250)Online publication date: 28-Jan-2018
  • (2018)Formalised Composition and Interaction for Heterogeneous Structured ParallelismInternational Journal of Parallel Programming10.1007/s10766-017-0511-446:1(120-151)Online publication date: 28-Dec-2018
  • (2018)Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core ArchitecturesInternational Journal of Parallel Programming10.1007/s10766-016-0419-445:2(203-224)Online publication date: 28-Dec-2018
  • (2014)SICSA multicore challenge editorial prefaceConcurrency and Computation: Practice & Experience10.1002/cpe.307726:4(929-934)Online publication date: 25-Mar-2014

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media