article

N-body computations using skeletal frameworks on multicore CPU/graphics processing unit architectures: an empirical performance evaluation

Authors:

Horacio González-VélezAuthors Info & Claims

Concurrency and Computation: Practice & Experience, Volume 26, Issue 4

Pages 972 - 986

https://doi.org/10.1002/cpe.3076

Published: 25 March 2014 Publication History

Abstract

With the emergence of general-purpose computation on graphics processing units, high-level approaches that hide the conceptual complexity of the low-level Compute Unified Device Architecture and Open Computing Language platforms are the subject of active research. However, these approaches may require a trade-off in terms of achieved performance and utilisation on graphics processing units hardware and may impose algorithmic limitations. In this paper, we present and systematically evaluate the parallel performance of three implementations of the brute force, all-pairs N-body algorithm with skeletal deployments based on the FastFlow, SkePU and Thrust frameworks. Our results indicate that the skeletal framework implementation achieves up to two orders of magnitude speed-up over serial version with a Tesla M2050 with lower implementation complexity than low-level Compute Unified Device Architecture programming. Copyright © 2013 John Wiley & Sons, Ltd.

References

[1]

McGuire JB. Study of exactly soluble one-dimensional N-body problems. Journal of Mathematical Physics 1964; Volume 5 Issue 5: pp.622-636.

[2]

Barnes J, Hut P. A hierarchical ON log N force-calculation algorithm. Nature 1986; Volume 324: pp.446-449.

[3]

Greengard LF. The rapid evaluation of potential fields in particle systems. PhD Thesis, Yale University, New Haven, CT, USA, 1987. Distinguished ACM Dissertation.

Digital Library

[4]

Hockney R, Eastwood J. Computer Simulation Using Particles. Taylor & Francis: Abingdon, 1988.

Digital Library

[5]

Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell TJ. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 2007; Volume 26 Issue 1: pp.80-113.

[6]

Malik M, Li T, Sharif U, Shahid R, El-Ghazawi T, Newby G. Productivity of GPUs under different programming paradigms. Concurrency and Computation: Practice and Experience 2012; Volume 24 Issue 2: pp.179-191.

Digital Library

[7]

Cole M. Algorithmic Skeletons: Structured Management of Parallel Computation, Research Monographs in Parallel and Distributed Computing. MIT Press/Pitman: London, 1989.

Digital Library

[8]

González-Vélez H, Leyton M. A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Software-Practice and Experience 2010; Volume 40 Issue 12: pp.1135-1160.

[9]

Enmyren J, Kessler CW. SkePU: a multi-backend skeleton programming library for multi-GPU systems. In HLPP '10. ACM: Baltimore, 2010; pp.5-14.

Digital Library

[10]

Bell N, Hoberock J. Thrust: a productivity-oriented library for CUDA. In GPU Computing Gems, Hwu Wm ed., <bookSeriesTitle>chap. 26, Applications of GPU Computing</bookSeriesTitle>, Jade edn. Morgan Kaufmann: Waltham, 2011; pp.359-371.

[11]

Marco A, Massimo T, Massimiliano M. FastFlow: efficient parallel streaming applications on multi-core. Technical Report TR-09-12, Universita di Pisa, Dipartimento di Informatica, Italy, September 2009. Available from: "http://arxiv.org/pdf/0909.1187v1" {Accessed on 13 February 2012}.

[12]

The SICSA multicore challenge: N-body computation, 2011. Available from: "http://goo.gl/2i4Fe" {Accessed on 13 February 2012}.

[13]

Bischof H, Gorlatch S, Leshchinskiy R, Müller J. Data parallelism in C++ template programs: a Barnes-Hut case study. Parallel Processing Letters 2005; Volume 15 Issue 3: pp.257-272.

[14]

Nyland L, Harris M, Prins J. Fast N-body simulation with CUDA. GPU Gems 2007; Volume 3: pp.677-695.

[15]

Aldinucci M, Danelutto M, Kilpatrick P, Meneghin M, Torquati M. Accelerating code on multi-cores with FastFlow. In Euro-Par 2011, vol.Volume 6853, <bookSeriesTitle>LNCS</bookSeriesTitle>. Springer: Bordeaux, 2011; pp.170-181.

[16]

Ubuntu: Intel q6600 one core computer language benchmarks game: N-body benchmark. Available from: "http://goo.gl/37iQx" {Accessed on 13 February 2012}.

[17]

Edinburgh compute and data facility web site. 1 August 2007. U of Edinburgh. February 01 2012. Available from: "http://www.ecdf.ed.ac.uk" {Accessed on 13 February 2012}.

[18]

Intel. Intel Xeon processor 5600 series. Technical Report Revision-002, Intel Corporation, Santa Clara, June 2011. Available from: "http://goo.gl/UpQbS" {Accessed on 13 February 2012} Datasheet.

Cited By

Matsuda MFukuda KMaruyama N(2018)A Portability Layer of an All-pairs Operation for Hierarchical N-Body Algorithm Framework TapasProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149471(241-250)Online publication date: 28-Jan-2018
https://dl.acm.org/doi/10.1145/3149457.3149471
Goli MGonzález-Vélez H(2018)Formalised Composition and Interaction for Heterogeneous Structured ParallelismInternational Journal of Parallel Programming10.1007/s10766-017-0511-446:1(120-151)Online publication date: 28-Dec-2018
https://dl.acm.org/doi/10.1007/s10766-017-0511-4
Goli MGonzález---Vélez H(2018)Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core ArchitecturesInternational Journal of Parallel Programming10.1007/s10766-016-0419-445:2(203-224)Online publication date: 28-Dec-2018
https://dl.acm.org/doi/10.1007/s10766-016-0419-4
Show More Cited By

Recommendations

Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core Architectures

Widely adumbrated as patterns of parallel computation and communication, algorithmic skeletons introduce a viable solution for efficiently programming modern heterogeneous multi-core architectures equipped not only with traditional multi-core CPUs, but ...
Stream Processing on Hybrid CPU/Intel^® Xeon Phi^™ Systems
Euro-Par 2018: Parallel Processing
Abstract
Stream processing is currently central to handle large volumes of data generated at high rates. However, the efficient processing of such quantity of data demands massively parallel hardware. The usual approach is to rely on clusters of multi-...
Massively LDPC Decoding on Multicore Architectures

Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable for parallel computing are proposed ...

Comments

Information & Contributors

Information

Published In

cover image Concurrency and Computation: Practice & Experience

Concurrency and Computation: Practice & Experience Volume 26, Issue 4

March 2014

169 pages

ISSN:1532-0626

Issue’s Table of Contents

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 25 March 2014

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Matsuda MFukuda KMaruyama N(2018)A Portability Layer of an All-pairs Operation for Hierarchical N-Body Algorithm Framework TapasProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149471(241-250)Online publication date: 28-Jan-2018
https://dl.acm.org/doi/10.1145/3149457.3149471
Goli MGonzález-Vélez H(2018)Formalised Composition and Interaction for Heterogeneous Structured ParallelismInternational Journal of Parallel Programming10.1007/s10766-017-0511-446:1(120-151)Online publication date: 28-Dec-2018
https://dl.acm.org/doi/10.1007/s10766-017-0511-4
Goli MGonzález---Vélez H(2018)Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core ArchitecturesInternational Journal of Parallel Programming10.1007/s10766-016-0419-445:2(203-224)Online publication date: 28-Dec-2018
https://dl.acm.org/doi/10.1007/s10766-016-0419-4
Loidl HSinger J(2014)SICSA multicore challenge editorial prefaceConcurrency and Computation: Practice & Experience10.1002/cpe.307726:4(929-934)Online publication date: 25-Mar-2014
https://dl.acm.org/doi/10.1002/cpe.3077

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents