article

NoT: a high-level no-threading parallel programming method for heterogeneous systems

Authors:

Zhengdong ZhuAuthors Info & Claims

The Journal of Supercomputing, Volume 75, Issue 7

Pages 3810 - 3841

https://doi.org/10.1007/s11227-019-02749-1

Published: 01 July 2019 Publication History

Abstract

Multithreading is the core of mainstream heterogeneous programming methods such as CUDA and OpenCL. However, multithreaded parallel programming requires programmers to handle low-level runtime details, making the programming process complex and error prone. This paper presents no-threading (NoT), a high-level no-threading programming method. It introduces the association structure, a new language construct, to provide a declarative runtime-free expression of different data parallelisms and avoid the use of multithreading. The NoT method designs C-like syntax for the association structure and implements a compiler and runtime system using OpenCL as an intermediate language. We demonstrate the effectiveness of our techniques with multiple benchmarks. The size of the NoT code is comparable to that of the serial code and is far less than that of the benchmark OpenCL code. The compiler generates efficient OpenCL code, yielding a performance competitive with or equivalent to that of the manually optimized benchmark OpenCL code on both a GPU platform and an MIC platform.

References

[1]

The CUDA Toolkit. https://developer.nvidia.com/cuda-toolkit. Accessed 10 May 2018

[2]

The OpenCL standard. https://www.khronos.org/opencl/. Accessed 10 May 2018

[3]

Ryoo S, Rodrigues CI, Baghsorkhi SS, Stone SS, Kirk DB, Hwu WW(2008) Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'08, pp 73---82

Digital Library

[4]

Alberto M, Christophe D, Michael OB (2014) Automatic optimization of thread-coarsening for graphics processors. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT'14, pp 455---466

Digital Library

[5]

Luk CK, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp 45---55

Digital Library

[6]

Han TD, Abdelrahman TS (2011) hiCUDA: high-level GPGPU programming. IEEE Trans Parallel Distrib Syst 22(1):78---90

Digital Library

[7]

Wang Z, Grewe D, O'boyle MFP (2015) Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems. ACM Trans Archit Code Optim 11(4):1---26

Digital Library

[8]

The OpenACC Homepage. https://www.openacc.org/. Accessed 10 May 2018

[9]

High Performance Fortran Forum. http://hpff.rice.edu/. Accessed 10 May 2018

[10]

Chamberlain BL, Callahan D, Zima HP (2007) Parallel programmability and the Chapel language. Int J High Perform Comput Appl 21(3):291---312

Digital Library

[11]

C++ Accelerated Massive Parallelism. https://msdn.microsoft.com/en-us/library/hh265137.aspx. Accessed 10 May 2018

[12]

Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107---113

Digital Library

[13]

Catanzaro B, Garland M, Keutzer K (2011) Copperhead: compiling an embedded data parallel language. ACM SIGPLAN Not 46(8):47---56

Digital Library

[14]

Zhang Y, Mueller F (2013) Hidp: a hierarchical data parallel language. In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO'13, pp 1---11

Digital Library

[15]

High-Performance Portable MPI. http://www.mpich.org/. Accessed 10 May 2018

[16]

The OpenMP API specification. http://www.openmp.org/specifications/. Accessed 10 May 2018

[17]

Szafaryn LG, Gamblin T, Supinski BRD, Skadron K (2013) Trellis: portability across architectures with a high-level framework. J Parallel Distrib Comput 73(10):1400---1413

Digital Library

[18]

Carter EH, Trott CR, Sunderland D (2014) Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J Parallel Distrib Comput 74(12):3202---3216

Digital Library

[19]

Martineau M, Mcintosh-Smith S, Boulton M, Gaudin W (2016) An evaluation of emerging many-core parallel programming models. In: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'16, pp 1---10

Digital Library

[20]

Lee S, Eigenmann R (2010) OpenMPC: extended OpenMP programming and tuning for GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC'10, pp 1---11

Digital Library

[21]

Klöckner A, Pinto N, Lee Y, Catanzaro B, Ivanov P, Fasih A (2012) PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation. Parallel Comput 38(3):157---174

Digital Library

[22]

Phothilimthana PM, Ansel J, Ragan-Kelley J, Amarasinghe S (2013) Portable performance on heterogeneous architectures. In: Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'13), pp 431---444

Digital Library

[23]

Chafi H, Sujeeth AK, Brown KJ, Lee HJ, Atreya AR, Olukotun K (2011) A domain-specific approach to heterogeneous parallelism. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11), pp 35---46

Digital Library

[24]

Pu J, Bell S, Yang X, Setter J, Richardson S, Ragan-Kelley J, Horowitz M (2017) Programming heterogeneous systems from an image processing DSL. ACM Trans Archit Code Optim 14(3), Article 26

Digital Library

[25]

Thies W, Karczmarek M, Amarasinghe S (2002) StreamIt: a language for streaming applications. In: Horspool RN (ed) Compiler construction, CC 2002, pp 179---196, vol 2304. Lecture Notes in Computer Science. Springer, Heidelberg

Digital Library

[26]

Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M, Hanrahan P (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph 23(3):777---786

Digital Library

[27]

Hormati AH, Samadi M, Woh M, Mudge T, Mahlke S (2011) Sponge: portable stream programming on graphics engines. ACM SIGPLAN Not 46(3):381---392

Digital Library

[28]

Hong J, Hong K, Burgstaller B, Blieberger J (2012) StreamPI: a stream-parallel programming extension for object-oriented programming languages. J Supercomput 61(1):118---140

Digital Library

[29]

Auerbach J, Bacon DF, Cheng P, Rabbah R (2010) Lime: a Java-compatible and synthesizable language for heterogeneous architectures. ACM SIGPLAN Not 45(10):89---108

Digital Library

[30]

Dubach C, Cheng P, Rabbah R, Bacon DF, Fink SJ (2012) Compiling a high-level language for GPUs: (via language support for architectures and compilers). In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'12), pp 1---12

Digital Library

[31]

Su Y, Shi F, Talpur S, Wei J, Tan H (2014) Exploiting controlled-grained parallelism in message-driven stream programs. J Supercomput 70(1):488---509

Digital Library

[32]

Linderman MD, Collins JD, Wang H, Meng TH (2008) Merge: a programming model for heterogeneous multi-core systems. ACM SIGPLAN Not 43(3):287---296

Digital Library

[33]

Enmyren J, Kessler CW (2010) SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications (HLPP'10), pp 5---14

Digital Library

[34]

Ernstsson A, Li L, Kessler C (2018) SkePU 2: flexible and type-safe skeleton programming for heterogeneous parallel systems. Int J Parallel Program 46(1):62---80

Digital Library

[35]

Steuwer M, Kegel P, Gorlatch S (2011) SkelCL: a portable skeleton library for high-level GPU programming. In: Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 1176---1182

Digital Library

[36]

Rodrigues C, Jablin T, Dakkak A, Hwu WM (2014) Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'14), pp 247---258

Digital Library

[37]

Steuwer M, Fensch C, Lindley S, Dubach C (2015) Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. In: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, pp 205---217

Digital Library

[38]

Steuwer M, Remmelg T, Dubach C (2017) LIFT: A functional data-parallel IR for high-performance GPU code generation. In: Proceedings of the 2017 IEEE/ACM International Symposium on Code Generation and Optimization, pp 74---85

Digital Library

[39]

Collins A, Grewe D, Grover V, Lee S, Susnea A (2014) NOVA: a functional language for data parallelism. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY'14), pp 8---13

Digital Library

[40]

Henriksen T, Serup NGW, Elsman M, Henglein F, Oancea CE (2014) Futhark: purely functional gpu-programming with nested parallelism and in-place array updates. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'17, pp 556---571

Digital Library

[41]

Mattson T, Sanders B, Massingill B (2004) Patterns for parallel programming. Addison-Wesley Professional, Boston

Digital Library

[42]

Johnston WM, Hanna P Jr, Millar RJ (2004) Advances in dataflow programming languages. ACM Comput Surv 36(1):1---34

Digital Library

[43]

Kaeli DR, Mistry P, Schaa D, Zhang DP (2015) Heterogeneous computing with OpenCL 2.0. Morgan Kaufmann, San Francisco

Digital Library

[44]

Stratton JA, Rodrigues C, Sung IJ, Obeid N, Chang LW, Anssari N, Liu GD, Hwu WW (2012) Parboil: a revised benchmark suite for scientific and commercial throughput computing. http://impact.crhc.illinois.edu/Shared/Docs/impact-12-01.parboil.pdf. Accessed 10 May 2018

[45]

The SPEC ACCEL benchmark. http://www.spec.org/accel/. Accessed 10 May 2018

Index Terms

NoT: a high-level no-threading parallel programming method for heterogeneous systems
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Index terms have been assigned to the content through auto-classification.

Recommendations

Data Parallel Algorithmic Skeletons with Accelerator Support

Hardware accelerators such as GPUs or Intel Xeon Phi comprise hundreds or thousands of cores on a single chip and promise to deliver high performance. They are widely used to boost the performance of highly parallel applications. However, because of ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Nuclear Reactor Simulations on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Field-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The maturing high-level synthesis (HLS) ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing

The Journal of Supercomputing Volume 75, Issue 7

July 2019

628 pages

ISSN:0920-8542

Issue’s Table of Contents

Copyright © Copyright © 2019 Springer Science+Business Media, LLC, part of Springer Nature.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2019

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents