Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1941553.1941591acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Achieving a single compute device image in OpenCL for multiple GPUs

Published: 12 February 2011 Publication History

Abstract

In this paper, we propose an OpenCL framework that combines multiple GPUs and treats them as a single compute device. Providing a single virtual compute device image to the user makes an OpenCL application written for a single GPU portable to the platform that has multiple GPU devices. It also makes the application exploit full computing power of the multiple GPU devices and the total amount of GPU memories available in the platform. Our OpenCL framework automatically distributes at run-time the OpenCL kernel written for a single GPU into multiple CUDA kernels that execute on the multiple GPU devices. It applies a run-time memory access range analysis to the kernel by performing a sampling run and identifies an optimal workload distribution for the kernel. To achieve a single compute device image, the runtime maintains virtual device memory that is allocated in the main memory. The OpenCL runtime treats the memory as if it were the memory of a single GPU device and keeps it consistent to the memories of the multiple GPU devices. Our OpenCL-C-to-C translator generates the sampling code from the OpenCL kernel code and OpenCL-C-to-CUDA-C translator generates the CUDA kernel code for the distributed OpenCL kernel. We show the effectiveness of our OpenCL framework by implementing the OpenCL runtime and two source-to-source translators. We evaluate its performance with a system that contains 8 GPUs using 11 OpenCL benchmark applications.

References

[1]
ATI Stream Software Development Ket (SDK) v2.1. AMD, 2010. http://developer.amd.com/gpu/atistreamsdk/pages/default.aspx.
[2]
G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS'67 (Spring): Proceedings of the April 18--20, 1967, spring joint computer conference, pages 483--485. ACM, 1967.
[3]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT'08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 72--81. ACM, October 2008.
[4]
F. Darema. The SPMD Model: Past, Present and Future. Lecture Notes in Computer Science, 2131 (1): 1--1, January 2001.
[5]
I. Gelado, J. H. Kelm, S. Ryoo, S. S. Lumetta, N. Navarro, and W.-m. W. Hwu. CUBA: an architecture for efficient CPU/co-processor data communication. In ICS'08: Proceedings of the 22nd annual international conference on Supercomputing, pages 299--308. ACM, June 2008.
[6]
J. Gummaraju, L. Morichetti, M. Houston, B. Sander, B. R. Gaster, and B. Zheng. Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors. In PACT'10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 205--216. ACM, 2010.
[7]
Khronos OpenCL Working Group. The OpenCL Specification Version 1.0. Khronos Group, 2009. http://www.khronos.org/opencl.
[8]
D. B. Kirk and W.-m. W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2010. ISBN 0123814723, 9780123814722.
[9]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO'04: Proceedings of the international symposium on Code generation and optimization, pages 75--86, Washington, DC, USA, March 2004. IEEE Computer Society.
[10]
J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo, S. H. Lee, S. M. Cho, H. J. Song, S.-B. Suh, and J.-D. Choi. An OpenCL framework for heterogeneous multicores with local memory. In PACT'10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 193--204. ACM, 2010.
[11]
S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997. ISBN 1-55860-320-4.
[12]
NASA Advanced Supercomputing Division. NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.
[13]
NVIDIA Fermi Compute Architecture White Paper. NVIDIA, 2009. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf.
[14]
NVIDIA CUDA C Best Practices Guide 3.1. NVIDIA, May 2010.
[15]
NVIDIA CUDA C Programming Guide 3.1.1. NVIDIA, July 2010.
[16]
NVIDIA CUDA Zone. NVIDIA, July 2010. http://www.nvidia.com/object/cuda_home_new.html.
[17]
NVIDIA GPU Computing Software Development Kit. NVIDIA, June 2010. http://developer.nvidia.com/object/cuda_3_1_downloads.html.
[18]
Tesla M2050/M2070 GPU Computing Module. NVIDIA, 2010. http://www.nvidia.com/object/product_tesla_M2050_M2070_us.html.
[19]
J. C. Phillips, J. E. Stone, and K. Schulten. Adapting a message-driven parallel application to GPU-accelerated clusters. In SC'08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--9, Piscataway, NJ, USA, November 2008. IEEE Press.
[20]
G. Quintana-Ortí, F. D. Igual, E. S. Quintana-Ortí, and R. A. van de Geijn. Solving dense linear systems on platforms with multiple hardware accelerators. In PPoPP'09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 121--130. ACM, 2009.
[21]
J. W. Romein, P. C. Broekema, J. D. Mol, and R. V. van Nieuwpoort. The LOFAR correlator: implementation and performance analysis. In PPoPP'10: Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 169--178. ACM, 2010.
[22]
D. Schaa and D. Kaeli. Exploring the multiple-GPU design space. In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pages 1--12, May 2009.
[23]
M. Strengert, C. Müler, C. Dachsbacher, and T. Ertl. CUDASA: Compute Unified Device and Systems Architecture. In Eurographics Symposium on Parallel Graphics and Visualization (EGPGV08), pages 49--56. Eurographics Association, April 2008.
[24]
The IMPACT Research Group. Parboil Benchmark suite. http://impact.crhc.illinois.edu/parboil.php, 2009.
[25]
F. Tip. A Survey of Program Slicing Techniques. Technical report, Amsterdam, The Netherlands, The Netherlands, 1994.
[26]
V. Volkov and J. W. Demmel. Benchmarking gpus to tune dense linear algebra. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--11, Piscataway, NJ, USA, 2008. IEEE Press.
[27]
M. Weiser. Program Slicing. In ICSE'81: Proceedings of the 5th International Conference on Software Engineering, pages 439--449, Piscataway, NJ, USA, 1981. IEEE Press.
[28]
C. Yang, F. Wang, Y. Du, J. Chen, J. Liu, H. Yi, and K. Lu. Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing. In IEEE Cluster'10: Proceedings of IEEE International Conference on Cluster Computing, pages 19--28, Los Alamitos, CA, USA, 2010. IEEE Computer Society.
[29]
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. In PLDI'10: Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, pages 86--97. ACM, June 2010.

Cited By

View all
  • (2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
  • (2023)Multiprovision: a Design Space Exploration tool for multi-tenant resource provisioning in CPU–GPU environmentsDesign Automation for Embedded Systems10.1007/s10617-023-09279-327:4(241-273)Online publication date: 21-Dec-2023
  • (2022)DopiaProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508421(32-45)Online publication date: 2-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
February 2011
326 pages
ISBN:9781450301190
DOI:10.1145/1941553
  • General Chair:
  • Calin Cascaval,
  • Program Chair:
  • Pen-Chung Yew
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 8
    PPoPP '11
    August 2011
    300 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2038037
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. access range analysis
  2. compilers
  3. opencl
  4. runtime
  5. virtual device memory
  6. workload distribution

Qualifiers

  • Research-article

Conference

PPoPP '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
  • (2023)Multiprovision: a Design Space Exploration tool for multi-tenant resource provisioning in CPU–GPU environmentsDesign Automation for Embedded Systems10.1007/s10617-023-09279-327:4(241-273)Online publication date: 21-Dec-2023
  • (2022)DopiaProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508421(32-45)Online publication date: 2-Apr-2022
  • (2022)GraphCL: A Framework for Execution of Data-Flow Graphs on Multi-Device Platforms2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00026(116-121)Online publication date: Mar-2022
  • (2022)Lightning: Scaling the GPU Programming Model Beyond a Single GPU2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00054(492-503)Online publication date: May-2022
  • (2021)SnuRHACProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460647(107-120)Online publication date: 21-Jun-2021
  • (2021)TRIPP: Transparent Resource Provisioning for Multi-Tenant CPU-GPU based Cloud Environments2021 XI Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC53686.2021.9628223(1-8)Online publication date: 22-Nov-2021
  • (2021)IRIS: A Portable Runtime System Exploiting Multiple Heterogeneous Programming Systems2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622873(1-8)Online publication date: 20-Sep-2021
  • (2020)The Minos Computing LibraryProceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/3366428.3380770(1-10)Online publication date: 23-Feb-2020
  • (2020)Pythia: Scheduling of Concurrent Network Packet Processing Applications on Heterogeneous Devices2020 6th IEEE Conference on Network Softwarization (NetSoft)10.1109/NetSoft48620.2020.9165447(145-149)Online publication date: Jun-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media