poster

An integer programming framework for optimizing shared memory use on GPUs

Authors:

Wenjing Ma,

Gagan AgrawalAuthors Info & Claims

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Pages 553 - 554

https://doi.org/10.1145/1854273.1854348

Published: 11 September 2010 Publication History

Get Access

Abstract

General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern NVIDIA GPUs have a very small programmable cache, referred to as shared memory, accesses to which are nearly 100 to 150 times faster than accesses to the regular device memory. An automatically generated or hand-written CUDA program can explicitly control what variables and array sections are allocated on the shared memory at any point during the execution. This, however, leads to a difficult optimization problem.

In this paper, we formulate and solve the shared memory allocation problem as an integer linear programming problem. We present a global (intraprocedural) framework which can model structured control flow, and is not restricted to a single loop nest. We consider allocation of scalars, arrays, and array sections on shared memory. We also briefly show how our framework can suggest useful loop transformations to further improve performance. Our experiments using several non-scientific application show that our integer programming framework outperforms a recently published heuristic method, and our loop transformations also improve performance for many applications.

References

[1]

}}Arthur Dempster, Nan Laird, and Donald Rubin. Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.

Google Scholar

[2]

}}Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004.

Digital Library

Google Scholar

[3]

}}Wenjing Ma and Gagan Agrawal. A translation system for enabling data mining applications on GPUs. In ICS '09: Proceedings of the 23rd international conference on Conference on Supercomputing, pages 400--409, New York, NY, USA, 2009. ACM.

Digital Library

Google Scholar

[4]

}}Andrew Makhorin. http://www.gnu.org/software/glpk/.

Google Scholar

Cited By

View all

Jatala VAnantpur JKarkare A(2017)Scratchpad Sharing in GPUsACM Transactions on Architecture and Code Optimization10.1145/307561914:2(1-29)Online publication date: 26-May-2017
https://dl.acm.org/doi/10.1145/3075619
Li PHu XChen DBrock JLuo HZhang EDing C(2017)LDACM Transactions on Architecture and Code Optimization10.1145/304667814:1(1-25)Online publication date: 21-Mar-2017
https://dl.acm.org/doi/10.1145/3046678
Chen GShen XWu BLi D(2017)Optimizing Data Placement on GPU MemoryIEEE Transactions on Computers10.1109/TC.2016.260437266:3(473-487)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1109/TC.2016.2604372
Show More Cited By

Index Terms

An integer programming framework for optimizing shared memory use on GPUs
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

A unified optimizing compiler framework for different GPGPU architectures

This article presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and ...
Evaluating the Performance of Integer Sum Reduction in SYCL on GPUs
ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop

SYCL is a promising programming model for heterogeneous computing—allowing a single-source code to target devices from multiple vendors. One significant task performed on these accelerators is a primitive operation for integer sum reduction. This paper ...
Programming for GPUs: The Directive-Based Approach
3PGCIC '13: Proceedings of the 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

In the last years, hardware accelerators, such as GPUs have become ubiquitous in the HPC landscape and GPGPU has been massively adopted by the HPC research community. If something is slowing down further expansion of this technology are its difficulties ...

Comments

Information & Contributors

Information

Published In

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

September 2010

596 pages

ISBN:9781450301787

DOI:10.1145/1854273

General Chair:
Valentina Salapura
IBM TJ Watson Research Center
,
Program Chairs:
Michael Gschwind
IBM Systems & Technology Group
,
Jens Knoop
Technische Universität Wien

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

PACT '10

Sponsor:

IFIP WG 10.3
IEEE CS TCPP
SIGARCH
IEEE CS TCAA

PACT '10: International Conference on Parallel Architectures and Compilation Techniques

September 11 - 15, 2010

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Upcoming Conference

PACT '24

Sponsor:
sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Long Beach , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
377
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)2

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jatala VAnantpur JKarkare A(2017)Scratchpad Sharing in GPUsACM Transactions on Architecture and Code Optimization10.1145/307561914:2(1-29)Online publication date: 26-May-2017
https://dl.acm.org/doi/10.1145/3075619
Li PHu XChen DBrock JLuo HZhang EDing C(2017)LDACM Transactions on Architecture and Code Optimization10.1145/304667814:1(1-25)Online publication date: 21-Mar-2017
https://dl.acm.org/doi/10.1145/3046678
Chen GShen XWu BLi D(2017)Optimizing Data Placement on GPU MemoryIEEE Transactions on Computers10.1109/TC.2016.260437266:3(473-487)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1109/TC.2016.2604372
Huang YLi D(2017)Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.42(166-177)Online publication date: Sep-2017
https://doi.org/10.1109/CLUSTER.2017.42
Chen GShen X(2016)Coherence-Free MultiviewProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926277(1-13)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1145/2925426.2926277
Jatala VAnantpur JKarkare ANakashima HTaura KLange J(2016)Improving GPU Performance Through Resource SharingProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907298(203-214)Online publication date: 31-May-2016
https://dl.acm.org/doi/10.1145/2907294.2907298
Chen GWu BLi DShen XFlautner KWenisch TOzer EFerdman M(2014)PORPLEProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.20(88-100)Online publication date: 13-Dec-2014
https://dl.acm.org/doi/10.1109/MICRO.2014.20
McDonell TChakravarty MKeller GLippmeier B(2013)Optimising purely functional GPU programsACM SIGPLAN Notices10.1145/2544174.250059548:9(49-60)Online publication date: 25-Sep-2013
https://dl.acm.org/doi/10.1145/2544174.2500595
McDonell TChakravarty MKeller GLippmeier BMorrisett GUustalu T(2013)Optimising purely functional GPU programsProceedings of the 18th ACM SIGPLAN international conference on Functional programming10.1145/2500365.2500595(49-60)Online publication date: 25-Sep-2013
https://dl.acm.org/doi/10.1145/2500365.2500595
Yilmaz MValluri PPagadrai S(2013)Hybrid Storage System Power OptimizationProceedings of the 2013 IEEE Green Technologies Conference10.1109/GreenTech.2013.51(285-292)Online publication date: 4-Apr-2013
https://dl.acm.org/doi/10.1109/GreenTech.2013.51
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A unified optimizing compiler framework for different GPGPU architectures

Evaluating the Performance of Integer Sum Reduction in SYCL on GPUs

Programming for GPUs: The Directive-Based Approach