Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1854273.1854348acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
poster

An integer programming framework for optimizing shared memory use on GPUs

Published: 11 September 2010 Publication History
  • Get Citation Alerts
  • Abstract

    General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern NVIDIA GPUs have a very small programmable cache, referred to as shared memory, accesses to which are nearly 100 to 150 times faster than accesses to the regular device memory. An automatically generated or hand-written CUDA program can explicitly control what variables and array sections are allocated on the shared memory at any point during the execution. This, however, leads to a difficult optimization problem.
    In this paper, we formulate and solve the shared memory allocation problem as an integer linear programming problem. We present a global (intraprocedural) framework which can model structured control flow, and is not restricted to a single loop nest. We consider allocation of scalars, arrays, and array sections on shared memory. We also briefly show how our framework can suggest useful loop transformations to further improve performance. Our experiments using several non-scientific application show that our integer programming framework outperforms a recently published heuristic method, and our loop transformations also improve performance for many applications.

    References

    [1]
    }}Arthur Dempster, Nan Laird, and Donald Rubin. Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.
    [2]
    }}Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004.
    [3]
    }}Wenjing Ma and Gagan Agrawal. A translation system for enabling data mining applications on GPUs. In ICS '09: Proceedings of the 23rd international conference on Conference on Supercomputing, pages 400--409, New York, NY, USA, 2009. ACM.
    [4]
    }}Andrew Makhorin. http://www.gnu.org/software/glpk/.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
    September 2010
    596 pages
    ISBN:9781450301787
    DOI:10.1145/1854273

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 September 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CUDA
    2. GPGPU
    3. ILP
    4. memory hierarchy

    Qualifiers

    • Poster

    Conference

    PACT '10
    Sponsor:
    • IFIP WG 10.3
    • IEEE CS TCPP
    • SIGARCH
    • IEEE CS TCAA

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Upcoming Conference

    PACT '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Scratchpad Sharing in GPUsACM Transactions on Architecture and Code Optimization10.1145/307561914:2(1-29)Online publication date: 26-May-2017
    • (2017)LDACM Transactions on Architecture and Code Optimization10.1145/304667814:1(1-25)Online publication date: 21-Mar-2017
    • (2017)Optimizing Data Placement on GPU MemoryIEEE Transactions on Computers10.1109/TC.2016.260437266:3(473-487)Online publication date: 1-Mar-2017
    • (2017)Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.42(166-177)Online publication date: Sep-2017
    • (2016)Coherence-Free MultiviewProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926277(1-13)Online publication date: 1-Jun-2016
    • (2016)Improving GPU Performance Through Resource SharingProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907298(203-214)Online publication date: 31-May-2016
    • (2014)PORPLEProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.20(88-100)Online publication date: 13-Dec-2014
    • (2013)Optimising purely functional GPU programsACM SIGPLAN Notices10.1145/2544174.250059548:9(49-60)Online publication date: 25-Sep-2013
    • (2013)Optimising purely functional GPU programsProceedings of the 18th ACM SIGPLAN international conference on Functional programming10.1145/2500365.2500595(49-60)Online publication date: 25-Sep-2013
    • (2013)Hybrid Storage System Power OptimizationProceedings of the 2013 IEEE Green Technologies Conference10.1109/GreenTech.2013.51(285-292)Online publication date: 4-Apr-2013
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media