Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3168806acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Register allocation for Intel processor graphics

Published: 24 February 2018 Publication History

Abstract

Register allocation is a well-studied problem, but surprisingly little work has been published on assigning registers for GPU architectures. In this paper we present the register allocator in the production compiler for Intel HD and Iris Graphics. Intel GPUs feature a large byte-addressable register file organized into banks, an expressive instruction set that supports variable SIMD-sizes and divergent control flow, and high spill overhead due to relatively long memory latencies. These distinctive characteristics impose challenges for register allocation, as input programs may have arbitrarily-sized variables, partial updates, and complex control flow. Not only should the allocator make a program spill-free, but it must also reduce the number of register bank conflicts and anti-dependencies. Since compilation occurs in a JIT environment, the allocator also needs to incur little overhead.
To manage compilation overhead, our register allocation framework adopts a hybrid approach that separates the assignment of local and global variables. Several extensions are introduced to the traditional graph-coloring algorithm to support variables with different sizes and to accurately model liveness under divergent branches. Different assignment polices are applied to exploit the trade-offs between minimizing register usage and avoiding bank conflicts and anti-dependencies. Experimental results show our framework produces very few spilling kernels and can improve RA JIT time by up to 4x over pure graph-coloring. Our round-robin and bank-conflict-reduction assignment policies can also achieve up to 20% runtime improvements.

References

[1]
2K Games. 2013. Bioshock Infinite. (2013). https://bioshockinfinite.ghoststorygames.com/.
[2]
Advanced Micro Devices. 2016. AMD GCN3 ISA Architecture Manual. (2016). http://gpuopen.com/compute-product/amd-gcn3-isa-architecture-manual/.
[3]
Apple. 2017. Apple Metal 2. (2017). https://developer.apple.com/metal/.
[4]
Blizzard Entertainment. 2016. World of Warcraft: Legion. (2016). https://worldofwarcraft.com/.
[5]
David Blythe. 2006. The direct3d 10 system. In ACM Transactions on Graphics (TOG), Vol. 25. ACM, 724-734.
[6]
Matthias Braun and Sebastian Hack. 2009. Register Spilling and Live-Range Splitting for SSA-Form Programs. In Proceedings of the 18th International Conference on Compiler Construction: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (CC '09). Springer-Verlag, Berlin, Heidelberg, 174-189.
[7]
Preston Briggs, Keith D Cooper, and Linda Torczon. 1994. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 16, 3 (1994), 428-455.
[8]
John Cavazos, J. Eliot B. Moss, and Michael F. P. O'Boyle. 2006. Hybrid Optimizations: Which Optimization Algorithm to Use?. In Proceedings of the 15th International Conference on Compiler Construction (CC'06). Springer-Verlag, Berlin, Heidelberg, 124-138.
[9]
Gregory Chaitin. 2004. Register allocation and spilling via graph coloring. Acm Sigplan Notices 39, 4 (2004), 66-74.
[10]
Fred C Chow and John L Hennessy. 1990. The priority-based coloring approach to register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 12, 4 (1990), 501-536.
[11]
Quentin Colombet, Benoit Boissinot, Philip Brisk, Sebastian Hack, and Fabrice Rastello. 2011. Graph-coloring and Treescan Register Allocation Using Repairing. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES '11). ACM, New York, NY, USA, 45-54.
[12]
Gregory Diamos, Benjamin Ashbaugh, Subramaniam Maiyuran, Andrew Kerr, Haicheng Wu, and Sudhakar Yalamanchili. 2011. SIMD re-convergence at thread frontiers. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 477-488.
[13]
Ahmed ElTantawy, Jessica Wenjie Ma, Mike O'Connor, and Tor M Aamodt. 2014. A scalable multi-path microarchitecture for efficient GPU control flow. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on. IEEE, 248-259.
[14]
Wilson WL Fung, Ivan Sham, George Yuan, and Tor M Aamodt. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 407-420.
[15]
Futuremark. 2017. 3DMark. (2017). https://www.futuremark.com/benchmarks/3dmark.
[16]
Futuremark. 2017. VRMark - the virtual reality benchmark. (2017). https://www.futuremark.com/benchmarks/vrmark.
[17]
James R Goodman and W-C Hsu. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the 2nd international conference on Supercomputing. ACM, 442-452.
[18]
Sebastian Hack, Daniel Grund, and Gerhard Goos. 2006. Register allocation for programs in SSA-form. CC 6 (2006), 247-262.
[19]
Tianyi David Han and Tarek S Abdelrahman. 2011. Reducing branch divergence in GPU programs. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units. ACM, 3.
[20]
Ari B Hayes and Eddy Z Zhang. 2014. Unified on-chip memory allocation for SIMT architecture. In Proceedings of the 28th ACM international conference on Supercomputing. ACM, 293-302.
[21]
Intel 2017. Intel Graphics for Linux documentation. (2017). https://01.org/linuxgraphics/documentation.
[22]
Intel 2017. Intel Processor Graphics. (2017). https://software.intel.com/en-us/articles/intel-graphics-developers-guides.
[23]
Chi keung Luk. 2015. MICRO48-Tutorial on Intel Processor Graphics: Architecture and Programming. (2015). https://software.intel.com/en-us/blogs/2015/08/27/micro48-tutorial-on-intel-processor-graphics-architecture-and-programming.
[24]
Kishonti Ltd. 2017. CompuBench - performance benchmark for various compute APIs. (2017). https://compubench.com/.
[25]
Kishonti Ltd. 2017. GFXBench. (2017). https://gfxbench.com/benchmark.jsp.
[26]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097-1105.
[27]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04). IEEE Computer Society, Washington, DC, USA, 75-. http://dl.acm.org/citation.cfm?id=977395.977673
[28]
Guei-Yuan Lueh, Thomas Gross, and Ali-Reza Adl-Tabatabai. 2000. Fusion-based Register Allocation. ACM Trans. Program. Lang. Syst. 22, 3 (May 2000), 431-470.
[29]
Sparsh Mittal. 2017. A survey of techniques for architecting and managing GPU register file. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2017), 16-28.
[30]
Rajeev Motwani, Krishna V Palem, Vivek Sarkar, and Salem Reyen. 1995. Combining register allocation and instruction scheduling. Courant Institute, New York University (1995).
[31]
Brian R Nickerson. 1990. Graph coloring register allocation for processors with multi-register operands. In ACM SIGPLAN Notices, Vol. 25. ACM, 40-52.
[32]
Nvidia 2016. Pascal Architecture Whitepaper. (2016). https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf.
[33]
Fernando Magno Quintao Pereira and Jens Palsberg. 2005. Register allocation via coloring of chordal graphs. In APLAS, Vol. 5. Springer, 315-329.
[34]
Shlomit S Pinter. 1993. Register allocation with instruction scheduling. In ACM SIGPLAN Notices, Vol. 28. ACM, 248-257.
[35]
Massimiliano Poletto and Vivek Sarkar. 1999. Linear scan register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 5 (1999), 895-913.
[36]
Fernando Magno Quintão Pereira and Jens Palsberg. 2008. Register Allocation by Puzzle Solving. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). ACM, New York, NY, USA, 216-226.
[37]
Minsoo Rhu and Mattan Erez. 2013. The dual-path execution model for efficient GPU control flow. In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 591-602.
[38]
Johan Runeson and Sven-Olof Nyström. 2003. Retargetable graph-coloring register allocation for irregular architectures. In SCOPES. Springer, 240-254.
[39]
Diogo Sampaio, Rafael Martins de Souza, Sylvain Collange, and Fernando Magno Quintão Pereira. 2014. Divergence Analysis. ACM Trans. Program. Lang. Syst. 35, 4, Article 13 (Jan. 2014), 36 pages.
[40]
Dave Shreiner, Graham Sellers, John Kessenich, and Bill Licea-Kane. 2013. OpenGL programming guide: The Official guide to learning OpenGL, version 4.3. Addison-Wesley.
[41]
SiSoftware. 2016. Sandra Platinum. (2016). http://www.sisoftware.net.
[42]
Michael D. Smith, Norman Ramsey, and Glenn Holloway. 2004. A Generalized Algorithm for Graph-coloring Register Allocation. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI '04). ACM, New York, NY, USA, 277-288.
[43]
John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3 (2010), 66-73.
[44]
The Khronos Group. 2016. Vulkan Overview. (2016). https://www.khronos.org/vulkan/.
[45]
Xiaolong Xie, Yun Liang, Xiuhong Li, Yudong Wu, Guangyu Sun, Tao Wang, and Dongrui Fan. 2015. Enabling coordinated register allocation and thread-level parallelism optimization for GPUs. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 395-406.

Cited By

View all
  • (2024)PresCount: Effective Register Allocation for Bank Conflict ReductionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444841(170-181)Online publication date: 2-Mar-2024
  • (2024)Giraph-Based Distributed Algorithms for Coloring Large-Scale GraphsInternational Journal of Parallel Programming10.1007/s10766-024-00781-053:1Online publication date: 17-Oct-2024
  • (2023)rNdN: Fast Query Compilation for NVIDIA GPUsACM Transactions on Architecture and Code Optimization10.1145/360350320:3(1-25)Online publication date: 19-Jul-2023
  • Show More Cited By

Index Terms

  1. Register allocation for Intel processor graphics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '18: Proceedings of the 2018 International Symposium on Code Generation and Optimization
    February 2018
    377 pages
    ISBN:9781450356176
    DOI:10.1145/3179541
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 February 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPU
    2. JIT compilers
    3. Register Allocation

    Qualifiers

    • Research-article

    Conference

    CGO '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)PresCount: Effective Register Allocation for Bank Conflict ReductionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444841(170-181)Online publication date: 2-Mar-2024
    • (2024)Giraph-Based Distributed Algorithms for Coloring Large-Scale GraphsInternational Journal of Parallel Programming10.1007/s10766-024-00781-053:1Online publication date: 17-Oct-2024
    • (2023)rNdN: Fast Query Compilation for NVIDIA GPUsACM Transactions on Architecture and Code Optimization10.1145/360350320:3(1-25)Online publication date: 19-Jul-2023
    • (2023)RL4ReAl: Reinforcement Learning for Register AllocationProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580273(133-144)Online publication date: 17-Feb-2023
    • (2023)A new distributed graph coloring algorithm for large graphsCluster Computing10.1007/s10586-023-03988-x27:1(875-891)Online publication date: 23-Mar-2023
    • (2022)Compiler-Directed Incremental Checkpointing for Low Latency GPU Preemption2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00078(751-761)Online publication date: May-2022
    • (2022)High-performance and balanced parallel graph coloring on multicore platformsThe Journal of Supercomputing10.1007/s11227-022-04894-679:6(6373-6421)Online publication date: 7-Nov-2022
    • (2022)Energy Reduction Method by Compiler OptimizationArtificial Intelligence and Security10.1007/978-3-031-06794-5_54(672-683)Online publication date: 15-Jul-2022
    • (2022)Research on global register allocation for code containing array‐unit dual‐usage register namesConcurrency and Computation: Practice and Experience10.1002/cpe.751935:19Online publication date: 5-Dec-2022
    • (2021)C-for-metalProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370324(289-300)Online publication date: 27-Feb-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media