Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1572769.1572797acmconferencesArticle/Chapter ViewAbstractPublication PageshpgConference Proceedingsconference-collections
research-article

Stream compaction for deferred shading

Published: 01 August 2009 Publication History
  • Get Citation Alerts
  • Abstract

    The GPU leverages SIMD efficiency when shading because it rasterizes a triangle at a time, running the same shader on all of its fragments. Ray tracing sacrifices this shader coherence, and the result is that SIMD units often must run different shaders simultaneously resulting in serialization. We study this problem and define a new measure called heterogeneous efficiency to measure SIMD divergence among multiple shaders of different complexities in a ray tracing application. We devise seven different algorithms for scheduling shaders onto SIMD processors to avoid divergence. In all but simply shaded scenes, we show the expense of sorting shaders pays off with better overall shading performance.

    References

    [1]
    Blelloch, G. E. 1990. Prefix sums and their applications. Tech. Rep. CMU-CS-90-190.
    [2]
    Carr, N. A., Hall, J. D., and Hart, J. C. 2002. The ray engine. In Proc. Graphics Hardware 2002, 37--46.
    [3]
    Carr, N. A., Hoberock, J., Crane, K., and Hart, J. C. 2006. Fast GPU ray tracing of dynamic meshes using geometry images. In Proc. Graphics Interface, 203--209.
    [4]
    Deering, M., Winner, S., Schediwy, B., Duffy, C., and Hunt, N. 1988. The triangle processor and normal vector shader: a VLSI system for high performance graphics. In Proc. SIGGRAPH, 21--30.
    [5]
    Foley, T., and Sugerman, J. 2005. Kd-tree acceleration structures for a GPU raytracer. In Proc. Graphics Hardware, 15--22.
    [6]
    Fuller, S. 1998. Motorola's altivec technology. Tech. rep., Network and Computing Core Technologies, Motorola Inc.
    [7]
    Fung, W. W. L., Sham, I., Yuan, G., and Aamodt, T. M. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In Proc. MICRO, 407--420.
    [8]
    Gschwind, M. 2007. The Cell broadband engine: Exploiting multiple levels of parallelism in a chip multiprocessor. Intl. J. of Parallel Programming 35, 3, 233--262.
    [9]
    Günther, J., Popov, S., Seidel, H.-P., and Slusallek, P. 2007. Realtime ray tracing on GPU with BVH-based packet traversal. In Proc. Sym. Interactive Ray Tracing, 113--118.
    [10]
    Horn, D. R., Sugerman, J., Houston, M., and Hanrahan, P. 2007. Interactive k-d tree gpu raytracing. In Proc. I3D, 167--174.
    [11]
    Kajiya, J. T. 1986. The rendering equation. In Proc. SIGGRAPH, 143--150.
    [12]
    Lindholm, E., Nickolls, J., Oberman, E., and Montrym, J. 2008. Nvidia tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55.
    [13]
    Månsson, E., Munkberg, J., and Akenine-Möller, T. 2007. Deep coherent ray tracing. In Proc. Sym. Interactive Ray Tracing, 79--85.
    [14]
    Perlin, K. 1985. An image synthesizer. (Proc. SIGGRAPH) Comput. Graph. 19, 3, 287--296.
    [15]
    Pharr, M., Kolb, C., Gershbein, R., and Hanrahan, P. 1997. Rendering complex scenes with memory-coherent ray tracing. In Proc. SIGGRAPH, 101--108.
    [16]
    Popa, T. S. 2004. Compiling Data Dependent Control Flow on SIMD GPUs. Master's thesis. Advisor-Michael McCool.
    [17]
    Popov, S., Günther, J., Seidel, H.-P., and Slusallek, P. 2007. Stackless kd-tree traversal for high performance GPU ray tracing. (Proc. Eurographics) CGF 26, 3, 415--424.
    [18]
    Purcell, T. J., Buck, I., Mark, W. R., and Hanrahan, P. 2002. Ray tracing on programmable graphics hardware. (Proc. SIGGRAPH) ACM TOG 21, 3 (July), 703--712.
    [19]
    Ramanathan, R., Curry, R., Chennupaty, S., Cross, R. L., Kuo, S., and Buxton, M. J. 2006. Extending the world's most popular processor architecture. Tech. rep., Intel.
    [20]
    Saito, T., and Takahashi, T. 1990. Comprehensible rendering of 3-d shapes. (Proc. SIGGRAPH) CG 24, 4, 197--206.
    [21]
    Satish, N., Harris, M., and Garland, M. 2009. Designing efficient sorting algorithms for manycore GPUs. In Proc. Int'l Par. & Dist. Proc. Sym.
    [22]
    Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., and Hanrahan, P. 2008. Larrabee: a many-core x86 architecture for visual computing. (Proc. SIGGRAPH) ACM TOG 27, 3, 1--15.
    [23]
    Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for GPU computing. In Proc. Graphics Hardware, 97--106.
    [24]
    Thrane, N., and Simonsen, L. O. 2005. A comparison of acceleration structures for GPU assisted ray tracing. Master's thesis, University of Aarhus, Denmark.
    [25]
    Wald, I., Benthin, C., Wagner, M., and Slusallek, P. 2001. Interactive rendering with coherent ray tracing. (Proc. Eurographics), CGF 20, 3, 153--164.
    [26]
    Wald, I., Purcell, T. J., Schmittler, J., Benthin, C., and Slusallek, P. 2003. Realtime Ray Tracing and its use for Interactive Global Illumination. In Eurographics STAR.
    [27]
    Wald, I., Gribble, C. P., Boulos, S., and Kensler, A. 2007. SIMD Ray Stream Tracing - SIMD Ray Traversal with Generalized Ray Packets and On-the-fly Re-Ordering. Tech. Rep. UUSCI-2007-012.
    [28]
    Wald, I., Mark, W. R., Günther, J., Boulos, S., Ize, T., Hunt, W., Parker, S. G., and Shirley, P. 2007. State of the Art in Ray Tracing Animated Scenes. In Eurographics STAR.

    Cited By

    View all
    • (2022)GPU Subwarp Interleaving2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00090(1184-1197)Online publication date: Apr-2022
    • (2020)Massively Parallel Rule-Based Interpreter Execution on GPUs Using Thread CompactionInternational Journal of Parallel Programming10.1007/s10766-020-00670-2Online publication date: 24-Jun-2020
    • (2018)General-Purpose Graphics Processor ArchitecturesSynthesis Lectures on Computer Architecture10.2200/S00848ED1V01Y201804CAC04413:2(1-140)Online publication date: 21-May-2018
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPG '09: Proceedings of the Conference on High Performance Graphics 2009
    August 2009
    185 pages
    ISBN:9781605586038
    DOI:10.1145/1572769
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 August 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    HPG 2009
    Sponsor:
    HPG 2009: High Performance Graphics
    August 1 - 3, 2009
    Louisiana, New Orleans

    Acceptance Rates

    Overall Acceptance Rate 15 of 44 submissions, 34%

    Upcoming Conference

    HPG '24
    High-Performance Graphics
    July 26 - 28, 2024
    Denver , CO , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)GPU Subwarp Interleaving2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00090(1184-1197)Online publication date: Apr-2022
    • (2020)Massively Parallel Rule-Based Interpreter Execution on GPUs Using Thread CompactionInternational Journal of Parallel Programming10.1007/s10766-020-00670-2Online publication date: 24-Jun-2020
    • (2018)General-Purpose Graphics Processor ArchitecturesSynthesis Lectures on Computer Architecture10.2200/S00848ED1V01Y201804CAC04413:2(1-140)Online publication date: 21-May-2018
    • (2017)PrivBayesACM Transactions on Database Systems10.1145/313442842:4(1-41)Online publication date: 27-Oct-2017
    • (2017)Vectorized production path tracingProceedings of High Performance Graphics10.1145/3105762.3105768(1-11)Online publication date: 28-Jul-2017
    • (2016)Local shading coherence extraction for SIMD-efficient path tracing on CPUsProceedings of High Performance Graphics10.5555/2977336.2977352(119-128)Online publication date: 20-Jun-2016
    • (2016)Fast parallel stream compaction for IA-based multi/many-core processorsProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.112(736-745)Online publication date: 16-May-2016
    • (2015)Accelerating Occlusion Rendering on a GPU via Ray ClassificationInternational Journal of Creative Interfaces and Computer Graphics10.4018/IJCICG.20150701016:2(1-17)Online publication date: 1-Jul-2015
    • (2015)Complex shading efficiently for ray tracing on GPUMultimedia Tools and Applications10.1007/s11042-013-1712-574:3(1091-1106)Online publication date: 1-Feb-2015
    • (2014)Fine-grain task aggregation and coordination on GPUsProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665701(181-192)Online publication date: 14-Jun-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media