research-article

Stream compaction for deferred shading

Authors:

Jared Hoberock,

Yuntao Jia, and

John C. HartAuthors Info & Claims

HPG '09: Proceedings of the Conference on High Performance Graphics 2009

August 2009

Pages 173 - 180

https://doi.org/10.1145/1572769.1572797

Published: 01 August 2009 Publication History

Abstract

The GPU leverages SIMD efficiency when shading because it rasterizes a triangle at a time, running the same shader on all of its fragments. Ray tracing sacrifices this shader coherence, and the result is that SIMD units often must run different shaders simultaneously resulting in serialization. We study this problem and define a new measure called heterogeneous efficiency to measure SIMD divergence among multiple shaders of different complexities in a ray tracing application. We devise seven different algorithms for scheduling shaders onto SIMD processors to avoid divergence. In all but simply shaded scenes, we show the expense of sorting shaders pays off with better overall shading performance.

References

[1]

Blelloch, G. E. 1990. Prefix sums and their applications. Tech. Rep. CMU-CS-90-190.

[2]

Carr, N. A., Hall, J. D., and Hart, J. C. 2002. The ray engine. In Proc. Graphics Hardware 2002, 37--46.

Digital Library

[3]

Carr, N. A., Hoberock, J., Crane, K., and Hart, J. C. 2006. Fast GPU ray tracing of dynamic meshes using geometry images. In Proc. Graphics Interface, 203--209.

Digital Library

[4]

Deering, M., Winner, S., Schediwy, B., Duffy, C., and Hunt, N. 1988. The triangle processor and normal vector shader: a VLSI system for high performance graphics. In Proc. SIGGRAPH, 21--30.

Digital Library

[5]

Foley, T., and Sugerman, J. 2005. Kd-tree acceleration structures for a GPU raytracer. In Proc. Graphics Hardware, 15--22.

Digital Library

[6]

Fuller, S. 1998. Motorola's altivec technology. Tech. rep., Network and Computing Core Technologies, Motorola Inc.

[7]

Fung, W. W. L., Sham, I., Yuan, G., and Aamodt, T. M. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In Proc. MICRO, 407--420.

Digital Library

[8]

Gschwind, M. 2007. The Cell broadband engine: Exploiting multiple levels of parallelism in a chip multiprocessor. Intl. J. of Parallel Programming 35, 3, 233--262.

Digital Library

[9]

G&#252;nther, J., Popov, S., Seidel, H.-P., and Slusallek, P. 2007. Realtime ray tracing on GPU with BVH-based packet traversal. In Proc. Sym. Interactive Ray Tracing, 113--118.

Digital Library

[10]

Horn, D. R., Sugerman, J., Houston, M., and Hanrahan, P. 2007. Interactive k-d tree gpu raytracing. In Proc. I3D, 167--174.

Digital Library

[11]

Kajiya, J. T. 1986. The rendering equation. In Proc. SIGGRAPH, 143--150.

Digital Library

[12]

Lindholm, E., Nickolls, J., Oberman, E., and Montrym, J. 2008. Nvidia tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55.

Digital Library

[13]

M&#229;nsson, E., Munkberg, J., and Akenine-M&#246;ller, T. 2007. Deep coherent ray tracing. In Proc. Sym. Interactive Ray Tracing, 79--85.

Digital Library

[14]

Perlin, K. 1985. An image synthesizer. (Proc. SIGGRAPH) Comput. Graph. 19, 3, 287--296.

Digital Library

[15]

Pharr, M., Kolb, C., Gershbein, R., and Hanrahan, P. 1997. Rendering complex scenes with memory-coherent ray tracing. In Proc. SIGGRAPH, 101--108.

Digital Library

[16]

Popa, T. S. 2004. Compiling Data Dependent Control Flow on SIMD GPUs. Master's thesis. Advisor-Michael McCool.

[17]

Popov, S., G&#252;nther, J., Seidel, H.-P., and Slusallek, P. 2007. Stackless kd-tree traversal for high performance GPU ray tracing. (Proc. Eurographics) CGF 26, 3, 415--424.

[18]

Purcell, T. J., Buck, I., Mark, W. R., and Hanrahan, P. 2002. Ray tracing on programmable graphics hardware. (Proc. SIGGRAPH) ACM TOG 21, 3 (July), 703--712.

Digital Library

[19]

Ramanathan, R., Curry, R., Chennupaty, S., Cross, R. L., Kuo, S., and Buxton, M. J. 2006. Extending the world's most popular processor architecture. Tech. rep., Intel.

[20]

Saito, T., and Takahashi, T. 1990. Comprehensible rendering of 3-d shapes. (Proc. SIGGRAPH) CG 24, 4, 197--206.

Digital Library

[21]

Satish, N., Harris, M., and Garland, M. 2009. Designing efficient sorting algorithms for manycore GPUs. In Proc. Int'l Par. & Dist. Proc. Sym.

Digital Library

[22]

Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., and Hanrahan, P. 2008. Larrabee: a many-core x86 architecture for visual computing. (Proc. SIGGRAPH) ACM TOG 27, 3, 1--15.

Digital Library

[23]

Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for GPU computing. In Proc. Graphics Hardware, 97--106.

Digital Library

[24]

Thrane, N., and Simonsen, L. O. 2005. A comparison of acceleration structures for GPU assisted ray tracing. Master's thesis, University of Aarhus, Denmark.

[25]

Wald, I., Benthin, C., Wagner, M., and Slusallek, P. 2001. Interactive rendering with coherent ray tracing. (Proc. Eurographics), CGF 20, 3, 153--164.

Digital Library

[26]

Wald, I., Purcell, T. J., Schmittler, J., Benthin, C., and Slusallek, P. 2003. Realtime Ray Tracing and its use for Interactive Global Illumination. In Eurographics STAR.

Digital Library

[27]

Wald, I., Gribble, C. P., Boulos, S., and Kensler, A. 2007. SIMD Ray Stream Tracing - SIMD Ray Traversal with Generalized Ray Packets and On-the-fly Re-Ordering. Tech. Rep. UUSCI-2007-012.

[28]

Wald, I., Mark, W. R., G&#252;nther, J., Boulos, S., Ize, T., Hunt, W., Parker, S. G., and Shirley, P. 2007. State of the Art in Ray Tracing Animated Scenes. In Eurographics STAR.

Cited By

Damani SStephenson MRangan RJohnson DKulkami RKeckler S(2022)GPU Subwarp Interleaving2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00090(1184-1197)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00090
Köster MGroß JKrüger A(2020)Massively Parallel Rule-Based Interpreter Execution on GPUs Using Thread CompactionInternational Journal of Parallel Programming10.1007/s10766-020-00670-2Online publication date: 24-Jun-2020
https://doi.org/10.1007/s10766-020-00670-2
Aamodt TFung WRogers T(2018)General-Purpose Graphics Processor ArchitecturesSynthesis Lectures on Computer Architecture10.2200/S00848ED1V01Y201804CAC04413:2(1-140)Online publication date: 21-May-2018
https://doi.org/10.2200/S00848ED1V01Y201804CAC044
Show More Cited By

Index Terms

Stream compaction for deferred shading
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
    2. Image manipulation
      1. Texturing
  2. Parallel computing methodologies
2. Information systems
  1. Data management systems
    1. Data structures
      1. Data layout
        Data compression

Recommendations

Adaptive deferred shading
I3D '16: Proceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games

The primary advantage of deferred shading is eliminating wasted shading operations for fragments that are occluded by others during rendering. Deferred shading starts by rendering the entire scene into a series of temporary buffers (a geometry buffer or ...
Read More
Subpixel reconstruction antialiasing for deferred shading
I3D '11: Symposium on Interactive 3D Graphics and Games

Subpixel Reconstruction Antialiasing (SRAA) combines singlepixel (1x) shading with subpixel visibility to create antialiased images without increasing the shading cost. SRAA targets deferred-shading renderers, which cannot use multisample antialiasing.

...
Read More
Decoupled deferred shading for hardware rasterization
I3D '12: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games

In this paper we present decoupled deferred shading: a rendering technique based on a new data structure called compact geometry buffer, which stores shading samples independently from the visibility. This enables caching and efficient reuse of shading ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPG '09: Proceedings of the Conference on High Performance Graphics 2009

August 2009

185 pages

ISBN:9781605586038

DOI:10.1145/1572769

Editors:
Stephen N. Spencer
University of Washington
,
David McAllister
NVIDIA
,
Matt Pharr
Intel
,
Ingo Wald
Intel
,
General Chairs:
David Luebke
NVIDIA
,
Philipp Slusallek
DFKI & Saarland University

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
EUROGRAPHICS: The European Association for Computer Graphics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

HPG 2009

Sponsor:

SIGGRAPH
EUROGRAPHICS

HPG 2009: High Performance Graphics

August 1 - 3, 2009

Louisiana, New Orleans

Acceptance Rates

Overall Acceptance Rate 15 of 44 submissions, 34%

Upcoming Conference

HPG '24

Sponsor:
siggraph

High-Performance Graphics

July 26 - 28, 2024

Denver , CO , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
566
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Damani SStephenson MRangan RJohnson DKulkami RKeckler S(2022)GPU Subwarp Interleaving2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00090(1184-1197)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00090
Köster MGroß JKrüger A(2020)Massively Parallel Rule-Based Interpreter Execution on GPUs Using Thread CompactionInternational Journal of Parallel Programming10.1007/s10766-020-00670-2Online publication date: 24-Jun-2020
https://doi.org/10.1007/s10766-020-00670-2
Aamodt TFung WRogers T(2018)General-Purpose Graphics Processor ArchitecturesSynthesis Lectures on Computer Architecture10.2200/S00848ED1V01Y201804CAC04413:2(1-140)Online publication date: 21-May-2018
https://doi.org/10.2200/S00848ED1V01Y201804CAC044
Zhang JCormode GProcopiuc CSrivastava DXiao X(2017)PrivBayesACM Transactions on Database Systems10.1145/313442842:4(1-41)Online publication date: 27-Oct-2017
https://dl.acm.org/doi/10.1145/3134428
Lee MGreen BXie FTabellion EMcGuire MPatney A(2017)Vectorized production path tracingProceedings of High Performance Graphics10.1145/3105762.3105768(1-11)Online publication date: 28-Jul-2017
https://dl.acm.org/doi/10.1145/3105762.3105768
Áfra ABenthin CWald IMunkberg JLuebke DMolnar S(2016)Local shading coherence extraction for SIMD-efficient path tracing on CPUsProceedings of High Performance Graphics10.5555/2977336.2977352(119-128)Online publication date: 20-Jun-2016
https://dl.acm.org/doi/10.5555/2977336.2977352
Sun QYang CWu CLi LLiu FVarela CCastro HBarrios C(2016)Fast parallel stream compaction for IA-based multi/many-core processorsProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.112(736-745)Online publication date: 16-May-2016
https://dl.acm.org/doi/10.1109/CCGrid.2016.112
Costa VPereira JJorge J(2015)Accelerating Occlusion Rendering on a GPU via Ray ClassificationInternational Journal of Creative Interfaces and Computer Graphics10.4018/IJCICG.20150701016:2(1-17)Online publication date: 1-Jul-2015
https://dl.acm.org/doi/10.4018/IJCICG.2015070101
Yang XXu DZhao LYang B(2015)Complex shading efficiently for ray tracing on GPUMultimedia Tools and Applications10.1007/s11042-013-1712-574:3(1091-1106)Online publication date: 1-Feb-2015
https://dl.acm.org/doi/10.1007/s11042-013-1712-5
Orr MBeckmann BReinhardt SWood DYew PZhai AKeckler S(2014)Fine-grain task aggregation and coordination on GPUsProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665701(181-192)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.5555/2665671.2665701
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents