research-article

GRAMPS: A programming model for graphics pipelines

Authors:

Jeremy Sugerman,

Kayvon Fatahalian,

Solomon Boulos,

Pat HanrahanAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 28, Issue 1

Article No.: 4, Pages 1 - 11

https://doi.org/10.1145/1477926.1477930

Published: 09 February 2009 Publication History

Abstract

We introduce GRAMPS, a programming model that generalizes concepts from modern real-time graphics pipelines by exposing a model of execution containing both fixed-function and application-programmable processing stages that exchange data via queues. GRAMPS allows the number, type, and connectivity of these processing stages to be defined by software, permitting arbitrary processing pipelines or even processing graphs. Applications achieve high performance using GRAMPS by expressing advanced rendering algorithms as custom pipelines, then using the pipeline as a rendering engine. We describe the design of GRAMPS, then evaluate it by implementing three pipelines, that is, Direct3D, a ray tracer, and a hybridization of the two, and running them on emulations of two different GRAMPS implementations: a traditional GPU-like architecture and a CPU-like multicore architecture. In our tests, our GRAMPS schedulers run our pipelines with 500 to 1500KB of queue usage at their peaks.

References

[1]

AMD. 2008a. AMD radeon HD 4800 product documentation. http://ati.amd.com/products/radeonhd4800.

[2]

AMD. 2008b. ATI stream computing web site. http://ati.amd.com/technology/streamcomputing/.

[3]

Bavoil, L., Callahan, S. P., Lefohn, A., Comba, J. L. D., and Silva, C. T. 2007. Multi-Fragment effects on the GPU using the k-buffer. In Proceedings of the Symposium on Interactive 3D Graphics and Games. ACM, New York, 97--104.

Digital Library

[4]

Blythe, D. 2006. The Direct3D 10 system. ACM Trans. Graphics 25, 3, 724--734.

Digital Library

[5]

Boulos, S., Edwards, D., Lacewell, J., Kniss, J., Kautz, J., Shirley, P., and Wald, I. 2007. Packet-Based Whitted and distribution ray tracing. In Proceedings of the Graphics Interface Conference, 177--184.

Digital Library

[6]

Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graphics 23, 3, 777--786.

Digital Library

[7]

Chen, J., Gordon, M. I., Thies, W., Zwicker, M., Pulli, K., and Durand, F. 2005. A reconfigurable architecture for load-balanced rendering. In Proceedings of the Workshop on Graphics Hardware. ACM, New York, 71--80.

Digital Library

[8]

Dally, W. J., Hanrahan, P., Erez, M., Knight, T. J., Labonte, F., A., J.-H., Jayasena, N., Kapasi, U. J., Das, A., Gummaraju, J., and Buck, I. 2003. Merrimac: Supercomputing with streams. In Proceedings of the ACM/IEEE Conference on Super Computing (SC'03).

Digital Library

[9]

Das, A., Dally, W. J., and Mattson, P. 2006. Compiling for stream processing. In Proceedings of the International Conference on Paralel Computing Technologies (PaCT'06), 33--42.

Digital Library

[10]

Foley, T. and Sugerman, J. 2005. KD-tree acceleration structures for a GPU raytracer. In Proceedings of the Workshop on Graphics Hardware. ACM, New York, 15--22.

Digital Library

[11]

Hall, R. and Greenberg, D. 1983. A testbed for realistic image synthesis. IEEE Comput. Graph. Appl. 3, 8, 10--20.

Digital Library

[12]

Hasselgren, J. and Akenine-Möller, T. 2007. PCU: The programmable culling unit. ACM Trans. Graphics 26, 3, 92.

Digital Library

[13]

Horn, D., Sugerman, J., Houston, M., and Hanrahan, P. 2007. Interactive k-D tree GPU raytracing. In Proceedings of the Symposium on Interactive 3D Graphics and Games. ACM, New York.

Digital Library

[14]

Intel. 2008. Intel thread building blocks product documentation. http://www.intel.com/cd/software/products/asmo-na/eng/294797.htm.

[15]

Kapasi, U., Dally, W. J., Rixner, S., Owens, J. D., and Khailany, B. 2002. The Imagine stream processor. In Proceedings IEEE International Conference on Computer Design, 282--288.

Digital Library

[16]

Kongetira, P., Aingaran, K., and Olukotun, K. 2005. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro 25, 2, 21--29.

Digital Library

[17]

Kumar, S., Hughes, C., and Nguyen, A. 2007. Carbon: Architectural support for fine-grained parallelism on chip multiprocessors. In Proceedings of the 34th Annual International Conference on Computer Architecture, 162--173.

Digital Library

[18]

Lindholm, E., Nickolls, J., Obermanan, S., and Montrym, J. 2008. NVIDIA Tesla: A graphics and computing architecture. IEEE Micro 28, 2, 39--55.

Digital Library

[19]

McCool, M., Toit, S. D., Popa, T., Chan, B., and Moule, K. 2004. Shader algebra. In Proceedings of the ACM SIGGRAPH'04 International Conference on Computer Graphics and Interactive Techniques. ACM, New York, 787--795.

Digital Library

[20]

MIPS Technologies Inc. 2005. MIPS64 architecture. http://mips.com/products/architectures/mips64/.

[21]

NVIDIA. 2007. NVIDIA CUDA programming guide. http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf.

[22]

Owens, J. D., Khailany, B., Towles, B., and Dally, W. J. 2002. Comparing Reyes and OpenGL on a stream architecture. In Proceedings of the Workshop on Graphics Hardware, 47--56.

Digital Library

[23]

Pham, D., Asano, S., Bolliger, M., Day, M., Hofstee, H., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., et al. 2005. The design and implementation of a first-generation CELL processor. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC'05), 184--186.

[24]

Purcell, T. J. 2004. Ray tracing on a stream processor. Ph.D. thesis, Stanford University.

Digital Library

[25]

Segal, M. and Akeley, K. 2006. The OpenGL 2.1 specification. http://www.opengl.org/registry/doc/glspec21.20061201.pdf.

[26]

Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., and Hanrahan, P. 2008. Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graphics 27, 3.

Digital Library

[27]

Tarditi, D., Puri, S., and Oglesby, J. 2006. Accelerator: Using data parallelism to program GPUs for general-purpose uses. SIGOPS Oper. Syst. Rev. 40, 5, 325--335.

Digital Library

[28]

Thies, W., Karczmarek, M., and Amarasinghe, S. 2002. StreamIt: A language for streaming applications. In International Conference on Compiler Construction.

Digital Library

Cited By

Wu QLi RBeard JJohn LRodríguez GSadayappan PSukumaran-Rajam A(2024)BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less QueuingProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641568(100-112)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641568
Willis BShrivastava AMack JDave SChakrabarti CBrunhaver J(2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TC.2023.3327504
Kim MBaek N(2021)A 3D graphics rendering pipeline implementation based on the openCL massively parallel processingThe Journal of Supercomputing10.1007/s11227-020-03581-877:7(7351-7367)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1007/s11227-020-03581-8
Show More Cited By

Index Terms

GRAMPS: A programming model for graphics pipelines
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
  2. Parallel computing methodologies

Recommendations

Extending the graphics pipeline with adaptive, multi-rate shading

Due to complex shaders and high-resolution displays (particularly on mobile graphics platforms), fragment shading often dominates the cost of rendering in games. To improve the efficiency of shading on GPUs, we extend the graphics pipeline to natively ...
Piko: a framework for authoring programmable graphics pipelines

We present Piko, a framework for designing, optimizing, and retargeting implementations of graphics pipelines on multiple architectures. Piko programmers express a graphics pipeline by organizing the computation within each stage into spatial bins and ...
Brook for GPUs: stream computing on graphics hardware
SIGGRAPH '04: ACM SIGGRAPH 2004 Papers

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 28, Issue 1

January 2009

144 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/1477926

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2009

Accepted: 01 August 2008

Received: 01 June 2008

Published in TOG Volume 28, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

U.S. Army Research Laboratory

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

50
Total Citations
View Citations
1,587
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu QLi RBeard JJohn LRodríguez GSadayappan PSukumaran-Rajam A(2024)BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less QueuingProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641568(100-112)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641568
Willis BShrivastava AMack JDave SChakrabarti CBrunhaver J(2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TC.2023.3327504
Kim MBaek N(2021)A 3D graphics rendering pipeline implementation based on the openCL massively parallel processingThe Journal of Supercomputing10.1007/s11227-020-03581-877:7(7351-7367)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1007/s11227-020-03581-8
Crawford LO'Boyle M(2019)Specialization Opportunities in Graphical Workloads2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00029(272-283)Online publication date: Sep-2019
https://doi.org/10.1109/PACT.2019.00029
Kenzel MKerbl BSchmalstieg DSteinberger M(2018)A high-performance software graphics pipeline architecture for the GPUACM Transactions on Graphics10.1145/3197517.320137437:4(1-15)Online publication date: 30-Jul-2018
https://dl.acm.org/doi/10.1145/3197517.3201374
Crawford LO'Boyle M(2018)A Cross-platform Evaluation of Graphics Shader Compiler Optimization2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2018.00035(219-228)Online publication date: Apr-2018
https://doi.org/10.1109/ISPASS.2018.00035
Huang LLü YShen LWang Z(2017)Improving the Efficiency of GPGPU Work-Queue Through Data AwarenessACM Transactions on Architecture and Code Optimization10.1145/315103514:4(1-22)Online publication date: 5-Dec-2017
https://dl.acm.org/doi/10.1145/3151035
Zheng ZOh CZhai JShen XYi YChen WHunter HMoreno JEmer JSanchez D(2017)VersapipeProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123978(587-599)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3123939.3123978
Deng YNi YLi ZMu SZhang W(2017)Toward Real-Time Ray TracingACM Computing Surveys10.1145/310406750:4(1-41)Online publication date: 30-Aug-2017
https://dl.acm.org/doi/10.1145/3104067
Wulf CHasselbring WOhlemacher J(2017)Parallel and Generic Pipe-and-Filter Architectures with TeeTime2017 IEEE International Conference on Software Architecture Workshops (ICSAW)10.1109/ICSAW.2017.20(290-293)Online publication date: Apr-2017
https://doi.org/10.1109/ICSAW.2017.20
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents