Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1280094.1280110acmconferencesArticle/Chapter ViewAbstractPublication PageshpgConference Proceedingsconference-collections
Article

Scan primitives for GPU computing

Published: 04 August 2007 Publication History

Abstract

The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API. Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.

References

[1]
{AS89} Anderson E., Saad Y.: Solving sparse triangular systems on parallel computers. International Journal of High Speed Computing 1, 1 (May 1989), 73--95.
[2]
{BFGS03} Bolz J., Farmer I., Grinspun E., Schröder P.: Sparse matrix solvers on the GPU: Conjugate gradients and multigrid. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2003) 22, 3 (July 2003), 917--924.
[3]
{BH03} Buck I., Hanrahan P.: Data Parallel Computation on Graphics Hardware. Tech. Rep. 2003-03, Stanford University Computer Science Department, Dec. 2003.
[4]
{BHZ93} Blelloch G. E., Heroux M. A., Zagha M.: Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors. Tech. Rep. CMU-CS-93-173, School of Computer Science, Carnegie Mellon University, Aug. 1993.
[5]
{Ble90} Blelloch G.: Vector Models for Data-Parallel Computing. MIT Press, 1990.
[6]
{Buc05} Buck I.: Taking the plunge into GPU computing. In GPU Gems 2, Pharr M., (Ed.). Addison Wesley, Mar. 2005, ch. 32, pp. 509--519.
[7]
{CBZ90} Chatterjee S., Blelloch G. E., Zagha M.: Scan primitives for vector computers. In Super-computing '90: Proceedings of the 1990 Conference on Supercomputing (1990), pp. 666--675.
[8]
{Dav94} Davis T. A.: The University of Florida sparse matrix collection. NA Digest 92, 42 (16 Oct. 1994). http://www.cise.ufl.edu/research/sparse/matrices.
[9]
{Gah06} Gahvari H. B.: Benchmarking Sparse Matrix-Vector Multiply. Master's thesis, University of California, Berkeley, Dec. 2006.
[10]
{GGK06} Gress A., Guthe M., Klein R.: GPU-based collision detection for deformable parameterized surfaces. Computer Graphics Forum 25, 3 (Sept. 2006), 497--506.
[11]
{GGKM06} Govindaraju N. K., Gray J., Kumar R., Manocha D.: GPUTeraSort: High performance graphics coprocessor sorting for large database management. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (June 2006), pp. 325--336.
[12]
{Hor05} Horn D.: Stream reduction operations for GPGPU applications. In GPU Gems 2, Pharr M., (Ed.). Addison Wesley, Mar. 2005, ch. 36, pp. 573--589.
[13]
{HSC*05} Hensley J., Scheuermann T., Coombe G., Singh M., Lastra A.: Fast summed-area table generation and its applications. Computer Graphics Forum 24, 3 (Sept. 2005), 547--555.
[14]
{HSO07} Harris M., Sengupta S., Owens J. D.: Parallel prefix sum (scan) with CUDA. In GPU Gems 3, Nguyen H., (Ed.). Addison Wesley, Aug. 2007.
[15]
{Ive62} Iverson K. E.: A Programming Language. Wiley, New York, 1962.
[16]
{KLO06} Kass M., Lefohn A., Owens J.: Interactive Depth of Field. Tech. Rep. # 06-01, Pixar, 2006. http://graphics.pixar.com/DepthOfField/.
[17]
{KM90} Kass M., Miller G.: Rapid, stable fluid dynamics for computer graphics. In Computer Graphics (Proceedings of SIGGRAPH 90) (Aug. 1990), pp. 49--57.
[18]
{KW03} Krüger J., Westermann R.: Linear algebra operators for GPU implementation of numerical algorithms. ACM Transactions on Graphics 22, 3 (July 2003), 908--916.
[19]
{LKS*06} Lefohn A. E., Kniss J., Strzodka R., Sengupta S., Owens J. D.: Glift: Generic, efficient, random-access GPU data structures. ACM Transactions on Graphics 26, 1 (Jan. 2006), 60--99.
[20]
{LM01} Larsen E. S., McAllister D.: Fast matrix multiplies using graphics hardware. In Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (Nov. 2001), p. 55.
[21]
{Mor02} Moravánszky A.: Dense matrix algebra on the GPU. In ShaderX2: Shader Programming Tips and Tricks with DirectX 9.0, Engel W. F., (Ed.). Wordware Publishing, 2002, pp. 352--380.
[22]
{NVI07} NVIDIA Corporation: NVIDIA CUDA compute unified device architecture programming guide. http://developer.nvidia.com/cuda, Jan. 2007.
[23]
{PSG06} Peercy M., Segal M., Gerstmann D.: A performance-oriented data parallel virtual machine for GPUs. In ACM SIGGRAPH 2006 Conference Abstracts and Applications (Aug. 2006).
[24]
{Sch80} Schwartz J. T.: Ultracomputers. ACM Transactions on Programming Languages and Systems 2, 4 (Oct. 1980), 484--521.
[25]
{SLO06} Sengupta S., Lefohn A. E., Owens J. D.: A work-efficient step-efficient prefix sum algorithm. In Proceedings of the Workshop on Edge Computing Using New Commodity Architectures (May 2006), pp. D-26--27.

Cited By

View all
  • (2024)PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory ArchitecturesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004348:3(1-36)Online publication date: 10-Dec-2024
  • (2022)SparsePProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080416:1(1-49)Online publication date: 28-Feb-2022
  • (2021)HashGraph—Scalable Hash Tables Using a Sparse Graph Data StructureACM Transactions on Parallel Computing10.1145/34608728:2(1-17)Online publication date: 15-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
August 2007
119 pages
ISBN:9781595936257

Sponsors

Publisher

Eurographics Association

Goslar, Germany

Publication History

Published: 04 August 2007

Check for updates

Qualifiers

  • Article

Conference

GH07
Sponsor:
GH07: Graphics Hardware
August 4 - 5, 2007
California, San Diego

Acceptance Rates

GH '07 Paper Acceptance Rate 12 of 30 submissions, 40%;
Overall Acceptance Rate 37 of 94 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory ArchitecturesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004348:3(1-36)Online publication date: 10-Dec-2024
  • (2022)SparsePProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080416:1(1-49)Online publication date: 28-Feb-2022
  • (2021)HashGraph—Scalable Hash Tables Using a Sparse Graph Data StructureACM Transactions on Parallel Computing10.1145/34608728:2(1-17)Online publication date: 15-Jul-2021
  • (2019)Adaptive Sparse Matrix-Vector Multiplication on CPU-GPU Heterogeneous ArchitectureProceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference10.1145/3341069.3341072(6-10)Online publication date: 22-Jun-2019
  • (2019)A Unified Optimization Approach for CNN Model Inference on Integrated GPUsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337839(1-10)Online publication date: 5-Aug-2019
  • (2019)XBFSProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3307681.3326606(121-131)Online publication date: 17-Jun-2019
  • (2019)On the correctness of GPU programsProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3293882.3338989(443-447)Online publication date: 10-Jul-2019
  • (2019)Efficient GPU Stream TransformProceedings of the Australasian Computer Science Week Multiconference10.1145/3290688.3290707(1-11)Online publication date: 29-Jan-2019
  • (2019)ParaCellsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2018.281457016:3(994-1006)Online publication date: 1-May-2019
  • (2018)Rapid k-d tree construction for sparse volume dataProceedings of the Symposium on Parallel Graphics and Visualization10.5555/3293524.3293531(69-77)Online publication date: 4-Jun-2018
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media