Abstract
Modern graphics processing units (GPU) are becoming more and more suitable for general purpose computing due to its growing computational power. These commodity processors follow, in general, a parallel SIMD execution model whose efficiency is subject to a right exploitation of the explicit memory hierarchy, among other factors. In this paper we analyze the implementation of the Fast Fourier Transform using the programming model of the Compute Unified Device Architecture (CUDA) recently released by NVIDIA for its new graphics platforms. Within this model we propose an FFT implementation that takes into account memory reference locality issues that are crucial in order to achieve a high execution performance. This proposal has been experimentally tested and compared with other well known approaches such as the manufacturer’s FFT library.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fialka, O., Cadik, M.: FFT and Convolution Performance in Image Filtering on GPU. Information Visualization (2006)
Fastest Fourier Transform in the West (FFTW), http://www.fftw.org/
Frigo, M., Johnson, S.G.: The Design and Implementation of FFTW3. Proceedings of the IEEE 93, 216–231 (2005)
Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A Memory Model for Scientific Algorithms on Graphics Processors. In: Conference on Supercomputing (2006)
Jansen, T., von Rymon-Lipinski, B., Hanssen, N., Keeve, E.: Fourier volume rendering on the GPU using a split-stream FFT. In: Vision, Modeling, and Visualization Workshop (2004)
Moler, C.: HPC Benchmark. In: Conference on Supercomputing (2006), http://www.hpcchallenge.org/presentations/sc2006/moler-slides.pdf
Moreland, K., Angel, E.: The FFT on a GPU. In: ACM Conference on Graphics Hardware (2003)
NVIDIA CUDA Homepage, http://developer.nvidia.com/object/cuda.html
Spitzer, J.: Implementing a GPU-Efficient FFT. SIGGRAPH GPGPU Course (2003)
Sumanaweera, T., Liu, D.: Medical Image Reconstruction with the FFT. GPU Gems 2, 765–784 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gutierrez, E., Romero, S., Trenas, M.A., Zapata, E.L. (2008). Memory Locality Exploitation Strategies for FFT on the CUDA Architecture. In: Palma, J.M.L.M., Amestoy, P.R., Daydé, M., Mattoso, M., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2008. VECPAR 2008. Lecture Notes in Computer Science, vol 5336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92859-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-92859-1_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92858-4
Online ISBN: 978-3-540-92859-1
eBook Packages: Computer ScienceComputer Science (R0)