Efficient 3D Transpositions in Graphics Processing Units

Jodra, Jose L.; Gurrutxaga, Ibai; Muguerza, Javier

doi:10.1007/s10766-015-0366-5

Efficient 3D Transpositions in Graphics Processing Units

Published: 04 April 2015

Volume 43, pages 876–891, (2015)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Jose L. Jodra¹,
Ibai Gurrutxaga² &
Javier Muguerza²

501 Accesses
Explore all metrics

Abstract

Matrix transposition is a basic operation for several computing tasks. Hence, transposing a matrix in a computer’s main memory has been well studied since many years ago. More recently, the out-of-place matrix transposition has been performed efficiently in graphical processing units (GPU), which are broadly used today for general purpose computing. However, due to the particular architecture of GPUs, the adaptation of the matrix transposition operation to 3D arrays is not straightforward. In this paper, we describe efficient implementations for graphical processing units of the 5 possible out-of-place 3D transpositions. Moreover, we also include the transposition of the most basic in-place 3D transpositions. The results show that the achieved bandwidth is close to a simple array copy and is similar to the 2D transposition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

4-Valued spectral transforms implementation on GPU with Tensor Cores

Article 02 July 2022

Computing matrix trigonometric functions with GPUs through Matlab

Article 07 April 2018

Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs

References

Bian, M., Bi, F., Liu, F.: Matrix transpose methods for SAR imaging system. In: 2010 IEEE 10th International Conference on Signal Processing (ICSP 2010). IEEE, pp. 2176–2179 (2010)
Sung, I.J.: Data Layout Transformation Through In-place Transposition. Ph.D. thesis, University of Illinois at Urbana-Champaign (2013)
Brenner, N.: Algorithm 467: matrix transposition in place. Commun. ACM 16(11), 692 (1973)
Article Google Scholar
Cate, E.G., Twigg, D.W.: Algorithm 513: analysis of in-situ transposition [F1]. ACM Trans. Math. Softw. 3(1), 104 (1977)
Article MathSciNet Google Scholar
Chatterjee, S., Sen, S.: Cache-efficient matrix transposition. In: Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000. IEEE, pp. 195–205 (2000)
Gustavson, F., Karlsson, L., Kågström, B.: Parallel and cache-efficient in-place matrix storage format conversion. ACM Trans. Math. Softw. 38(3), 17:1 (2012)
Ruetsch, G., Micikevicius, P.: Optimizing matrix transpose in CUDA. Tech. rep., NVIDIA Corporation (2009). http://www.cs.colostate.edu/~cs675/MatrixTranspose
Catanzaro, B., Keller, A., Garland, M.: A decomposition for in-place matrix transposition. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp. 193–206 (2014)
Berman, M.F.: A method for transposing a matrix. J. ACM 5(4), 383 (1958)
Article MATH Google Scholar
Windley, P.: Transposing matrices in a digital computer. Comput. J. 2(1), 47 (1959)
Article MathSciNet MATH Google Scholar
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: 40th Annual Symposium on Foundations of Computer Science, 1999. IEEE, pp. 285–297 (1999)
Knuth, D.E.: The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1973)
Google Scholar
El-Moursy, A., El-Mahdy, A., El-Shishiny, H.: An efficient in-place 3D transpose for multicore processors with software managed memory hierarchy. In: Proceedings of the 1st International Forum on Next-generation Multicore/Manycore Technologies. ACM, pp. 10:1–10:6 (2008)
Ruetsch, G., Fatica, M.: CUDA Fortran for Scientists and Engineers. Morgan Kaufmann, Burlington (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Technology, University of the Basque Country, UPV/EHU, Plaza Europa 1, 20018, Donostia-San Sebastián, Spain
Jose L. Jodra
Department of Computer Architecture and Technology, University of the Basque Country, UPV/EHU, Manuel Lardizabal, 1, 20018, Donostia-San Sebastián, Spain
Ibai Gurrutxaga & Javier Muguerza

Authors

Jose L. Jodra
View author publications
You can also search for this author in PubMed Google Scholar
Ibai Gurrutxaga
View author publications
You can also search for this author in PubMed Google Scholar
Javier Muguerza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ibai Gurrutxaga.

Additional information

This work was funded by the Department of Education, Universities and Research of the Basque Government (IT395-10 Research Group Grant), by the University of the Basque Country UPV/EHU (ALDAPA Research Group Grant, GIU10/02 and BAILab Research and Training Unit Grant, UFI11/45), and by the Science and Education Department of the Spanish Government (ModelAccess Project, TIN2010-15549).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jodra, J.L., Gurrutxaga, I. & Muguerza, J. Efficient 3D Transpositions in Graphics Processing Units. Int J Parallel Prog 43, 876–891 (2015). https://doi.org/10.1007/s10766-015-0366-5

Download citation

Received: 15 October 2014
Accepted: 27 March 2015
Published: 04 April 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s10766-015-0366-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient 3D Transpositions in Graphics Processing Units

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

4-Valued spectral transforms implementation on GPU with Tensor Cores

Computing matrix trigonometric functions with GPUs through Matlab

Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient 3D Transpositions in Graphics Processing Units

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

4-Valued spectral transforms implementation on GPU with Tensor Cores

Computing matrix trigonometric functions with GPUs through Matlab

Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation