Abstract
The current milieu, encourages rapid growth of wireless communication, multimedia applications, robotics and graphics to have efficient utilization of resources with high throughput and low power digital signal processing (DSP) systems. In an aggregate DSP system ranging from audio/video signal processing to wireless sensor networks, floating point matrix multiplication is used in wide scale in most of the fundamental processing units. Hardware implementation of floating-point matrix multiplication demands a colossal number of arithmetic operations that alter speed and consuming more area and power. DSP systems essentially uses two techniques to reduce dynamic power consumption:—they are pipelining and parallel processing that needs high performance processing element with less area and low power in diverse scientific computing applications. However, number of adders and multipliers used in the design of floating-point unit also increases subsequently. The adders and multipliers are the most area, delay and power consuming data path elements in the processing unit. The arithmetic level reduction of delay, power and area in the processing element is performed by the selection of appropriate adders and multipliers. This article proposes a parallel multiplication architecture using Strassen and UrdhvaTiryagbhyam multiplier, which involves design of efficient parallel matrix multiplication with flexible implementation of FPGA (Field Programmable Gate Array) device to analyse the computation and area. The design incorporates scheduling of blocks, operations on processing elements, block size determination, parallelization and double buffering for storage of matrix elements.










Similar content being viewed by others
Data availability
The data supporting the findings of this study are available within the paper.
References
Amrutha, K., Ravi Kumar, M.N., Panduranga, H.T.: Implementation of dense matrix multiplication. In: Proceedings of 2nd ASAR International Conference, pp. 17–20 (2015)
Arish, S., Sharma, R.K.: Run time reconfigurable multi precision floating point matrix multiplier intellectual property core on FPGA. Circuits Syst. Signal Process. 36(3), 998–1026 (2016)
Cannon, L.E.: A cellular computer to implement the kalman filter algorithm. PhD dissertation. Montana State University (1969)
Chetan, S., Sourabh, K.S., Lekshmi, V., Sudhakar, S., Manikandan, J.: Design and evaluation of floating point matrix operations for FPGA based system design. Procedia Comput. Sci. 171, 959–968 (2020)
Choi, J.: A new parallel matrix multiplication algorithm on distributed-memory concurrent computers. Concurr. Pract. Exp. 10(8), 224–229 (1997)
Choi, J., Dangarra, J.J., Pozo, R., Walker, D.W.: PUMMA: parallel universal matrix multiplication algorithms on distributed memory concurrent computers. Concurr. Pract. Exp. 6(7), 543–570 (1994)
Dou, Y., Vassiliadis, S., Kuzmanov, G.K., Gaydadjiev, G.N.: 64-bit floating point FPGA matrix multiplication. In: Proceeding of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays (FPGA). pp. 86–95 (2005)
Fox, G.C., Otto, S.W.: Matrix algorithms on a hypercube I: matrix multiplication. Parallel Comput. 4(1), 17–31 (1987)
Geijn, R.A.V., Watts, J.: SUMMA: scalable universal matrix multiplication algorithm. Concurr. Pract. Exp. 9(4), 255–274 (1998)
Jagadguru Swami Sri BharatiKrsnaTirthaji Maharaja.: Vedic mathematics or sixteen simple mathematical formulae from the Vedas. MotilalBanarsidass, Delhi (1985)
Kalaiselvi, A.: Multimedia security for image encryption using transformation matrix. Maejo Int. J. Sci. Technol. 1(3), 79–88 (2010)
Kang, B.-H.: A review on image and video processing. Int. J. Multimed. Ubiquitous Eng. 2(2), 49–64 (2007)
Khayyat, A., Manjikian, N.: Analysis of blocking and scheduling for FPGA based floating point matrix multiplication. Can. J. Electr. Comput. Eng. 37(2), 65–75 (2014)
Kumar, V.B.Y., Joshi, S., Patkar, S.B., Narayanan, H.: FPGA based high performance double precision matrix multiplication. Int. J. Parallel Prog. 38(3), 322–338 (2010)
Li, K.: Constant time boolean matrix multiplication on a linear array with a reconfigurable pipelined bus system. J. Supercomput. 11(4), 391–403 (1997)
Li, K., Pan, V.Y.: Parallel matrix multiplication on a linear array with a reconfigurable pipelined bus system. IEEE Trans. Comput. 50(5), 519–525 (2001)
Matam, K.K., Prasanna, V.K.: Energy efficient large scale matrix multiplication on FPGAs. In: Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig). pp. 1–8 (2013)
Matam, K.K., Le, H., Prasanna, V.K.: Evaluating energy efficiency of floating point matrix multiplication on FPGAs. In: Proceeding of the IEEE High Performance Extreme Computing Conference (HPEC). pp. 1–6 (2013)
Palacios, I., Medina, M., Moreno, J.: Matrix multiplication on digital signal processors and hierarchical memory systems. In: Baeza-Yates, R., Manber, U. (eds.) Computer Science, pp. 473–483. Springer, Boston, MA (1992)
Pan, V.: Complexity of parallel matrix computation. Theoret. Comput. Sci. 54, 65–85 (1987)
Pan, Y., Li, K., Zheng, S.Q.: Fast nearest neighbor algorithms on a linear array with a reconfigurable pipelined bus system. Parallel Algorithms Appl. 13(1), 1–25 (2007)
Pedram, A., Geijin, R.A., Gerstlauer, A.: Co-design tradeoffs for high-performance low power linear algebra architectures. IEEE Trans. Comput. 61(12), 1724–1736 (2012)
Prabhune, O., Sabale, P., Sonawane, D.N., Prabhune C.L.: Image Processing and Matrices. In: International conference on Data Management Analytics and Innovation (ICDMAI). pp. 166–171 (2017)
Qasim, S.M., Abbasi, S.A., Almashary, B.: FPGA-based design and realization of fixed and floating point matrix multipliers: a review. J. Active Passiv. Electron. Devices 5, 181–189 (2010)
Sajish, C., Abhyankar, Y., Ghotgalkar, S., Venkates, K.A.: Floating point matrix multiplication on a reconfigurable computing system. In: Current Trends in High Performance Computing and its Applications, pp. 113–122. Springer, Berlin (2005)
Shen, H., Chen, J.: Efficient matrix multiplication on wireless sensor networks. In: Proc of 7th International Conference on Grid and Cooperative Computing: 331–341 (2008)
Silva, H.D., Gustafson, J.L., Wong, W.F.: Making Strassen matrix multiplication safe. In: Proceedings of the 25th International Conference on High Performance Computing, pp. 173–182 (2018)
Singh, K.N., Tarunkumar, H.: A review on various multipliers designs in VLSI. In: Annual IEEE India Conference (INDICON), pp. 1–4 (2015)
Sonawane, D.N., Sutaone, M.S., InayatMalek: Resource efficient 64-bit floating point matrix multiplication algorithm using FPGA. In: IEEE Region 10 Conference TENCON, pp. 1–5 (2009)
Stojcev, M.K., Milovanovic, I.Z., Radonjic, Z.C.: Some shifting methods for matrix multiplication. IEE Proc. E-Comput. Digital Tech. 132(1), 33–44 (1985)
Strassen, V.: Gaussian elimination is not optimal. Numerischemathematik 13(4), 354–356 (1969)
Thabet, K., Al-Ghuribi, S.: Matrix multiplication algorithms. Int. J. Comput. Sci. Netw. Secur. 12(2), 74–79 (2012)
Tiwari, S., Singh, S., Meena, N.: FPGA design and implementation of matrix multiplication architecture by PPI-MO techniques. Int. J. Comput. Appl. 80(1), 19–22 (2013)
Van De Geijn, R.A., Watts, J.: SUMMA: scalable universal matrix multiplication algorithm. Concurr.: Pract. Exp. 9(4), 255–274 (1998)
Zhang, T., Li, C.T., Qin, Y., Nie, M.: An optimized floating point matrix multiplication on FPGA. Inf. Technol. J. 12(9), 1832–1838 (2013)
Zhou, L., Prassana, V.K.: High performance designs for linear algebra operations on reconfigurable hardware. IEEE Trans. Comput. 57(8), 1057–1071 (2008)
Acknowledgements
This research has been funded by the research general direction at Universidad Santiago de Cali, Colombia under call no 01-2022. This research is collaborated with the authors in these institutions such as St. Xavier’s Catholic College of Engineering, Tamilnadu, India, Gems Educational Institutions, Sbte, Karunya Institute of Technology and Sciences, Coimbatore, India, and Al-nahrain university, al-nahrain nonrenewable energy research center Baghdad, Iraq.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bessant, Y.R.A., Jency, J.G., Sagayam, K.M. et al. Improved parallel matrix multiplication using Strassen and Urdhvatiryagbhyam method. CCF Trans. HPC 5, 102–115 (2023). https://doi.org/10.1007/s42514-023-00149-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-023-00149-9