Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3526241.3530314acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

MI2D: Accelerating Matrix Inversion with 2-Dimensional Tile Manipulations

Published: 06 June 2022 Publication History

Abstract

Matrix inversion is critical in mathematics and scientific applications. Large-scale dense matrix inversion is especially challenging for modern computers due to its heavy dependency of matrix elements and the poor temporal data locality. In this paper, we propose a novel accelerator termed MI2D, which converts matrix inversion into regular matrix multiplications using 2-dimensional cross-tile operations and novel algorithms for efficient data reuse and computations. Our evaluations show that MI2D can be easily integrated with existing matrix engines in modern high-end CPU and NPU, and effectively improves matrix inversion with 2.7× speedup against Intel Skylake CPU, and 24× against NVIDIA RTX 2080 Ti.

Supplementary Material

MP4 File (GLSVLSI22-099.mp4)
Presentation video for the paper <MI2D: Accelerating Matrix Inversion with 2-Dimensional Tile Manipulations> on GLSVLSI-2022 to introduce a new architecture for fast matrix inversion. We propose a novel multi-function on-chip network for flexible tile operations, as well as a novel matrix inversion method to maximize the data reuse. The architecture can be integrated into existing standard matrix process unit (MPU) to provide high-performance matrix inversion capability. The experiments show superior performance over high-end CPU and GPU processors.

References

[1]
A C Antoulas and Danny C Sorensen. 2001. Approximation of large-scale dynamical systems: an overview. International Journal of Applied Mathematics and Computer Science, Vol. 11, 5 (2001), 1093--1121.
[2]
Janier Arias-Garcia, Ricardo Pezzuol Jacobi, Llanos, and Carlos H. 2011. A suitable FPGA implementation of floating-point matrix inversion based on Gauss-Jordan elimination. In VII Southern Conference on Programmable Logic. IEEE, 263--268.
[3]
Peter Benner, Pablo Ezzatti, Enrique S Quintana-Orti, and Alfredo Remón. 2013. Matrix inversion on CPU--GPU platforms with applications in control theory. Concurrency and Computation: Practice and Experience, Vol. 25, 8 (2013), 1170--1182.
[4]
David Boland. 2016. Reducing memory requirements for high-performance and numerically stable gaussian elimination. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 244--253.
[5]
Debabrata DasGupta et al. 2013. In-place matrix inversion by modified Gauss-Jordan algorithm. Applied Mathematics, Vol. 4, 10 (2013), 1392--1396.
[6]
Goncalo M de Matos and Horacio C Neto. 2007. Memory optimized architecture for efficient gauss-jordan matrix inversion. In 2007 3rd Southern Conference on Programmable Logic. IEEE, 33--38.
[7]
Rui Duarte, Horácio Neto, and Mário Véstias. 2009. Double-precision gauss-jordan algorithm with partial pivoting on fpgas. In 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools. IEEE, 273--280.
[8]
Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh. 2017. On-chip networks. Synthesis Lectures on Computer Architecture, Vol. 12, 3 (2017), 1--210.
[9]
Zhenhua Jiang and Sayed Ata Raziei. 2017. An efficient FPGA-based direct linear solver. In 2017 IEEE National Aerospace and Electronics Conference (NAECON). IEEE, 159--166.
[10]
Heng Liao, Jiajin Tu, Jing Xia, Hu Liu, Xiping Zhou, Honghui Yuan, and Yuxing Hu. 2021. Ascend: a scalable and unified architecture for ubiquitous deep neural network computing: Industry track paper. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 789--801.
[11]
Ming Liu, Matthieu Crussiere, and Jeanfrancois Helard. 2012. A Novel Data-Aided Channel Estimation With Reduced Complexity for TDS-OFDM Systems. IEEE Transactions on Broadcasting, Vol. 58, 2 (2012), 247--260.
[12]
Nevine Nassif, Ashley O Munch, Carleton L Molnar, Gerald Pasdast, Sitaraman V Lyer, Zibing Yang, Oscar Mendoza, Mark Huddart, Srikrishnan Venkataraman, Sireesha Kandula, et al. 2022. Sapphire Rapids: The Next-Generation Intel Xeon Scalable Processor. In 2022 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 65. IEEE, 44--46.
[13]
Thomas Norrie, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li, James Laudon, Cliff Young, Norman P Jouppi, and David A Patterson. 2020. Google's Training Chips Revealed: TPUv2 and TPUv3. In Hot Chips Symposium. 1--70.
[14]
Kwanjun Park, Taeseok Daniel Yang, Hyung Jin Kim, Taedong Kong, Jung Min Lee, Hyuk Soon Choi, Hoon Jai Chun, Beopmin Kim, and Youngwoon Choi. 2019. Inversion-free image recovery from strong aberration using a minimally sampled transmission matrix. Scientific Reports, Vol. 9, 1 (2019), 1206.
[15]
James Reinders. 2012. An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors. Intel Corporation, Santa Clara, Vol. 1 (2012), 1550002.
[16]
Qing Wang, Ming Liu, Nian Liu, and Zhangdui Zhong. 2018. On Augmenting UL Connections in Massive MIMO System Using Composite Channel Estimation. (2018), 1--7.
[17]
Tian Xia, Pengchen Zong, Haoran Zhao, Jianming Tong, Wenzhe Zhao, Nanning Zheng, and Pengju Ren. 2020. COCOA: Content-Oriented Configurable Architecture Based on Highly-Adaptive Data Transmission Networks. In Proceedings of the 2020 on Great Lakes Symposium on VLSI. 253--258.
[18]
Wenzhe Zhao. 2021. Open-HiPU200 Project on github.com/xjtuiair-cag/Open-HiPU200. https://www.researchgate.net/publication/243771074_DIY_Corpora_the_WWW_and_the_Translator
[19]
Dengkui Zhu, Boyu Li, and Ping Liang. 2015. On the matrix inversion approximation based on neumann series in massive MIMO systems. (2015), 1763--1769.
[20]
Pengchen Zong, Tian Xia, Haoran Zhao, Jianming Tong, Wenzhe Zhao, Nanning Zheng, and Pengju Ren. 2020. PIT: Processing-In-Transmission with Fine-Grained Data Manipulation Networks. IEEE Trans. Comput. (2020).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '22: Proceedings of the Great Lakes Symposium on VLSI 2022
June 2022
560 pages
ISBN:9781450393225
DOI:10.1145/3526241
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerator
  2. architecture
  3. matrix inversion
  4. on-chip network

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • Key Research and Development Program of Shaanxi

Conference

GLSVLSI '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 114
    Total Downloads
  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media