Parameter Estimation via Time Modeling for MLIR Implementation of GEMM

Romanov, Alexey; Turkin, Andrei; Myakinin, Oleg; Tsupko, Fiodar; Gao, Jiexing

doi:10.1007/978-3-031-47859-8_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14395))

Included in the following conference series:

International Conference on Optimization and Applications

215 Accesses

Abstract

We consider the problem of identifying optimal parameters for two implementations of the general matrix multiplication (GEMM). Optimal parameters are chosen based on time modeling for two GEMM implementations, which is done by analyzing the structure of each implementation and the characteristics of the hardware. Each implementation has specific packing strategies that influence data movement and time of data access. The data movement, as well as constraints for the registers and each level of the two-level cache, is considered to ensure proper data usage. Based on the proposed models, an exhaustive search procedure for microkernel and tiling parameters was used to obtain the best parameters for each of the considered implementations of GEMM for multi-level intermediate representation (MLIR). The results show that the performance of MLIR-based code generation for these GEMM implementations, when different matrix sizes are used, is comparable with the performance that can be obtained for Basic Linear Algebra Subprograms (BLAS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures

Article 22 January 2016

Efficient Processing of Large Data Structures on GPUs: Enumeration Scheme Based Optimisation

Article Open access 04 July 2017

Design Principles for Sparse Matrix Multiplication on the GPU

Notes

1.
https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html.
2.
https://www.openblas.net/.
3.
https://github.com/flame/blis.
4.
See, for instance, https://en.wikichip.org/wiki/intel/microarchitectures/coffee_lake.

References

Bondhugula, U.: High performance code generation in MLIR: an early case study with GEMM (2020). https://doi.org/10.48550/arXiv.2003.00532
Goto, K., Geijn, R.V.D.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34, 12:1–12:25 (2008)
Google Scholar
Huang, J., Geijn, R.: BLISlab: a sandbox for optimizing GEMM. arXiv, p. 1609.00076 (2016). https://doi.org/10.48550/arXiv.1609.00076
Lattner, C., et al.: MLIR: scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Seoul, Korea, pp. 2–14 (2021)
Google Scholar
Low, T.M., Igual, F., Smith, T., Quintana-Ortí, E.S.: Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. (TOMS) 43, 1–18 (2016)
Article MathSciNet MATH Google Scholar
Lu, L., et al.: TENET: a framework for modeling tensor dataflow based on relation-centric notation (2021). https://doi.org/10.48550/arXiv.2105.01892
Zee, F.G.V., van de Geijn, R.A.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41, 14:1–14:33 (2015)
Google Scholar
Zhang, H., Cheng, X., Zang, H., Park, D.H.: Compiler-level matrix multiplication optimization for deep learning. arXiv, p. 1909.10616 (2019). https://doi.org/10.48550/arXiv.1909.10616

Download references

Author information

Authors and Affiliations

Huawei Technologies, Central Research Institute, 2012 Labs, Shenzhen, China
Alexey Romanov, Andrei Turkin, Oleg Myakinin, Fiodar Tsupko & Jiexing Gao

Authors

Alexey Romanov
View author publications
You can also search for this author in PubMed Google Scholar
Andrei Turkin
View author publications
You can also search for this author in PubMed Google Scholar
Oleg Myakinin
View author publications
You can also search for this author in PubMed Google Scholar
Fiodar Tsupko
View author publications
You can also search for this author in PubMed Google Scholar
Jiexing Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiexing Gao .

Editor information

Editors and Affiliations

FRC CSC RAS, Moscow, Russia
Nicholas Olenev
FRC CSC RAS, Moscow, Russia
Yuri Evtushenko
University of Montenegro, Podgorica, Montenegro
Milojica Jaćimović
Krasovsky Institute of Mathematics and Mechanics, Ekaterinburg, Russia
Michael Khachay
FRC CSC RAS, Moscow, Russia
Vlasta Malkova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Romanov, A., Turkin, A., Myakinin, O., Tsupko, F., Gao, J. (2023). Parameter Estimation via Time Modeling for MLIR Implementation of GEMM. In: Olenev, N., Evtushenko, Y., Jaćimović, M., Khachay, M., Malkova, V. (eds) Optimization and Applications. OPTIMA 2023. Lecture Notes in Computer Science, vol 14395. Springer, Cham. https://doi.org/10.1007/978-3-031-47859-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-47859-8_12
Published: 10 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47858-1
Online ISBN: 978-3-031-47859-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parameter Estimation via Time Modeling for MLIR Implementation of GEMM

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures

Efficient Processing of Large Data Structures on GPUs: Enumeration Scheme Based Optimisation

Design Principles for Sparse Matrix Multiplication on the GPU

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Parameter Estimation via Time Modeling for MLIR Implementation of GEMM

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures

Efficient Processing of Large Data Structures on GPUs: Enumeration Scheme Based Optimisation

Design Principles for Sparse Matrix Multiplication on the GPU

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation