Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3176364.3176374acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
short-paper

OpenMP-based parallel implementation of matrix-matrix multiplication on the intel knights landing

Published: 31 January 2018 Publication History

Abstract

The second generation Intel Xeon Phi processor codenamed Knights Landing (KNL) have emerged with 2D tile mesh architecture. Implementing of the general matrix-matrix multiplication on a new architecture is an important practice. To date, there has not been a sufficient description on a parallel implementation of the general matrix-matrix multiplication. In this study, we describe the parallel implementation of the double-precision general matrix-matrix multiplication (DGEMM) with OpenMP on the KNL. The implementation is based on the blocked matrix-matrix multiplication. We propose a method for choosing the cache block sizes and discuss the parallelism within the implementation of DGEMM. We show that the performance of DGEMM varies by the thread affinity environment variables. We conducted the performance experiments with the Intel Xeon Phi 7210 and 7250. The performance experiments validate our method.

References

[1]
Kazushige Goto and Robert A van de Geijn. 2008. Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS) 34, 3 (2008), 12.
[2]
Guide for Intel C++ Compiler 2015. User and reference Guide for the Intel C++ Compiler 15.0. (2015). https://software.intel.com/en-us/node/522691
[3]
Murat Efe Guney, Kazushige Goto, Timothy B Costa, Sarah Knepper, Louise Huot, Arthur Mitrano, and Shane Story. 2017. Optimizing Matrix Multiplication on Intel® Xeon Phi x200 Architecture. In Computer Arithmetic (ARITH), 2017 IEEE 24th Symposium on. IEEE, 144--145.
[4]
James Jeffers, James Reinders, and Avinash Sodani. 2016. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann.
[5]
Roktaek Lim, Yeongha Lee, Raehyun Kim, and Jaeyoung Choi. Submitted. An implementation of matrix-matrix multiplication on the Intel KNL processor with AVX-512. Cluster Computing (Submitted).
[6]
Tze Meng Low, Francisco D Igual, Tyler M Smith, and Enrique S Quintana-Orti. 2016. Analytical modeling is enough for high-performance BLIS. ACM Transactions on Mathematical Software (TOMS) 43, 2 (2016), 12.
[7]
Bryan Marker, Field G Van Zee, Kazushige Goto, Gregorio Quintana-Ortí, and Robert A Van De Geijn. 2007. Toward scalable matrix multiply on multithreaded architectures. In European Conference on Parallel Processing. Springer, 748--757.
[8]
Jonathan Lawrence Peyton. 2013. Programming Dense Linear Algebra Kernels on Vectorized Architectures. Master's thesis. The University of Tennessee, Knoxville.
[9]
Tyler M Smith, Robert A Van De Geijn, Mikhail Smelyanskiy, Jeff R Hammond, and Field G Van Zee. 2014. Anatomy of high-performance many-threaded matrix multiplication. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, 1049--1059.
[10]
R Clint Whaley and Jack J Dongarra. 1998. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing. IEEE Computer Society, 1--27.
[11]
R Clint Whaley and Antoine Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice and Experience 35, 2 (2005), 101--121.

Cited By

View all
  • (2022)Seamless optimization of the GEMM kernel for task-based programming modelsProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532385(1-11)Online publication date: 28-Jun-2022
  • (2022)Optimization of Matrix-Matrix Multiplication Algorithm for Matrix-Panel Multiplication on Intel KNL2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA56895.2022.10017947(1-7)Online publication date: Dec-2022
  • (2021)Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processorsCluster Computing10.1007/s10586-021-03274-826:5(2539-2549)Online publication date: 12-Apr-2021
  • Show More Cited By

Index Terms

  1. OpenMP-based parallel implementation of matrix-matrix multiplication on the intel knights landing
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        HPCAsia '18 Workshops: Proceedings of Workshops of HPC Asia
        January 2018
        86 pages
        ISBN:9781450363471
        DOI:10.1145/3176364
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        • IPSJ: Information Processing Society of Japan

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 31 January 2018

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. high-performance
        2. knights landing
        3. manycore
        4. matrix-matrix multiplication
        5. openMP affinity

        Qualifiers

        • Short-paper

        Funding Sources

        • Korea Ministry of Science and ICT (MSIT)

        Conference

        HPC Asia 2018 WS
        Sponsor:
        • IPSJ
        HPC Asia 2018 WS: Workshops of HPC Asia 2018
        January 31, 2018
        Tokyo, Chiyoda

        Acceptance Rates

        Overall Acceptance Rate 69 of 143 submissions, 48%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)13
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 29 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)Seamless optimization of the GEMM kernel for task-based programming modelsProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532385(1-11)Online publication date: 28-Jun-2022
        • (2022)Optimization of Matrix-Matrix Multiplication Algorithm for Matrix-Panel Multiplication on Intel KNL2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA56895.2022.10017947(1-7)Online publication date: Dec-2022
        • (2021)Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processorsCluster Computing10.1007/s10586-021-03274-826:5(2539-2549)Online publication date: 12-Apr-2021
        • (2020)Evaluating performance of Parallel Matrix Multiplication Routine on Intel KNL and Xeon Scalable Processors2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)10.1109/ACSOS-C51401.2020.00027(42-47)Online publication date: Aug-2020
        • (2019)Optimizing parallel GEMM routines using auto-tuning with Intel AVX-512Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3293320.3293334(101-110)Online publication date: 14-Jan-2019
        • (2019)Optimizing Xeon Phi for Interactive Data Analysis2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916300(1-6)Online publication date: Sep-2019
        • (2018)Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processorsThe Journal of Supercomputing10.1007/s11227-018-2702-175:12(7895-7908)Online publication date: 26-Nov-2018

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media