Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICPADS.2012.97guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor

Published: 17 December 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Every mainstream processor vendor provides an optimized BLAS implementation for its CPU, as BLAS is a fundamental math library in scientific computing. The Loongson 3A CPU is a general-purpose 64-bit MIPS64 quad-core processor, developed by the Institute of Computing Technology, Chinese Academy of Sciences. To date, there has not been a sufficiently optimized BLAS on the Loongson 3A CPU. The purpose of this research is to optimize level 3 BLAS performance on the Loongson 3A CPU. We analyzed the Loongson 3A architecture and built a performance model to highlight the key point, L1 data cache misses, which is different from level 3 BLAS optimization on the mainstream x86 CPU. Therefore, we employed a variety of methods to avoid L1 cache misses in single thread optimization, including cache and register blocking, the Loongson 3A 128-bit memory accessing extension instructions, software prefetching, and single precision floating-point SIMD instructions. Furthermore, we improved parallel performance by reducing bank conflicts among multiple threads in the shared L2 cache. We created an open source BLAS project, OpenBLAS, to demonstrate the performance improvement on the Loongson 3A quad-core processor.

    Cited By

    View all
    • (2024)Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656620(137-149)Online publication date: 30-May-2024
    • (2023)Julia Cloud Matrix Machine: Dynamic Matrix Language Acceleration on Multicore Clusters in the CloudProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582518(1-10)Online publication date: 25-Feb-2023
    • (2023)To Pack or Not to Pack: A Generalized Packing Analysis and TransformationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580024(14-27)Online publication date: 17-Feb-2023
    • Show More Cited By

    Index Terms

    1. Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            ICPADS '12: Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems
            December 2012
            954 pages
            ISBN:9780769549033

            Publisher

            IEEE Computer Society

            United States

            Publication History

            Published: 17 December 2012

            Author Tags

            1. BLAS
            2. Loongson 3A
            3. MIPS64
            4. Multi-core
            5. Optimization

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656620(137-149)Online publication date: 30-May-2024
            • (2023)Julia Cloud Matrix Machine: Dynamic Matrix Language Acceleration on Multicore Clusters in the CloudProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582518(1-10)Online publication date: 25-Feb-2023
            • (2023)To Pack or Not to Pack: A Generalized Packing Analysis and TransformationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580024(14-27)Online publication date: 17-Feb-2023
            • (2023)LAGrad: Statically Optimized Differentiable Programming in MLIRProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580259(228-238)Online publication date: 17-Feb-2023
            • (2022)Transfer-TuningProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569682(28-39)Online publication date: 8-Oct-2022
            • (2022)The Linear Algebra Mapping Problem. Current State of Linear Algebra Languages and LibrariesACM Transactions on Mathematical Software10.1145/354993548:3(1-30)Online publication date: 10-Sep-2022
            • (2022)IATF: An Input-Aware Tuning Framework for Compact BLAS Based on ARMv8 CPUsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545032(1-11)Online publication date: 29-Aug-2022
            • (2021)KernelFaRerACM Transactions on Architecture and Code Optimization10.1145/345901018:3(1-22)Online publication date: 28-Jun-2021
            • (2019)MnnFastProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322214(250-263)Online publication date: 22-Jun-2019
            • (2019)Optimizing parallel GEMM routines using auto-tuning with Intel AVX-512Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3293320.3293334(101-110)Online publication date: 14-Jan-2019
            • Show More Cited By

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media