Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3636480.3637093acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article
Open access

Introducing software pipelining for the A64FX processor into LLVM

Published: 11 January 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Software pipelining is an essential optimization for accelerating High-Performance Computing(HPC) applications on CPUs. Modern CPUs achieve high performance through many-core and wide SIMD instructions. Software pipelining is an optimization that promotes further performance improvement of HPC applications by cooperating with these functions. Although open source compilers such as GCC and LLVM have implemented software pipelining, it is underutilized for the AArch64 architecture. We have implemented software pipelining for the A64FX processor on LLVM to improve this situation. This paper describes the details of this implementation. We also confirmed that our implementation improves the performance of several benchmark programs.

    References

    [1]
    Vicki H. Allan, Reese B. Jones, Randall M. Lee, and Stephen J. Allan. 1995. Software Pipelining. ACM Comput. Surv. 27, 3 (sep 1995), 367–432. https://doi.org/10.1145/212094.212131
    [2]
    Benoit Boissinot, Alain Darte, Fabrice Rastello, Benoit Dupont de Dinechin, and Christophe Guillon. 2009. Revisiting Out-of-SSA Translation for Correctness, Code Quality and Efficiency. In 2009 International Symposium on Code Generation and Optimization. 114–125. https://doi.org/10.1109/CGO.2009.19
    [3]
    Josep M. Codina, Josep Llosa, and Antonio González. 2002. A Comparative Study of modulo Scheduling Techniques. In Proceedings of the 16th International Conference on Supercomputing (New York, New York, USA) (ICS ’02). Association for Computing Machinery, New York, NY, USA, 97–106. https://doi.org/10.1145/514191.514208
    [4]
    Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst. 13, 4 (oct 1991), 451–490. https://doi.org/10.1145/115372.115320
    [5]
    A.E. Eichenberger and E.S. Davidson. 1995. Stage scheduling: a technique to reduce the register requirements of a module schedule. In Proceedings of the 28th Annual International Symposium on Microarchitecture. 338–349. https://doi.org/10.1109/MICRO.1995.476843
    [6]
    M. Lam. 1988. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (Atlanta, Georgia, USA) (PLDI ’88). Association for Computing Machinery, New York, NY, USA, 318–328. https://doi.org/10.1145/53990.54022
    [7]
    C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004.75–86. https://doi.org/10.1109/CGO.2004.1281665
    [8]
    ARM Limited. 2015. Arm C Language Extensions. https://arm-software.github.io/acle/main/acle.html Accessed: Nov. 13, 2023.
    [9]
    Fujitsu Limited. 2022. A64FX Microarchitecture Manual v1.8.1. https://github.com/fujitsu/A64FX Accessed: Dec. 1, 2022.
    [10]
    J. Llosa, A. Gonzalez, E. Ayguade, and M. Valero. 1996. Swing module scheduling: a lifetime-sensitive approach. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique. 80–86. https://doi.org/10.1109/PACT.1996.554030
    [11]
    Saeed Maleki, Yaoqing Gao, Maria J. Garzar´n, Tommy Wong, and David A. Padua. 2011. An Evaluation of Vectorizing Compilers. In 2011 International Conference on Parallel Architectures and Compilation Techniques. 372–382. https://doi.org/10.1109/PACT.2011.68
    [12]
    Tim Peters. 1992. Livermore Loops coded in C. https://netlib.org/benchmark/livermorec Accessed: Dec. 1, 2022.
    [13]
    B. Ramakrishna Rau. 1994. Iterative modulo Scheduling: An Algorithm for Software Pipelining Loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture (San Jose, California, USA) (MICRO 27). Association for Computing Machinery, New York, NY, USA, 63–74. https://doi.org/10.1145/192724.192731
    [14]
    Richard M Stallman 1999. Using and porting the GNU compiler collection. Vol. 86. Free Software Foundation Boston, MA, USA.
    [15]
    Nancy J Warter, Grant E Haab, Krishna Subramanian, and John W Bockhaus. 1992. Enhanced modulo scheduling for loops with conditional branches. ACM SIGMICRO Newsletter 23, 1-2 (1992), 170–179.
    [16]
    Javier Zalamea, Josep Llosa, Eduard Ayguadé, and Mateo Valero. 2000. Improved Spill Code Generation for Software Pipelined Loops. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (Vancouver, British Columbia, Canada) (PLDI ’00). Association for Computing Machinery, New York, NY, USA, 134–144. https://doi.org/10.1145/349299.349319

    Index Terms

    1. Introducing software pipelining for the A64FX processor into LLVM

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      HPCAsia '24 Workshops: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops
      January 2024
      134 pages
      ISBN:9798400716522
      DOI:10.1145/3636480
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 January 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. A64FX
      2. AArch64
      3. LLVM
      4. compiler
      5. optimization
      6. software pipelining

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      HPCAsiaWS 2024

      Acceptance Rates

      Overall Acceptance Rate 69 of 143 submissions, 48%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 236
        Total Downloads
      • Downloads (Last 12 months)236
      • Downloads (Last 6 weeks)33
      Reflects downloads up to 12 Aug 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media