research-article

Open access

Introducing software pipelining for the A64FX processor into LLVM

Authors:

Naoto Fukumoto,

Hitoshi MuraiAuthors Info & Claims

HPCAsia '24 Workshops: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops

Pages 1 - 6

https://doi.org/10.1145/3636480.3637093

Published: 11 January 2024 Publication History

All formats PDF

Abstract

Software pipelining is an essential optimization for accelerating High-Performance Computing(HPC) applications on CPUs. Modern CPUs achieve high performance through many-core and wide SIMD instructions. Software pipelining is an optimization that promotes further performance improvement of HPC applications by cooperating with these functions. Although open source compilers such as GCC and LLVM have implemented software pipelining, it is underutilized for the AArch64 architecture. We have implemented software pipelining for the A64FX processor on LLVM to improve this situation. This paper describes the details of this implementation. We also confirmed that our implementation improves the performance of several benchmark programs.

References

[1]

Vicki H. Allan, Reese B. Jones, Randall M. Lee, and Stephen J. Allan. 1995. Software Pipelining. ACM Comput. Surv. 27, 3 (sep 1995), 367–432. https://doi.org/10.1145/212094.212131

Digital Library

[2]

Benoit Boissinot, Alain Darte, Fabrice Rastello, Benoit Dupont de Dinechin, and Christophe Guillon. 2009. Revisiting Out-of-SSA Translation for Correctness, Code Quality and Efficiency. In 2009 International Symposium on Code Generation and Optimization. 114–125. https://doi.org/10.1109/CGO.2009.19

Digital Library

[3]

Josep M. Codina, Josep Llosa, and Antonio González. 2002. A Comparative Study of modulo Scheduling Techniques. In Proceedings of the 16th International Conference on Supercomputing (New York, New York, USA) (ICS ’02). Association for Computing Machinery, New York, NY, USA, 97–106. https://doi.org/10.1145/514191.514208

Digital Library

[4]

Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst. 13, 4 (oct 1991), 451–490. https://doi.org/10.1145/115372.115320

Digital Library

[5]

A.E. Eichenberger and E.S. Davidson. 1995. Stage scheduling: a technique to reduce the register requirements of a module schedule. In Proceedings of the 28th Annual International Symposium on Microarchitecture. 338–349. https://doi.org/10.1109/MICRO.1995.476843

[6]

M. Lam. 1988. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (Atlanta, Georgia, USA) (PLDI ’88). Association for Computing Machinery, New York, NY, USA, 318–328. https://doi.org/10.1145/53990.54022

Digital Library

[7]

C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004.75–86. https://doi.org/10.1109/CGO.2004.1281665

[8]

ARM Limited. 2015. Arm C Language Extensions. https://arm-software.github.io/acle/main/acle.html Accessed: Nov. 13, 2023.

[9]

Fujitsu Limited. 2022. A64FX Microarchitecture Manual v1.8.1. https://github.com/fujitsu/A64FX Accessed: Dec. 1, 2022.

[10]

J. Llosa, A. Gonzalez, E. Ayguade, and M. Valero. 1996. Swing module scheduling: a lifetime-sensitive approach. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique. 80–86. https://doi.org/10.1109/PACT.1996.554030

[11]

Saeed Maleki, Yaoqing Gao, Maria J. Garzar´n, Tommy Wong, and David A. Padua. 2011. An Evaluation of Vectorizing Compilers. In 2011 International Conference on Parallel Architectures and Compilation Techniques. 372–382. https://doi.org/10.1109/PACT.2011.68

Digital Library

[12]

Tim Peters. 1992. Livermore Loops coded in C. https://netlib.org/benchmark/livermorec Accessed: Dec. 1, 2022.

[13]

B. Ramakrishna Rau. 1994. Iterative modulo Scheduling: An Algorithm for Software Pipelining Loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture (San Jose, California, USA) (MICRO 27). Association for Computing Machinery, New York, NY, USA, 63–74. https://doi.org/10.1145/192724.192731

Digital Library

[14]

Richard M Stallman 1999. Using and porting the GNU compiler collection. Vol. 86. Free Software Foundation Boston, MA, USA.

[15]

Nancy J Warter, Grant E Haab, Krishna Subramanian, and John W Bockhaus. 1992. Enhanced modulo scheduling for loops with conditional branches. ACM SIGMICRO Newsletter 23, 1-2 (1992), 170–179.

Digital Library

[16]

Javier Zalamea, Josep Llosa, Eduard Ayguadé, and Mateo Valero. 2000. Improved Spill Code Generation for Software Pipelined Loops. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (Vancouver, British Columbia, Canada) (PLDI ’00). Association for Computing Machinery, New York, NY, USA, 134–144. https://doi.org/10.1145/349299.349319

Digital Library

Index Terms

Introducing software pipelining for the A64FX processor into LLVM
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Retargetable compilers

Recommendations

Evaluation of scheduling techniques on a SPARC-based VLIW testbed
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

The performance of Very Long Instruction Word (VLIW) microprocessors depends on the close cooperation between the compiler and the architecture. This paper evaluates a set of important compilation techniques and related architectural features for VLIW ...
Translating AArch64 Floating-Point Instruction Set to the x86-64 Platform
ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel Processing

Binary translation translates binary programs from one instruction set to another. It is widely used in virtual machines and emulators. We extend mc2llvm, which is an LLVM-based retargetable 32-bit binary translator developed in our lab in the past ...
UltraSPARC: Compiling for Maximum Floating Point Performance
COMPCON '96: Proceedings of the 41st IEEE International Computer Conference

UltraSPARC-I is the first microprocessor from Sun Microsystems to implement the new 64-bit SPARC V9 architecture. UltraSPARC-I is a superscalar processor capable of issuing up to four instructions together and possesses several features designed to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HPCAsia '24 Workshops: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops

January 2024

134 pages

ISBN:9798400716522

DOI:10.1145/3636480

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

HPCAsiaWS 2024

HPCAsiaWS 2024: International Conference on High Performance Computing in Asia-Pacific Region Workshops

January 25 - 27, 2024

Nagoya, Japan

Acceptance Rates

Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
236
Total Downloads

Downloads (Last 12 months)236
Downloads (Last 6 weeks)33

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents