research-article

Fast Instruction Selection for Fast Digital Signal Processing

Authors:

Alexander J Root,

Maaz Bin Safeer Ahmad,

Dillon Sharlet,

Jonathan Ragan-KelleyAuthors Info & Claims

ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4

Pages 125 - 137

https://doi.org/10.1145/3623278.3624768

Published: 07 February 2024 Publication History

Abstract

Modern vector processors support a wide variety of instructions for fixed-point digital signal processing. These instructions support a proliferation of rounding, saturating, and type conversion modes, and are often fused combinations of more primitive operations. While these are common idioms in fixed-point signal processing, it is difficult to use these operations in portable code. It is challenging for programmers to write down portable integer arithmetic in a C-like language that corresponds exactly to one of these instructions, and even more challenging for compilers to recognize when these instructions can be used. Our system, Pitchfork, defines a portable fixed-point intermediate representation, FPIR, that captures common idioms in fixed-point code. FPIR can be used directly by programmers experienced with fixed-point, or Pitchfork can automatically lift from integer operations into FPIR using a term-rewriting system (TRS) composed of verified manual and automatically-synthesized rules. Pitchfork then lowers from FPIR into target-specific fixed-point instructions using a set of target-specific TRSs. We show that this approach improves runtime performance of portably-written fixed-point signal processing code in Halide, across a range of benchmarks, by geomean 1.31x on x86 with AVX2, 1.82x on ARM Neon, and 2.44x on Hexagon HVX compared to a standard LLVM-based compiler flow, while maintaining or improving existing compile times.

References

[1]

Andrew Adams and Dillon Sharlet. 2022. Better Fixed-Point Filtering with Averaging Trees. Proceedings of the ACM on Computer Graphics and Interactive Techniques 5, 3 (July 2022), 1--8.

Digital Library

[2]

Maaz Bin Safeer Ahmad and Alvin Cheung. 2018. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 1205--1220.

Digital Library

[3]

Maaz Bin Safeer Ahmad, Jonathan Ragan-Kelley, Alvin Cheung, and Shoaib Kamil. 2019. Automatically translating image processing libraries to halide. ACM Transactions on Graphics 38 (11 2019), 1--13.

Digital Library

[4]

Maaz Bin Safeer Ahmad, Alexander J. Root, Andrew Adams, Shoaib Kamil, and Alvin Cheung. 2022. Vector Instruction Selection for Digital Signal Processors using Program Synthesis. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery.

Digital Library

[5]

Randy Allen and Ken Kennedy. 1987. Automatic Translation of FORTRAN Programs to Vector Form. ACM Trans. Program. Lang. Syst. 9, 4 (oct 1987), 491--542.

Digital Library

[6]

Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013. 1--17.

[7]

ARM. [n.d.]. Learn the architecture - Neon programmers' guide: D.3.13. VRHSUB. Technical Report. ARM Developer. https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Arithmetic/VRHSUB

[8]

Sara S. Baghsorkhi, Nalini Vasudevan, and Youfeng Wu. 2016. FlexVec: Auto-Vectorization for Irregular Loops. SIGPLAN Not. 51, 6 (jun 2016), 697--710.

Digital Library

[9]

Gilles Barthe, Juan Manuel Crespo, Sumit Gulwani, Cesar Kunz, and Mark Marron. 2013. From Relational Verification to SIMD Loop Synthesis. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Shenzhen, China) (PPoPP '13). Association for Computing Machinery, New York, NY, USA, 123--134.

Digital Library

[10]

Sebastian Buchwald, Andreas Fried, and Sebastian Hack. 2018. Synthesizing an Instruction Selection Rule Library from Semantic Specifications. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (Vienna, Austria) (CGO 2018). Association for Computing Machinery, New York, NY, USA, 300--313.

Digital Library

[11]

Yishen Chen, Charith Mendis, and Saman Amarasinghe. 2022. All You Need is Superword-Level Parallelism: Systematic Control-Flow Vectorization with SLP. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (San Diego, CA, USA) (PLDI 2022). Association for Computing Machinery, New York, NY, USA, 301--315.

Digital Library

[12]

Yishen Chen, Charith Mendis, Michael Carbin, and Saman Amarasinghe. 2021. VeGen: A Vectorizer Generator for SIMD and Beyond. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA, 902--914.

Digital Library

[13]

Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Optimizing Database-Backed Applications with Query Synthesis. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA) (PLDI '13). Association for Computing Machinery, New York, NY, USA.

Digital Library

[14]

L. Codrescu, W. Anderson, S. Venkumanhanti, M. Zeng, E. Plondke, C. Koob, A. Ingle, C. Tabony, and R. Maule. 2014. Hexagon DSP: An Architecture Optimized for Mobile Multimedia and Communications. IEEE Micro 34, 02 (mar 2014), 34--43.

[15]

Meghan Cowan, Thierry Moreau, Tianqi Chen, James Bornholt, and Luis Ceze. 2020. Automatic Generation of High-Performance Quantized Machine Learning Kernels. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. ACM.

Digital Library

[16]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (Budapest, Hungary) (TACAS'08/ETAPS'08). Springer-Verlag, Berlin, Heidelberg, 337--340.

[17]

Henry Gordon Dietz. [n.d.]. The Aggregate Magic Algorithms. Technical Report. University of Kentucky. http://aggregate.org/MAGIC/

[18]

Erich Elsen, Marat Dukhan, Trevor Gale, and Karen Simonyan. 2019. Fast Sparse ConvNets.

[19]

LLVM Foundation. 2022. LLVM Fixed Point Arithmetic Intrinsics. https://llvm.org/docs/LangRef.html. Accessed: 2022-10-18.

[20]

S. Gulwani, O. Polozov, and R. Singh. 2017. Program Synthesis. Now Publishers. https://books.google.com/books?id=mK5ctAEACAAJ

[21]

Samuel W. Hasinoff, Dillon Sharlet, Ryan Geiss, Andrew Adams, Jonathan T. Barron, Florian Kainz, Jiawen Chen, and Marc Levoy. 2016. Burst Photography for High Dynamic Range and Low-Light Imaging on Mobile Cameras. ACM Trans. Graph. 35, 6, Article 192 (nov 2016), 12 pages.

Digital Library

[22]

Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, William T. Freeman, and Frédo Durand. 2021. QuanTaichi: A Compiler for Quantized Simulations. ACM Transactions on Graphics 40, 4 (aug 2021), 1--16.

Digital Library

[23]

Intel. [n.d.]. Intel Intrinsics Guide. Technical Report. Intel. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html

[24]

Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. 2016. Verified lifting of stencil computations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13--17, 2016. 711--726.

Digital Library

[25]

Samuel Larsen and Saman Amarasinghe. 2000. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. SIGPLAN Not. 35, 5 (may 2000), 145--156.

Digital Library

[26]

Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (Palo Alto, California) (CGO '04). IEEE Computer Society, USA, 75.

[27]

DongKwon Lee, Woosuk Lee, Hakjoo Oh, and Kwangkeun Yi. 2020. Optimizing Homomorphic Evaluation Circuits by Program Synthesis and Term Rewriting. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery.

Digital Library

[28]

libjpeg turbo. 2022. libjpeg-turbo. Technical Report. https://github.com/libjpeg-turbo/libjpeg-turbo/tree/5446ff88d617b2d2768456d9be1a8c47c4606c92/simd

[29]

ARM Limited. 2011. Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile. https://developer.arm.com/documentation/ddi0487/ga

[30]

Jun Liu, Yuanrui Zhang, Ohyoung Jang, Wei Ding, and Mahmut Kandemir. 2012. A Compiler Framework for Extracting Superword Level Parallelism. SIGPLAN Not. 47, 6 (jun 2012), 347--358.

Digital Library

[31]

Nuno P. Lopes, Juneyoung Lee, Chung-Kil Hur, Zhengyang Liu, and John Regehr. 2021. Alive2: Bounded Translation Validation for LLVM. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (Virtual, Canada) (PLDI 2021). Association for Computing Machinery, New York, NY, USA, 65--79.

Digital Library

[32]

Nuno P. Lopes, David Menendez, Santosh Nagarakatte, and John Regehr. 2015. Provably Correct Peephole Optimizations with Alive. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (Portland, OR, USA) (PLDI '15). Association for Computing Machinery, New York, NY, USA, 22--32.

Digital Library

[33]

John McFarlane. 2018. Fixed-Point Real Numbers. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0037r5.html. Accessed: 2022-10-18.

[34]

Charith Mendis and Saman Amarasinghe. 2018. GoSLP: Globally Optimized Superword Level Parallelism Framework. Proc. ACM Program. Lang. 2, OOPSLA, Article 110 (oct 2018), 28 pages.

Digital Library

[35]

David Menendez and Santosh Nagarakatte. 2017. Alive-Infer: Data-Driven Precondition Inference for Peephole Optimizations in LLVM. SIGPLAN Not. 52, 6 (jun 2017), 49--63.

Digital Library

[36]

Millind Mittal, Alex Peleg, and Uri Weiser. 1997. MMX Technology Architecture Overview. Intel Technology Journal Q3 (1997), 12. http://developer.intel.com/technology/itj/q31997/articles/art_2.htm;http://developer.intel.com/technology/itj/q31997/pdf/archite.pdf

[37]

Chandrakana Nandi, Max Willsey, Amy Zhu, Yisu Remy Wang, Brett Saiki, Adam Anderson, Adriana Schulz, Dan Grossman, and Zachary Tatlock. 2021. Rewrite Rule Inference Using Equality Saturation. Proc. ACM Program. Lang. 5, OOPSLA, Article 119 (oct 2021), 28 pages.

Digital Library

[38]

Julie L. Newcomb, Andrew Adams, Steven Johnson, Rastislav Bodik, and Shoaib Kamil. 2020. Verifying and Improving Halide's Term Rewriting System with Program Synthesis. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1--28.

Digital Library

[39]

Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-Vectorization of Interleaved Data for SIMD. SIGPLAN Not. 41, 6 (jun 2006), 132--143.

Digital Library

[40]

William K. Pratt. 2007. Digital Image Processing: PIKS Scientific Inside. Wiley-Interscience, USA.

[41]

Qualcomm Technologies 2018. Qualcomm Hexagon V66 HVX Programmer's Reference Manual (80-n2040-44 rev. b ed.). Qualcomm Technologies.

[42]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA) (PLDI '13). Association for Computing Machinery, New York, NY, USA, 519--530.

Digital Library

[43]

RISC-V. [n.d.]. RISC-V "V" Vector Extension. Technical Report. RISC-V. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#v-vector-extension-for-application-processors

[44]

Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Gratian Lup, Jubi Taneja, and John Regehr. 2017. Souper: A Synthesizing Superoptimizer.

[45]

Eric Schkufza, Rahul Sharma, and Alex Aiken. 2016. Stochastic Program Optimization. Commun. ACM 59, 2 (Jan. 2016), 114--122.

Digital Library

[46]

Manu Mathew Thomas, Karthik Vaidyanathan, Gabor Liktor, and Angus G. Forbes. 2020. A Reduced-Precision Network for Image Reconstruction. ACM Trans. Graph. 39, 6, Article 231 (nov 2020), 12 pages.

Digital Library

[47]

Emina Torlak and Rastislav Bodik. 2013. Growing Solver-Aided Languages with Rosette. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Indianapolis, Indiana, USA) (Onward! 2013). Association for Computing Machinery, New York, NY, USA, 135--152.

Digital Library

[48]

Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson. 2021. Vectorization for Digital Signal Processors via Equality Saturation. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA, 874--886.

Digital Library

[49]

WebAssembly. [n.d.]. Relaxed SIMD proposal for WebAssembly. Technical Report. WebAssembly. https://github.com/WebAssembly/relaxed-simd

[50]

WebAssembly. [n.d.]. WebAssembly 128-bit packed SIMD Extension. Technical Report. WebAssembly. https://github.com/WebAssembly/simd/blob/main/proposals/simd/SIMD.md

[51]

Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. Egg: Fast and Extensible Equality Saturation. Proc. ACM Program. Lang. 5, POPL, Article 23 (jan 2021), 29 pages.

Digital Library

[52]

Zhilei Xu, Shoaib Kamil, and Armando Solar-Lezama. 2014. MSL: A Synthesis Enabled Language for Distributed Implementations. In SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 311--322.

Digital Library

[53]

Yichen Yang, Phitchaya Mangpo Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy, and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization.

[54]

Yihong Zhang, Yisu Remy Wang, Max Willsey, and Zachary Tatlock. 2022. Relational E-Matching. Proc. ACM Program. Lang. 6, POPL, Article 35 (jan 2022), 22 pages.

Digital Library

Index Terms

Fast Instruction Selection for Fast Digital Signal Processing
1. Hardware
  1. Integrated circuits
    1. Logic circuits

Index terms have been assigned to the content through auto-classification.

Recommendations

Fast, frequency-based, integrated register allocation and instruction scheduling

Instruction scheduling and register allocation are two of the most important optimization phases in modern compilers as they have a significant impact on the quality of the generated code. Unfortunately, the objectives of these two optimizations are in ...
Very long instruction word architectures for digital signal processing
A scalable front-end architecture for fast instruction delivery
ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture

In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4

March 2023

430 pages

ISBN:9798400703942

DOI:10.1145/3623278

Chair:
Tor Aamodt,
Program Chair:
Michael M Swift,
Program Co-chair:
Natalie Enright Jerger

Copyright © 2023 Owner/Author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2024

Check for updates

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

ASPLOS '23

Sponsor:

ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4

March 25 - 29, 2023

BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
216
Total Downloads

Downloads (Last 12 months)216
Downloads (Last 6 weeks)34

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents