research-article

Towards neural architecture-aware exploration of compiler optimizations in a deep learning {graph} compiler

Authors:

Swetang Finviya,

Barbara ChapmanAuthors Info & Claims

CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers

Pages 244 - 250

https://doi.org/10.1145/3528416.3530251

Published: 17 May 2022 Publication History

Abstract

Deep Neural Networks (DNN) form the basis for many existing and emerging applications. Many DL compilers analyze the computation graphs and apply various optimizations at different stages. These high-level optimizations are applied using compiler passes before feeding the resultant computation graph for low-level and hardware-specific optimizations. With advancements in DNN architectures and backend hardware, the search space of compiler optimizations has grown manifolds. Also, the inclusion of passes without the knowledge of the computation graph leads to increased execution time with a slight influence on the intermediate representation. This paper presents preliminary results 1) summarizing the relevance of pass selection and ordering in a DL compiler, 2) neural architecture-aware selection of optimization passes, and 3) pruning search space for the phase selection problem in a DL compiler. We use TVM as a compiler to demonstrate the experimental results on Nvidia A100 and GeForce RTX 2080 GPUs, establishing the relevance of neural architecture-aware selection of optimization passes for DNNs DL compilers.

Experimental evaluation with seven models categorized into four architecturally different classes demonstrated performance gains for most neural networks. For ResNets, the average throughput increased by 24% and 32% for TensorFlow and PyTorch frameworks, respectively. Additionally, we observed an average 15% decrease in the compilation time for ResNets, 45% for MobileNet, and 54% for SSD-based models without impacting the throughput. BERT models showed a dramatic improvement with a 92% reduction in the compile time.

References

[1]

Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv: 1803.08375 (2018).

[2]

Amir Hossein Ashouri, Andrea Bignoli, Gianluca Palermo, and Cristina Silvano. 2016. Predictive modeling methodology for compiler phase-ordering. In Proceedings of the 7th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 5th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms. 7--12.

[3]

Amir H Ashouri, Andrea Bignoli, Gianluca Palermo, Cristina Silvano, Sameer Kulkarni, and John Cavazos. 2017. Micomp: Mitigating the compiler phase-ordering problem using optimization sub-sequences and machine learning. ACM Transactions on Architecture and Code Optimization (TACO) 14,3 (2017), 1--28.

Digital Library

[4]

Amir H Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A survey on compiler autotuning using machine learning. ACM Computing Surveys (CSUR) 51, 5 (2018), 1--42.

Digital Library

[5]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 578--594.

[6]

Standard Performance Evaluation Corporation. 1995--2022. SPECint95 Benchmark Suite. Retrieved 20220208 from https://www.spec.org/cpu95/CINT95/index.html

[7]

Chris Cummins, Bram Wasti, Jiadong Guo, Brandon Cui, Jason Ansel, Sahir Gomez, Somya Jain, Jia Liu, Olivier Teytaud, Benoit Steiner, Yuandong Tian, and Hugh Leather. 2021. CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research. arXiv.2109.08267 (2021).

[8]

Scott Cyphers, Arjun K Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, et al. 2018. Intel ngraph: An intermediate representation, compiler, and executor for deep learning. arXiv preprint arXiv 1801.08058 (2018).

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[11]

Grigori Fursin and Olivier Temam. 2010. Collective optimization: A practical collaborative approach. ACM Transactions on Architecture and Code Optimization (TACO) 7, 4 (2010), 1--29.

Digital Library

[12]

Kyriakos Georgiou, Craig Blackmore, Samuel Xavier-de Souza, and Kerstin Eder. 2018. Less is more: Exploiting the standard compiler optimization levels for better performance and energy consumption. In Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems. 35--42.

Digital Library

[13]

Masayo Haneda, Peter MW Knijnenburg, and Harry AG Wijshoff. 2005. Optimizing general purpose compiler optimization. In Proceedings of the 2nd conference on Computing frontiers. 180--188.

Digital Library

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[15]

Qijing Huang, Ameer Haj-Ali, William Moses, John Xiang, Ion Stoica, Krste Asanovic, and John Wawrzynek. 2019. Autophase: Compiler phase-ordering for hls with deep reinforcement learning. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 308--308.

[16]

Yuanjie Huang, Liang Peng, Chengyong Wu, Yuriy Kashnikov, Jörn Rennecke, and Grigori Fursin. 2010. Transforming GCC into a research-friendly environment: plugins for optimization tuning and reordering, function cloning and program instrumentation. In 2nd International Workshop on GCC Research Opportunities (GROW'10).

[17]

Michael R Jantz and Prasad A Kulkarni. 2013. Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches. In 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). IEEE, 1--10.

Digital Library

[18]

Tarindu Jayatilaka, Hideto Ueno, Giorgis Georgakoudis, EunJung Park, and Johannes Doerfert. 2021. Towards Compile-Time-Reducing Compiler Optimization Selection via Machine Learning. In 50th International Conference on Parallel Processing Workshop. 1--6.

Digital Library

[19]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[20]

Prasad A Kulkarni, David B Whalley, Gary S Tyson, and Jack W Davidson. 2009. Practical exhaustive optimization phase order exploration and evaluation. ACM Transactions on Architecture and Code Optimization (TACO) 6, 1 (2009), 1--36.

Digital Library

[21]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.

[22]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. SSD: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.

[23]

Luiz GA Martins, Ricardo Nobre, Joao MP Cardoso, Alexandre CB Delbem, and Eduardo Marques. 2016. Clustering-based selection for the exploration of compiler optimization sequences. ACM Transactions on Architecture and Code Optimization (TACO) 13, 1 (2016), 1--28.

Digital Library

[24]

Luiz GA Martins, Ricardo Nobre, Alexandre CB Delbem, Eduardo Marques, and João MP Cardoso. 2014. A clustering-based approach for exploring sequences of compiler optimizations. In 2014 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2436--2443.

[25]

Mena Nagiub and Wael Farag. 2013. Automatic selection of compiler options using genetic techniques for embedded software design. In 2013 IEEE14th International Symposium on Computational Intelligence and Informatics (CINTI). 69--74.

[26]

Ricardo Nobre, Luiz GA Martins, and Joao MP Cardoso. 2015. Use of previously acquired positioning of optimizations for phase ordering exploration. In Proceedings of the 18th international workshop on software and compilers for embedded systems. 58--67.

Digital Library

[27]

Ricardo Nobre, Luiz GA Martins, and João MP Cardoso. 2016. A graph-based iterative compiler pass selection and phase ordering approach. ACM SIGPLAN Notices 51, 5 (2016), 21--30.

Digital Library

[28]

Ricardo Nobre, Luís Reis, and Joao MP Cardoso. 2018. Compiler phase ordering as an orthogonal approach for reducing energy consumption. arXiv preprint arXiv 1807.00638 (2018).

[29]

NVIDIA. v2021.2.2.0. Nsight Compute Command Line Profiler. Retrieved 20220208 from https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html

[30]

NVIDIA. v2021.3.2.4. NVIDIA Nsight Systems. Retrieved 20220208 from https://docs.nvidia.com/nsight-systems/

[31]

NVIDIA. v8.0.1. TensorRT. https://developer.nvidia.com/tensorrt

[32]

Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv 1806.03822 (2018).

[33]

Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv 1805.00907 (2018).

[34]

Amit Sabne. 2020. XLA : Compiling Machine Learning for Peak Performance.

[35]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[37]

Gaurav Verma, Yashi Gupta, Abid M Malik, and Barbara Chapman. 2021. Performance evaluation of deep learning compilers for edge inference. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 858--865.

Cited By

Zhang XJiang WShen CLi QWang QLin CGuan X(2025)Deep Learning Library Testing: Definition, Methods and ChallengesACM Computing Surveys10.1145/3716497Online publication date: 5-Feb-2025
https://dl.acm.org/doi/10.1145/3716497
Verma GRaskar SEmani MChapman B(2024)Cross-Feature Transfer Learning for Efficient Tensor Program GenerationApplied Sciences10.3390/app1402051314:2(513)Online publication date: 6-Jan-2024
https://doi.org/10.3390/app14020513

Index Terms

Towards neural architecture-aware exploration of compiler optimizations in a deep learning {graph} compiler
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Performance evaluation of OpenMP's target construct on GPUs-exploring compiler optimisations

OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP's high-level parallel abstractions with accelerator programming. This ...
The Intel labs Haskell research compiler
Haskell '13

The Glasgow Haskell Compiler (GHC) is a well supported optimizing compiler for the Haskell programming language, along with its own extensions to the language and libraries. Haskell's lazy semantics imposes a runtime model which is in general difficult ...
Latte: a language, compiler, and runtime for elegant and efficient deep neural networks
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Deep neural networks (DNNs) have undergone a surge in popularity with consistent advances in the state of the art for tasks including image recognition, natural language processing, and speech recognition. The computationally expensive nature of these ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers

May 2022

321 pages

ISBN:9781450393386

DOI:10.1145/3528416

General Chair:
Luca Sterpone
Politecnico di Torino, IT
,
Program Chairs:
Andrea Bartolini
Universit`a di Bologna, IT
,
Anastasiia Butko
Lawrence Berkeley National Laboratory

Copyright © 2022 ACM.

© 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Office of the Under Secretary of Defense for Research and Engineering (OUSD(R&E))

Conference

CF '22

Sponsor:

SIGMICRO

CF '22: 19th ACM International Conference on Computing Frontiers

May 17 - 22, 2022

Turin, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
206
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)6

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XJiang WShen CLi QWang QLin CGuan X(2025)Deep Learning Library Testing: Definition, Methods and ChallengesACM Computing Surveys10.1145/3716497Online publication date: 5-Feb-2025
https://dl.acm.org/doi/10.1145/3716497
Verma GRaskar SEmani MChapman B(2024)Cross-Feature Transfer Learning for Efficient Tensor Program GenerationApplied Sciences10.3390/app1402051314:2(513)Online publication date: 6-Jan-2024
https://doi.org/10.3390/app14020513

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten