Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3528416.3530251acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Towards neural architecture-aware exploration of compiler optimizations in a deep learning {graph} compiler

Published: 17 May 2022 Publication History

Abstract

Deep Neural Networks (DNN) form the basis for many existing and emerging applications. Many DL compilers analyze the computation graphs and apply various optimizations at different stages. These high-level optimizations are applied using compiler passes before feeding the resultant computation graph for low-level and hardware-specific optimizations. With advancements in DNN architectures and backend hardware, the search space of compiler optimizations has grown manifolds. Also, the inclusion of passes without the knowledge of the computation graph leads to increased execution time with a slight influence on the intermediate representation. This paper presents preliminary results 1) summarizing the relevance of pass selection and ordering in a DL compiler, 2) neural architecture-aware selection of optimization passes, and 3) pruning search space for the phase selection problem in a DL compiler. We use TVM as a compiler to demonstrate the experimental results on Nvidia A100 and GeForce RTX 2080 GPUs, establishing the relevance of neural architecture-aware selection of optimization passes for DNNs DL compilers.
Experimental evaluation with seven models categorized into four architecturally different classes demonstrated performance gains for most neural networks. For ResNets, the average throughput increased by 24% and 32% for TensorFlow and PyTorch frameworks, respectively. Additionally, we observed an average 15% decrease in the compilation time for ResNets, 45% for MobileNet, and 54% for SSD-based models without impacting the throughput. BERT models showed a dramatic improvement with a 92% reduction in the compile time.

References

[1]
Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv: 1803.08375 (2018).
[2]
Amir Hossein Ashouri, Andrea Bignoli, Gianluca Palermo, and Cristina Silvano. 2016. Predictive modeling methodology for compiler phase-ordering. In Proceedings of the 7th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 5th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms. 7--12.
[3]
Amir H Ashouri, Andrea Bignoli, Gianluca Palermo, Cristina Silvano, Sameer Kulkarni, and John Cavazos. 2017. Micomp: Mitigating the compiler phase-ordering problem using optimization sub-sequences and machine learning. ACM Transactions on Architecture and Code Optimization (TACO) 14,3 (2017), 1--28.
[4]
Amir H Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A survey on compiler autotuning using machine learning. ACM Computing Surveys (CSUR) 51, 5 (2018), 1--42.
[5]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 578--594.
[6]
Standard Performance Evaluation Corporation. 1995--2022. SPECint95 Benchmark Suite. Retrieved 20220208 from https://www.spec.org/cpu95/CINT95/index.html
[7]
Chris Cummins, Bram Wasti, Jiadong Guo, Brandon Cui, Jason Ansel, Sahir Gomez, Somya Jain, Jia Liu, Olivier Teytaud, Benoit Steiner, Yuandong Tian, and Hugh Leather. 2021. CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research. arXiv.2109.08267 (2021).
[8]
Scott Cyphers, Arjun K Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, et al. 2018. Intel ngraph: An intermediate representation, compiler, and executor for deep learning. arXiv preprint arXiv 1801.08058 (2018).
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[11]
Grigori Fursin and Olivier Temam. 2010. Collective optimization: A practical collaborative approach. ACM Transactions on Architecture and Code Optimization (TACO) 7, 4 (2010), 1--29.
[12]
Kyriakos Georgiou, Craig Blackmore, Samuel Xavier-de Souza, and Kerstin Eder. 2018. Less is more: Exploiting the standard compiler optimization levels for better performance and energy consumption. In Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems. 35--42.
[13]
Masayo Haneda, Peter MW Knijnenburg, and Harry AG Wijshoff. 2005. Optimizing general purpose compiler optimization. In Proceedings of the 2nd conference on Computing frontiers. 180--188.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[15]
Qijing Huang, Ameer Haj-Ali, William Moses, John Xiang, Ion Stoica, Krste Asanovic, and John Wawrzynek. 2019. Autophase: Compiler phase-ordering for hls with deep reinforcement learning. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 308--308.
[16]
Yuanjie Huang, Liang Peng, Chengyong Wu, Yuriy Kashnikov, Jörn Rennecke, and Grigori Fursin. 2010. Transforming GCC into a research-friendly environment: plugins for optimization tuning and reordering, function cloning and program instrumentation. In 2nd International Workshop on GCC Research Opportunities (GROW'10).
[17]
Michael R Jantz and Prasad A Kulkarni. 2013. Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches. In 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). IEEE, 1--10.
[18]
Tarindu Jayatilaka, Hideto Ueno, Giorgis Georgakoudis, EunJung Park, and Johannes Doerfert. 2021. Towards Compile-Time-Reducing Compiler Optimization Selection via Machine Learning. In 50th International Conference on Parallel Processing Workshop. 1--6.
[19]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[20]
Prasad A Kulkarni, David B Whalley, Gary S Tyson, and Jack W Davidson. 2009. Practical exhaustive optimization phase order exploration and evaluation. ACM Transactions on Architecture and Code Optimization (TACO) 6, 1 (2009), 1--36.
[21]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
[22]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. SSD: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.
[23]
Luiz GA Martins, Ricardo Nobre, Joao MP Cardoso, Alexandre CB Delbem, and Eduardo Marques. 2016. Clustering-based selection for the exploration of compiler optimization sequences. ACM Transactions on Architecture and Code Optimization (TACO) 13, 1 (2016), 1--28.
[24]
Luiz GA Martins, Ricardo Nobre, Alexandre CB Delbem, Eduardo Marques, and João MP Cardoso. 2014. A clustering-based approach for exploring sequences of compiler optimizations. In 2014 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2436--2443.
[25]
Mena Nagiub and Wael Farag. 2013. Automatic selection of compiler options using genetic techniques for embedded software design. In 2013 IEEE14th International Symposium on Computational Intelligence and Informatics (CINTI). 69--74.
[26]
Ricardo Nobre, Luiz GA Martins, and Joao MP Cardoso. 2015. Use of previously acquired positioning of optimizations for phase ordering exploration. In Proceedings of the 18th international workshop on software and compilers for embedded systems. 58--67.
[27]
Ricardo Nobre, Luiz GA Martins, and João MP Cardoso. 2016. A graph-based iterative compiler pass selection and phase ordering approach. ACM SIGPLAN Notices 51, 5 (2016), 21--30.
[28]
Ricardo Nobre, Luís Reis, and Joao MP Cardoso. 2018. Compiler phase ordering as an orthogonal approach for reducing energy consumption. arXiv preprint arXiv 1807.00638 (2018).
[29]
NVIDIA. v2021.2.2.0. Nsight Compute Command Line Profiler. Retrieved 20220208 from https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html
[30]
NVIDIA. v2021.3.2.4. NVIDIA Nsight Systems. Retrieved 20220208 from https://docs.nvidia.com/nsight-systems/
[31]
NVIDIA. v8.0.1. TensorRT. https://developer.nvidia.com/tensorrt
[32]
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv 1806.03822 (2018).
[33]
Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv 1805.00907 (2018).
[34]
Amit Sabne. 2020. XLA : Compiling Machine Learning for Peak Performance.
[35]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.
[36]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[37]
Gaurav Verma, Yashi Gupta, Abid M Malik, and Barbara Chapman. 2021. Performance evaluation of deep learning compilers for edge inference. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 858--865.

Cited By

View all
  • (2025)Deep Learning Library Testing: Definition, Methods and ChallengesACM Computing Surveys10.1145/3716497Online publication date: 5-Feb-2025
  • (2024)Cross-Feature Transfer Learning for Efficient Tensor Program GenerationApplied Sciences10.3390/app1402051314:2(513)Online publication date: 6-Jan-2024

Index Terms

  1. Towards neural architecture-aware exploration of compiler optimizations in a deep learning {graph} compiler

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers
      May 2022
      321 pages
      ISBN:9781450393386
      DOI:10.1145/3528416
      © 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 May 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. deep learning compilers
      2. neural architecture
      3. pass selection
      4. search space pruning

      Qualifiers

      • Research-article

      Funding Sources

      • U.S. Office of the Under Secretary of Defense for Research and Engineering (OUSD(R&E))

      Conference

      CF '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 273 of 785 submissions, 35%

      Upcoming Conference

      CF '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)34
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Deep Learning Library Testing: Definition, Methods and ChallengesACM Computing Surveys10.1145/3716497Online publication date: 5-Feb-2025
      • (2024)Cross-Feature Transfer Learning for Efficient Tensor Program GenerationApplied Sciences10.3390/app1402051314:2(513)Online publication date: 6-Jan-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media