Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

Published: 16 February 2022 Publication History

Abstract

This article presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs.

References

[1]
Christian Bischof, Niels Guertler, Andreas Kowarz, and Andrea Walther. 2008. Parallel reverse mode automatic differentiation for OpenMP programs with ADOL-C. In Advances in Automatic Differentiation, Christian H. Bischof, H. Martin Bücker, Paul D. Hovland, Uwe Naumann, and J. Utke (Eds.). Springer, 163–173. https://doi.org/10.1007/978-3-540-68942-3_15
[2]
H. Martin Bücker, Bruno Lang, Dieter an Mey, and Christian H. Bischof. 2001. Bringing together automatic differentiation and OpenMP. In Proceedings of the 15th ACM International Conference on Supercomputing. ACM Press, New York, 246–251. https://doi.org/10.1145/377792.377842
[3]
H. M. Bücker, B. Lang, A. Rasch, C. H. Bischof, and D. an Mey. 2002. Explicit loop scheduling in OpenMP for parallel automatic differentiation. In Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications, J. N. Almhana and V. C. Bhavsar (Eds.). IEEE Computer Society Press, Los Alamitos, CA, 121–126. https://doi.org/10.1109/HPCSA.2002.1019144
[4]
H. M. Bücker, A. Rasch, and A. Wolf. 2004. A class of OpenMP applications involving nested parallelism. In Proceedings of the 19th ACM Symposium on Applied Computing, Nicosia, Vol. 1. ACM Press, New York, 220–224. https://doi.org/10.1145/967900.967948
[5]
Jose Cardesa, Laurent Hascoët, and Christophe Airiau. 2020. Adjoint computations by algorithmic differentiation of a parallel solver for time-dependent PDEs. Journal of Computational Science (2020), 101155. https://doi.org/10.1016/j.jocs.2020.101155
[6]
Shiyi Chen and Gary D. Doolen. 1998. Lattice Boltzmann method for fluid flows. Annual Review of Fluid Mechanics 30, 1 (1998), 329–364.
[7]
Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An industry standard API for shared-memory programming. IEEE Computational Science and Engineering 5, 1 (1998), 46–55.
[8]
T. El-Ghazawi. 2007. Partitioned global address space (PGAS) programming languages. In Tutorial at SC07. http://sc07.supercomputing.org/.
[9]
Michael Förster. 2014. Algorithmic Differentiation of Pragma-Defined Parallel Regions: Differentiating Computer Programs Containing OpenMP. Ph.D. Dissertation. RWTH Aachen.
[10]
Ralf Giering and Thomas Kaminski. 1996. Recipes for Adjoint Code Construction. Technical Report 212. Max-Planck-Institut für Meteorologie. http://www.mpimet.mpg.de/en/web/science/a_reports_archive.php?actual=1996.
[11]
Ralf Giering and Thomas Kaminski. 2003. Applying TAF to generate efficient derivative code of Fortran 77-95 programs. In PAMM: Proceedings in Applied Mathematics and Mechanics, Vol. 2. Wiley Online Library, 54–57.
[12]
Ralf Giering, Thomas Kaminski, Ricardo Todling, Ronald Errico, Ronald Gelaro, and Nathan Winslow. 2005. Tangent linear and adjoint versions of NASA/GMAO’s Fortran 90 global weather forecast model. In Automatic Differentiation: Applications, Theory, and Implementations, H. Martin Bücker, George F. Corliss, Paul D. Hovland, Uwe Naumann, and Boyana Norris (Eds.). Springer, 275–284.
[13]
Markus Grabner, Thomas Pock, Tobias Gross, and Bernhard Kainz. 2008. Automatic differentiation for GPU-Accelerated 2D/3D Registration. In Advances in Automatic Differentiation, Christian H. Bischof, H. Martin Bücker, Paul D. Hovland, Uwe Naumann, and J. Utke (Eds.). Springer, 259–269. https://doi.org/10.1007/978-3-540-68942-3_23
[14]
Andreas Griewank and Andrea Walther. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation (2nd ed.). Number 105 in Other Titles in Applied Mathematics. SIAM, Philadelphia, PA. http://www.ec-securehost.com/SIAM/OT105.html.
[15]
Alexander Hück, Christian Bischof, Max Sagebaum, Nicolas R. Gauger, Benjamin Jurgelucks, Eric Larour, and Gilberto Perez. 2018. A usability case study of algorithmic differentiation tools on the ISSM ice sheet model. Optimization Methods & Software 33, 4–6 (2018), 844–867. https://doi.org/10.1080/10556788.2017.1396602arXiv:https://doi.org/10.1080/10556788.2017.1396602
[16]
J. C. Hückelheim, P. D. Hovland, M. M. Strout, and J.-D. Müller. 2018. Parallelizable adjoint stencil computations using transposed forward-mode algorithmic differentiation. Optimization Methods and Software 33, 4-6 (2018), 672–693. https://doi.org/10.1080/10556788.2018.1435654arXiv:https://doi.org/10.1080/10556788.2018.1435654
[17]
Jan Hückelheim, Paul D. Hovland, Michelle Mills Strout, and Jens-Dominik Müller. 2017. Reverse-mode algorithmic differentiation of an OpenMP-parallel compressible flow solver. International Journal for High Performance Computing Applications (2017), https://doi.org/10.1177/1094342017712060
[18]
Jan Hückelheim, Navjot Kukreja, Sri Hari Krishna Narayanan, Fabio Luporini, Gerard Gorman, and Paul Hovland. 2019. Automatic differentiation for adjoint stencil loops. In Proceedings of the 48th International Conference on Parallel Processing (ICPP ’19). Association for Computing Machinery, New York, NY, Article 83, 10 pages. https://doi.org/10.1145/3337821.3337906
[19]
Tim Kaler, Tao B. Schardl, Brian Xie, Charles E. Leiserson, Jie Chen, Aldo Pareja, and Georgios Kollias. 2021. PARAD: A work-efficient parallel algorithm for reverse-mode automatic differentiation. In Proceedings of the Symposium on Algorithmic Principles of Computer Systems (APOCS), Schapira Michael (Ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA, 144–158. https://doi.org/10.1137/1.9781611976489.11arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611976489.11
[20]
Benjamin Letschert, Kshitij Kulshreshtha, Andrea Walther, Duc Nguyen, Assefaw Gebremedhin, and Alex Pothen. 2012. Exploiting sparsity in automatic differentiation on multicore architectures. In Recent Advances in Algorithmic Differentiation, Shaun Forth, Paul Hovland, Eric Phipps, Jean Utke, and Andrea Walther (Eds.). Lecture Notes in Computational Science and Engineering, Vol. 87. Springer, Berlin, 151–161. https://doi.org/10.1007/978-3-642-30023-3_14
[21]
U. Naumann, L. Hascoët, C. Hill, P. Hovland, J. Riehme, and J. Utke. 2008. A framework for proving correctness of adjoint message-passing programs. In Proceedings of the 15th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer-Verlag, Berlin, 316–321. https://doi.org/10.1007/978-3-540-87475-1_44
[22]
Emre Özkaya, Anil Nemili, and Nicolas R. Gauger. 2012. Application of automatic differentiation to an incompressible URANS solver. In Recent Advances in Algorithmic Differentiation, Shaun Forth, Paul Hovland, Eric Phipps, Jean Utke, and Andrea Walther (Eds.). Lecture Notes in Computational Science and Engineering, Vol. 87. Springer, Berlin, 35–45. https://doi.org/10.1007/978-3-642-30023-3_4
[23]
Jarrett Revels, Tim Besard, Valentin Churavy, Bjorn De Sutter, and Juan Pablo Vielma. 2018. Dynamic Automatic Differentiation of GPU Broadcast Kernels. arxiv:cs.MS/1810.08297.
[24]
Kevin Stock, Martin Kong, Tobias Grosser, Louis-Noël Pouchet, Fabrice Rastello, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2014. A framework for enhancing data reuse via associative reordering. In ACM SIGPLAN Notices, Vol. 49. ACM, 65–76.
[25]
John A. Stratton, Christopher Rodrigues, I.-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127 (2012).
[26]
A. Taftaf, V. Pascual, and L. Hascoët. 2014. Adjoint of Fixed-Point iterations. In 11th World Congress on Computational Mechanics (WCCM XI), Vol. 5. 5024–5034.
[27]
Markus Towara and Uwe Naumann. 2018. SIMPLE adjoint message passing. Optimization Methods & Software 33, 4–6 (2018), 1232–1249. https://doi.org/10.1080/10556788.2018.1435653 arXiv:https://doi.org/10.1080/10556788.2018.1435653
[28]
Sudharshan S. Vazhkudai, Bronis R. de Supinski, Arthur S. Bland, Al Geist, James Sexton, Jim Kahle, Christopher J. Zimmer, Scott Atchley, Sarp Oral, Don E. Maxwell, et al. 2018. The design, deployment, and evaluation of the CORAL pre-exascale systems. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 661–672.
[29]
Andreas Wolf. 2011. Ein Softwarekonzept zur hierarchischen Parallelisierung von stochastischen und deterministischen Inversionsproblemen auf modernen ccNUMA-Plattformen unter Nutzung automatischer Programmtransformation. Ph.D. Dissertation. Aachen. https://publications.rwth-aachen.de/record/64281.

Cited By

View all
  • (2024)Knowledge transfer based many-objective approach for finding bugs in multi-path loopsComplex & Intelligent Systems10.1007/s40747-023-01323-w10:3(3235-3258)Online publication date: 24-Jan-2024
  • (2023)Event-Based Automatic Differentiation of OpenMP with OpDiLibACM Transactions on Mathematical Software10.1145/357015949:1(1-31)Online publication date: 21-Mar-2023
  • (2022)Scalable automatic differentiation of multiple parallel paradigms through compiler augmentationProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571964(1-18)Online publication date: 13-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 48, Issue 1
March 2022
320 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3505199
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 February 2022
Accepted: 01 June 2021
Revised: 01 June 2021
Received: 01 August 2020
Published in TOMS Volume 48, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automatic differentiation
  2. OpenMP
  3. shared-memory parallel
  4. multicore

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • U.S. Department of Energy, Office of Science

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)7
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Knowledge transfer based many-objective approach for finding bugs in multi-path loopsComplex & Intelligent Systems10.1007/s40747-023-01323-w10:3(3235-3258)Online publication date: 24-Jan-2024
  • (2023)Event-Based Automatic Differentiation of OpenMP with OpDiLibACM Transactions on Mathematical Software10.1145/357015949:1(1-31)Online publication date: 21-Mar-2023
  • (2022)Scalable automatic differentiation of multiple parallel paradigms through compiler augmentationProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571964(1-18)Online publication date: 13-Nov-2022
  • (2022)Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with SacadoACM Transactions on Mathematical Software10.1145/356026248:4(1-29)Online publication date: 19-Dec-2022
  • (2022)Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler AugmentationSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00065(1-18)Online publication date: Nov-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media