Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\lmcsdoi

18141 \lmcsheadingLABEL:LastPageJan. 19, 2021Mar. 22, 2022

Higher Order Automatic Differentiation of Higher Order Functions

Mathieu Huot\rsupera Sam Staton\rsupera  and  Matthijs Vákár\rsuperb University of Oxford Utrecht University
Abstract.

We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Throughout, we show how the analysis extends to AD methods for computing higher order derivatives using a Taylor approximation.

Key words and phrases:
automatic differentiation, software correctness, denotational semantics
The authors contributed equally to this work.

1. Introduction

Automatic differentiation (AD), loosely speaking, is the process of taking a program describing a function, and constructing the derivative of that function by applying the chain rule across the program code. As gradients play a central role in many aspects of machine learning, so too do automatic differentiation systems such as TensorFlow [AAB+16], PyTorch [PGC+17] or Stan [CHB+15].

Programs denotational semantics automatic differentiation Programs denotational semantics Differential geometry math differentiation Differential geometry
Figure 1. Overview of semantics/correctness of AD.

Differentiation has a well-developed mathematical theory in terms of differential geometry. The aim of this paper is to formalize this connection between differential geometry and the syntactic operations of AD, particularly for AD methods that calculate higher order derivatives. In this way we achieve two things: (1) a compositional, denotational understanding of differentiable programming and AD; (2) an explanation of the correctness of AD.

This intuitive correspondence (summarized in Fig. 1) is in fact rather complicated. In this paper, we focus on resolving the following problem: higher order functions play a key role in programming, and yet they have no counterpart in traditional differential geometry. Moreover, we resolve this problem while retaining the compositionality of denotational semantics.

1.0.1. Higher order functions and differentiation.

A major application of higher order functions is to support disciplined code reuse. Code reuse is particularly acute in machine learning. For example, a multi-layer neural network might be built of millions of near-identical neurons, as follows.

neuronn:(𝐫𝐞𝐚𝐥n(𝐫𝐞𝐚𝐥n𝐫𝐞𝐚𝐥))𝐫𝐞𝐚𝐥neuronn=defλx,w,b.ς(wx+b)layern:((τ1P)τ2)(τ1Pn)τ2nlayern=defλf.λx,p1,,pn.fx,p1,,fx,pncomp:(((τ1P)τ2)((τ2Q)τ3))(τ1(PQ))τ3comp=defλf,g.λx,(p,q).gfx,p,q 50500.51xς(x) missing-subexpression:subscriptneuron𝑛superscript𝐫𝐞𝐚𝐥𝑛superscript𝐫𝐞𝐚𝐥𝑛𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥missing-subexpressionformulae-sequencesuperscriptdefsubscriptneuron𝑛𝜆𝑥𝑤𝑏𝜍𝑤𝑥𝑏missing-subexpression:subscriptlayer𝑛subscript𝜏1𝑃subscript𝜏2subscript𝜏1superscript𝑃𝑛superscriptsubscript𝜏2𝑛missing-subexpressionformulae-sequencesuperscriptdefsubscriptlayer𝑛𝜆𝑓𝜆𝑥subscript𝑝1subscript𝑝𝑛𝑓𝑥subscript𝑝1𝑓𝑥subscript𝑝𝑛missing-subexpression:compsubscript𝜏1𝑃subscript𝜏2subscript𝜏2𝑄subscript𝜏3subscript𝜏1𝑃𝑄subscript𝜏3missing-subexpressionformulae-sequencesuperscriptdefcomp𝜆𝑓𝑔𝜆𝑥𝑝𝑞𝑔𝑓𝑥𝑝𝑞 50500.51𝑥𝜍𝑥 \begin{array}[]{ll}\begin{aligned} &\mathrm{neuron}_{n}:\boldsymbol{(}\mathbf{% real}^{n}\boldsymbol{\mathop{*}}\boldsymbol{(}\mathbf{real}^{n}\boldsymbol{% \mathop{*}}\mathbf{real}\boldsymbol{)}\boldsymbol{)}\to\mathbf{real}\\ &\mathrm{neuron}_{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\lambda\langle x% ,\langle w,b\rangle\rangle.\,\varsigma(w\cdot x+b)\\ &\mathrm{layer}_{n}:(\boldsymbol{(}{\tau}_{1}\boldsymbol{\mathop{*}}P% \boldsymbol{)}\to{\tau}_{2})\to\boldsymbol{(}{\tau}_{1}\boldsymbol{\mathop{*}}% P^{n}\boldsymbol{)}\to{\tau}_{2}^{n}\\ &\mathrm{layer}_{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\lambda f.\,% \lambda\langle x,\langle p_{1},\dots,p_{n}\rangle\rangle.\,\langle f\langle x,% p_{1}\rangle,\dots,f\langle x,p_{n}\rangle\rangle\\ &\mathrm{comp}:\boldsymbol{(}(\boldsymbol{(}{\tau}_{1}\boldsymbol{\mathop{*}}P% \boldsymbol{)}\to{\tau}_{2})\boldsymbol{\mathop{*}}(\boldsymbol{(}{\tau}_{2}% \boldsymbol{\mathop{*}}Q\boldsymbol{)}\to{\tau}_{3})\boldsymbol{)}\to% \boldsymbol{(}{\tau}_{1}\boldsymbol{\mathop{*}}\boldsymbol{(}P\boldsymbol{% \mathop{*}}Q\boldsymbol{)}\boldsymbol{)}\to{\tau}_{3}\\ &\mathrm{comp}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\lambda\langle f,g% \rangle.\,\lambda\langle x,(p,q)\rangle.\,g\langle f\langle x,p\rangle,q% \rangle\end{aligned}&\raisebox{-22.76219pt}[0.0pt]{ \leavevmode\hbox to106.35% pt{\vbox to89.59pt{\pgfpicture\makeatletter\hbox{\hskip 34.9209pt\lower-24.462% 46pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\pgfsys@beginscope\pgfsys@invoke{ }\hbox to0.0pt{% \hbox to0.0pt{\hbox to0.0pt{\hbox to0.0pt{\hbox to0.0pt{\hbox to0.0pt{\hbox to% 0.0pt{\hbox to0.0pt{\hbox to0.0pt{\hbox to0.0pt{\hbox to0.0pt{\hbox to0.0pt{% \hbox to0.0pt{\hbox to0.0pt{\hbox to0.0pt{ {}{}{} \hss} {}{}{} \hss} {}{}{} { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{}{0.0pt}\pgfsys@invoke{ }% \definecolor[named]{tikz@color}{rgb}{.5,.5,.5}\definecolor[named]{.}{rgb}{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@invoke{ }\pgfsys@color@gray@fill{.5}% \pgfsys@invoke{ }\pgfsys@setlinewidth{0.2pt}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\color[rgb]{0.75,0.75,0.75}% \definecolor[named]{pgfstrokecolor}{rgb}{0.75,0.75,0.75}% \pgfsys@color@gray@stroke{0.75}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0.75}% \pgfsys@invoke{ }\definecolor{pgffillcolor}{rgb}{0.75,0.75,0.75}{}% \pgfsys@moveto{0.0pt}{-0.00737pt}\pgfsys@lineto{71.13188pt}{-0.00737pt}% \pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{}{0.0pt}\pgfsys@invoke{ }% \definecolor[named]{tikz@color}{rgb}{.5,.5,.5}\definecolor[named]{.}{rgb}{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@invoke{ }\pgfsys@color@gray@fill{.5}% \pgfsys@invoke{ }\pgfsys@setlinewidth{0.2pt}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\color[rgb]{0.75,0.75,0.75}% \definecolor[named]{pgfstrokecolor}{rgb}{0.75,0.75,0.75}% \pgfsys@color@gray@stroke{0.75}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0.75}% \pgfsys@invoke{ }\definecolor{pgffillcolor}{rgb}{0.75,0.75,0.75}{}% \pgfsys@moveto{0.0pt}{30.77252pt}\pgfsys@lineto{71.13188pt}{30.77252pt}% \pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setdash{}{0.0pt}\pgfsys@invoke{ }% \definecolor[named]{tikz@color}{rgb}{.5,.5,.5}\definecolor[named]{.}{rgb}{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@invoke{ }\pgfsys@color@gray@fill{.5}% \pgfsys@invoke{ }\pgfsys@setlinewidth{0.2pt}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\color[rgb]{0.75,0.75,0.75}% \definecolor[named]{pgfstrokecolor}{rgb}{0.75,0.75,0.75}% \pgfsys@color@gray@stroke{0.75}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0.75}% \pgfsys@invoke{ }\definecolor{pgffillcolor}{rgb}{0.75,0.75,0.75}{}% \pgfsys@moveto{0.0pt}{61.55244pt}\pgfsys@lineto{71.13188pt}{61.55244pt}% \pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{15.9682pt}{-4.89998pt}\pgfsys@lineto{15.9682% pt}{-3.49998pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{35.92845pt}{-4.89998pt}\pgfsys@lineto{35.928% 45pt}{-3.49998pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{55.8887pt}{-4.89998pt}\pgfsys@lineto{55.8887% pt}{-3.49998pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{-3.5pt}{-0.00737pt}\pgfsys@lineto{-2.5pt}{-0% .00737pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{-3.5pt}{30.77252pt}\pgfsys@lineto{-2.5pt}{30% .77252pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{-3.5pt}{61.55244pt}\pgfsys@lineto{-2.5pt}{61% .55244pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }{}\pgfsys@moveto{0.% 0pt}{-3.49998pt}\pgfsys@lineto{71.13188pt}{-3.49998pt}\pgfsys@stroke% \pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\color[rgb]{% 0.75,0.75,0.75}\definecolor[named]{pgfstrokecolor}{rgb}{0.75,0.75,0.75}% \pgfsys@color@gray@stroke{0.75}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0.75}% \pgfsys@invoke{ }\definecolor{pgffillcolor}{rgb}{0.75,0.75,0.75}% \pgfsys@rectcap\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{7% 1.13188pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\color[rgb]{% 0.75,0.75,0.75}\definecolor[named]{pgfstrokecolor}{rgb}{0.75,0.75,0.75}% \pgfsys@color@gray@stroke{0.75}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0.75}% \pgfsys@invoke{ }\definecolor{pgffillcolor}{rgb}{0.75,0.75,0.75}% \pgfsys@rectcap\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{61.54305pt}% \pgfsys@lineto{71.13188pt}{61.54305pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }{}\pgfsys@moveto{-2% .5pt}{0.0pt}\pgfsys@lineto{-2.5pt}{61.54305pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\color[rgb]{% 0.75,0.75,0.75}\definecolor[named]{pgfstrokecolor}{rgb}{0.75,0.75,0.75}% \pgfsys@color@gray@stroke{0.75}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0.75}% \pgfsys@invoke{ }\definecolor{pgffillcolor}{rgb}{0.75,0.75,0.75}% \pgfsys@rectcap\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@lineto{0% .0pt}{61.54305pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\color[rgb]{% 0.75,0.75,0.75}\definecolor[named]{pgfstrokecolor}{rgb}{0.75,0.75,0.75}% \pgfsys@color@gray@stroke{0.75}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0.75}% \pgfsys@invoke{ }\definecolor{pgffillcolor}{rgb}{0.75,0.75,0.75}% \pgfsys@rectcap\pgfsys@invoke{ }{}\pgfsys@moveto{71.13188pt}{0.0pt}% \pgfsys@lineto{71.13188pt}{61.54305pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \hss}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}} {} {}{}{}{}{}{}{}{}{}{} {} {} {}{} {} {} {}{}{}{} {} {} {} {}{}{}{} {}{}{}{} {} {} {} {}{}{}{} {} {} {} {}{} {}{} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} { {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} } { } { } {} {} {} { {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} { } {} } \pgfsys@beginscope\pgfsys@invoke{ } \pgfsys@beginscope\pgfsys@invoke{ }{} \pgfsys@setlinewidth{0.6pt}\pgfsys@invoke{ }\definecolor[named]{tikz@color}{% rgb}{0,0,0}\definecolor[named]{.}{rgb}{0,0,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ } {}{}{}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}}{}{} {}{{}{}{}} {}{}{}{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@curveto{0.0pt}{0.0pt}{0.52441pt}{0% .00133pt}{0.72583pt}{0.00185pt}\pgfsys@curveto{0.92725pt}{0.00235pt}{1.25024pt% }{0.00319pt}{1.45166pt}{0.0037pt}\pgfsys@curveto{1.65308pt}{0.0042pt}{1.97609% pt}{0.00485pt}{2.1775pt}{0.00552pt}\pgfsys@curveto{2.37892pt}{0.0062pt}{2.7019% 2pt}{0.00775pt}{2.90334pt}{0.0086pt}\pgfsys@curveto{3.10475pt}{0.00946pt}{3.42% 776pt}{0.01076pt}{3.62917pt}{0.01169pt}\pgfsys@curveto{3.83058pt}{0.01262pt}{4% .15358pt}{0.01428pt}{4.355pt}{0.01538pt}\pgfsys@curveto{4.55641pt}{0.01648pt}{% 4.87943pt}{0.01833pt}{5.08084pt}{0.01968pt}\pgfsys@curveto{5.28226pt}{0.02104% pt}{5.60526pt}{0.02362pt}{5.80667pt}{0.02524pt}\pgfsys@curveto{6.00809pt}{0.02% 684pt}{6.33109pt}{0.02943pt}{6.5325pt}{0.03139pt}\pgfsys@curveto{6.73392pt}{0.% 03334pt}{7.05693pt}{0.037pt}{7.25835pt}{0.03938pt}\pgfsys@curveto{7.45976pt}{0% .04176pt}{7.78276pt}{0.0458pt}{7.98418pt}{0.04863pt}\pgfsys@curveto{8.1856pt}{% 0.05144pt}{8.50859pt}{0.05629pt}{8.71pt}{0.0597pt}\pgfsys@curveto{8.91142pt}{0% .06311pt}{9.23444pt}{0.06915pt}{9.43585pt}{0.07324pt}\pgfsys@curveto{9.63727pt% }{0.07733pt}{9.96027pt}{0.0844pt}{10.16168pt}{0.08925pt}\pgfsys@curveto{10.363% 1pt}{0.09412pt}{10.6861pt}{0.10246pt}{10.88751pt}{0.10834pt}\pgfsys@curveto{11% .08893pt}{0.11423pt}{11.41193pt}{0.12465pt}{11.61334pt}{0.13173pt}% \pgfsys@curveto{11.81476pt}{0.13881pt}{12.13777pt}{0.15099pt}{12.33919pt}{0.15% 942pt}\pgfsys@curveto{12.5406pt}{0.16788pt}{12.8636pt}{0.1826pt}{13.06502pt}{0% .19267pt}\pgfsys@curveto{13.26643pt}{0.20274pt}{13.58943pt}{0.22003pt}{13.7908% 5pt}{0.23207pt}\pgfsys@curveto{13.99226pt}{0.24411pt}{14.31528pt}{0.26495pt}{1% 4.5167pt}{0.27948pt}\pgfsys@curveto{14.71811pt}{0.29399pt}{15.0411pt}{0.3194pt% }{15.24252pt}{0.33673pt}\pgfsys@curveto{15.44394pt}{0.35406pt}{15.76694pt}{0.3% 8368pt}{15.96835pt}{0.40445pt}\pgfsys@curveto{16.16977pt}{0.4252pt}{16.49278pt% }{0.46147pt}{16.6942pt}{0.48631pt}\pgfsys@curveto{16.89561pt}{0.51115pt}{17.21% 861pt}{0.55394pt}{17.42003pt}{0.58357pt}\pgfsys@curveto{17.62144pt}{0.6132pt}{% 17.94444pt}{0.66449pt}{18.14586pt}{0.69992pt}\pgfsys@curveto{18.34727pt}{0.735% 37pt}{18.67029pt}{0.79669pt}{18.87169pt}{0.83905pt}\pgfsys@curveto{19.0731pt}{% 0.88141pt}{19.3961pt}{0.9547pt}{19.59752pt}{1.00526pt}\pgfsys@curveto{19.79893% pt}{1.05583pt}{20.12195pt}{1.14336pt}{20.32336pt}{1.20349pt}\pgfsys@curveto{20% .52478pt}{1.26361pt}{20.84778pt}{1.36708pt}{21.0492pt}{1.43864pt}% \pgfsys@curveto{21.25061pt}{1.51022pt}{21.57361pt}{1.6343pt}{21.77502pt}{1.719% 36pt}\pgfsys@curveto{21.97644pt}{1.80443pt}{22.29945pt}{1.95091pt}{22.50087pt}% {2.05177pt}\pgfsys@curveto{22.70229pt}{2.15263pt}{23.02528pt}{2.32697pt}{23.22% 67pt}{2.44637pt}\pgfsys@curveto{23.42812pt}{2.56578pt}{23.75111pt}{2.77155pt}{% 23.95253pt}{2.91238pt}\pgfsys@curveto{24.15395pt}{3.05322pt}{24.47696pt}{3.295% 72pt}{24.67838pt}{3.46149pt}\pgfsys@curveto{24.87979pt}{3.62727pt}{25.20279pt}% {3.91287pt}{25.4042pt}{4.10727pt}\pgfsys@curveto{25.60562pt}{4.30165pt}{25.928% 62pt}{4.63567pt}{26.13004pt}{4.86261pt}\pgfsys@curveto{26.33145pt}{5.08955pt}{% 26.65446pt}{5.47916pt}{26.85588pt}{5.7429pt}\pgfsys@curveto{27.0573pt}{6.00665% pt}{27.3803pt}{6.45882pt}{27.58171pt}{6.76357pt}\pgfsys@curveto{27.78313pt}{7.% 06831pt}{28.10612pt}{7.58952pt}{28.30754pt}{7.93936pt}\pgfsys@curveto{28.50896% pt}{8.28922pt}{28.83195pt}{8.88652pt}{29.03337pt}{9.28505pt}\pgfsys@curveto{29% .23479pt}{9.6836pt}{29.5578pt}{10.36154pt}{29.75922pt}{10.81175pt}% \pgfsys@curveto{29.96063pt}{11.26196pt}{30.28363pt}{12.02603pt}{30.48505pt}{12% .52988pt}\pgfsys@curveto{30.68646pt}{13.03372pt}{31.00946pt}{13.88542pt}{31.21% 088pt}{14.44316pt}\pgfsys@curveto{31.4123pt}{15.00092pt}{31.7353pt}{15.93964pt% }{31.93672pt}{16.54973pt}\pgfsys@curveto{32.13814pt}{17.15984pt}{32.46114pt}{1% 8.1816pt}{32.66255pt}{18.84038pt}\pgfsys@curveto{32.86397pt}{19.49918pt}{33.18% 697pt}{20.59636pt}{33.38838pt}{21.29785pt}\pgfsys@curveto{33.5898pt}{21.99934% pt}{33.91281pt}{23.16028pt}{34.11423pt}{23.89629pt}\pgfsys@curveto{34.31564pt}% {24.6323pt}{34.63864pt}{25.84221pt}{34.84006pt}{26.60246pt}\pgfsys@curveto{35.% 04146pt}{27.36272pt}{35.36447pt}{28.603pt}{35.56589pt}{29.37573pt}% \pgfsys@curveto{35.7673pt}{30.14845pt}{36.0903pt}{31.39896pt}{36.29172pt}{32.1% 7177pt}\pgfsys@curveto{36.49313pt}{32.94458pt}{36.81615pt}{34.18541pt}{37.0175% 6pt}{34.94566pt}\pgfsys@curveto{37.21898pt}{35.70592pt}{37.54198pt}{36.9153pt}% {37.7434pt}{37.65121pt}\pgfsys@curveto{37.94481pt}{38.38715pt}{38.2678pt}{39.5% 4817pt}{38.46922pt}{40.24966pt}\pgfsys@curveto{38.67064pt}{40.95114pt}{38.9936% 5pt}{42.04834pt}{39.19507pt}{42.70714pt}\pgfsys@curveto{39.39648pt}{43.36592pt% }{39.71948pt}{44.38776pt}{39.9209pt}{44.99777pt}\pgfsys@curveto{40.12231pt}{45% .60779pt}{40.44531pt}{46.54607pt}{40.64673pt}{47.10373pt}\pgfsys@curveto{40.84% 814pt}{47.66139pt}{41.17114pt}{48.51324pt}{41.37256pt}{49.01701pt}% \pgfsys@curveto{41.57397pt}{49.52077pt}{41.89699pt}{50.28432pt}{42.0984pt}{50.% 73453pt}\pgfsys@curveto{42.29982pt}{51.18474pt}{42.62282pt}{51.8633pt}{42.8242% 3pt}{52.26183pt}\pgfsys@curveto{43.02565pt}{52.66037pt}{43.34866pt}{53.25714pt% }{43.55006pt}{53.6069pt}\pgfsys@curveto{43.75148pt}{53.95667pt}{44.07448pt}{54% .47803pt}{44.2759pt}{54.7827pt}\pgfsys@curveto{44.47731pt}{55.08737pt}{44.8003% 2pt}{55.53908pt}{45.00174pt}{55.80275pt}\pgfsys@curveto{45.20316pt}{56.0664pt}% {45.52615pt}{56.45612pt}{45.72757pt}{56.68306pt}\pgfsys@curveto{45.92899pt}{56% .91pt}{46.25198pt}{57.244pt}{46.4534pt}{57.4384pt}\pgfsys@curveto{46.65482pt}{% 57.6328pt}{46.97783pt}{57.91838pt}{47.17924pt}{58.08415pt}\pgfsys@curveto{47.3% 8066pt}{58.24994pt}{47.70366pt}{58.49243pt}{47.90508pt}{58.63327pt}% \pgfsys@curveto{48.10649pt}{58.77411pt}{48.42949pt}{58.97997pt}{48.6309pt}{59.% 09927pt}\pgfsys@curveto{48.83232pt}{59.2186pt}{49.15533pt}{59.3924pt}{49.35675% pt}{59.49326pt}\pgfsys@curveto{49.55817pt}{59.59413pt}{49.88116pt}{59.74124pt}% {50.08258pt}{59.82631pt}\pgfsys@curveto{50.284pt}{59.91138pt}{50.60701pt}{60.0% 3482pt}{50.80841pt}{60.1064pt}\pgfsys@curveto{51.00983pt}{60.17798pt}{51.33282% pt}{60.28204pt}{51.53424pt}{60.34218pt}\pgfsys@curveto{51.73566pt}{60.4023pt}{% 52.05867pt}{60.48929pt}{52.26009pt}{60.53978pt}\pgfsys@curveto{52.4615pt}{60.5% 9026pt}{52.7845pt}{60.66364pt}{52.98592pt}{60.706pt}\pgfsys@curveto{53.18733pt% }{60.74835pt}{53.51033pt}{60.80968pt}{53.71175pt}{60.84511pt}\pgfsys@curveto{5% 3.91316pt}{60.88055pt}{54.23618pt}{60.93184pt}{54.43759pt}{60.96146pt}% \pgfsys@curveto{54.639pt}{60.99109pt}{54.962pt}{61.03387pt}{55.16342pt}{61.058% 73pt}\pgfsys@curveto{55.36484pt}{61.08359pt}{55.68784pt}{61.11986pt}{55.88925% pt}{61.14061pt}\pgfsys@curveto{56.09067pt}{61.16136pt}{56.41368pt}{61.191pt}{5% 6.6151pt}{61.20833pt}\pgfsys@curveto{56.81651pt}{61.22566pt}{57.13951pt}{61.25% 105pt}{57.34093pt}{61.26556pt}\pgfsys@curveto{57.54234pt}{61.28008pt}{57.86534% pt}{61.30084pt}{58.06676pt}{61.31297pt}\pgfsys@curveto{58.26817pt}{61.32509pt}% {58.59117pt}{61.34291pt}{58.79259pt}{61.35298pt}\pgfsys@curveto{58.994pt}{61.3% 6305pt}{59.31702pt}{61.37724pt}{59.51843pt}{61.3856pt}\pgfsys@curveto{59.71985% pt}{61.39397pt}{60.04285pt}{61.40623pt}{60.24426pt}{61.41331pt}\pgfsys@curveto% {60.44568pt}{61.4204pt}{60.76868pt}{61.43082pt}{60.9701pt}{61.43669pt}% \pgfsys@curveto{61.17151pt}{61.44258pt}{61.49452pt}{61.45093pt}{61.69594pt}{61% .45578pt}\pgfsys@curveto{61.89735pt}{61.46065pt}{62.22035pt}{61.46771pt}{62.42% 177pt}{61.4718pt}\pgfsys@curveto{62.62318pt}{61.47589pt}{62.94618pt}{61.48193% pt}{63.1476pt}{61.48534pt}\pgfsys@curveto{63.34901pt}{61.48875pt}{63.67203pt}{% 61.4936pt}{63.87344pt}{61.49641pt}\pgfsys@curveto{64.07486pt}{61.49922pt}{64.3% 9786pt}{61.50328pt}{64.59927pt}{61.50566pt}\pgfsys@curveto{64.80069pt}{61.5080% 4pt}{65.12369pt}{61.5117pt}{65.3251pt}{61.51366pt}\pgfsys@curveto{65.52652pt}{% 61.51561pt}{65.84953pt}{61.51819pt}{66.05093pt}{61.5198pt}\pgfsys@curveto{66.2% 5235pt}{61.52142pt}{66.57535pt}{61.52399pt}{66.77676pt}{61.52534pt}% \pgfsys@curveto{66.97818pt}{61.52672pt}{67.3012pt}{61.52856pt}{67.50261pt}{61.% 52966pt}\pgfsys@curveto{67.70403pt}{61.53076pt}{68.02702pt}{61.53242pt}{68.228% 44pt}{61.53336pt}\pgfsys@curveto{68.42986pt}{61.53429pt}{68.75285pt}{61.53557% pt}{68.95427pt}{61.53642pt}\pgfsys@curveto{69.15569pt}{61.53728pt}{69.4787pt}{% 61.53883pt}{69.68011pt}{61.53952pt}\pgfsys@curveto{69.88153pt}{61.54019pt}{70.% 20453pt}{61.54085pt}{70.40594pt}{61.54137pt}\pgfsys@curveto{70.60736pt}{61.541% 87pt}{71.13177pt}{61.54321pt}{71.13177pt}{61.54321pt}\pgfsys@stroke% \pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@beginscope\pgfsys@invoke{ }{} \pgfsys@setlinewidth{0.6pt}\pgfsys@invoke{ }\definecolor[named]{tikz@color}{% rgb}{0,0,0}\definecolor[named]{.}{rgb}{0,0,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ } \pgfsys@setdash{}{0.0pt}\pgfsys@invoke{ }\definecolor[named]{tikz@color}{rgb}{% 0,0,0}\definecolor[named]{.}{rgb}{0,0,0}\definecolor[named]{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ } {}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope { }{ {}{} } {\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} {\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} {\pgfsys@beginscope\pgfsys@invoke{ }{}{}\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} { {}{} }{ {}{} } {\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} {\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} {\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} {\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} {\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } {{}{{}}} {{}{{}}} \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ % } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ } {}{{}}{} {}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{15.9682pt}{-4.89998pt}\pgfsys@stroke% \pgfsys@invoke{ }\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}}{}{}{}{}{} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{11.52377pt}{-12.72153pt}\pgfsys@invoke{ }\hbox{{\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{\footnotesize{$-5$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope {{ {{ }}{{{}}} {{ }}{{{}}} {}}} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } {{}{{}}} {{}{{}}} \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ % } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ } {}{{}}{} {}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{35.92845pt}{-4.89998pt}\pgfsys@stroke% \pgfsys@invoke{ }\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}}{}{}{}{}{} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{33.92845pt}{-12.72153pt}\pgfsys@invoke{ }\hbox{{\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{\footnotesize{$0$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope {{ {{ }}{{{}}} {{ }}{{{}}} {}}} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } {{}{{}}} {{}{{}}} \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ % } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ } {}{{}}{} {}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{55.8887pt}{-4.89998pt}\pgfsys@stroke% \pgfsys@invoke{ }\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}}{}{}{}{}{} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{53.8887pt}{-12.72153pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{\footnotesize{$5$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} {\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{} {{ {{ }}{{{}}} {{ }}{{{}}} {}}} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } {{}{{}}} {{}{{}}} \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ % } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ } {}{{}}{} {}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{-3.5pt}{-0.00737pt}\pgfsys@stroke% \pgfsys@invoke{ }\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{{}{}}}% {{}{}} {{}{{}}}{{}{}}{}{{}{}}{}{}{}{}{} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-10.166pt}{-2.58514pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{\footnotesize{$0$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope {{ {{ }}{{{}}} {{ }}{{{}}} {}}} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } {{}{{}}} {{}{{}}} \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ % } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ } {}{{}}{} {}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{-3.5pt}{30.77252pt}\pgfsys@stroke% \pgfsys@invoke{ }\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{{}{}}}% {{}{}} {{}{{}}}{{}{}}{}{{}{}}{}{}{}{}{} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-18.05489pt}{28.19475pt}\pgfsys@invoke{ }\hbox{{\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{\footnotesize{$0.5$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope {{ {{ }}{{{}}} {{ }}{{{}}} {}}} {{ { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } { { { {{ }}{{{}}} {{ }}{{{}}} {}}{}} } {{ { {{ }}{{{}}} {{ }}{{{}}} {}}{ {{}{}} }{}}} } } { } {{}{{}}} {{}{{}}} \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ % } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ } {}{{}}{} {}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@roundcap% \pgfsys@invoke{ }{}\pgfsys@moveto{-3.5pt}{61.55244pt}\pgfsys@stroke% \pgfsys@invoke{ }\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{{}{}}}% {{}{}} {{}{{}}}{{}{}}{}{{}{}}{}{}{}{}{} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-10.166pt}{58.97467pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{\footnotesize{$1$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope} \pgfsys@beginscope\pgfsys@invoke{ }{} \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ }{} {}\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ }{} {}\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ }{} \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ }{} {}\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ }{} {}\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@beginscope\pgfsys@invoke{ }{} {} {{{ {{ }}{{{}}} {}}}}{{}{ {}{}{}{}{}}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{33.19095pt}{-21.12946pt}\pgfsys@invoke{ }\hbox{{\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{\small{$\mathit{x}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\pgfsys@beginscope% \pgfsys@invoke{ }{} {} {{{ {{ }}{{{}}} {}}}}{{ {}{}{}{}{}}{}{}}{{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{% { {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}}{}{}{}{}{} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{0.0}{1.0}{-1.0}{% 0.0}{-24.83789pt}{23.06683pt}\pgfsys@invoke{ }\hbox{{\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{\small{$\varsigma(x)$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope {}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}} }\end{array}start_ARRAY start_ROW start_CELL start_ROW start_CELL end_CELL start_CELL roman_neuron start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_( bold_real start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_∗ bold_( bold_real start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_∗ bold_real bold_) bold_) → bold_real end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL roman_neuron start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_λ ⟨ italic_x , ⟨ italic_w , italic_b ⟩ ⟩ . italic_ς ( italic_w ⋅ italic_x + italic_b ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL roman_layer start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : ( bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ italic_P bold_) → italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) → bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ italic_P start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_) → italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL roman_layer start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_λ italic_f . italic_λ ⟨ italic_x , ⟨ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ ⟩ . ⟨ italic_f ⟨ italic_x , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , … , italic_f ⟨ italic_x , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ ⟩ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL roman_comp : bold_( ( bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ italic_P bold_) → italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) bold_∗ ( bold_( italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_∗ italic_Q bold_) → italic_τ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) bold_) → bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ bold_( italic_P bold_∗ italic_Q bold_) bold_) → italic_τ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL roman_comp start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_λ ⟨ italic_f , italic_g ⟩ . italic_λ ⟨ italic_x , ( italic_p , italic_q ) ⟩ . italic_g ⟨ italic_f ⟨ italic_x , italic_p ⟩ , italic_q ⟩ end_CELL end_ROW end_CELL start_CELL - 5 0 5 0 0.5 1 italic_x italic_ς ( italic_x ) end_CELL end_ROW end_ARRAY

(Here ς(x)=def11+exsuperscriptdef𝜍𝑥11superscript𝑒𝑥\varsigma(x)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{1}{1+e^{-x}}italic_ς ( italic_x ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP divide start_ARG 1 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT end_ARG is the sigmoid function, as illustrated.) We can use these functions to build a network as follows (see also Fig. 2):

complayerm(neuronk),complayern(neuronm),neuronn:(𝐫𝐞𝐚𝐥kP)𝐫𝐞𝐚𝐥:compsubscriptlayer𝑚subscriptneuron𝑘compsubscriptlayer𝑛subscriptneuron𝑚subscriptneuron𝑛superscript𝐫𝐞𝐚𝐥𝑘𝑃𝐫𝐞𝐚𝐥\mathrm{comp}\langle\mathrm{layer}_{m}(\mathrm{neuron}_{k}),\mathrm{comp}% \langle\mathrm{layer}_{n}(\mathrm{neuron}_{m}),\mathrm{neuron}_{n}\rangle% \rangle:\boldsymbol{(}\mathbf{real}^{k}\boldsymbol{\mathop{*}}P\boldsymbol{)}% \to\mathbf{real}roman_comp ⟨ roman_layer start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( roman_neuron start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , roman_comp ⟨ roman_layer start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( roman_neuron start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , roman_neuron start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ ⟩ : bold_( bold_real start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_∗ italic_P bold_) → bold_real (1)
\cdots\cdots\cdots111122223333k𝑘kitalic_k11112222m𝑚mitalic_m11112222n𝑛nitalic_n
Figure 2. The network in (1) with k𝑘kitalic_k inputs and two hidden layers.

Here P𝐫𝐞𝐚𝐥p𝑃superscript𝐫𝐞𝐚𝐥𝑝P\cong\mathbf{real}^{p}italic_P ≅ bold_real start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT with p=(m(k+1)+n(m+1)+n+1)𝑝𝑚𝑘1𝑛𝑚1𝑛1p=(m(k{+}1){+}n(m{+}1){+}n{+}1)italic_p = ( italic_m ( italic_k + 1 ) + italic_n ( italic_m + 1 ) + italic_n + 1 ). This program (1) describes a smooth (infinitely differentiable) function. The goal of automatic differentiation is to find its derivative.

If we β𝛽\betaitalic_β-reduce all the λ𝜆\lambdaitalic_λ’s, we end up with a very long function expression just built from the sigmoid function and linear algebra. We can then find a program for calculating its derivative by applying the chain rule. However, automatic differentiation can also be expressed without first β𝛽\betaitalic_β-reducing, in a compositional way, by explaining how higher order functions like (layer)layer(\mathrm{layer})( roman_layer ) and (comp)comp(\mathrm{comp})( roman_comp ) propagate derivatives. This paper is a semantic analysis of this compositional approach.

The general idea of denotational semantics is to interpret types as spaces and programs as functions between the spaces. In this paper, we propose to use diffeological spaces and smooth functions [Sou80, IZ13] to this end. These satisfy the following three desiderata:

  • \mathbb{R}blackboard_R is a space, and the smooth functions \mathbb{R}\to\mathbb{R}blackboard_R → blackboard_R are exactly the functions that are infinitely differentiable;

  • The set of smooth functions XY𝑋𝑌X\to Yitalic_X → italic_Y between spaces again forms a space, so we can interpret function types.

  • The disjoint union of a sequence of spaces again forms a space, and this enables us to interpret variant types and inductive types, e.g. lists of reals form the space i=0isuperscriptsubscriptsymmetric-difference𝑖0superscript𝑖\biguplus_{i=0}^{\infty}\mathbb{R}^{i}⨄ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

We emphasise that the most standard formulation of differential geometry, using manifolds, does not support spaces of functions. Diffeological spaces seem to us the simplest notion of space that satisfies these conditions, but there are other candidates [BH11, Sta11]. A diffeological space is in particular a set X𝑋Xitalic_X equipped with a chosen set of curves CXXsubscript𝐶𝑋superscript𝑋C_{X}\subseteq X^{\mathbb{R}}italic_C start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ⊆ italic_X start_POSTSUPERSCRIPT blackboard_R end_POSTSUPERSCRIPT and a smooth map f:XY:𝑓𝑋𝑌f:X\to Yitalic_f : italic_X → italic_Y must be such that if γCX𝛾subscript𝐶𝑋\gamma\in C_{X}italic_γ ∈ italic_C start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT then γ;fCY𝛾𝑓subscript𝐶𝑌\gamma;f\in C_{Y}italic_γ ; italic_f ∈ italic_C start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. This is reminiscent of the method of logical relations.

1.0.2. From smoothness to automatic derivatives at higher types.

Our denotational semantics in diffeological spaces guarantees that all definable functions are smooth. But we need more than just to know that a definable function happens to have a mathematical derivative: we need to be able to find that derivative.

In this paper we focus on forward mode automatic differentiation methods for computing higher derivatives, which are macro translations on syntax (called  𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG in Section 3). We are able to show that they are correct, using our denotational semantics.

Here there is one subtle point that is central to our development. Although differential geometry provides established derivatives for first order functions (such as neuronneuron\mathrm{neuron}roman_neuron above), there is no canonical notion of derivative for higher order functions (such as layerlayer\mathrm{layer}roman_layer and compcomp\mathrm{comp}roman_comp) in the theory of diffeological spaces (e.g. [CW14]). We propose a new way to resolve this, by interpreting types as triples (X,X,S)𝑋superscript𝑋𝑆(X,X^{\prime},S)( italic_X , italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_S ) where, intuitively, X𝑋Xitalic_X is a space of inhabitants of the type, Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a space serving as a chosen bundle of tangents (or jets, in the case of higher order derivatives) over X𝑋Xitalic_X, and SX×X𝑆superscript𝑋superscript𝑋S\subseteq X^{\mathbb{R}}\times X^{\prime\mathbb{R}}italic_S ⊆ italic_X start_POSTSUPERSCRIPT blackboard_R end_POSTSUPERSCRIPT × italic_X start_POSTSUPERSCRIPT ′ blackboard_R end_POSTSUPERSCRIPT is a binary relation between curves, informally relating curves in X𝑋Xitalic_X with their tangent (resp. jet) curves in Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. This new model gives a denotational semantics for higher order automatic differentiation on a language with higher order functions.

In Section 4 we boil this new approach down to a straightforward and elementary logical relations argument for the correctness of higher order automatic differentiation. The approach is explained in detail in Section 6. We explore some subtleties of non-uniqueness of derivatives of higher order functions in Section 7.

1.0.3. Related work and context.

AD has a long history and has many implementations. AD was perhaps first phrased in a functional setting in [PS08], and there are now a number of teams working on AD in the functional setting (e.g. [WWE+19, SFVPJ19, Ell18]), some providing efficient implementations. Although that work does not involve formal semantics, it is inspired by intuitions from differential geometry and category theory.

This paper adds to a very recent body of work on verified automatic differentiation. In the first order setting, there are recent accounts based on denotational semantics in manifolds [FST19, LYRY20] and based on synthetic differential geometry [CGM19], work making a categorical abstraction [CCG+20] and work connecting operational semantics with denotational semantics [AP20, Plo18], as well as work focussing on how to correctly differentiate programs that operate on tensors [BML+20] and programs that make use of quantum computing [ZHCW20]. Recently there has also been significant progress at higher types. The work of Brunel et al. [BMP20] and Mazza and Pagani [MP21] give formal correctness proofs for reverse-mode derivatives on a linear λ𝜆\lambdaitalic_λ-calculus with a particular operational semantics. The work of Barthe et al. [BCLG20] provides a general discussion of some new syntactic logical relations arguments including one very similar to our syntactic proof of Theorem 3. Sherman et al. [SMC20] discuss a differential programming technique that works at higher types, based on exact real arithmetic and relate it to a computable semantics. We understand that the authors of [CGM19] are working on higher types. Vákár [Vák21, VS21, LNV21] phrase and prove correct a reverse mode AD technique on a higher order language based on a similar gluing technique. Vákár [Vák20] extends a standard λ𝜆\lambdaitalic_λ-calculus with type recursion, and proves correct a forward-mode AD on such a higher-order language, also using a gluing argument.

The differential λ𝜆\lambdaitalic_λ-calculus [ER03] is related to AD, and explicit connections are made in [MO20, Man12]. One difference is that the differential λ𝜆\lambdaitalic_λ-calculus allows the addition of terms at all types, and hence vector space models are suitable to interpret all types. This choice would appear peculiar with the variant and inductive types that we consider here, as the dimension of a disjoint union of spaces is only defined locally.

This paper builds on our previous work [HSV20a, Vák20] in which we gave denotational correctness proofs for forward mode AD algorithms for computing first derivatives. Here, we explain how these techniques extend to methods that calculate higher derivatives.

The Faà di Bruno construction has also been investigated [CS11] in the context of Cartesian differential categories.

The idea of directly calculating higher order derivatives by using automatic differentiation methods that work with Taylor approximations (also known as jets in differential geometry) is well-known [GUW00] and it has recently gained renewed interest [Bet18, BJD19]. So far, such “Taylor-mode AD” methods have only been applied to first order functional languages, however. This paper shows how to extend these higher order AD methods to languages with support for higher order functions and algebraic data types.

The two main methods for implementing AD are operator overloading and, the method used in this paper, source code transformation [VMBBL18]. Taylor-mode AD has been seen to be significantly faster than iterated AD in the context of operator overloading [BJD19] in Jax [FJL18]. There are other notable implementations of forward Taylor-mode [BS96, BS97, Kar01, PS07, WGP16]. Some of them are implemented in a functional language [Kar01, PS07]. Taylor-mode implementations use the rich algebraic structure of derivatives to avoid a lot of redundant computations occurring via iterated first order methods and share of a lot of redundant computations. Perhaps the simplest example to see this is with the sin function, whose iterated derivatives only involve sin, cos, and negation. Importantly, most AD tools have the right complexity up to a constant factor, but this constant is quite important in practice and Taylor-mode helps achieve better performance. Another stunning result of a version of Taylor-mode was achieved in [LMG18], where a gain of performance of up to two orders of magnitude was achieved for computing certain Hessian-vector products using Ricci calculus. In essence, the algorithm used is mixed-mode that is derived via jets in [Bet18]. This is further improved in [LMG20]. Tayor-mode can also be useful for ODE solvers and hence will be important for neural differential equations [CRBD18].

Finally, we emphasise that we have chosen the neural network (1) as our running example mainly for its simplicity. Indeed one would typically use reverse-mode AD to train neural networks in practice. There are many other examples of AD outside the neural networks literature: AD is useful whenever derivatives need to be calculated on high dimensional spaces. This includes optimization problems more generally, where the derivative is passed to a gradient descent method (e.g. [RM51, KW+52, Qia99, KB14, DHS11, LN89]). Optimization problems involving higher order functions naturally show up in the calculus of variations and its applications in physics, where one typically looks for a function minimizing a certain integral [GSS00]. Other applications of AD are in advanced integration methods, since derivatives play a role in Hamiltonian Monte Carlo [Nea11, HG14] and variational inference [KTR+17]. Second order methods for gradient-descent have also been extensively studied. As the basic second order Newton method requires inverting a high dimentional hessian matrix, several alternatives and approximations have been studied. Some of them still require Taylor-like modes of differentiation and require a matrix-vector product where the matrix resembles the hessian or inverse hessian [KK04, Mar10, Ama12].

1.0.4. Summary of contributions.

We have provided a semantic analysis of higher order automatic differentiation. Our syntactic starting point are higher order forward-mode AD macros on a typed higher order language that extend their well-known first order equivalent (e.g. [SFVPJ19, WWE+19, HSV20a]). We present these in Section 3 for function types, and in Section 5 we extend them to inductive types and variants. The main contributions of this paper are as follows.

  • We give a denotational semantics for the language in diffeological spaces, showing that every definable expression is smooth (Section 4).

  • We show correctness of the higher order AD macros by a logical relations argument (Th. 3).

  • We give a categorical analysis of this correctness argument with two parts: a universal property satisfied by the macro in terms of syntactic categories, and a new notion of glued space that abstracts the logical relation (Section 6).

  • We then use this analysis to state and prove a correctness argument at all first order types (Th. 8).

Relation to previous work

This paper extends and develops the paper [HSV20a] presented at the 23rd International Conference on Foundations of Software Science and Computation Structure (FoSSaCS 2020). This version includes numerous elaborations, notably the extension of the definition, semantics and correctness of automatic differentiation methods for computing higher order derivatives (introduced in Section  2.2-2.4) and a novel discussion about derivatives of higher-order functions (Section  7).

2. Rudiments of differentiation: how to calculate with dual numbers and Taylor approximations

2.1. First order differentiation: the chain rule and dual numbers.

We will now recall the definition of gradient of a differentiable function, the goal of AD and, and what it means for AD to be correct. Recall that the derivative of a function f::𝑓f:\mathbb{R}\to\mathbb{R}italic_f : blackboard_R → blackboard_R, if it exists, is a function f::𝑓\nabla f:\mathbb{R}\to\mathbb{R}∇ italic_f : blackboard_R → blackboard_R such that for all a𝑎aitalic_a, f(a)𝑓𝑎\nabla f(a)∇ italic_f ( italic_a ) is the gradient of f𝑓fitalic_f at a𝑎aitalic_a in the sense that the function xf(a)+f(a)(xa)maps-to𝑥𝑓𝑎𝑓𝑎𝑥𝑎x\mapsto f(a)+\nabla f(a)\cdot(x-a)italic_x ↦ italic_f ( italic_a ) + ∇ italic_f ( italic_a ) ⋅ ( italic_x - italic_a ) gives the best linear approximation of f𝑓fitalic_f at a𝑎aitalic_a. (The gradient f(a)𝑓𝑎\nabla f(a)∇ italic_f ( italic_a ) is often written df(x)dx(a)d𝑓𝑥d𝑥𝑎\frac{\mathop{}\!\mathrm{d}f(x)}{\mathop{}\!\mathrm{d}x}(a)divide start_ARG roman_d italic_f ( italic_x ) end_ARG start_ARG roman_d italic_x end_ARG ( italic_a ).)

The chain rule for differentiation tells us that we can calculate (f;g)(a)=f(a)g(f(a))𝑓𝑔𝑎𝑓𝑎𝑔𝑓𝑎\nabla(f;g)(a)=\nabla f(a)\cdot\nabla g(f(a))∇ ( italic_f ; italic_g ) ( italic_a ) = ∇ italic_f ( italic_a ) ⋅ ∇ italic_g ( italic_f ( italic_a ) ). In that sense, the chain rule tells us how linear approximations to a function transform under post-composition with another function.

To find f𝑓\nabla f∇ italic_f in a compositional way, using the chain rule, two generalizations are reasonable:

  • We need both f𝑓fitalic_f and f𝑓\nabla f∇ italic_f when calculating (f;g)𝑓𝑔\nabla(f;g)∇ ( italic_f ; italic_g ) of a composition f;g𝑓𝑔f;gitalic_f ; italic_g, using the chain rule, so we are really interested in the pair (f,f):×:𝑓𝑓(f,\nabla f):\mathbb{R}\to\mathbb{R}\times\mathbb{R}( italic_f , ∇ italic_f ) : blackboard_R → blackboard_R × blackboard_R;

  • In building f𝑓fitalic_f we will need to consider functions of multiple arguments, such as +:2+:\mathbb{R}^{2}\to\mathbb{R}+ : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R, and these functions should propagate derivatives.

Thus we are more generally interested in transforming a function g:n:𝑔superscript𝑛g:\mathbb{R}^{n}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R into a function h:(×)n×:superscript𝑛h:(\mathbb{R}\times\mathbb{R})^{n}\to\mathbb{R}\times\mathbb{R}italic_h : ( blackboard_R × blackboard_R ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R × blackboard_R in such a way that for any f1fn::subscript𝑓1subscript𝑓𝑛f_{1}\dots f_{n}:\mathbb{R}\to\mathbb{R}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : blackboard_R → blackboard_R,

(f1,f1,,fn,fn);h=((f1,,fn);g,((f1,,fn);g)).subscript𝑓1subscript𝑓1subscript𝑓𝑛subscript𝑓𝑛subscript𝑓1subscript𝑓𝑛𝑔subscript𝑓1subscript𝑓𝑛𝑔.(f_{1},\nabla f_{1},\dots,f_{n},\nabla f_{n});h=((f_{1},\dots,f_{n});g,\nabla(% (f_{1},\dots,f_{n});g))\text{.}( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , ∇ italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ; italic_h = ( ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ; italic_g , ∇ ( ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ; italic_g ) ) . (2)

Computing automatically the program representing hhitalic_h, given a program representing g𝑔gitalic_g, is the goal of automatic differentiation. An intuition for hhitalic_h is often given in terms of dual numbers. The transformed function operates on pairs of numbers, (x,x)𝑥superscript𝑥(x,x^{\prime})( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), and it is common to think of such a pair as x+xϵ𝑥superscript𝑥italic-ϵx+x^{\prime}\epsilonitalic_x + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_ϵ for an ‘infinitesimal’ ϵitalic-ϵ\epsilonitalic_ϵ. But while this is a helpful intuition, the formalization of infinitesimals can be intricate, and the development in this paper is focussed on the elementary formulation in (2).

A function hhitalic_h satisfying (2) encodes all the partial derivatives of g𝑔gitalic_g. For example, if g:2:𝑔superscript2g\colon\mathbb{R}^{2}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R, then with f1(x)=defxsuperscriptdefsubscript𝑓1𝑥𝑥f_{1}(x)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}xitalic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_x and f2(x)=defx2superscriptdefsubscript𝑓2𝑥subscript𝑥2f_{2}(x)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}x_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, by applying (2) to x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT we obtain h(x1,1,x2,0)=(g(x1,x2),g(x,x2)x(x1))subscript𝑥11subscript𝑥20𝑔subscript𝑥1subscript𝑥2𝑔𝑥subscript𝑥2𝑥subscript𝑥1h(x_{1},1,x_{2},0)\!=\!(g(x_{1},x_{2}),\frac{\partial g(x,x_{2})}{\partial x}(% x_{1}))italic_h ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 1 , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 0 ) = ( italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , divide start_ARG ∂ italic_g ( italic_x , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) and similarly h(x1,0,x2,1)=(g(x1,x2),g(x1,x)x(x2))subscript𝑥10subscript𝑥21𝑔subscript𝑥1subscript𝑥2𝑔subscript𝑥1𝑥𝑥subscript𝑥2h(x_{1},0,x_{2},1)\!=\!(g(x_{1},x_{2}),\frac{\partial g(x_{1},x)}{\partial x}(% x_{2}))italic_h ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 0 , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 1 ) = ( italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , divide start_ARG ∂ italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ). And conversely, if g𝑔gitalic_g is differentiable in each argument, then a unique hhitalic_h satisfying (2) can be found by taking linear combinations of partial derivatives, for example:

h(x1,x1,x2,x2)=(g(x1,x2),x1g(x,x2)x(x1)+x2g(x1,x)x(x2)).subscript𝑥1superscriptsubscript𝑥1subscript𝑥2superscriptsubscript𝑥2𝑔subscript𝑥1subscript𝑥2superscriptsubscript𝑥1𝑔𝑥subscript𝑥2𝑥subscript𝑥1superscriptsubscript𝑥2𝑔subscript𝑥1𝑥𝑥subscript𝑥2.\textstyle h(x_{1},x_{1}^{\prime},x_{2},x_{2}^{\prime})=(g(x_{1},x_{2}),x_{1}^% {\prime}\cdot\frac{\partial g(x,x_{2})}{\partial x}(x_{1})+x_{2}^{\prime}\cdot% \frac{\partial g(x_{1},x)}{\partial x}(x_{2}))\text{.}italic_h ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ( italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ divide start_ARG ∂ italic_g ( italic_x , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ divide start_ARG ∂ italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) .

(Here, recall that the partial derivative g(x,x2)x(x1)𝑔𝑥subscript𝑥2𝑥subscript𝑥1\frac{\partial g(x,x_{2})}{\partial x}(x_{1})divide start_ARG ∂ italic_g ( italic_x , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) is a particular notation for the gradient (g(,x2))(x1)𝑔subscript𝑥2subscript𝑥1\nabla(g(-,x_{2}))(x_{1})∇ ( italic_g ( - , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), i.e. with x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT fixed. )

In summary, the idea of differentiation with dual numbers is to transform a differentiable function g:n:𝑔superscript𝑛g:\mathbb{R}^{n}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R to a function h:2n2:superscript2𝑛superscript2h:\mathbb{R}^{2n}\to\mathbb{R}^{2}italic_h : blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT which captures g𝑔gitalic_g and all its partial derivatives. We packaged this up in (2) as an invariant which is useful for building derivatives of compound functions \mathbb{R}\to\mathbb{R}blackboard_R → blackboard_R in a compositional way. The idea of (first order) forward mode automatic differentiation is to perform this transformation at the source code level.

We say that a macro for AD is correct if, given a semantic model semlimit-from𝑠𝑒𝑚sem{-}italic_s italic_e italic_m -, the program P𝑃Pitalic_P representing g=Pg=\llbracket P\rrbracketitalic_g = ⟦ italic_P ⟧ is transformed by the macro to a program Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT representing h=Ph=\llbracket P^{\prime}\rrbracketitalic_h = ⟦ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧. This means in particular that Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT computes correct partial derivatives of (the function represented by) P𝑃Pitalic_P.

Smooth functions.

In what follows we will often speak of smooth functions ksuperscript𝑘\mathbb{R}^{k}\to\mathbb{R}blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R, which are functions that are continuous and differentiable, such that their derivatives are also continuous and differentiable, and so on.

2.2. Higher order differentiation: the Faà di Bruno formula and Taylor approximations.

We now generalize the above in two directions:

  • We look for the best local approximations to f𝑓fitalic_f with polynomials of some order R𝑅Ritalic_R, generalizing the above use of linear functions (R=1𝑅1R=1italic_R = 1).

  • We can work directly with multivariate functions ksuperscript𝑘\mathbb{R}^{k}\to\mathbb{R}blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R instead of functions of one variable \mathbb{R}\to\mathbb{R}blackboard_R → blackboard_R (k=1𝑘1k=1italic_k = 1).

To make this precise, we recall that, given a smooth function f:k:𝑓superscript𝑘f:\mathbb{R}^{k}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R and a natural number R0𝑅0R\geq 0italic_R ≥ 0, the R𝑅Ritalic_R-th order Taylor approximation of f𝑓fitalic_f at ak𝑎superscript𝑘{a\in\mathbb{R}^{k}}italic_a ∈ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT is defined in terms of the partial derivatives of f𝑓fitalic_f:

ksuperscript𝑘\displaystyle\mathbb{R}^{k}\;\;\quadblackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT \displaystyle\to\qquad\mathbb{R}→ blackboard_R
x𝑥\displaystyle{x}\qquaditalic_x {(α1,,αk)kα1++αkR}1α1!αk!α1++αkf(x)x1α1xkαk(a)(x1a1)α1(xkak)αk.maps-toabsentsubscriptconditional-setsubscript𝛼1subscript𝛼𝑘superscript𝑘subscript𝛼1subscript𝛼𝑘𝑅1subscript𝛼1subscript𝛼𝑘superscriptsubscript𝛼1subscript𝛼𝑘𝑓𝑥superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝛼𝑘𝑎superscriptsubscript𝑥1subscript𝑎1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝑎𝑘subscript𝛼𝑘\displaystyle\mapsto\sum_{\left\{(\alpha_{1},\ldots,\alpha_{k})\in\mathbb{N}^{% k}\mid\alpha_{1}+\ldots+\alpha_{k}\leq R\right\}}{\frac{1}{\alpha_{1}!\cdot% \ldots\cdot\alpha_{k}!}\frac{\partial^{\alpha_{1}+\ldots+\alpha_{k}}f(x)}{% \partial x_{1}^{\alpha_{1}}\cdots\partial x_{k}^{\alpha_{k}}}(a)}\cdot(x_{1}-a% _{1})^{\alpha_{1}}\cdot\ldots\cdot(x_{k}-a_{k})^{\alpha_{k}}.↦ ∑ start_POSTSUBSCRIPT { ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ blackboard_N start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∣ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_R } end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! ⋅ … ⋅ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ( italic_a ) ⋅ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ … ⋅ ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

This is an R𝑅Ritalic_R-th order polynomial. Similarly to the case of first order derivatives, we can recover the partial derivatives of f𝑓fitalic_f up to the R𝑅Ritalic_R-th order from its Taylor approximation by evaluating the series at basis vectors. See Section 2.3 below for an example.

Recall that the ordering of partial derivatives does not matter for smooth functions (Schwarz/Clairaut’s theorem). So there will be (R+k1k1)binomial𝑅𝑘1𝑘1\binom{R+k-1}{k-1}( FRACOP start_ARG italic_R + italic_k - 1 end_ARG start_ARG italic_k - 1 end_ARG ) R𝑅Ritalic_R-th order partial derivatives, and altogether there are (R+kk)binomial𝑅𝑘𝑘\binom{R+k}{k}( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) summands in the R𝑅Ritalic_R-th order Taylor approximation. (This can be seen by a ‘stars-and-bars’ argument.)

Since there are (R+kk)binomial𝑅𝑘𝑘{\binom{R+k}{k}}( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) partial derivatives of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of order Rabsent𝑅\leq R≤ italic_R, we can store them in the Euclidean space (R+kk)superscriptbinomial𝑅𝑘𝑘\mathbb{R}^{\binom{R+k}{k}}blackboard_R start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT, which can also be regarded as the space of k𝑘kitalic_k-variate polynomials of degree Rabsent𝑅\leq R≤ italic_R.

We use a convention of coordinates (yα1αk)(α1,,αk){(α1,,αk)k0α1++αkR}subscriptsubscript𝑦subscript𝛼1subscript𝛼𝑘subscript𝛼1subscript𝛼𝑘conditional-setsubscript𝛼1subscript𝛼𝑘superscript𝑘0subscript𝛼1subscript𝛼𝑘𝑅\left(y_{\alpha_{1}...\alpha_{k}}\in\mathbb{R}\right)_{(\alpha_{1},\ldots,% \alpha_{k})\in\left\{(\alpha_{1},\ldots,\alpha_{k})\in\mathbb{N}^{k}\mid 0\leq% \alpha_{1}+\ldots+\alpha_{k}\leq R\right\}}( italic_y start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R ) start_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ { ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ blackboard_N start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∣ 0 ≤ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_R } end_POSTSUBSCRIPT where yα1αksubscript𝑦subscript𝛼1subscript𝛼𝑘y_{\alpha_{1}\ldots\alpha_{k}}italic_y start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT is intended to represent a partial derivative α1++αkfx1α1xkαk(a)superscriptsubscript𝛼1subscript𝛼𝑘𝑓superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝛼𝑘𝑎\frac{\partial^{\alpha_{1}+...+\alpha_{k}}f}{\partial x_{1}^{\alpha_{1}}\cdots% \partial x_{k}^{\alpha_{k}}}(a)divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ( italic_a ) for some function f:k:𝑓superscript𝑘f:\mathbb{R}^{k}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R. We will choose these coordinates in lexicographic order of the multi-indices (α1,,αk)subscript𝛼1subscript𝛼𝑘(\alpha_{1},\ldots,\alpha_{k})( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), that is, the indexes in the Euclidean space (R+kk)superscriptbinomial𝑅𝑘𝑘\mathbb{R}^{\binom{R+k}{k}}blackboard_R start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT will typically range from (0,,0)00(0,\ldots,0)( 0 , … , 0 ) to (R,0,,0)𝑅00(R,0,\ldots,0)( italic_R , 0 , … , 0 ).

The (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor representation of a function g:n:𝑔superscript𝑛g:\mathbb{R}^{n}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R is a function h:((R+kk))n(R+kk):superscriptsuperscriptbinomial𝑅𝑘𝑘𝑛superscriptbinomial𝑅𝑘𝑘h:\left(\mathbb{R}^{\binom{R+k}{k}}\right)^{n}\to\mathbb{R}^{\binom{R+k}{k}}italic_h : ( blackboard_R start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT that transforms the partial derivatives of f:kn:𝑓superscript𝑘superscript𝑛f:\mathbb{R}^{k}\to\mathbb{R}^{n}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT of order Rabsent𝑅\leq R≤ italic_R under postcomposition with g𝑔gitalic_g:

((α1++αkfj(x)x1α1xkαk)(α1,,αk)=(0,,0)(R,0,,0))j=1n;h=((α1++αk((f1,,fn);g)(x)x1α1xkαk)(α1,,αk)=(0,,0)(R,0,,0))j=1n.superscriptsubscriptsuperscriptsubscriptsuperscriptsubscript𝛼1subscript𝛼𝑘subscript𝑓𝑗𝑥superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝛼𝑘subscript𝛼1subscript𝛼𝑘00𝑅00𝑗1𝑛superscriptsubscriptsuperscriptsubscriptsuperscriptsubscript𝛼1subscript𝛼𝑘subscript𝑓1subscript𝑓𝑛𝑔𝑥superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝛼𝑘subscript𝛼1subscript𝛼𝑘00𝑅00𝑗1𝑛.{\left({\left(\frac{\partial^{\alpha_{1}+\ldots+\alpha_{k}}f_{j}(x)}{\partial x% _{1}^{\alpha_{1}}\cdots\partial x_{k}^{\alpha_{k}}}\right)}_{(\alpha_{1},...,% \alpha_{k})=(0,...,0)}^{(R,0,...,0)}\right)}_{j=1}^{n};h={\left({\left(\frac{% \partial^{\alpha_{1}+\ldots+\alpha_{k}}((f_{1},\ldots,f_{n});g)(x)}{\partial x% _{1}^{\alpha_{1}}\cdots\partial x_{k}^{\alpha_{k}}}\right)}_{(\alpha_{1},...,% \alpha_{k})=(0,...,0)}^{(R,0,...,0)}\right)}_{j=1}^{n}\text{.}( ( divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) start_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = ( 0 , … , 0 ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R , 0 , … , 0 ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ; italic_h = ( ( divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ; italic_g ) ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) start_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = ( 0 , … , 0 ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R , 0 , … , 0 ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .
(3)

Thus the Taylor representation generalizes the dual numbers representation (R=k=1𝑅𝑘1R=k=1italic_R = italic_k = 1).

To explicitly calculate the Taylor representation for a smooth function, we recall a generalization of the chain rule to higher derivatives. The chain rule tells us how the coefficients of linear approximations transform under composition of the functions. The Faà di Bruno formula [Sav06, EM03, CS96] tells us how coefficients of Taylor approximations – that is, higher derivatives – transform under composition. We recall the multivariate form from [Sav06, Theorem 2.1]. Given functions f=(f1,,fl):kl:𝑓subscript𝑓1subscript𝑓𝑙superscript𝑘superscript𝑙f=(f_{1},\ldots,f_{l}):\mathbb{R}^{k}\to\mathbb{R}^{l}italic_f = ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and g:l:𝑔superscript𝑙g:\mathbb{R}^{l}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT → blackboard_R, for α1++αk>0subscript𝛼1subscript𝛼𝑘0\alpha_{1}+\ldots+\alpha_{k}>0italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0,

α1++αk(f;g)(x)x1α1xkαk(a)superscriptsubscript𝛼1subscript𝛼𝑘𝑓𝑔𝑥superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝛼𝑘𝑎\displaystyle\frac{\partial^{\alpha_{1}+\ldots+\alpha_{k}}(f;g)(x)}{\partial x% _{1}^{\alpha_{1}}\cdots\partial x_{k}^{\alpha_{k}}}(a)divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_f ; italic_g ) ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ( italic_a ) =α1!αk!{(β1,,βl)l1β1++βlα1++αk}β1++βlg(y)y1β1ylβl(f(a))\displaystyle=\alpha_{1}!\cdot\ldots\cdot\alpha_{k}!\cdot\sum_{\left\{(\beta_{% 1},\ldots,\beta_{l})\in\mathbb{N}^{l}\mid 1\leq\beta_{1}+\ldots+\beta_{l}\leq% \alpha_{1}+\ldots+\alpha_{k}\right\}}\frac{\partial^{\beta_{1}+\ldots+\beta_{l% }}g(y)}{\partial y_{1}^{\beta_{1}}\cdots\partial y_{l}^{\beta_{l}}}(f(a))\cdot= italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! ⋅ … ⋅ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! ⋅ ∑ start_POSTSUBSCRIPT { ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ∈ blackboard_N start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∣ 1 ≤ italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ≤ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_g ( italic_y ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_y start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ( italic_f ( italic_a ) ) ⋅
{((e11,,el1),,(e1q,,elq))(l)qej1++ejq=βj,(e11++el1)αi1++(e1q++elq)αiq=αi}subscriptconditional-setsubscriptsuperscript𝑒11subscriptsuperscript𝑒1𝑙subscriptsuperscript𝑒𝑞1subscriptsuperscript𝑒𝑞𝑙superscriptsuperscript𝑙𝑞formulae-sequencesuperscriptsubscript𝑒𝑗1superscriptsubscript𝑒𝑗𝑞subscript𝛽𝑗subscriptsuperscript𝑒11subscriptsuperscript𝑒1𝑙subscriptsuperscript𝛼1𝑖subscriptsuperscript𝑒𝑞1subscriptsuperscript𝑒𝑞𝑙subscriptsuperscript𝛼𝑞𝑖subscript𝛼𝑖\displaystyle\sum_{\left\{((e^{1}_{1},\ldots,e^{1}_{l}),\ldots,(e^{q}_{1},% \ldots,e^{q}_{l}))\in(\mathbb{N}^{l})^{q}\mid e_{j}^{1}+\ldots+e_{j}^{q}=\beta% _{j},(e^{1}_{1}+\ldots+e^{1}_{l})\cdot\alpha^{1}_{i}+\ldots+(e^{q}_{1}+\ldots+% e^{q}_{l})\cdot\alpha^{q}_{i}=\alpha_{i}\right\}}∑ start_POSTSUBSCRIPT { ( ( italic_e start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) , … , ( italic_e start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ) ∈ ( blackboard_N start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∣ italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + … + italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT = italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ( italic_e start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_e start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ⋅ italic_α start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + … + ( italic_e start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_e start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ⋅ italic_α start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_POSTSUBSCRIPT
r=1qj=1l1ejr!(1α1r!αkr!α1r++αkrfj(x)α1rx1αkrxk(a))ejr,superscriptsubscriptproduct𝑟1𝑞superscriptsubscriptproduct𝑗1𝑙1subscriptsuperscript𝑒𝑟𝑗superscript1subscriptsuperscript𝛼𝑟1subscriptsuperscript𝛼𝑟𝑘superscriptsubscriptsuperscript𝛼𝑟1subscriptsuperscript𝛼𝑟𝑘subscript𝑓𝑗𝑥superscriptsubscriptsuperscript𝛼𝑟1subscript𝑥1superscriptsubscriptsuperscript𝛼𝑟𝑘subscript𝑥𝑘𝑎subscriptsuperscript𝑒𝑟𝑗\displaystyle\prod_{r=1}^{q}\prod_{j=1}^{l}\frac{1}{e^{r}_{j}!}\left(\frac{1}{% \alpha^{r}_{1}!\cdot\ldots\cdot\alpha^{r}_{k}!}\frac{\partial^{\alpha^{r}_{1}+% \cdots+\alpha^{r}_{k}}f_{j}(x)}{\partial^{\alpha^{r}_{1}}x_{1}\cdots\partial^{% \alpha^{r}_{k}}x_{k}}(a)\right)^{e^{r}_{j}},∏ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! ⋅ … ⋅ italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋯ ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( italic_a ) ) start_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,

where (α11,,αk1),,(α1q,,αkq)ksubscriptsuperscript𝛼11subscriptsuperscript𝛼1𝑘subscriptsuperscript𝛼𝑞1subscriptsuperscript𝛼𝑞𝑘superscript𝑘(\alpha^{1}_{1},\ldots,\alpha^{1}_{k}),\ldots,(\alpha^{q}_{1},\ldots,\alpha^{q% }_{k})\in\mathbb{N}^{k}( italic_α start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , … , ( italic_α start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ blackboard_N start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT are an enumeration of all the vectors (α1r,,αkr)subscriptsuperscript𝛼𝑟1subscriptsuperscript𝛼𝑟𝑘(\alpha^{r}_{1},\ldots,\alpha^{r}_{k})( italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) of k𝑘kitalic_k natural numbers such that αjrαjsubscriptsuperscript𝛼𝑟𝑗subscript𝛼𝑗\alpha^{r}_{j}\leq\alpha_{j}italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and α1r++αkr>0subscriptsuperscript𝛼𝑟1subscriptsuperscript𝛼𝑟𝑘0\alpha^{r}_{1}+\ldots+\alpha^{r}_{k}>0italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 and we write q𝑞qitalic_q for the number of such vectors. The details of this formula reflect the complicated combinatorics the arise from repeated applications of the chain and product rules for differentiation that one uses to prove it. Conceptually, however, it is rather straightforward: it tells us that the coefficients of the R𝑅Ritalic_R-th order Taylor approximation of f;g𝑓𝑔f;gitalic_f ; italic_g can be expressed exclusively in terms of those of f𝑓fitalic_f and g𝑔gitalic_g.

Thus the Faà di Bruno formula uniquely determines the Taylor approximation h:((R+kk))n(R+kk):superscriptsuperscriptbinomial𝑅𝑘𝑘𝑛superscriptbinomial𝑅𝑘𝑘h:\left(\mathbb{R}^{\binom{R+k}{k}}\right)^{n}\to\mathbb{R}^{\binom{R+k}{k}}italic_h : ( blackboard_R start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT in terms of the derivatives of g:n:𝑔superscript𝑛g:\mathbb{R}^{n}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R of order Rabsent𝑅\leq R≤ italic_R, and we can also recover all such derivatives from hhitalic_h.

2.3. Example: a two-dimensional second order Taylor series

As an example, we can specialize the Faà di Bruno formula above to the second order Taylor series of a function f:2l:𝑓superscript2superscript𝑙f:\mathbb{R}^{2}\to\mathbb{R}^{l}italic_f : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and its behaviour under postcomposition with a smooth function g:l:𝑔superscript𝑙g:\mathbb{R}^{l}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT → blackboard_R:

2(f;g)(x)xixi(a)superscript2𝑓𝑔𝑥subscript𝑥𝑖subscript𝑥superscript𝑖𝑎\displaystyle\frac{\partial^{2}(f;g)(x)}{\partial x_{i}\partial x_{i^{\prime}}% }(a)divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_f ; italic_g ) ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ( italic_a ) =j=1lg(y)yj(f(a))2fj(x)xixi(a)+j,j=1l2g(y)yjyj(f(a))fj(x)xi(a)fj(x)xi(a),absentsuperscriptsubscript𝑗1𝑙𝑔𝑦subscript𝑦𝑗𝑓𝑎superscript2subscript𝑓𝑗𝑥subscript𝑥𝑖subscript𝑥superscript𝑖𝑎superscriptsubscript𝑗superscript𝑗1𝑙superscript2𝑔𝑦subscript𝑦𝑗subscript𝑦superscript𝑗𝑓𝑎subscript𝑓superscript𝑗𝑥subscript𝑥𝑖𝑎subscript𝑓𝑗𝑥subscript𝑥superscript𝑖𝑎\displaystyle=\sum_{j=1}^{l}\frac{\partial g(y)}{\partial y_{j}}(f(a))\frac{% \partial^{2}f_{j}(x)}{\partial x_{i}\partial x_{i^{\prime}}}(a)+\sum_{j,j^{% \prime}=1}^{l}\frac{\partial^{2}g(y)}{\partial y_{j}\partial y_{j^{\prime}}}(f% (a))\frac{\partial f_{j^{\prime}}(x)}{\partial x_{i}}(a)\frac{\partial f_{j}(x% )}{\partial x_{i^{\prime}}}(a),= ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ italic_g ( italic_y ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( italic_f ( italic_a ) ) divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ( italic_a ) + ∑ start_POSTSUBSCRIPT italic_j , italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_y ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∂ italic_y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ( italic_f ( italic_a ) ) divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_a ) divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ( italic_a ) ,

where i,i{1,2}𝑖superscript𝑖12i,i^{\prime}\in\left\{1,2\right\}italic_i , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { 1 , 2 } might either coincide or be distinct.

Rather than working with the full (2,2)22(2,2)( 2 , 2 )-Taylor representation of g𝑔gitalic_g, we ignore the non-mixed second order derivatives y02j=2fj(x)x22superscriptsubscript𝑦02𝑗superscript2subscript𝑓𝑗𝑥superscriptsubscript𝑥22y_{02}^{j}=\frac{\partial^{2}f_{j}(x)}{\partial x_{2}^{2}}italic_y start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG and y20j=2fj(x)x12superscriptsubscript𝑦20𝑗superscript2subscript𝑓𝑗𝑥superscriptsubscript𝑥12y_{20}^{j}=\frac{\partial^{2}f_{j}(x)}{\partial x_{1}^{2}}italic_y start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG for the moment, and we represent the derivatives of order 2absent2\leq 2≤ 2 of fj:2:subscript𝑓𝑗superscript2f_{j}:\mathbb{R}^{2}\to\mathbb{R}italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R (at some point a𝑎aitalic_a) as the numbers

(y00j,y01j,y10j,y11j)=(fj(a),fj(x)x2(a),fj(x)x1(a),2fj(x)x1x2(a))4superscriptsubscript𝑦00𝑗superscriptsubscript𝑦01𝑗superscriptsubscript𝑦10𝑗superscriptsubscript𝑦11𝑗subscript𝑓𝑗𝑎subscript𝑓𝑗𝑥subscript𝑥2𝑎subscript𝑓𝑗𝑥subscript𝑥1𝑎superscript2subscript𝑓𝑗𝑥subscript𝑥1subscript𝑥2𝑎superscript4(y_{00}^{j},y_{01}^{j},y_{10}^{j},y_{11}^{j})=\left(f_{j}(a),\frac{\partial f_% {j}(x)}{\partial x_{2}}(a),\frac{\partial f_{j}(x)}{\partial x_{1}}(a),\frac{% \partial^{2}f_{j}(x)}{\partial x_{1}\partial x_{2}}(a)\right)\in\mathbb{R}^{4}( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = ( italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_a ) , divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ( italic_a ) , divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ( italic_a ) , divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ( italic_a ) ) ∈ blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT

and we can choose a similar representation for the derivatives of (f;g)𝑓𝑔(f;g)( italic_f ; italic_g ). Then, we observe that the Faà di Bruno formula induces the function h:(4)l4:superscriptsuperscript4𝑙superscript4h:(\mathbb{R}^{4})^{l}\to\mathbb{R}^{4}italic_h : ( blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT

h((y001,y011,y101,y111),,(y00l,y01l,y10l,y11l))=superscriptsubscript𝑦001superscriptsubscript𝑦011superscriptsubscript𝑦101superscriptsubscript𝑦111superscriptsubscript𝑦00𝑙superscriptsubscript𝑦01𝑙superscriptsubscript𝑦10𝑙superscriptsubscript𝑦11𝑙absent\displaystyle h((y_{00}^{1},y_{01}^{1},y_{10}^{1},y_{11}^{1}),\ldots,(y_{00}^{% l},y_{01}^{l},y_{10}^{l},y_{11}^{l}))=italic_h ( ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) , … , ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ) =
(g(y001,,y00l)j=1lg(y1,,yl)yj(y001,,y00l)y01jj=1lg(y1,,yl)yj(y001,,y00l)y10jj=1lg(y1,,yl)yj(y001,,y00l)y11j+j,j=1l2g(y1,,yl)yjyj(y001,,y00l)y10jy01j).𝑔superscriptsubscript𝑦001superscriptsubscript𝑦00𝑙superscriptsubscript𝑗1𝑙𝑔superscript𝑦1superscript𝑦𝑙subscript𝑦𝑗superscriptsubscript𝑦001superscriptsubscript𝑦00𝑙superscriptsubscript𝑦01𝑗superscriptsubscript𝑗1𝑙𝑔superscript𝑦1superscript𝑦𝑙subscript𝑦𝑗superscriptsubscript𝑦001superscriptsubscript𝑦00𝑙superscriptsubscript𝑦10𝑗superscriptsubscript𝑗1𝑙𝑔superscript𝑦1superscript𝑦𝑙subscript𝑦𝑗superscriptsubscript𝑦001superscriptsubscript𝑦00𝑙superscriptsubscript𝑦11𝑗superscriptsubscript𝑗superscript𝑗1𝑙superscript2𝑔superscript𝑦1superscript𝑦𝑙subscript𝑦𝑗subscript𝑦superscript𝑗superscriptsubscript𝑦001superscriptsubscript𝑦00𝑙superscriptsubscript𝑦10𝑗superscriptsubscript𝑦01superscript𝑗\displaystyle\left(\begin{array}[]{l}g(y_{00}^{1},\ldots,y_{00}^{l})\\ \sum_{j=1}^{l}\frac{\partial g(y^{1},\ldots,y^{l})}{\partial y_{j}}(y_{00}^{1}% ,\ldots,y_{00}^{l})\cdot y_{01}^{j}\\ \sum_{j=1}^{l}\frac{\partial g(y^{1},\ldots,y^{l})}{\partial y_{j}}(y_{00}^{1}% ,\ldots,y_{00}^{l})\cdot y_{10}^{j}\\ \sum_{j=1}^{l}\frac{\partial g(y^{1},\ldots,y^{l})}{\partial y_{j}}(y_{00}^{1}% ,\ldots,y_{00}^{l})\cdot y_{11}^{j}+\sum_{j,j^{\prime}=1}^{l}\frac{\partial^{2% }g(y^{1},\ldots,y^{l})}{\partial y_{j}\partial y_{j^{\prime}}}(y_{00}^{1},% \ldots,y_{00}^{l})\cdot y_{10}^{j}\cdot y_{01}^{j^{\prime}}\end{array}\right).( start_ARRAY start_ROW start_CELL italic_g ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ italic_g ( italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ⋅ italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ italic_g ( italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ⋅ italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ italic_g ( italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ⋅ italic_y start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j , italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∂ italic_y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ⋅ italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⋅ italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ) .

In particular, we can note that

h((y001,y011,y101,0),,(y00l,y01l,y10l,0))=(g(y001,,y00l)j=1lg(y1,,yl)yj(y001,,y00l)y01jj=1lg(y1,,yl)yj(y001,,y00l)y10jj,j=1l2g(y1,,yl)yjyj(y001,,y00l)y10jy01j).superscriptsubscript𝑦001superscriptsubscript𝑦011superscriptsubscript𝑦1010superscriptsubscript𝑦00𝑙superscriptsubscript𝑦01𝑙superscriptsubscript𝑦10𝑙0𝑔superscriptsubscript𝑦001superscriptsubscript𝑦00𝑙superscriptsubscript𝑗1𝑙𝑔superscript𝑦1superscript𝑦𝑙subscript𝑦𝑗superscriptsubscript𝑦001superscriptsubscript𝑦00𝑙superscriptsubscript𝑦01𝑗superscriptsubscript𝑗1𝑙𝑔superscript𝑦1superscript𝑦𝑙subscript𝑦𝑗superscriptsubscript𝑦001superscriptsubscript𝑦00𝑙superscriptsubscript𝑦10𝑗superscriptsubscript𝑗superscript𝑗1𝑙superscript2𝑔superscript𝑦1superscript𝑦𝑙subscript𝑦𝑗subscript𝑦superscript𝑗superscriptsubscript𝑦001superscriptsubscript𝑦00𝑙superscriptsubscript𝑦10𝑗superscriptsubscript𝑦01superscript𝑗\displaystyle h((y_{00}^{1},y_{01}^{1},y_{10}^{1},0),\ldots,(y_{00}^{l},y_{01}% ^{l},y_{10}^{l},0))=\left(\begin{array}[]{l}g(y_{00}^{1},\ldots,y_{00}^{l})\\ \sum_{j=1}^{l}\frac{\partial g(y^{1},\ldots,y^{l})}{\partial y_{j}}(y_{00}^{1}% ,\ldots,y_{00}^{l})\cdot y_{01}^{j}\\ \sum_{j=1}^{l}\frac{\partial g(y^{1},\ldots,y^{l})}{\partial y_{j}}(y_{00}^{1}% ,\ldots,y_{00}^{l})\cdot y_{10}^{j}\\ \sum_{j,j^{\prime}=1}^{l}\frac{\partial^{2}g(y^{1},\ldots,y^{l})}{\partial y_{% j}\partial y_{j^{\prime}}}(y_{00}^{1},\ldots,y_{00}^{l})\cdot y_{10}^{j}\cdot y% _{01}^{j^{\prime}}\end{array}\right).italic_h ( ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , 0 ) , … , ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , 0 ) ) = ( start_ARRAY start_ROW start_CELL italic_g ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ italic_g ( italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ⋅ italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ italic_g ( italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ⋅ italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_j , italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∂ italic_y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ⋅ italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⋅ italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY ) .

We see can use this method to calculate any directional first and second order derivative of g𝑔gitalic_g in one pass. For example, if l=3𝑙3l=3italic_l = 3, so g:3:𝑔superscript3g:\mathbb{R}^{3}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT → blackboard_R, then the last component of h((x,x,x′′,0),(y,y,y′′,0),(z,z,z′′,0))𝑥superscript𝑥superscript𝑥′′0𝑦superscript𝑦superscript𝑦′′0𝑧superscript𝑧superscript𝑧′′0h((x,x^{\prime},x^{\prime\prime},0),(y,y^{\prime},y^{\prime\prime},0),(z,z^{% \prime},z^{\prime\prime},0))italic_h ( ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , 0 ) , ( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , 0 ) , ( italic_z , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , 0 ) ) is the result of taking the first derivative in direction (x,y,z)superscript𝑥superscript𝑦superscript𝑧(x^{\prime},y^{\prime},z^{\prime})( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and the second derivative in direction (x′′,y′′,z′′)superscript𝑥′′superscript𝑦′′superscript𝑧′′(x^{\prime\prime},y^{\prime\prime},z^{\prime\prime})( italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ), and evaluating at (x,y,z)𝑥𝑦𝑧(x,y,z)( italic_x , italic_y , italic_z ).

In the proper Taylor representation we explicitly include the non-mixed second order derivatives as inputs and outputs, leading to a function h:(6)l6:superscriptsuperscriptsuperscript6𝑙superscript6h^{\prime}:(\mathbb{R}^{6})^{l}\to\mathbb{R}^{6}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : ( blackboard_R start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT. Above we have followed a common trick to avoid some unnecessary storage and computation, since these extra inputs and outputs are not required for computing the second order derivatives of g𝑔gitalic_g. For instance, if l=2𝑙2l=2italic_l = 2 then the last component of h((x,1,1,0),(y,0,0,0))𝑥110𝑦000h((x,1,1,0),(y,0,0,0))italic_h ( ( italic_x , 1 , 1 , 0 ) , ( italic_y , 0 , 0 , 0 ) ) computes 2g(x,y)x2(x,y)superscript2𝑔𝑥𝑦superscript𝑥2𝑥𝑦\frac{\partial^{2}g(x,y)}{\partial x^{2}}(x,y)divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_x , italic_y ) end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_x , italic_y ).

2.4. Example: a one-dimensional second order Taylor series

As opposed to (2,2)-AD, (1,2)-AD computes the first and second order derivatives in the same direction. For example, if g:2:𝑔superscript2g:\mathbb{R}^{2}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R is a smooth function, then h:(3)23:superscriptsuperscript32superscript3h:(\mathbb{R}^{3})^{2}\to\mathbb{R}^{3}italic_h : ( blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. An intuition for hhitalic_h can be given in terms of triple numbers. The transformed function operates on triples of numbers, (x,x,x′′)𝑥superscript𝑥superscript𝑥′′(x,x^{\prime},x^{\prime\prime})( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ), and it is common to think of such a triple as x+xϵ+x′′ϵ2𝑥superscript𝑥italic-ϵsuperscript𝑥′′superscriptitalic-ϵ2x+x^{\prime}\epsilon+x^{\prime\prime}\epsilon^{2}italic_x + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_ϵ + italic_x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for an ‘infinitesimal’ ϵitalic-ϵ\epsilonitalic_ϵ which has the property that ϵ3=0superscriptitalic-ϵ30\epsilon^{3}=0italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = 0. For instance we have

h((x1,1,0),(x2,0,0))=(g(x1,x2),g(x,x2)x(x1),2g(x,x2)x2(x1))subscript𝑥110subscript𝑥200𝑔subscript𝑥1subscript𝑥2𝑔𝑥subscript𝑥2𝑥subscript𝑥1superscript2𝑔𝑥subscript𝑥2superscript𝑥2subscript𝑥1\displaystyle h((x_{1},1,0),(x_{2},0,0))=(g(x_{1},x_{2}),\frac{\partial g(x,x_% {2})}{\partial x}(x_{1}),\frac{\partial^{2}g(x,x_{2})}{\partial x^{2}}(x_{1}))italic_h ( ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 1 , 0 ) , ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 0 , 0 ) ) = ( italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , divide start_ARG ∂ italic_g ( italic_x , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_x , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) )
h((x1,0,0),(x2,1,0))=(g(x1,x2),g(x1,x)x(x2),2g(x1,x)x2(x2))subscript𝑥100subscript𝑥210𝑔subscript𝑥1subscript𝑥2𝑔subscript𝑥1𝑥𝑥subscript𝑥2superscript2𝑔subscript𝑥1𝑥superscript𝑥2subscript𝑥2\displaystyle h((x_{1},0,0),(x_{2},1,0))=(g(x_{1},x_{2}),\frac{\partial g(x_{1% },x)}{\partial x}(x_{2}),\frac{\partial^{2}g(x_{1},x)}{\partial x^{2}}(x_{2}))italic_h ( ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 0 , 0 ) , ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 1 , 0 ) ) = ( italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , divide start_ARG ∂ italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) )
h((x1,1,0),(x2,1,0))=(g(x1,x2),g(x,x2)x(x1)+g(x1,x)x(x2),\displaystyle h((x_{1},1,0),(x_{2},1,0))=(g(x_{1},x_{2}),\frac{\partial g(x,x_% {2})}{\partial x}(x_{1})+\frac{\partial g(x_{1},x)}{\partial x}(x_{2}),italic_h ( ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 1 , 0 ) , ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 1 , 0 ) ) = ( italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , divide start_ARG ∂ italic_g ( italic_x , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + divide start_ARG ∂ italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,
2g(x,x2)x2(x1)+2g(x1,x)x2(x2)+22g(x,y)xy(x1,x2))\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad\frac{\partial^{2}% g(x,x_{2})}{\partial x^{2}}(x_{1})+\frac{\partial^{2}g(x_{1},x)}{\partial x^{2% }}(x_{2})+2\frac{\partial^{2}g(x,y)}{\partial x\partial y}(x_{1},x_{2}))divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_x , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + 2 divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_x , italic_y ) end_ARG start_ARG ∂ italic_x ∂ italic_y end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) )

We see that we directly get non-mixed second-order partial derivatives but not the mixed-ones. We can recover 2g(x,y)xy(x1,x2)superscript2𝑔𝑥𝑦𝑥𝑦subscript𝑥1subscript𝑥2\frac{\partial^{2}g(x,y)}{\partial x\partial y}(x_{1},x_{2})divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_x , italic_y ) end_ARG start_ARG ∂ italic_x ∂ italic_y end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) as 12(h((x1,1,0),(x2,1,0))h((x1,1,0),(x2,0,0))h((x1,0,0),(x2,1,0)))12subscript𝑥110subscript𝑥210subscript𝑥110subscript𝑥200subscript𝑥100subscript𝑥210\frac{1}{2}(h((x_{1},1,0),(x_{2},1,0))-h((x_{1},1,0),(x_{2},0,0))-h((x_{1},0,0% ),(x_{2},1,0)))divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_h ( ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 1 , 0 ) , ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 1 , 0 ) ) - italic_h ( ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 1 , 0 ) , ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 0 , 0 ) ) - italic_h ( ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 0 , 0 ) , ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 1 , 0 ) ) ).

More generally, if g:l:𝑔superscript𝑙g:\mathbb{R}^{l}\to\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT → blackboard_R, then h:(3)l3:superscriptsuperscript3𝑙superscript3h:(\mathbb{R}^{3})^{l}\to\mathbb{R}^{3}italic_h : ( blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT satisfies:

h((x1,x1,0),,(xl,xl,0))=(g(x1,,xl)i=1lg(x1,,xl)xi(x1,,xl)xii,j=1l2g(x1,,xl)xixj(x1,,xl)xixj).subscript𝑥1subscriptsuperscript𝑥10subscript𝑥𝑙subscriptsuperscript𝑥𝑙0𝑔subscript𝑥1subscript𝑥𝑙superscriptsubscript𝑖1𝑙𝑔subscript𝑥1subscript𝑥𝑙subscript𝑥𝑖subscript𝑥1subscript𝑥𝑙subscriptsuperscript𝑥𝑖superscriptsubscript𝑖𝑗1𝑙superscript2𝑔subscript𝑥1subscript𝑥𝑙subscript𝑥𝑖subscript𝑥𝑗subscript𝑥1subscript𝑥𝑙subscriptsuperscript𝑥𝑖subscriptsuperscript𝑥𝑗\displaystyle h((x_{1},x^{\prime}_{1},0),\ldots,(x_{l},x^{\prime}_{l},0))=% \left(\begin{array}[]{l}g(x_{1},\ldots,x_{l})\\ \sum_{i=1}^{l}\frac{\partial g(x_{1},\ldots,x_{l})}{\partial x_{i}}(x_{1},% \ldots,x_{l})\cdot x^{\prime}_{i}\\ \sum_{i,j=1}^{l}\frac{\partial^{2}g(x_{1},\ldots,x_{l})}{\partial x_{i}% \partial x_{j}}(x_{1},\ldots,x_{l})\cdot x^{\prime}_{i}\cdot x^{\prime}_{j}% \end{array}\right).italic_h ( ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 0 ) , … , ( italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , 0 ) ) = ( start_ARRAY start_ROW start_CELL italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ⋅ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ⋅ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ) .

We can always recover the mixed second order partial derivatives from this but this requires several computations involving hhitalic_h. This is thus different from the (2,2) method which was more direct.

2.5. Remark

In the rest of this article, we study forward-mode (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-automatic differentiation for a language with higher-order functions. The reader may like to fix k=R=1𝑘𝑅1k=R=1italic_k = italic_R = 1 for a standard automatic differentiation with first-order derivatives, based on dual numbers. This is the approach taken in the conference version of this paper [HSV20b]. But the generalization to higher-order derivatives with arbitrary k𝑘kitalic_k and R𝑅Ritalic_R flows straightforwardly through the whole narrative.

3. A Higher Order Forward-Mode AD Translation

3.1. A simple language of smooth functions.

We consider a standard higher order typed language with a first order type 𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\mathbf{real}bold_real of real numbers. The types (τ,σ)𝜏𝜎({\tau},{\sigma})( italic_τ , italic_σ ) and terms (t,s)𝑡𝑠({t},{s})( italic_t , italic_s ) are as follows.

τ,σ,ρ::=types|𝐫𝐞𝐚𝐥real numbers|(τ1τn)finite product|τσfunctiont,s,r::=termsxvariable|𝗈𝗉(t1,,tn)operations (including constants)|t1,,tn|𝐜𝐚𝐬𝐞t𝐨𝐟x1,,xnstuples/pattern matching|λx.t|tsfunction abstraction/application𝜏𝜎𝜌:absentassignmissing-subexpressiontypesmissing-subexpressionmissing-subexpression|𝐫𝐞𝐚𝐥real numbersmissing-subexpressionmissing-subexpression|subscript𝜏1subscript𝜏𝑛finite productmissing-subexpressionmissing-subexpression|𝜏𝜎functionmissing-subexpression𝑡𝑠𝑟:absentassignmissing-subexpressiontermsmissing-subexpressionmissing-subexpressionmissing-subexpression𝑥variablemissing-subexpressionmissing-subexpression|𝗈𝗉subscript𝑡1subscript𝑡𝑛operations (including constants)missing-subexpressionmissing-subexpression||subscript𝑡1subscript𝑡𝑛𝐜𝐚𝐬𝐞𝑡𝐨𝐟subscript𝑥1subscript𝑥𝑛𝑠tuples/pattern matchingmissing-subexpressionmissing-subexpression|formulae-sequence𝜆𝑥|𝑡𝑡𝑠function abstraction/applicationmissing-subexpression\begin{array}[t]{l@{\quad\!\!}*3{l@{}}@{\,}l}{\tau},{\sigma},{\rho}&::=&&% \mspace{-25.0mu}\qquad\text{types}\\ &\mathrel{\lvert}&\mathbf{real}&\qquad\text{real numbers}\\ &\mathrel{\lvert}&\boldsymbol{(}{\tau}_{1}\boldsymbol{\mathop{*}}\dots% \boldsymbol{\mathop{*}}{\tau}_{n}\boldsymbol{)}&\qquad\text{finite product}\\ &\mathrel{\lvert}&{\tau}\to{\sigma}&\qquad\text{function}\\[6.0pt] {t},{s},{r}&::=&&\mspace{-25.0mu}\qquad\text{terms}\\ &&{x}&\qquad\text{variable}\\ &\mathrel{\lvert}&\mathsf{op}({t}_{1},\ldots,{t}_{n})&\qquad\text{operations (% including constants)}\\ &\mathrel{\lvert}&\langle{t}_{1},\dots,{t}_{n}\rangle\ \mathrel{\lvert}\mathbf% {case}\,{t}\,\mathbf{of}\,\langle{x}_{1},\dots,{x}_{n}\rangle\to{s}&\qquad% \text{tuples/pattern matching}\\ &\mathrel{\lvert}&\lambda{x}.{t}\ \mathrel{\lvert}{t}\,{s}&\qquad\text{% function abstraction/application}\\ \end{array}start_ARRAY start_ROW start_CELL italic_τ , italic_σ , italic_ρ end_CELL start_CELL : := end_CELL start_CELL end_CELL start_CELL types end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL bold_real end_CELL start_CELL real numbers end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ … bold_∗ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_) end_CELL start_CELL finite product end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL italic_τ → italic_σ end_CELL start_CELL function end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_t , italic_s , italic_r end_CELL start_CELL : := end_CELL start_CELL end_CELL start_CELL terms end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL italic_x end_CELL start_CELL variable end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL sansserif_op ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL start_CELL operations (including constants) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL ⟨ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ | bold_case italic_t bold_of ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → italic_s end_CELL start_CELL tuples/pattern matching end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL italic_λ italic_x . italic_t | italic_t italic_s end_CELL start_CELL function abstraction/application end_CELL start_CELL end_CELL end_ROW end_ARRAY

The typing rules are in Figure 3. We have included some abstract basic n𝑛nitalic_n-ary operations 𝗈𝗉𝖮𝗉n𝗈𝗉subscript𝖮𝗉𝑛\mathsf{op}\in\mathsf{Op}_{n}sansserif_op ∈ sansserif_Op start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for every n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N. These are intended to include the usual (smooth) mathematical operations that are used in programs to which automatic differentiation is applied. For example,

  • for any real constant c𝑐c\in\mathbb{R}italic_c ∈ blackboard_R, we typically include a constant c¯𝖮𝗉0¯𝑐subscript𝖮𝗉0\underline{c}\in\mathsf{Op}_{0}under¯ start_ARG italic_c end_ARG ∈ sansserif_Op start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT; we slightly abuse notation and will simply write c¯¯𝑐\underline{c}under¯ start_ARG italic_c end_ARG for c¯()¯𝑐\underline{c}()under¯ start_ARG italic_c end_ARG ( ) in our examples;

  • we include some unary operations such as ς𝖮𝗉1𝜍subscript𝖮𝗉1\varsigma\in\mathsf{Op}_{1}italic_ς ∈ sansserif_Op start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT which we intend to stand for the usual sigmoid function, ς(x)=def11+exsuperscriptdef𝜍𝑥11superscript𝑒𝑥\varsigma(x)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{1}{1+e^{-x}}italic_ς ( italic_x ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP divide start_ARG 1 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT end_ARG;

  • we include some binary operations such as addition and multiplication (+),()𝖮𝗉2subscript𝖮𝗉2(+),(*)\in\mathsf{Op}_{2}( + ) , ( ∗ ) ∈ sansserif_Op start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT;

We add some simple syntactic sugar tu=deft+(1)¯usuperscriptdef𝑡𝑢𝑡¯1𝑢t-u\stackrel{{\scriptstyle\mathrm{def}}}{{=}}t+\underline{(-1)}*uitalic_t - italic_u start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_t + under¯ start_ARG ( - 1 ) end_ARG ∗ italic_u and, for some natural number n𝑛nitalic_n,

nt=deft++tn timesandtn=defttn timesformulae-sequencesuperscriptdef𝑛𝑡superscript𝑡𝑡n timesandsuperscriptdefsuperscript𝑡𝑛superscript𝑡𝑡n timesn\cdot{t}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\overbrace{{t}+...+{t}}^{% \text{$n$ times}}\qquad\text{and}\qquad{t}^{n}\stackrel{{\scriptstyle\mathrm{% def}}}{{=}}\overbrace{{t}*...*{t}}^{\text{$n$ times}}italic_n ⋅ italic_t start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP over⏞ start_ARG italic_t + … + italic_t end_ARG start_POSTSUPERSCRIPT italic_n times end_POSTSUPERSCRIPT and italic_t start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP over⏞ start_ARG italic_t ∗ … ∗ italic_t end_ARG start_POSTSUPERSCRIPT italic_n times end_POSTSUPERSCRIPT

Similarly, we will frequently denote repeated sums and products using \sum- and product\prod-signs, respectively: for example, we write t1++tnsubscript𝑡1subscript𝑡𝑛{t}_{1}+...+{t}_{n}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT as i{1,,n}tisubscript𝑖1𝑛subscript𝑡𝑖\sum_{i\in\{1,...,n\}}{t}_{i}∑ start_POSTSUBSCRIPT italic_i ∈ { 1 , … , italic_n } end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and t1tnsubscript𝑡1subscript𝑡𝑛{t}_{1}*...*{t}_{n}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ … ∗ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT as i{1,,n}tisubscriptproduct𝑖1𝑛subscript𝑡𝑖\prod_{i\in\left\{1,...,n\right\}}{t}_{i}∏ start_POSTSUBSCRIPT italic_i ∈ { 1 , … , italic_n } end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. This in addition to programming sugar such as 𝐥𝐞𝐭x=t𝐢𝐧s𝐥𝐞𝐭𝑥𝑡𝐢𝐧𝑠\mathbf{let}\,{x}=\,{t}\,\mathbf{in}\,{s}bold_let italic_x = italic_t bold_in italic_s for (λx.s)t(\lambda{x}.{{s}})\,{t}( italic_λ italic_x . italic_s ) italic_t and λx1,,xn.tformulae-sequence𝜆subscript𝑥1subscript𝑥𝑛𝑡\lambda\langle{x}_{1},\ldots,{x}_{n}\rangle.{{t}}italic_λ ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ . italic_t for λx.𝐜𝐚𝐬𝐞x𝐨𝐟x1,,xntformulae-sequence𝜆𝑥𝐜𝐚𝐬𝐞𝑥𝐨𝐟subscript𝑥1subscript𝑥𝑛𝑡\lambda{x}.{\mathbf{case}\,{x}\,\mathbf{of}\,\langle{x}_{1},\ldots,{x}_{n}% \rangle\to{t}}italic_λ italic_x . bold_case italic_x bold_of ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → italic_t.

\inferruleΓt1:𝐫𝐞𝐚𝐥Γtn:𝐫𝐞𝐚𝐥Γ𝗈𝗉(t1,,tn):𝐫𝐞𝐚𝐥(𝗈𝗉𝖮𝗉n)\inferruleΓt1:τ1Γtn:τnΓt1,,tn:(τ1τn)\inferruleΓt:(σ1σn)Γ,x1:σ1,,xn:σns:τΓ𝐜𝐚𝐬𝐞t𝐨𝐟x1,,xns:τ\inferruleΓx:τ((x:τ)Γ)\inferruleΓ,x:τt:σΓλx:τ.t:τσ\inferruleΓt:στΓs:σΓts:τ\begin{array}[]{c}\inferrule{\Gamma\vdash{t}_{1}:\mathbf{real}\;\;\dots\;\;% \Gamma\vdash{t}_{n}:\mathbf{real}}{\Gamma\vdash\mathsf{op}({t}_{1},\ldots,{t}_% {n}):\mathbf{real}}(\mathsf{op}\in\mathsf{Op}_{n})\\[12.0pt] \\ \inferrule{\Gamma\vdash{t}_{1}:{\tau}_{1}\;\;\dots\;\;\Gamma\vdash{t}_{n}:{% \tau}_{n}}{\Gamma\vdash\langle{t}_{1},\dots,{t}_{n}\rangle:\boldsymbol{(}{\tau% }_{1}\boldsymbol{\mathop{*}}\dots\boldsymbol{\mathop{*}}{\tau}_{n}\boldsymbol{% )}}\qquad\inferrule{\Gamma\vdash{t}:\boldsymbol{(}{\sigma}_{1}\boldsymbol{% \mathop{*}}\dots\boldsymbol{\mathop{*}}{\sigma}_{n}\boldsymbol{)}\;\;\Gamma,{{% x}_{1}\colon{\sigma}_{1},{.}{.}{.},{x}_{n}\colon{\sigma}_{n}}\vdash{s}:{\tau}}% {\Gamma\vdash\mathbf{case}\,{t}\,\mathbf{of}\,\langle{x}_{1},\dots,{x}_{n}% \rangle\to{s}:{\tau}}\\[12.0pt] \\ \inferrule{~{}}{\Gamma\vdash{x}:{\tau}}(({x}:{\tau})\in\Gamma)\qquad\inferrule% {\Gamma,{x}:{\tau}\vdash{t}:{\sigma}}{\Gamma\vdash\lambda{x}:{\tau}.{t}:{\tau}% \to{\sigma}}\qquad\inferrule{\Gamma\vdash{t}:{\sigma}\to{\tau}\\ \Gamma\vdash{s}:{\sigma}}{\Gamma\vdash{t}\,{s}:{\tau}}\end{array}start_ARRAY start_ROW start_CELL roman_Γ ⊢ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : bold_real … roman_Γ ⊢ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_real roman_Γ ⊢ sansserif_op ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) : bold_real ( sansserif_op ∈ sansserif_Op start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL end_ROW start_ROW start_CELL roman_Γ ⊢ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … roman_Γ ⊢ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_Γ ⊢ ⟨ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ : bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ … bold_∗ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_) roman_Γ ⊢ italic_t : bold_( italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ … bold_∗ italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_) roman_Γ , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊢ italic_s : italic_τ roman_Γ ⊢ bold_case italic_t bold_of ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → italic_s : italic_τ end_CELL end_ROW start_ROW start_CELL end_CELL end_ROW start_ROW start_CELL roman_Γ ⊢ italic_x : italic_τ ( ( italic_x : italic_τ ) ∈ roman_Γ ) roman_Γ , italic_x : italic_τ ⊢ italic_t : italic_σ roman_Γ ⊢ italic_λ italic_x : italic_τ . italic_t : italic_τ → italic_σ roman_Γ ⊢ italic_t : italic_σ → italic_τ end_CELL end_ROW start_ROW start_CELL roman_Γ ⊢ italic_s : italic_σ roman_Γ ⊢ italic_t italic_s : italic_τ end_CELL end_ROW end_ARRAY

Figure 3. Typing rules for the simple language.

3.2. Syntactic automatic differentiation: a functorial macro.

The aim of higher order forward mode AD is to find the (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor representation of a function by syntactic manipulations, for some choice of (k,R)𝑘𝑅(k,R)( italic_k , italic_R ) that we fix. For our simple language, we implement this as the following inductively defined macro 𝒟(k,R)subscript𝒟𝑘𝑅\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT on both types and terms (see also [WWE+19, SFVPJ19]). For the sake of legibility, we simply write 𝒟(k,R)subscript𝒟𝑘𝑅\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT as 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG here and leave the dimension k𝑘kitalic_k and order R𝑅Ritalic_R of the Taylor representation implicit. The following definition is for general k𝑘kitalic_k and R𝑅Ritalic_R, but we treat specific cases afterwards in Example 3.2.

𝒟(τσ)=def𝒟(τ)𝒟(σ)𝒟(τ1τn)=def𝒟(τ1)𝒟(τn)formulae-sequencesuperscriptdef𝒟𝜏𝜎𝒟𝜏𝒟𝜎superscriptdef𝒟subscript𝜏1subscript𝜏𝑛𝒟subscript𝜏1𝒟subscript𝜏𝑛\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({\tau}\to{\sigma})% \stackrel{{\scriptstyle\mathrm{def}}}{{=}}\scalebox{0.8}{$\overrightarrow{% \mathcal{D}}$}({\tau})\to\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({% \sigma})\qquad\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({\tau}_{1}% \boldsymbol{\mathop{*}}...\boldsymbol{\mathop{*}}{\tau}_{n})\stackrel{{% \scriptstyle\mathrm{def}}}{{=}}{\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}% ({\tau}_{1})}\boldsymbol{\mathop{*}}...\boldsymbol{\mathop{*}}{\scalebox{0.8}{% $\overrightarrow{\mathcal{D}}$}({\tau}_{n})}over→ start_ARG caligraphic_D end_ARG ( italic_τ → italic_σ ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP over→ start_ARG caligraphic_D end_ARG ( italic_τ ) → over→ start_ARG caligraphic_D end_ARG ( italic_σ ) over→ start_ARG caligraphic_D end_ARG ( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ … bold_∗ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP over→ start_ARG caligraphic_D end_ARG ( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) bold_∗ … bold_∗ over→ start_ARG caligraphic_D end_ARG ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
𝒟(𝐫𝐞𝐚𝐥)=def𝐫𝐞𝐚𝐥(R+kk)(i.e., the type of tuples of reals of length (R+kk))superscriptdef𝒟𝐫𝐞𝐚𝐥superscript𝐫𝐞𝐚𝐥binomial𝑅𝑘𝑘(i.e., the type of tuples of reals of length (R+kk))\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\mathbf{real})% \stackrel{{\scriptstyle\mathrm{def}}}{{=}}{\mathbf{real}}^{\binom{R+k}{k}}% \quad\text{(i.e.,~{}the type of tuples of reals of length $\textstyle\binom{R+% k}{k}$)}over→ start_ARG caligraphic_D end_ARG ( bold_real ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP bold_real start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT (i.e., the type of tuples of reals of length ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) )
𝒟(x)=defx𝒟(c¯)=defc¯,0formulae-sequencesuperscriptdef𝒟𝑥𝑥superscriptdef𝒟¯𝑐¯𝑐0\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({x})\stackrel{{% \scriptstyle\mathrm{def}}}{{=}}{x}\hskip 80.0pt\scalebox{0.8}{$\overrightarrow% {\mathcal{D}}$}(\underline{c})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}% \langle\underline{c},0\rangleover→ start_ARG caligraphic_D end_ARG ( italic_x ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_x over→ start_ARG caligraphic_D end_ARG ( under¯ start_ARG italic_c end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ⟨ under¯ start_ARG italic_c end_ARG , 0 ⟩
𝒟(λx.t)=defλx.𝒟(t)𝒟(ts)=def𝒟(t)𝒟(s)𝒟(t1,,tn)=def𝒟(t1),,𝒟(tn)\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\lambda{x}.{t})% \stackrel{{\scriptstyle\mathrm{def}}}{{=}}\lambda{x}.{\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({t})}\hskip 12.0pt\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({t}\,{s})\stackrel{{\scriptstyle\mathrm{def}}}{% {=}}\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({t})\,\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({s})\hskip 12.0pt\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}(\langle{t}_{1},\dots,{t}_{n}\rangle)\stackrel{{% \scriptstyle\mathrm{def}}}{{=}}\langle\scalebox{0.8}{$\overrightarrow{\mathcal% {D}}$}({t}_{1}),\dots,\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({t}_{n})\rangleover→ start_ARG caligraphic_D end_ARG ( italic_λ italic_x . italic_t ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_λ italic_x . over→ start_ARG caligraphic_D end_ARG ( italic_t ) over→ start_ARG caligraphic_D end_ARG ( italic_t italic_s ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP over→ start_ARG caligraphic_D end_ARG ( italic_t ) over→ start_ARG caligraphic_D end_ARG ( italic_s ) over→ start_ARG caligraphic_D end_ARG ( ⟨ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ⟨ over→ start_ARG caligraphic_D end_ARG ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , over→ start_ARG caligraphic_D end_ARG ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⟩
𝒟(𝐜𝐚𝐬𝐞t𝐨𝐟x1,,xns)=def𝐜𝐚𝐬𝐞𝒟(t)𝐨𝐟x1,,xn𝒟(s)superscriptdef𝒟𝐜𝐚𝐬𝐞𝑡𝐨𝐟subscript𝑥1subscript𝑥𝑛𝑠𝐜𝐚𝐬𝐞𝒟𝑡𝐨𝐟subscript𝑥1subscript𝑥𝑛𝒟𝑠\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({\mathbf{case}\,{t% }\,\mathbf{of}\,\langle{x}_{1},\dots,{x}_{n}\rangle\to{s}})\stackrel{{% \scriptstyle\mathrm{def}}}{{=}}\mathbf{case}\,\scalebox{0.8}{$\overrightarrow{% \mathcal{D}}$}({t})\,\mathbf{of}\,\langle{x}_{1},\dots,{x}_{n}\rangle\to% \scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({s})over→ start_ARG caligraphic_D end_ARG ( bold_case italic_t bold_of ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → italic_s ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP bold_case over→ start_ARG caligraphic_D end_ARG ( italic_t ) bold_of ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → over→ start_ARG caligraphic_D end_ARG ( italic_s )
𝒟(𝗈𝗉(t1,,tn))=def𝐜𝐚𝐬𝐞𝒟(t1)𝐨𝐟x001,,xR,001𝐜𝐚𝐬𝐞𝒟(tn)𝐨𝐟x00n,,xR,00nD00𝗈𝗉(x001,,xR,001,,x00n,.,x0n),,DR0𝗈𝗉(x001,,xR,001,,x00n,.,xR,00n)\displaystyle\begin{array}[]{ll}\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}% (\mathsf{op}({t}_{1},\ldots,{t}_{n}))\stackrel{{\scriptstyle\mathrm{def}}}{{=}% }{}&\mathbf{case}\,\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({t}_{1})\,% \mathbf{of}\,\langle{x}_{0...0}^{1},...,{x}_{R,0...0}^{1}\rangle\to\\ &\vdots\\ &{\mathbf{case}\,\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({t}_{n})\,% \mathbf{of}\,\langle{x}_{0...0}^{n},...,{x}_{R,0...0}^{n}\rangle\to}\\ &\begin{array}[]{lll}\langle&D^{0...0}\mathsf{op}(x_{0...0}^{1},...,x_{R,0...0% }^{1},...,x_{0...0}^{n},....,x_{...0}^{n}),\\ &\cdots,\\ &D^{R...0}\mathsf{op}(x_{0...0}^{1},...,x_{R,0...0}^{1},...,x_{0...0}^{n},....% ,x_{R,0...0}^{n})&\rangle\end{array}\end{array}start_ARRAY start_ROW start_CELL over→ start_ARG caligraphic_D end_ARG ( sansserif_op ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP end_CELL start_CELL bold_case over→ start_ARG caligraphic_D end_ARG ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) bold_of ⟨ italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_R , 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ⟩ → end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL bold_case over→ start_ARG caligraphic_D end_ARG ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) bold_of ⟨ italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_R , 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟩ → end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_ARRAY start_ROW start_CELL ⟨ end_CELL start_CELL italic_D start_POSTSUPERSCRIPT 0 … 0 end_POSTSUPERSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_R , 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , … . , italic_x start_POSTSUBSCRIPT … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋯ , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_D start_POSTSUPERSCRIPT italic_R … 0 end_POSTSUPERSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_R , 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , … . , italic_x start_POSTSUBSCRIPT italic_R , 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_CELL start_CELL ⟩ end_CELL end_ROW end_ARRAY end_CELL end_ROW end_ARRAY
where D00𝗈𝗉(x001,,xR,001,,x00n,.,xR,00n)=def𝗈𝗉(x001,,x00n)\displaystyle D^{0...0}\mathsf{op}(x_{0...0}^{1},...,x_{R,0...0}^{1},...,x_{0.% ..0}^{n},....,x_{R,0...0}^{n})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}% \mathsf{op}(x_{0...0}^{1},...,x_{0...0}^{n})italic_D start_POSTSUPERSCRIPT 0 … 0 end_POSTSUPERSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_R , 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , … . , italic_x start_POSTSUBSCRIPT italic_R , 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP sansserif_op ( italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )
Dα1.αk𝗈𝗉(x001,,xR,001,,x00n,.,xR,00n)=def(for α1++αk>0)α1!αk!{(β1,,βn)l1β1++βnα1++αk}β1βn𝗈𝗉(x001,,x00n){((e11,,el1),,(e1q,,elq))(l)qej1++ejq=βj,(e11++el1)αi1++(e1q++elq)αiq=αi}r=1qj=1l1ejr!(1α1r!αkr!xα1αkj)ejr.\displaystyle\begin{array}[]{@{}l}D^{\alpha_{1}....\alpha_{k}}\mathsf{op}(x_{0% ...0}^{1},...,x_{R,0...0}^{1},...,x_{0...0}^{n},....,x_{R,0...0}^{n})\stackrel% {{\scriptstyle\mathrm{def}}}{{=}}~{}\qquad\text{\scriptsize(for $\alpha_{1}+..% .+\alpha_{k}>0$)}\\ \begin{array}[]{c}{\alpha_{1}!\cdot\ldots\cdot\alpha_{k}!}\cdot\sum_{\left\{(% \beta_{1},\ldots,\beta_{n})\in\mathbb{N}^{l}\mid 1\leq\beta_{1}+\ldots+\beta_{% n}\leq\alpha_{1}+\ldots+\alpha_{k}\right\}}\partial_{\beta_{1}\cdots\beta_{n}}% \mathsf{op}({x}_{0...0}^{1},\ldots,{x}_{0...0}^{n})*\\ \sum_{\left\{((e^{1}_{1},\ldots,e^{1}_{l}),\ldots,(e^{q}_{1},\ldots,e^{q}_{l})% )\in(\mathbb{N}^{l})^{q}\mid e_{j}^{1}+\ldots+e_{j}^{q}=\beta_{j},(e^{1}_{1}+% \ldots+e^{1}_{l})\cdot\alpha^{1}_{i}+\ldots+(e^{q}_{1}+\ldots+e^{q}_{l})\cdot% \alpha^{q}_{i}=\alpha_{i}\right\}}\\ \prod_{r=1}^{q}\prod_{j=1}^{l}{\frac{1}{e^{r}_{j}!}}\cdot\left({\frac{1}{% \alpha^{r}_{1}!\cdot\ldots\cdot\alpha^{r}_{k}!}}\cdot{x}_{\alpha_{1}\cdots% \alpha_{k}}^{j}\right)^{e^{r}_{j}}\text{.}\end{array}\end{array}start_ARRAY start_ROW start_CELL italic_D start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … . italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_R , 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , … . , italic_x start_POSTSUBSCRIPT italic_R , 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP (for italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 ) end_CELL end_ROW start_ROW start_CELL start_ARRAY start_ROW start_CELL italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! ⋅ … ⋅ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! ⋅ ∑ start_POSTSUBSCRIPT { ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ blackboard_N start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∣ 1 ≤ italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ∂ start_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋯ italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 0 … 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ∗ end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT { ( ( italic_e start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) , … , ( italic_e start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ) ∈ ( blackboard_N start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∣ italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + … + italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT = italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ( italic_e start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_e start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ⋅ italic_α start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + … + ( italic_e start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_e start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ⋅ italic_α start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ∏ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! end_ARG ⋅ ( divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! ⋅ … ⋅ italic_α start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ! end_ARG ⋅ italic_x start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋯ italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . end_CELL end_ROW end_ARRAY end_CELL end_ROW end_ARRAY

Here, (β1βn𝗈𝗉)(x1,,xn)subscriptsubscript𝛽1subscript𝛽𝑛𝗈𝗉subscript𝑥1subscript𝑥𝑛(\partial_{\beta_{1}\cdots\beta_{n}}\mathsf{op})(x_{1},\ldots,x_{n})( ∂ start_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋯ italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT sansserif_op ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) are some chosen terms of type 𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\mathbf{real}bold_real in the language with free variables from x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},\ldots,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. We think of these terms as implementing the partial derivative β1++βn𝗈𝗉(x1,,xn)x1β1xnβn\frac{\partial^{\beta_{1}+...+\beta_{n}}\llbracket\mathsf{op}\rrbracket(x_{1},% ...,x_{n})}{\partial x_{1}^{\beta_{1}}\cdots\partial x_{n}^{\beta_{n}}}divide start_ARG ∂ start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟦ sansserif_op ⟧ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG of the smooth function 𝗈𝗉:n\llbracket\mathsf{op}\rrbracket:\mathbb{R}^{n}\to\mathbb{R}⟦ sansserif_op ⟧ : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R that 𝗈𝗉𝗈𝗉\mathsf{op}sansserif_op implements. For example, we could choose the following representations of derivatives of order 2absent2\leq 2≤ 2 of our example operations

01(+)(x1,x2)=1¯02(+)(x1,x2)=0¯10(+)(x1,x2)=1¯11(+)(x1,x2)=0¯20(+)(x1,x2)=0¯01()(x1,x2)=x102()(x1,x2)=0¯10()(x1,x2)=x211()(x1,x2)=1¯20()(x1,x2)=0¯1(ς)(x)=𝐥𝐞𝐭y=ς(x)𝐢𝐧y(1¯y)2(ς)(x)=𝐥𝐞𝐭y=ς(x)𝐢𝐧𝐥𝐞𝐭z=y(1¯y)𝐢𝐧z(1¯2¯y)subscript01subscript𝑥1subscript𝑥2¯1subscript02subscript𝑥1subscript𝑥2¯0subscript10subscript𝑥1subscript𝑥2¯1subscript11subscript𝑥1subscript𝑥2¯0subscript20subscript𝑥1subscript𝑥2¯0missing-subexpressionsubscript01subscript𝑥1subscript𝑥2subscript𝑥1subscript02subscript𝑥1subscript𝑥2¯0subscript10subscript𝑥1subscript𝑥2subscript𝑥2subscript11subscript𝑥1subscript𝑥2¯1subscript20subscript𝑥1subscript𝑥2¯0missing-subexpressionsubscript1𝜍𝑥𝐥𝐞𝐭𝑦𝜍𝑥𝐢𝐧𝑦¯1𝑦subscript2𝜍𝑥𝐥𝐞𝐭𝑦𝜍𝑥𝐢𝐧missing-subexpression𝐥𝐞𝐭𝑧𝑦¯1𝑦𝐢𝐧𝑧¯1¯2𝑦\begin{array}[]{ll}\partial_{01}(+)(x_{1},x_{2})=\underline{1}&\partial_{02}(+% )(x_{1},x_{2})=\underline{0}\\ \partial_{10}(+)(x_{1},x_{2})=\underline{1}&\partial_{11}(+)(x_{1},x_{2})=% \underline{0}\\ \partial_{20}(+)(x_{1},x_{2})=\underline{0}\\[6.0pt] \partial_{01}(*)(x_{1},x_{2})=x_{1}&\partial_{02}(*)(x_{1},x_{2})=\underline{0% }\\ \partial_{10}(*)(x_{1},x_{2})=x_{2}&\partial_{11}(*)(x_{1},x_{2})=\underline{1% }\\ \partial_{20}(*)(x_{1},x_{2})=\underline{0}\\[6.0pt] \partial_{1}(\varsigma)({x})=\mathbf{let}\,{y}=\,\varsigma({x})\,\mathbf{in}\,% {y}*(\underline{1}-{y}){}&\partial_{2}(\varsigma)({x})=\mathbf{let}\,{y}=\,% \varsigma({x})\,\mathbf{in}\\ &\phantom{\partial_{2}(\varsigma)({x})=\;}\mathbf{let}\,{z}=\,{y}*(\underline{% 1}-{y})\,\mathbf{in}\,{z}*(\underline{1}-\underline{2}*{y})\end{array}start_ARRAY start_ROW start_CELL ∂ start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT ( + ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = under¯ start_ARG 1 end_ARG end_CELL start_CELL ∂ start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT ( + ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = under¯ start_ARG 0 end_ARG end_CELL end_ROW start_ROW start_CELL ∂ start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( + ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = under¯ start_ARG 1 end_ARG end_CELL start_CELL ∂ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( + ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = under¯ start_ARG 0 end_ARG end_CELL end_ROW start_ROW start_CELL ∂ start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT ( + ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = under¯ start_ARG 0 end_ARG end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ∂ start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT ( ∗ ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL ∂ start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT ( ∗ ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = under¯ start_ARG 0 end_ARG end_CELL end_ROW start_ROW start_CELL ∂ start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( ∗ ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL ∂ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( ∗ ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = under¯ start_ARG 1 end_ARG end_CELL end_ROW start_ROW start_CELL ∂ start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT ( ∗ ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = under¯ start_ARG 0 end_ARG end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ∂ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ς ) ( italic_x ) = bold_let italic_y = italic_ς ( italic_x ) bold_in italic_y ∗ ( under¯ start_ARG 1 end_ARG - italic_y ) end_CELL start_CELL ∂ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ς ) ( italic_x ) = bold_let italic_y = italic_ς ( italic_x ) bold_in end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL bold_let italic_z = italic_y ∗ ( under¯ start_ARG 1 end_ARG - italic_y ) bold_in italic_z ∗ ( under¯ start_ARG 1 end_ARG - under¯ start_ARG 2 end_ARG ∗ italic_y ) end_CELL end_ROW end_ARRAY

Note that our rules, in particular, imply that 𝒟(c¯)=c¯,0¯,,0¯𝒟¯𝑐¯𝑐¯0¯0\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\underline{c})=\langle% \underline{c},\underline{0},\ldots,\underline{0}\rangleover→ start_ARG caligraphic_D end_ARG ( under¯ start_ARG italic_c end_ARG ) = ⟨ under¯ start_ARG italic_c end_ARG , under¯ start_ARG 0 end_ARG , … , under¯ start_ARG 0 end_ARG ⟩.

{exa}

[(1,1)11(1,1)( 1 , 1 )- and (2,2)22(2,2)( 2 , 2 )-AD] Our choices of partial derivatives of the example operations are sufficient to implement (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor forward AD with R2𝑅2R\leq 2italic_R ≤ 2. To be explicit, the distinctive formulas for (1,1)11(1,1)( 1 , 1 )- and (2,2)22(2,2)( 2 , 2 )-AD methods (specializing our abstract definition of 𝒟(k,R)subscript𝒟𝑘𝑅\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT above) are

𝒟(1,1)(𝐫𝐞𝐚𝐥)=(𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)subscript𝒟11𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(1,1)}(\mathbf{% real})=\boldsymbol{(}\mathbf{real}\boldsymbol{\mathop{*}}\mathbf{real}% \boldsymbol{)}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT ( bold_real ) = bold_( bold_real bold_∗ bold_real bold_)
𝒟(1,1)(𝗈𝗉(t1,,tn))=subscript𝒟11𝗈𝗉subscript𝑡1subscript𝑡𝑛absent\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(1,1)}(\mathsf{op% }({t}_{1},\ldots,{t}_{n}))=over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT ( sansserif_op ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) =
𝐜𝐚𝐬𝐞𝒟(1,1)(t1)𝐨𝐟x01,x11𝐜𝐚𝐬𝐞𝒟(1,1)(tn)𝐨𝐟x0n,x1n𝗈𝗉(x01,,x0n),i=1nx1ii𝗈𝗉(x01,,x0n)𝐜𝐚𝐬𝐞subscript𝒟11subscript𝑡1𝐨𝐟subscriptsuperscript𝑥10subscriptsuperscript𝑥11𝐜𝐚𝐬𝐞subscript𝒟11subscript𝑡𝑛𝐨𝐟superscriptsubscript𝑥0𝑛superscriptsubscript𝑥1𝑛absent𝗈𝗉superscriptsubscript𝑥01superscriptsubscript𝑥0𝑛superscriptsubscript𝑖1𝑛subscriptsuperscript𝑥𝑖1subscript𝑖𝗈𝗉superscriptsubscript𝑥01superscriptsubscript𝑥0𝑛\displaystyle\qquad\begin{array}[]{l}\mathbf{case}\,\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}_{(1,1)}({t}_{1})\,\mathbf{of}\,\langle{x}^{1}_{% 0},{x}^{1}_{1}\rangle\to\ldots\to\mathbf{case}\,\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}_{(1,1)}({t}_{n})\,\mathbf{of}\,\langle{x}_{0}^{% n},{x}_{1}^{n}\rangle\to\\ \langle\mathsf{op}({x}_{0}^{1},\ldots,{x}_{0}^{n}),\sum_{i=1}^{n}{x}^{i}_{1}*% \partial_{i}\mathsf{op}({x}_{0}^{1},\ldots,{x}_{0}^{n})\rangle\end{array}start_ARRAY start_ROW start_CELL bold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) bold_of ⟨ italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ → … → bold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) bold_of ⟨ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟩ → end_CELL end_ROW start_ROW start_CELL ⟨ sansserif_op ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ⟩ end_CELL end_ROW end_ARRAY
𝒟(2,2)(𝐫𝐞𝐚𝐥)=𝐫𝐞𝐚𝐥6subscript𝒟22𝐫𝐞𝐚𝐥superscript𝐫𝐞𝐚𝐥6\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(2,2)}(\mathbf{% real})=\mathbf{real}^{6}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) end_POSTSUBSCRIPT ( bold_real ) = bold_real start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
𝒟(2,2)(𝗈𝗉(t1,,tn))=subscript𝒟22𝗈𝗉subscript𝑡1subscript𝑡𝑛absent\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(2,2)}(\mathsf{op% }({t}_{1},...,{t}_{n}))=over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) end_POSTSUBSCRIPT ( sansserif_op ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) =
𝐜𝐚𝐬𝐞𝒟(2,2)(t1)𝐨𝐟x001,x011,x021,x101,x111,x201𝐜𝐚𝐬𝐞𝒟(2,2)(tn)𝐨𝐟x00n,x01n,x02n,x10n,x11n,x20n𝗈𝗉(x001,,x00n),i=1nx01ii^𝗈𝗉(x001,,x00n),i=1nx02ii^𝗈𝗉(x001,,x00n)+i,j=1nx01ix01ji,j^𝗈𝗉(x001,,x00n),i=1nx10ii^𝗈𝗉(x001,,x00n),i=1nx11ii^𝗈𝗉(x001,,x00n)+i,j=1nx10ix01ji,j^𝗈𝗉(x001,,x00n),i=1nx20ii^𝗈𝗉(x001,,x00n)+i,j=1nx10ix10ji,j^𝗈𝗉(x001,,x00n)\displaystyle\qquad\begin{array}[]{l}\mathbf{case}\,\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}_{(2,2)}({t}_{1})\,\mathbf{of}\,\langle{x}^{1}_{% 00},{x}^{1}_{01},{x}^{1}_{02},{x}^{1}_{10},{x}^{1}_{11},{x}^{1}_{20}\rangle\to% \\ \vdots\\ \mathbf{case}\,\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(2,2)}({t}_{n})% \,\mathbf{of}\,\langle{x}^{n}_{00},{x}^{n}_{01},{x}^{n}_{02},{x}^{n}_{10},{x}^% {n}_{11},{x}^{n}_{20}\rangle\to\\ \qquad\begin{array}[]{l}\langle\mathsf{op}({x}^{1}_{00},\ldots,{x}^{n}_{00}),% \\ \sum_{i=1}^{n}{x}^{i}_{01}*\partial_{\hat{i}}\mathsf{op}({x}_{00}^{1},\ldots,{% x}_{00}^{n}),\\ \sum_{i=1}^{n}{x}^{i}_{02}*\partial_{{\hat{i}}}\mathsf{op}({x}_{00}^{1},\ldots% ,{x}_{00}^{n})+\sum_{i,j=1}^{n}{x}^{i}_{01}*{x}^{j}_{01}*\partial_{\widehat{{i% ,j}}}\mathsf{op}({x}_{00}^{1},\ldots,{x}_{00}^{n}),\\ \sum_{i=1}^{n}{x}^{i}_{10}*\partial_{\hat{i}}\mathsf{op}({x}_{00}^{1},\ldots,{% x}_{00}^{n}),\\ \sum_{i=1}^{n}{x}^{i}_{11}*\partial_{{\hat{i}}}\mathsf{op}({x}_{00}^{1},\ldots% ,{x}_{00}^{n})+\sum_{i,j=1}^{n}{x}^{i}_{10}*{x}^{j}_{01}*\partial_{\widehat{{i% ,j}}}\mathsf{op}({x}_{00}^{1},\ldots,{x}_{00}^{n}),\\ \sum_{i=1}^{n}{x}^{i}_{20}*\partial_{{\hat{i}}}\mathsf{op}({x}_{00}^{1},\ldots% ,{x}_{00}^{n})+\sum_{i,j=1}^{n}{x}^{i}_{10}*{x}^{j}_{10}*\partial_{\widehat{{i% ,j}}}\mathsf{op}({x}_{00}^{1},\ldots,{x}_{00}^{n})\rangle\end{array}\end{array}start_ARRAY start_ROW start_CELL bold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) bold_of ⟨ italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT ⟩ → end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL bold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) bold_of ⟨ italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT ⟩ → end_CELL end_ROW start_ROW start_CELL start_ARRAY start_ROW start_CELL ⟨ sansserif_op ( italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT , … , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT ∗ italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i , italic_j end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ∗ italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i , italic_j end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ∗ italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i , italic_j end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ⟩ end_CELL end_ROW end_ARRAY end_CELL end_ROW end_ARRAY

where we informally write i^^𝑖\hat{i}over^ start_ARG italic_i end_ARG for the one-hot encoding of i𝑖iitalic_i (the sequence of length n𝑛nitalic_n consisting exclusively of zeros except in position i𝑖iitalic_i where it has a 1111) and i,j^^𝑖𝑗\widehat{i,j}over^ start_ARG italic_i , italic_j end_ARG for the two-hot encoding of i𝑖iitalic_i and j𝑗jitalic_j (the sequence of length n𝑛nitalic_n consisting exclusively of zeros except in positions i𝑖iitalic_i and j𝑗jitalic_j where it has a 1111 if ij𝑖𝑗i\neq jitalic_i ≠ italic_j and a 2222 if i=j𝑖𝑗i=jitalic_i = italic_j)

As noted in Section  2, it is often unnecessary to include all components of the (2,2)22(2,2)( 2 , 2 )-algorithm, for example when computing a second order directional derivative. In that case, we may define a restricted (2,2)22(2,2)( 2 , 2 )-AD algorithm that drops the non-mixed second order derivatives from the definitions above and defines 𝒟(2,2)(𝐫𝐞𝐚𝐥)=𝐫𝐞𝐚𝐥4subscript𝒟superscript22𝐫𝐞𝐚𝐥superscript𝐫𝐞𝐚𝐥4\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(2,2)^{\prime}}(\mathbf{real})% =\mathbf{real}^{4}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_real ) = bold_real start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT and
𝒟(2,2)(𝗈𝗉(t1,,tn))=𝐜𝐚𝐬𝐞𝒟(2,2)(t1)𝐨𝐟x001,x011,x101,x111𝐜𝐚𝐬𝐞𝒟(2,2)(tn)𝐨𝐟x00n,x01n,x10n,x11n𝗈𝗉(x001,,x00n),i=1nx01ii^𝗈𝗉(x001,,x00n),i=1nx10ii^𝗈𝗉(x001,,x00n),i=1nx11ii^𝗈𝗉(x001,,x00n)+i,i=1nx10ix01ii^^𝗈𝗉(x001,,x00n).\begin{array}[]{ll}\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(2,2)^{% \prime}}(\mathsf{op}({t}_{1},...,{t}_{n}))=&\mathbf{case}\,\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}_{(2,2)^{\prime}}({t}_{1})\,\mathbf{of}\,\langle% {x}^{1}_{00},{x}^{1}_{01},{x}^{1}_{10},{x}^{1}_{11}\rangle\to\\ &\vdots\\ &\mathbf{case}\,\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(2,2)^{\prime}% }({t}_{n})\,\mathbf{of}\,\langle{x}^{n}_{00},{x}^{n}_{01},{x}^{n}_{10},{x}^{n}% _{11}\rangle\to\\ &\begin{array}[]{l}\langle\mathsf{op}({x}^{1}_{00},\ldots,{x}^{n}_{00}),\\ \sum_{i=1}^{n}{x}^{i}_{01}*\partial_{\hat{i}}\mathsf{op}({x}_{00}^{1},\ldots,{% x}_{00}^{n}),\\ \sum_{i=1}^{n}{x}^{i}_{10}*\partial_{\hat{i}}\mathsf{op}({x}_{00}^{1},\ldots,{% x}_{00}^{n}),\\ \sum_{i=1}^{n}{x}^{i}_{11}*\partial_{{\hat{i}}}\mathsf{op}({x}_{00}^{1},\ldots% ,{x}_{00}^{n})\\ {}+\sum_{i,i^{\prime}=1}^{n}{x}^{i}_{10}*{x}^{i}_{01}*\partial_{\hat{\hat{i}}}% \mathsf{op}({x}_{00}^{1},\ldots,{x}_{00}^{n})\rangle.\end{array}\end{array}start_ARRAY start_ROW start_CELL over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( sansserif_op ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) = end_CELL start_CELL bold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) bold_of ⟨ italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ⟩ → end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL bold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) bold_of ⟨ italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ⟩ → end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_ARRAY start_ROW start_CELL ⟨ sansserif_op ( italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT , … , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG italic_i end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL + ∑ start_POSTSUBSCRIPT italic_i , italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ∗ italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT ∗ ∂ start_POSTSUBSCRIPT over^ start_ARG over^ start_ARG italic_i end_ARG end_ARG end_POSTSUBSCRIPT sansserif_op ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ⟩ . end_CELL end_ROW end_ARRAY end_CELL end_ROW end_ARRAY

We extend 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG to contexts: 𝒟({x1:τ1,,xn:τn})=def{x1:𝒟(τ1),,xn:𝒟(τn)}superscriptdef𝒟conditional-setsubscript𝑥1:subscript𝜏1subscript𝑥𝑛subscript𝜏𝑛conditional-setsubscript𝑥1:𝒟subscript𝜏1subscript𝑥𝑛𝒟subscript𝜏𝑛\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\{{x}_{1}{:}{\tau}_{1},{.}{.}{.% },{x}_{n}{:}{\tau}_{n}\})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\{{x}_{1}{:% }\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({\tau}_{1}),{.}{.}{.},{x}_{n}{% :}\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({\tau}_{n})\}over→ start_ARG caligraphic_D end_ARG ( { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : over→ start_ARG caligraphic_D end_ARG ( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : over→ start_ARG caligraphic_D end_ARG ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }. This turns 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG into a well-typed, functorial macro in the following sense.

Lemma 1 (Functorial macro).

If Γt:τprovesΓ𝑡:𝜏\Gamma\vdash{t}:{\tau}roman_Γ ⊢ italic_t : italic_τ then 𝒟(Γ)𝒟(t):𝒟(τ)proves𝒟Γ𝒟𝑡:𝒟𝜏\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\Gamma)\vdash\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({t}):\scalebox{0.8}{$\overrightarrow{\mathcal{D% }}$}({\tau})over→ start_ARG caligraphic_D end_ARG ( roman_Γ ) ⊢ over→ start_ARG caligraphic_D end_ARG ( italic_t ) : over→ start_ARG caligraphic_D end_ARG ( italic_τ ).
If Γ,x:σt:τ:Γ𝑥𝜎proves𝑡:𝜏\Gamma,{x}:{\sigma}\vdash{t}:{\tau}roman_Γ , italic_x : italic_σ ⊢ italic_t : italic_τ and Γs:σprovesΓ𝑠:𝜎\Gamma\vdash{s}:{\sigma}roman_Γ ⊢ italic_s : italic_σ then 𝒟(Γ)𝒟(t[s/x])=𝒟(t)[𝒟(s)/x]\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\Gamma)\vdash\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({t}{}[^{{s}}\!/\!_{{x}}])=\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({t}){}[^{\scalebox{0.8}{$\overrightarrow{% \mathcal{D}}$}({s})}\!/\!_{{x}}]over→ start_ARG caligraphic_D end_ARG ( roman_Γ ) ⊢ over→ start_ARG caligraphic_D end_ARG ( italic_t [ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ] ) = over→ start_ARG caligraphic_D end_ARG ( italic_t ) [ start_POSTSUPERSCRIPT over→ start_ARG caligraphic_D end_ARG ( italic_s ) end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ].

Proof 3.1.

By induction on the structure of typing derviations.

{exa}

[Inner products] Let us write τnsuperscript𝜏𝑛{\tau}^{n}italic_τ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for the n𝑛nitalic_n-fold product (ττ)𝜏𝜏\boldsymbol{(}{\tau}\boldsymbol{\mathop{*}}\dots\boldsymbol{\mathop{*}}{\tau}% \boldsymbol{)}bold_( italic_τ bold_∗ … bold_∗ italic_τ bold_). Then, given Γt,s:𝐫𝐞𝐚𝐥nprovesΓ𝑡𝑠:superscript𝐫𝐞𝐚𝐥𝑛\Gamma\vdash{t},{s}:\mathbf{real}^{n}roman_Γ ⊢ italic_t , italic_s : bold_real start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT we can define their inner product

Γtns=def𝐜𝐚𝐬𝐞t𝐨𝐟z1,,zn𝐜𝐚𝐬𝐞s𝐨𝐟y1,,ynz1y1++znyn:𝐫𝐞𝐚𝐥provesΓsuperscriptdefsubscript𝑛𝑡𝑠absent𝐜𝐚𝐬𝐞𝑡𝐨𝐟subscript𝑧1subscript𝑧𝑛absentmissing-subexpression:𝐜𝐚𝐬𝐞𝑠𝐨𝐟subscript𝑦1subscript𝑦𝑛subscript𝑧1subscript𝑦1subscript𝑧𝑛subscript𝑦𝑛𝐫𝐞𝐚𝐥\begin{array}[]{ll}\Gamma\vdash{t}\cdot_{n}{s}\stackrel{{\scriptstyle\mathrm{% def}}}{{=}}&\mathbf{case}\,{t}\,\mathbf{of}\,\langle{z}_{1},\ldots,{z}_{n}% \rangle\to\\ &\mathbf{case}\,{s}\,\mathbf{of}\,\langle{y}_{1},\ldots,{y}_{n}\rangle\to{z}_{% 1}*{y}_{1}+\dots+{z}_{n}*{y}_{n}:\mathbf{real}\end{array}start_ARRAY start_ROW start_CELL roman_Γ ⊢ italic_t ⋅ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_s start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP end_CELL start_CELL bold_case italic_t bold_of ⟨ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL bold_case italic_s bold_of ⟨ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_real end_CELL end_ROW end_ARRAY

To illustrate the calculation of 𝒟(1,1)subscript𝒟11\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(1,1)}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT, let us expand (and β𝛽\betaitalic_β-reduce) 𝒟(1,1)(t2s)subscript𝒟11subscript2𝑡𝑠\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(1,1)}({t}\cdot_{2}{s})over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT ( italic_t ⋅ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_s ):

𝐜𝐚𝐬𝐞𝒟(1,1)(t)𝐨𝐟z1,z2𝐜𝐚𝐬𝐞𝒟(1,1)(s)𝐨𝐟y1,y2𝐜𝐚𝐬𝐞subscript𝒟11𝑡𝐨𝐟subscript𝑧1subscript𝑧2𝐜𝐚𝐬𝐞subscript𝒟11𝑠𝐨𝐟subscript𝑦1subscript𝑦2absent\displaystyle\mathbf{case}\,\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(1% ,1)}({t})\,\mathbf{of}\,\langle{z}_{1},{z}_{2}\rangle\to\mathbf{case}\,% \scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(1,1)}({s})\,\mathbf{of}\,% \langle{y}_{1},{y}_{2}\rangle\tobold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT ( italic_t ) bold_of ⟨ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ → bold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT ( italic_s ) bold_of ⟨ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ →
𝐜𝐚𝐬𝐞z1𝐨𝐟z1,1,z1,2𝐜𝐚𝐬𝐞y1𝐨𝐟y1,1,y1,2𝐜𝐚𝐬𝐞subscript𝑧1𝐨𝐟subscript𝑧11subscript𝑧12𝐜𝐚𝐬𝐞subscript𝑦1𝐨𝐟subscript𝑦11subscript𝑦12absent\displaystyle\mathbf{case}\,{z}_{1}\,\mathbf{of}\,\langle{z}_{1,1},{z}_{1,2}% \rangle\to\mathbf{case}\,{y}_{1}\,\mathbf{of}\,\langle{y}_{1,1},{y}_{1,2}\rangle\tobold_case italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_of ⟨ italic_z start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT ⟩ → bold_case italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_of ⟨ italic_y start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT ⟩ →
𝐜𝐚𝐬𝐞z2𝐨𝐟z2,1,z2,2𝐜𝐚𝐬𝐞y2𝐨𝐟y2,1,y2,2𝐜𝐚𝐬𝐞subscript𝑧2𝐨𝐟subscript𝑧21subscript𝑧22𝐜𝐚𝐬𝐞subscript𝑦2𝐨𝐟subscript𝑦21subscript𝑦22absent\displaystyle\mathbf{case}\,{z}_{2}\,\mathbf{of}\,\langle{z}_{2,1},{z}_{2,2}% \rangle\to\mathbf{case}\,{y}_{2}\,\mathbf{of}\,\langle{y}_{2,1},{y}_{2,2}\rangle\tobold_case italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_of ⟨ italic_z start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT ⟩ → bold_case italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_of ⟨ italic_y start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT ⟩ →
z1,1y1,1+z2,1y2,1,z1,1y1,2+z1,2y1,1+z2,1y2,2+z2,2y2,1subscript𝑧11subscript𝑦11subscript𝑧21subscript𝑦21subscript𝑧11subscript𝑦12subscript𝑧12subscript𝑦11subscript𝑧21subscript𝑦22subscript𝑧22subscript𝑦21\displaystyle\qquad\langle{z}_{1,1}*{y}_{1,1}+{z}_{2,1}*{y}_{2,1}\ ,\ {z}_{1,1% }*{y}_{1,2}+{z}_{1,2}*{y}_{1,1}+{z}_{2,1}*{y}_{2,2}+{z}_{2,2}*{y}_{2,1}\rangle⟨ italic_z start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT ⟩
Let us also expand the calculation of 𝒟(2,2)(t2s)subscript𝒟superscript22subscript2𝑡𝑠\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(2,2)^{\prime}}({t}\cdot_{2}{s})over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ⋅ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_s ):
𝐜𝐚𝐬𝐞𝒟(2,2)(t)𝐨𝐟z1,z2𝐜𝐚𝐬𝐞𝒟(2,2)(s)𝐨𝐟y1,y2𝐜𝐚𝐬𝐞subscript𝒟superscript22𝑡𝐨𝐟subscript𝑧1subscript𝑧2𝐜𝐚𝐬𝐞subscript𝒟superscript22𝑠𝐨𝐟subscript𝑦1subscript𝑦2absent\displaystyle\mathbf{case}\,\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(2% ,2)^{\prime}}({t})\,\mathbf{of}\,\langle{z}_{1},{z}_{2}\rangle\to\mathbf{case}% \,\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(2,2)^{\prime}}({s})\,% \mathbf{of}\,\langle{y}_{1},{y}_{2}\rangle\tobold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) bold_of ⟨ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ → bold_case over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 2 , 2 ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s ) bold_of ⟨ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ →
𝐜𝐚𝐬𝐞z1𝐨𝐟z1,z1,1,z1,2,z1′′𝐜𝐚𝐬𝐞y1𝐨𝐟y1,y1,1,y1,2,y1′′,\displaystyle\mathbf{case}\,{z}_{1}\,\mathbf{of}\,\langle{z}_{1},{z}_{1,1}^{% \prime},{z}_{1,2}^{\prime},{z}_{1}^{\prime\prime}\rangle\to\mathbf{case}\,{y}_% {1}\,\mathbf{of}\,\langle{y}_{1},{y}_{1,1}^{\prime},{y}_{1,2}^{\prime},{y}_{1}% ^{\prime\prime},\rangle\tobold_case italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_of ⟨ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ⟩ → bold_case italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_of ⟨ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , ⟩ →
𝐜𝐚𝐬𝐞z2𝐨𝐟z2,z2,1,z2,2,z2′′,𝐜𝐚𝐬𝐞y2𝐨𝐟y2,y2,1,y2,2,y2′′,\displaystyle\mathbf{case}\,{z}_{2}\,\mathbf{of}\,\langle{z}_{2},{z}_{2,1}^{% \prime},{z}_{2,2}^{\prime},{z}_{2}^{\prime\prime},\rangle\to\mathbf{case}\,{y}% _{2}\,\mathbf{of}\,\langle{y}_{2},{y}_{2,1}^{\prime},{y}_{2,2}^{\prime},{y}_{2% }^{\prime\prime},\rangle\tobold_case italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_of ⟨ italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , ⟩ → bold_case italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_of ⟨ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , ⟩ →
z1y1+z2y2,\displaystyle\langle{z}_{1}*{y}_{1}+{z}_{2}*{y}_{2},⟨ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
z1y1,1+z1,1y1+z2y2,1+z2,1y2,subscript𝑧1superscriptsubscript𝑦11superscriptsubscript𝑧11subscript𝑦1subscript𝑧2superscriptsubscript𝑦21superscriptsubscript𝑧21subscript𝑦2\displaystyle\qquad\ {z}_{1}*{y}_{1,1}^{\prime}+{z}_{1,1}^{\prime}*{y}_{1}+{z}% _{2}*{y}_{2,1}^{\prime}+{z}_{2,1}^{\prime}*{y}_{2},italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
z1y1,2+z1,2y1+z2y2,2+z2,2y2,subscript𝑧1superscriptsubscript𝑦12superscriptsubscript𝑧12subscript𝑦1subscript𝑧2superscriptsubscript𝑦22superscriptsubscript𝑧22subscript𝑦2\displaystyle\qquad\ {z}_{1}*{y}_{1,2}^{\prime}+{z}_{1,2}^{\prime}*{y}_{1}+{z}% _{2}*{y}_{2,2}^{\prime}+{z}_{2,2}^{\prime}*{y}_{2},italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
z1′′y1+z2′′y2+y1′′z1+y2′′z2+superscriptsubscript𝑧1′′subscript𝑦1superscriptsubscript𝑧2′′subscript𝑦2superscriptsubscript𝑦1′′subscript𝑧1limit-fromsuperscriptsubscript𝑦2′′subscript𝑧2\displaystyle\qquad\ {z}_{1}^{\prime\prime}*{y}_{1}+{z}_{2}^{\prime\prime}*{y}% _{2}+{y}_{1}^{\prime\prime}*{z}_{1}+{y}_{2}^{\prime\prime}*{z}_{2}+italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∗ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∗ italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT +
y1,1y1,2+y2,1y2,2+z1,1z1,2+z2,1z2,2\displaystyle\qquad{y}_{1,1}^{\prime}*{y}_{1,2}^{\prime}+{y}_{2,1}^{\prime}*{y% }_{2,2}^{\prime}+{z}_{1,1}^{\prime}*{z}_{1,2}^{\prime}+{z}_{2,1}^{\prime}*{z}_% {2,2}^{\prime}\rangleitalic_y start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∗ italic_y start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_y start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∗ italic_y start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∗ italic_z start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_z start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∗ italic_z start_POSTSUBSCRIPT 2 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩
{exa}

[Neural networks] In our introduction, we provided a program (1) in our language to build a neural network out of expressions neuron,layer,compneuronlayercomp\mathrm{neuron},\mathrm{layer},\mathrm{comp}roman_neuron , roman_layer , roman_comp; this program makes use of the inner product of Ex. 3.1. We can similarly calculate the derivatives of deep neural nets by mechanically applying the macro  𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG .

4. Semantics of differentiation

Consider for a moment the first order fragment of the language in Section 3, with only one type, 𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\mathbf{real}bold_real, and no λ𝜆\lambdaitalic_λ’s or pairs. This has a simple semantics in the category of cartesian spaces and smooth maps. Indeed, a term x1xn:𝐫𝐞𝐚𝐥t:𝐫𝐞𝐚𝐥:subscript𝑥1subscript𝑥𝑛𝐫𝐞𝐚𝐥proves𝑡:𝐫𝐞𝐚𝐥{x}_{1}\dots x_{n}:\mathbf{real}\vdash{t}:\mathbf{real}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_real ⊢ italic_t : bold_real has a natural reading as a function t:n\llbracket{t}\rrbracket:\mathbb{R}^{n}\to\mathbb{R}⟦ italic_t ⟧ : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R by interpreting our operation symbols by the well-known operations on nsuperscript𝑛\mathbb{R}^{n}\to\mathbb{R}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R with the corresponding name. In fact, the functions that are definable in this first order fragment are smooth. Let us write 𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp for this category of cartesian spaces (nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for some n𝑛nitalic_n) and smooth functions.

The category 𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp has cartesian products, and so we can also interpret product types, tupling and pattern matching, giving us a useful syntax for constructing functions into and out of products of \mathbb{R}blackboard_R. For example, the interpretation of (neuronn)subscriptneuron𝑛(\mathrm{neuron}_{n})( roman_neuron start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) in (1) becomes

n×n×n×id×+ς\mathbb{R}^{n}\times\mathbb{R}^{n}\times\mathbb{R}\xrightarrow{\llbracket\cdot% _{n}\rrbracket\times{\rm id}_{\mathbb{R}}}\mathbb{R}\times\mathbb{R}% \xrightarrow{\llbracket+\rrbracket}\mathbb{R}\xrightarrow{\llbracket\varsigma% \rrbracket}\mathbb{R}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × blackboard_R start_ARROW start_OVERACCENT ⟦ ⋅ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟧ × roman_id start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW blackboard_R × blackboard_R start_ARROW start_OVERACCENT ⟦ + ⟧ end_OVERACCENT → end_ARROW blackboard_R start_ARROW start_OVERACCENT ⟦ italic_ς ⟧ end_OVERACCENT → end_ARROW blackboard_R

where ndelimited-⟦⟧subscript𝑛\llbracket\cdot_{n}\rrbracket⟦ ⋅ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟧, +delimited-⟦⟧\llbracket+\rrbracket⟦ + ⟧ and ςdelimited-⟦⟧𝜍\llbracket\varsigma\rrbracket⟦ italic_ς ⟧ are the usual inner product, addition and the sigmoid function on \mathbb{R}blackboard_R, respectively.

Inside this category, we can straightforwardly study the first order language without λ𝜆\lambdaitalic_λ’s, and automatic differentiation. In fact, we can prove the following by plain induction on the syntax:
The interpretation of the (syntactic) forward AD 𝒟(t)𝒟𝑡\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({t})over→ start_ARG caligraphic_D end_ARG ( italic_t ) of a first order term t𝑡{t}italic_t equals the usual (semantic) derivative of the interpretation of t𝑡{t}italic_t as a smooth function.

However, as is well-known, the category 𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp does not support function spaces. To see this, notice that we have polynomial terms

x1,,xd:𝐫𝐞𝐚𝐥λy.n=1dxnyn:𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥{x}_{1},\ldots,{x}_{d}:\mathbf{real}\vdash\lambda{y}.\,\textstyle\sum_{n=1}^{d% }{x}_{n}{y}^{n}:\mathbf{real}\to\mathbf{real}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT : bold_real ⊢ italic_λ italic_y . ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : bold_real → bold_real

for each d𝑑ditalic_d, and so if we could interpret (𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥(\mathbf{real}\to\mathbf{real})( bold_real → bold_real ) as a Euclidean space psuperscript𝑝\mathbb{R}^{p}blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT then, by interpreting these polynomial expressions, we would be able to find continuous injections dpsuperscript𝑑superscript𝑝\mathbb{R}^{d}\to\mathbb{R}^{p}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT for every d𝑑ditalic_d, which is topologically impossible for any p𝑝pitalic_p, for example as a consequence of the Borsuk-Ulam theorem (see Appx. A).

This lack of function spaces means that we cannot interpret the functions (layer)layer(\mathrm{layer})( roman_layer ) and (comp)comp(\mathrm{comp})( roman_comp ) from (1) in 𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp, as they are higher order functions, even though they are very useful and innocent building blocks for differential programming! Clearly, we could define neural nets such as (1) directly as smooth functions without any higher order subcomponents, though that would quickly become cumbersome for deep networks. A problematic consequence of the lack of a semantics for higher order differential programs is that we have no obvious way of establishing compositional semantic correctness of 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG for the given implementation of (1).

We now show that every definable function is smooth, and then in Section 4.2 we show that the 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG macro witnesses its derivatives.

4.1. Smoothness at higher types and diffeologies

The aim of this section is to introduce diffeological spaces as a semantic model for the simple language in Section 3. By way of motivation, we begin with a standard set theoretic semantics, where types are interpreted as follows

𝐫𝐞𝐚𝐥=def(τ1τn)=defi=1nτiτσ=def(τσ)\textstyle\llfloor\mathbf{real}\rrceil\stackrel{{\scriptstyle\mathrm{def}}}{{=% }}\mathbb{R}\qquad\llfloor\boldsymbol{(}{\tau}_{1}\boldsymbol{\mathop{*}}\dots% \boldsymbol{\mathop{*}}{\tau}_{n}\boldsymbol{)}\rrceil\stackrel{{\scriptstyle% \mathrm{def}}}{{=}}\prod_{i=1}^{n}\llfloor{\tau}_{i}\rrceil\qquad\llfloor\tau% \to\sigma\rrceil\stackrel{{\scriptstyle\mathrm{def}}}{{=}}(\llfloor\tau\rrceil% \to\llfloor\sigma\rrceil)start_OPEN ⌊ ⌊ end_OPEN bold_real start_CLOSE ⌉ ⌉ end_CLOSE start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP blackboard_R start_OPEN ⌊ ⌊ end_OPEN bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ … bold_∗ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_) start_CLOSE ⌉ ⌉ end_CLOSE start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_OPEN ⌊ ⌊ end_OPEN italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_CLOSE ⌉ ⌉ end_CLOSE start_OPEN ⌊ ⌊ end_OPEN italic_τ → italic_σ start_CLOSE ⌉ ⌉ end_CLOSE start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ( start_OPEN ⌊ ⌊ end_OPEN italic_τ start_CLOSE ⌉ ⌉ end_CLOSE → start_OPEN ⌊ ⌊ end_OPEN italic_σ start_CLOSE ⌉ ⌉ end_CLOSE )

and a term x1:τ1,,xn:τnt:σ:subscript𝑥1subscript𝜏1subscript𝑥𝑛:subscript𝜏𝑛proves𝑡:𝜎{x_{1}:{\tau}_{1},\dots,x_{n}:{\tau}_{n}}\vdash t:{\sigma}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊢ italic_t : italic_σ is interpreted as a function i=1nτiσ\prod_{i=1}^{n}\llfloor{\tau}_{i}\rrceil\to\llfloor{\sigma}\rrceil∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_OPEN ⌊ ⌊ end_OPEN italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_CLOSE ⌉ ⌉ end_CLOSE → start_OPEN ⌊ ⌊ end_OPEN italic_σ start_CLOSE ⌉ ⌉ end_CLOSE, mapping a valuation of the context to a result.

We can show that the interpretation of a term x1:𝐫𝐞𝐚𝐥,,xn:𝐫𝐞𝐚𝐥t:𝐫𝐞𝐚𝐥:subscript𝑥1𝐫𝐞𝐚𝐥subscript𝑥𝑛:𝐫𝐞𝐚𝐥proves𝑡:𝐫𝐞𝐚𝐥x_{1}:\mathbf{real},\dots,x_{n}:\mathbf{real}\vdash t:\mathbf{real}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : bold_real , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_real ⊢ italic_t : bold_real is always a smooth function nsuperscript𝑛\mathbb{R}^{n}\to\mathbb{R}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R, even if it has higher order subterms. We begin with a fairly standard logical relations proof of this, and then move from this to the semantic model of diffeological spaces.

Proposition 2.

If x1:𝐫𝐞𝐚𝐥,,xn:𝐫𝐞𝐚𝐥t:𝐫𝐞𝐚𝐥:subscript𝑥1𝐫𝐞𝐚𝐥subscript𝑥𝑛:𝐫𝐞𝐚𝐥proves𝑡:𝐫𝐞𝐚𝐥x_{1}:\mathbf{real},\dots,x_{n}:\mathbf{real}\vdash t:\mathbf{real}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : bold_real , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_real ⊢ italic_t : bold_real then the function t:n\llfloor t\rrceil:\mathbb{R}^{n}\to\mathbb{R}start_OPEN ⌊ ⌊ end_OPEN italic_t start_CLOSE ⌉ ⌉ end_CLOSE : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R is smooth.

Proof 4.1.

For each type τ𝜏{\tau}italic_τ define a set Qτ[kτ]Q_{{\tau}}\subseteq[\mathbb{R}^{k}\to\llfloor{\tau}\rrceil]italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ⊆ [ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → start_OPEN ⌊ ⌊ end_OPEN italic_τ start_CLOSE ⌉ ⌉ end_CLOSE ] by induction on the structure of types:

Q𝐫𝐞𝐚𝐥subscript𝑄𝐫𝐞𝐚𝐥\displaystyle Q_{\mathbf{real}}italic_Q start_POSTSUBSCRIPT bold_real end_POSTSUBSCRIPT ={f:k|f is smooth}absentconditional-set𝑓superscript𝑘conditional𝑓 is smooth\displaystyle=\{f:\mathbb{R}^{k}\to\mathbb{R}~{}|~{}f\text{ is smooth}\}= { italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R | italic_f is smooth }
Q(τ1τn)subscript𝑄subscript𝜏1subscript𝜏𝑛\displaystyle Q_{\boldsymbol{(}{\tau}_{1}\boldsymbol{\mathop{*}}\dots% \boldsymbol{\mathop{*}}{\tau}_{n}\boldsymbol{)}}italic_Q start_POSTSUBSCRIPT bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ … bold_∗ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_) end_POSTSUBSCRIPT ={f:ki=1nτi|rk.i.fi(r)Qτi}\displaystyle=\{\textstyle f:\mathbb{R}^{k}\to\prod_{i=1}^{n}\llfloor{\tau}_{i% }\rrceil~{}|~{}\forall\vec{r}\in\mathbb{R}^{k}.\ \forall i.\ f_{i}(\vec{r})\in Q% _{{\tau}_{i}}\}= { italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_OPEN ⌊ ⌊ end_OPEN italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_CLOSE ⌉ ⌉ end_CLOSE | ∀ over→ start_ARG italic_r end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT . ∀ italic_i . italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over→ start_ARG italic_r end_ARG ) ∈ italic_Q start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT }
Qτσsubscript𝑄𝜏𝜎\displaystyle Q_{{{\tau}}\to{{\sigma}}}italic_Q start_POSTSUBSCRIPT italic_τ → italic_σ end_POSTSUBSCRIPT ={f:kτσ|gQτ.λ(r).f(r)(g(r))Qσ}\displaystyle=\{f:\mathbb{R}^{k}\to\llfloor{\tau}\rrceil\to\llfloor{\sigma}% \rrceil~{}|~{}\forall g\in Q_{{\tau}}.\,\lambda(\vec{r}).\,f(\vec{r})(g(\vec{r% }))\in Q_{{\sigma}}\}= { italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → start_OPEN ⌊ ⌊ end_OPEN italic_τ start_CLOSE ⌉ ⌉ end_CLOSE → start_OPEN ⌊ ⌊ end_OPEN italic_σ start_CLOSE ⌉ ⌉ end_CLOSE | ∀ italic_g ∈ italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT . italic_λ ( over→ start_ARG italic_r end_ARG ) . italic_f ( over→ start_ARG italic_r end_ARG ) ( italic_g ( over→ start_ARG italic_r end_ARG ) ) ∈ italic_Q start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT }

Now we show the fundamental lemma: if x1:τ1,,xn:τnu:σ:subscript𝑥1subscript𝜏1subscript𝑥𝑛:subscript𝜏𝑛proves𝑢:𝜎{x_{1}:{\tau}_{1},\dots,x_{n}:{\tau}_{n}}\vdash u:{\sigma}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊢ italic_u : italic_σ and g1Qτ1gnQτnsubscript𝑔1subscript𝑄subscript𝜏1subscript𝑔𝑛subscript𝑄subscript𝜏𝑛g_{1}\in Q_{{\tau}_{1}}\dots g_{n}\in Q_{{\tau}_{n}}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT … italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT then ((g1gn);u)Qσ((g_{1}\dots g_{n});\llfloor u\rrceil)\in Q_{{\sigma}}( ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ; start_OPEN ⌊ ⌊ end_OPEN italic_u start_CLOSE ⌉ ⌉ end_CLOSE ) ∈ italic_Q start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT. This is shown by induction on the structure of typing derivations. The only interesting step here is that the basic operations (+++, *, ς𝜍\varsigmaitalic_ς etc.) are smooth. We deduce the statement of the theorem by putting u=t𝑢𝑡u=titalic_u = italic_t, k=n𝑘𝑛k=nitalic_k = italic_n, and letting gi:n:subscript𝑔𝑖superscript𝑛g_{i}:\mathbb{R}^{n}\to\mathbb{R}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R be the projections.

At higher types, the logical relations Q𝑄Qitalic_Q show that we can only define functions that send smooth functions to smooth functions, meaning that we can never use them to build first order functions that are not smooth. For example, (comp)comp(\mathrm{comp})( roman_comp ) in (1) has this property.

This logical relations proof suggests to build a semantic model by interpreting types as sets with structure: for each type we have a set X𝑋Xitalic_X together with a set QXk[kX]subscriptsuperscript𝑄superscript𝑘𝑋delimited-[]superscript𝑘𝑋Q^{\mathbb{R}^{k}}_{X}\subseteq[\mathbb{R}^{k}\to X]italic_Q start_POSTSUPERSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ⊆ [ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → italic_X ] of plots. {defi} A diffeological space (X,𝒫X)𝑋subscript𝒫𝑋(X,\mathcal{P}_{X})( italic_X , caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) consists of a set X𝑋Xitalic_X together with, for each n𝑛nitalic_n and each open subset U𝑈Uitalic_U of nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, a set 𝒫XU[UX]superscriptsubscript𝒫𝑋𝑈delimited-[]𝑈𝑋\mathcal{P}_{X}^{U}\subseteq[U\to X]caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ⊆ [ italic_U → italic_X ] of functions, called plots, such that

  • all constant functions are plots;

  • if f:VU:𝑓𝑉𝑈f:V\to Uitalic_f : italic_V → italic_U is a smooth function and p𝒫XU𝑝superscriptsubscript𝒫𝑋𝑈p\in\mathcal{P}_{X}^{U}italic_p ∈ caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT, then f;p𝒫XV𝑓𝑝superscriptsubscript𝒫𝑋𝑉f;p\in\mathcal{P}_{X}^{V}italic_f ; italic_p ∈ caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT;

  • if (pi𝒫XUi)iIsubscriptsubscript𝑝𝑖superscriptsubscript𝒫𝑋subscript𝑈𝑖𝑖𝐼\left(p_{i}\in\mathcal{P}_{X}^{U_{i}}\right)_{i\in I}( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT is a compatible family of plots (xUiUjpi(x)=pj(x))𝑥subscript𝑈𝑖subscript𝑈𝑗subscript𝑝𝑖𝑥subscript𝑝𝑗𝑥(x\in U_{i}\cap U_{j}\Rightarrow p_{i}(x)=p_{j}(x))( italic_x ∈ italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⇒ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) = italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ) and (Ui)iIsubscriptsubscript𝑈𝑖𝑖𝐼\left(U_{i}\right)_{i\in I}( italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT covers U𝑈Uitalic_U, then the gluing p:UX:xUipi(x):𝑝𝑈𝑋:𝑥subscript𝑈𝑖maps-tosubscript𝑝𝑖𝑥p:U\to X:x\in U_{i}\mapsto p_{i}(x)italic_p : italic_U → italic_X : italic_x ∈ italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↦ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) is a plot.

We call a function f:XY:𝑓𝑋𝑌f:X\to Yitalic_f : italic_X → italic_Y between diffeological spaces smooth if, for all plots p𝒫XU𝑝superscriptsubscript𝒫𝑋𝑈p\in\mathcal{P}_{X}^{U}italic_p ∈ caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT, we have that p;f𝒫YU𝑝𝑓superscriptsubscript𝒫𝑌𝑈p;f\in\mathcal{P}_{Y}^{U}italic_p ; italic_f ∈ caligraphic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT. We write 𝐃𝐢𝐟𝐟(X,Y)𝐃𝐢𝐟𝐟𝑋𝑌\mathbf{Diff}(X,Y)bold_Diff ( italic_X , italic_Y ) for the set of smooth maps from X𝑋Xitalic_X to Y𝑌Yitalic_Y. Smooth functions compose, and so we have a category 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff of diffeological spaces and smooth functions.

A diffeological space is thus a set equipped with structure. Many constructions of sets carry over straightforwardly to diffeological spaces.

{exa}

[Cartesian diffeologies] Each open subset U𝑈Uitalic_U of nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT can be given the structure of a diffeological space by taking all the smooth functions VU𝑉𝑈V\to Uitalic_V → italic_U as 𝒫UVsuperscriptsubscript𝒫𝑈𝑉\mathcal{P}_{U}^{V}caligraphic_P start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT. Smooth functions from VU𝑉𝑈V\to Uitalic_V → italic_U in the traditional sense coincide with smooth functions in the sense of diffeological spaces [IZ13]. Thus diffeological spaces have a profound relationship with ordinary calculus.

In categorical terms, this gives a full embedding of 𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp in 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff. {exa}[Product diffeologies] Given a family (Xi)iIsubscriptsubscript𝑋𝑖𝑖𝐼\left(X_{i}\right)_{i\in I}( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT of diffeological spaces, we can equip the product iIXisubscriptproduct𝑖𝐼subscript𝑋𝑖\prod_{i\in I}X_{i}∏ start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of sets with the product diffeology in which U𝑈Uitalic_U-plots are precisely the functions of the form (pi)iIsubscriptsubscript𝑝𝑖𝑖𝐼\left(p_{i}\right)_{i\in I}( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ italic_I end_POSTSUBSCRIPT for pi𝒫XiUsubscript𝑝𝑖superscriptsubscript𝒫subscript𝑋𝑖𝑈p_{i}\in\mathcal{P}_{X_{i}}^{U}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT.

This gives us the categorical product in 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff. {exa}[Functional diffeology] We can equip the set 𝐃𝐢𝐟𝐟(X,Y)𝐃𝐢𝐟𝐟𝑋𝑌\mathbf{Diff}(X,Y)bold_Diff ( italic_X , italic_Y ) of smooth functions between diffeological spaces with the functional diffeology in which U𝑈Uitalic_U-plots consist of functions f:U𝐃𝐢𝐟𝐟(X,Y):𝑓𝑈𝐃𝐢𝐟𝐟𝑋𝑌f:U\to\mathbf{Diff}(X,Y)italic_f : italic_U → bold_Diff ( italic_X , italic_Y ) such that (u,x)f(u)(x)maps-to𝑢𝑥𝑓𝑢𝑥(u,x)\mapsto f(u)(x)( italic_u , italic_x ) ↦ italic_f ( italic_u ) ( italic_x ) is an element of 𝐃𝐢𝐟𝐟(U×X,Y)𝐃𝐢𝐟𝐟𝑈𝑋𝑌\mathbf{Diff}(U\times X,Y)bold_Diff ( italic_U × italic_X , italic_Y ).

This specifies the categorical function object in 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff.

We can now give a denotational semantics to our language from Section 3 in the category of diffeological spaces. We interpret each type τ𝜏{\tau}italic_τ as a set τdelimited-⟦⟧𝜏\llbracket{\tau}\rrbracket⟦ italic_τ ⟧ equipped with the relevant diffeology, by induction on the structure of types:

𝐫𝐞𝐚𝐥=defwith the standard diffeology\displaystyle\llbracket\mathbf{real}\rrbracket\stackrel{{\scriptstyle\mathrm{% def}}}{{=}}\mathbb{R}\qquad\text{with the standard diffeology}⟦ bold_real ⟧ start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP blackboard_R with the standard diffeology
(τ1τn)=defi=1nτiwith the product diffeology\displaystyle\llbracket\boldsymbol{(}{\tau}_{1}\boldsymbol{\mathop{*}}\dots% \boldsymbol{\mathop{*}}{\tau}_{n}\boldsymbol{)}\rrbracket\ \stackrel{{% \scriptstyle\mathrm{def}}}{{=}}\ \textstyle\prod_{i=1}^{n}\llbracket{\tau}_{i}% \rrbracket\quad\text{with the product diffeology}⟦ bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ … bold_∗ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_) ⟧ start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟦ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟧ with the product diffeology
τσ=def𝐃𝐢𝐟𝐟(τ,σ)with the functional diffeology\displaystyle\llbracket{\tau}\to{\sigma}\rrbracket\stackrel{{\scriptstyle% \mathrm{def}}}{{=}}\mathbf{Diff}(\llbracket{\tau}\rrbracket,\llbracket{\sigma}% \rrbracket)\quad\text{with the functional diffeology}⟦ italic_τ → italic_σ ⟧ start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP bold_Diff ( ⟦ italic_τ ⟧ , ⟦ italic_σ ⟧ ) with the functional diffeology

A context Γ=(x1:τ1xn:τn)\Gamma=({x}_{1}\colon{\tau}_{1}\dots x_{n}\colon{\tau}_{n})roman_Γ = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is interpreted as a diffeological space Γ=defi=1nτi\llbracket\Gamma\rrbracket\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\prod_{i=1% }^{n}\llbracket{\tau}_{i}\rrbracket⟦ roman_Γ ⟧ start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟦ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟧. Now well typed terms Γt:τprovesΓ𝑡:𝜏\Gamma\vdash{t}:{\tau}roman_Γ ⊢ italic_t : italic_τ are interpreted as smooth functions t:Γτ\llbracket{t}\rrbracket:\llbracket\Gamma\rrbracket\to\llbracket{\tau}\rrbracket⟦ italic_t ⟧ : ⟦ roman_Γ ⟧ → ⟦ italic_τ ⟧, giving a meaning for t𝑡{t}italic_t for every valuation of the context. This is routinely defined by induction on the structure of typing derivations once we choose a smooth function 𝗈𝗉:n\llbracket\mathsf{op}\rrbracket:\mathbb{R}^{n}\to\mathbb{R}⟦ sansserif_op ⟧ : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R to interpret each n𝑛nitalic_n-ary operation 𝗈𝗉𝖮𝗉n𝗈𝗉subscript𝖮𝗉𝑛\mathsf{op}\in\mathsf{Op}_{n}sansserif_op ∈ sansserif_Op start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. For example, constants c¯:𝐫𝐞𝐚𝐥:¯𝑐𝐫𝐞𝐚𝐥\underline{c}:\mathbf{real}under¯ start_ARG italic_c end_ARG : bold_real are interpreted as constant functions; and the first order operations (+,,ς𝜍+,*,\varsigma+ , ∗ , italic_ς) are interpreted by composing with the corresponding functions, which are smooth: e.g., ς(t)(ρ)=defς(t(ρ))\llbracket\varsigma(t)\rrbracket(\rho)\stackrel{{\scriptstyle\mathrm{def}}}{{=% }}\varsigma(\llbracket t\rrbracket(\rho))⟦ italic_ς ( italic_t ) ⟧ ( italic_ρ ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_ς ( ⟦ italic_t ⟧ ( italic_ρ ) ), where ρΓ\rho\in\llbracket\Gamma\rrbracketitalic_ρ ∈ ⟦ roman_Γ ⟧. Variables are interpreted as xi(ρ)=defρi\llbracket{x}_{i}\rrbracket(\rho)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}% \rho_{i}⟦ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟧ ( italic_ρ ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The remaining constructs are interpreted as follows, and it is straightforward to show that smoothness is preserved.

t1,,tn(ρ)=def(t1(ρ),,tn(ρ))\displaystyle\llbracket\langle{t}_{1},\dots,{t}_{n}\rangle\rrbracket(\rho)% \stackrel{{\scriptstyle\mathrm{def}}}{{=}}(\llbracket{t}_{1}\rrbracket(\rho),% \dots,\llbracket{t}_{n}\rrbracket(\rho))⟦ ⟨ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ ⟧ ( italic_ρ ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ( ⟦ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ ( italic_ρ ) , … , ⟦ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟧ ( italic_ρ ) ) λx:τ.t(ρ)(a)=deft(ρ,a)(aτ)\displaystyle\llbracket\lambda{x}{:}{\tau}.{{t}}\rrbracket(\rho)(a)\stackrel{{% \scriptstyle\mathrm{def}}}{{=}}\llbracket{t}\rrbracket(\rho,a)\ \text{($a\in% \llbracket{\tau}\rrbracket$)}⟦ italic_λ italic_x : italic_τ . italic_t ⟧ ( italic_ρ ) ( italic_a ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ⟦ italic_t ⟧ ( italic_ρ , italic_a ) ( italic_a ∈ ⟦ italic_τ ⟧ )
𝐜𝐚𝐬𝐞t𝐨𝐟s(ρ)=defs(ρ,t(ρ))\displaystyle\llbracket\mathbf{case}\,{t}\,\mathbf{of}\,\langle{.}{.}{.}% \rangle\to{s}\rrbracket(\rho)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}% \llbracket{s}\rrbracket(\rho,\llbracket{t}\rrbracket(\rho))⟦ bold_case italic_t bold_of ⟨ … ⟩ → italic_s ⟧ ( italic_ρ ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ⟦ italic_s ⟧ ( italic_ρ , ⟦ italic_t ⟧ ( italic_ρ ) ) ts(ρ)=deft(ρ)(s(ρ))\displaystyle\llbracket{t}\,{s}\rrbracket(\rho)\stackrel{{\scriptstyle\mathrm{% def}}}{{=}}\llbracket{t}\rrbracket(\rho)(\llbracket{s}\rrbracket(\rho))⟦ italic_t italic_s ⟧ ( italic_ρ ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ⟦ italic_t ⟧ ( italic_ρ ) ( ⟦ italic_s ⟧ ( italic_ρ ) )

The logical relations proof of Proposition 2 is reminiscent of diffeological spaces. We now briefly remark on the suitability of the axioms of diffeological spaces (Def 4.1) for a semantic model of smooth programs. The first axiom says that we only consider reflexive logical relations. From the perspective of the interpretation, it recognizes in particular that the semantics of an expression of type (𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥(\mathbf{real}\to\mathbf{real})\to\mathbf{real}( bold_real → bold_real ) → bold_real is defined by its value on smooth functions rather than arbitrary arguments. That is to say, the set-theoretic semantics at the beginning of this section, (𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)𝐫𝐞𝐚𝐥delimited-⌊⌊⌉⌉𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\llfloor(\mathbf{real}\to\mathbf{real})\to\mathbf{real}\rrceilstart_OPEN ⌊ ⌊ end_OPEN ( bold_real → bold_real ) → bold_real start_CLOSE ⌉ ⌉ end_CLOSE, is different to the diffeological semantics, (𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)𝐫𝐞𝐚𝐥delimited-⟦⟧𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\llbracket(\mathbf{real}\to\mathbf{real})\to\mathbf{real}\rrbracket⟦ ( bold_real → bold_real ) → bold_real ⟧. The second axiom for diffeological spaces ensures that the smooth maps in 𝐃𝐢𝐟𝐟(U,X)𝐃𝐢𝐟𝐟𝑈𝑋\mathbf{Diff}(U,X)bold_Diff ( italic_U , italic_X ) are exactly the plots in 𝒫XUsuperscriptsubscript𝒫𝑋𝑈\mathcal{P}_{X}^{U}caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT. The third axiom ensures that categories of manifolds fully embed into 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff; it will not play a visible role in this paper — in fact, [BCLG20] prove similar results for a simple language like ours by using plain logical relations (over 𝐒𝐞𝐭𝐒𝐞𝐭\mathbf{Set}bold_Set) and without demanding the diffeology axioms. However, we expect the third axiom to be crucial for programming with other smooth structures or partiality.

4.2. Correctness of AD

We have shown that a term x1:𝐫𝐞𝐚𝐥,,xn:𝐫𝐞𝐚𝐥t:𝐫𝐞𝐚𝐥:subscript𝑥1𝐫𝐞𝐚𝐥subscript𝑥𝑛:𝐫𝐞𝐚𝐥proves𝑡:𝐫𝐞𝐚𝐥{x}_{1}\colon\mathbf{real},\dots,{x}_{n}\colon\mathbf{real}\vdash{t}:\mathbf{real}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : bold_real , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_real ⊢ italic_t : bold_real is interpreted as a smooth function t:n\llbracket{t}\rrbracket:\mathbb{R}^{n}\to\mathbb{R}⟦ italic_t ⟧ : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R, even if t𝑡titalic_t involves higher order functions (like (1)). Moreover, the macro differentiation 𝒟(k,R)(t)subscript𝒟𝑘𝑅𝑡\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({t})over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_t ) is a function 𝒟(k,R)(t):((R+kk))n(R+kk)\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({t})% \rrbracket:(\mathbb{R}^{\binom{R+k}{k}})^{n}\to\mathbb{R}^{\binom{R+k}{k}}⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_t ) ⟧ : ( blackboard_R start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT (Proposition 1). This enables us to state a limited version of our main correctness theorem:

Theorem 3 (Semantic correctness of 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG (limited)).

For any term
x1:𝐫𝐞𝐚𝐥,,xn:𝐫𝐞𝐚𝐥t:𝐫𝐞𝐚𝐥:subscript𝑥1𝐫𝐞𝐚𝐥subscript𝑥𝑛:𝐫𝐞𝐚𝐥proves𝑡:𝐫𝐞𝐚𝐥{x}_{1}\colon\mathbf{real},\dots,{x}_{n}\colon\mathbf{real}\vdash{t}:\mathbf{real}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : bold_real , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_real ⊢ italic_t : bold_real, the function 𝒟(k,R)(t)delimited-⟦⟧subscript𝒟𝑘𝑅𝑡\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({t})\rrbracket⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_t ) ⟧ is the (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor representation (3) of tdelimited-⟦⟧𝑡\llbracket{t}\rrbracket⟦ italic_t ⟧. In detail: for any smooth functions f1fn:k:subscript𝑓1subscript𝑓𝑛superscript𝑘f_{1}\dots f_{n}:\mathbb{R}^{k}\to\mathbb{R}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R,

((α1++αkfj(x)x1α1xkαk)(α1,,αk)=(0,,0)(R,0,,0))j=1n;𝒟(k,R)(t)=\displaystyle{\left({\left(\frac{\partial^{\alpha_{1}+\ldots+\alpha_{k}}f_{j}(% x)}{\partial x_{1}^{\alpha_{1}}\cdots\partial x_{k}^{\alpha_{k}}}\right)}_{(% \alpha_{1},...,\alpha_{k})=(0,...,0)}^{(R,0,...,0)}\right)}_{j=1}^{n};% \llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({t})\rrbracket=( ( divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) start_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = ( 0 , … , 0 ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R , 0 , … , 0 ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ; ⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_t ) ⟧ =
((α1++αk((f1,,fn);t)(x)x1α1xkαk)(α1,,αk)=(0,,0)(R,0,,0))j=1n.\displaystyle\qquad\quad{\left({\left(\frac{\partial^{\alpha_{1}+\ldots+\alpha% _{k}}((f_{1},\ldots,f_{n});\llbracket{t}\rrbracket)(x)}{\partial x_{1}^{\alpha% _{1}}\cdots\partial x_{k}^{\alpha_{k}}}\right)}_{(\alpha_{1},...,\alpha_{k})=(% 0,...,0)}^{(R,0,...,0)}\right)}_{j=1}^{n}\text{.}( ( divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ; ⟦ italic_t ⟧ ) ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) start_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = ( 0 , … , 0 ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R , 0 , … , 0 ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .

For instance, if n=2𝑛2n=2italic_n = 2, then 𝒟(1,1)(t)(x1,1,x2,0)=(t(x1,x2),t(x,x2)x(x1))\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(1,1)}({t})% \rrbracket({x}_{1},1,{x}_{2},0)=\big{(}\llbracket{t}\rrbracket({x}_{1},{x}_{2}% ),\frac{\partial\llbracket{t}\rrbracket({x},{x}_{2})}{\partial{x}}({x}_{1})% \big{)}⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT ( italic_t ) ⟧ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 1 , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 0 ) = ( ⟦ italic_t ⟧ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , divide start_ARG ∂ ⟦ italic_t ⟧ ( italic_x , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ).

Proof 4.2.

We prove this by logical relations. A categorical version of this proof is in Section 6.2.

For each type τ𝜏{\tau}italic_τ, we define a binary relation Sτsubscript𝑆𝜏S_{{\tau}}italic_S start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT between (open) k𝑘kitalic_k-dimensional plots in τdelimited-⟦⟧𝜏\llbracket{\tau}\rrbracket⟦ italic_τ ⟧ and (open) k𝑘kitalic_k-dimensional plots in 𝒟(k,R)(τ)delimited-⟦⟧subscript𝒟𝑘𝑅𝜏\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({\tau})\rrbracket⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_τ ) ⟧, i.e. Sτ𝒫τk×𝒫𝒟(k,R)(τ)ksubscript𝑆𝜏superscriptsubscript𝒫delimited-⟦⟧𝜏superscript𝑘superscriptsubscript𝒫delimited-⟦⟧subscript𝒟𝑘𝑅𝜏superscript𝑘S_{{\tau}}\subseteq\mathcal{P}_{\llbracket{\tau}\rrbracket}^{\mathbb{R}^{k}}% \times\mathcal{P}_{\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(% k,R)}({\tau})\rrbracket}^{\mathbb{R}^{k}}italic_S start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ⊆ caligraphic_P start_POSTSUBSCRIPT ⟦ italic_τ ⟧ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × caligraphic_P start_POSTSUBSCRIPT ⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_τ ) ⟧ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, by induction on τ𝜏{\tau}italic_τ:

S𝐫𝐞𝐚𝐥subscript𝑆𝐫𝐞𝐚𝐥\displaystyle S_{\mathbf{real}}italic_S start_POSTSUBSCRIPT bold_real end_POSTSUBSCRIPT =defsuperscriptdef\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP {(f,(α1++αkf(x)x1α1xkαk)(α1,,αk)=(0,,0)(R,0,,0))|f:k smooth}conditional-set𝑓superscriptsubscriptsuperscriptsubscript𝛼1subscript𝛼𝑘𝑓𝑥superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝛼𝑘subscript𝛼1subscript𝛼𝑘00𝑅00:𝑓superscript𝑘 smooth\displaystyle\left\{\left(f,{\left(\frac{\partial^{\alpha_{1}+\ldots+\alpha_{k% }}f(x)}{\partial x_{1}^{\alpha_{1}}\cdots\partial x_{k}^{\alpha_{k}}}\right)}_% {(\alpha_{1},...,\alpha_{k})=(0,...,0)}^{(R,0,...,0)}\right)~{}\Big{|}~{}f:% \mathbb{R}^{k}\to\mathbb{R}\text{ smooth}\right\}{ ( italic_f , ( divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) start_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = ( 0 , … , 0 ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R , 0 , … , 0 ) end_POSTSUPERSCRIPT ) | italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R smooth }
S(τ1τn)subscript𝑆subscript𝜏1subscript𝜏𝑛\displaystyle S_{\boldsymbol{(}{\tau}_{1}\boldsymbol{\mathop{*}}...\boldsymbol% {\mathop{*}}{\tau}_{n}\boldsymbol{)}}italic_S start_POSTSUBSCRIPT bold_( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_∗ … bold_∗ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_) end_POSTSUBSCRIPT =defsuperscriptdef\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP {((f1,,fn),(g1,,gn))(f1,g1)Sτ1,,(fn,gn)Sτn}conditional-setsubscript𝑓1subscript𝑓𝑛subscript𝑔1subscript𝑔𝑛formulae-sequencesubscript𝑓1subscript𝑔1subscript𝑆subscript𝜏1subscript𝑓𝑛subscript𝑔𝑛subscript𝑆subscript𝜏𝑛\displaystyle\{((f_{1},...,f_{n}),(g_{1},...,g_{n}))\mid(f_{1},g_{1})\in S_{{% \tau}_{1}},...,(f_{n},g_{n})\in S_{{\tau}_{n}}\}{ ( ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) ∣ ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∈ italic_S start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , ( italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ italic_S start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT }
Sτσsubscript𝑆𝜏𝜎\displaystyle S_{{\tau}\to{\sigma}}italic_S start_POSTSUBSCRIPT italic_τ → italic_σ end_POSTSUBSCRIPT =defsuperscriptdef\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP {(f1,f2)(g1,g2)Sτ.(xf1(x)(g1(x)),xf2(x)(g2(x)))Sσ}conditional-setsubscript𝑓1subscript𝑓2formulae-sequencefor-allsubscript𝑔1subscript𝑔2subscript𝑆𝜏formulae-sequencemaps-to𝑥subscript𝑓1𝑥subscript𝑔1𝑥maps-to𝑥subscript𝑓2𝑥subscript𝑔2𝑥subscript𝑆𝜎\displaystyle\{(f_{1},f_{2})\mid\forall(g_{1},g_{2})\in S_{{\tau}}.(x{\mapsto}% f_{1}(x)(g_{1}(x)),x{\mapsto}f_{2}(x)(g_{2}(x)))\in S_{{\sigma}}\}{ ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∣ ∀ ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ italic_S start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT . ( italic_x ↦ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ) , italic_x ↦ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) ( italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) ) ) ∈ italic_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT }

Then, we establish the following ‘fundamental lemma’:

If x1:τ1,,xn:τnt:σ:subscript𝑥1subscript𝜏1subscript𝑥𝑛:subscript𝜏𝑛proves𝑡:𝜎{x}_{1}{:}{\tau}_{1},{.}{.}{.},{x}_{n}{:}{\tau}_{n}\vdash{t}:{\sigma}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊢ italic_t : italic_σ and, for all 1in1𝑖𝑛1{\leq}i{\leq}n1 ≤ italic_i ≤ italic_n, fi:kτif_{i}:\mathbb{R}^{k}\to\llbracket{\tau}_{i}\rrbracketitalic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → ⟦ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟧ and
gi:k𝒟(k,R)(τi)g_{i}:\mathbb{R}^{k}\to~{}\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D% }}$}_{(k,R)}({\tau}_{i})\rrbracketitalic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → ⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⟧ are such that (fi,gi)subscript𝑓𝑖subscript𝑔𝑖(f_{i},g_{i})( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is in Sτisubscript𝑆subscript𝜏𝑖S_{{\tau}_{i}}italic_S start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, then we have that

((f1,,fn);t,(g1,,gn);𝒟(k,R)(t))\Big{(}(f_{1},\ldots,f_{n});\llbracket{t}\rrbracket,(g_{1},\ldots,g_{n});% \llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({t})% \rrbracket\Big{)}( ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ; ⟦ italic_t ⟧ , ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ; ⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_t ) ⟧ )

is in Sσsubscript𝑆𝜎S_{{\sigma}}italic_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT.

This is proved routinely by induction on the typing derivation of t𝑡{t}italic_t. The case for 𝗈𝗉(t1,,tn)𝗈𝗉subscript𝑡1subscript𝑡𝑛\mathsf{op}({t}_{1},\ldots,{t}_{n})sansserif_op ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) relies on the precise definition of 𝒟(k,R)(𝗈𝗉(t1,,tn))subscript𝒟𝑘𝑅𝗈𝗉subscript𝑡1subscript𝑡𝑛\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}(\mathsf{op}({t}_{1},% \ldots,{t}_{n}))over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( sansserif_op ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ).

We conclude the theorem from the fundamental lemma by considering the case where τi=σ=𝐫𝐞𝐚𝐥subscript𝜏𝑖𝜎𝐫𝐞𝐚𝐥{\tau}_{i}={\sigma}=\mathbf{real}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_σ = bold_real, m=n𝑚𝑛m=nitalic_m = italic_n and si=yisubscript𝑠𝑖subscript𝑦𝑖s_{i}=y_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

5. Extending the language: variant and inductive types

In this section, we show that the definition of forward AD and the semantics generalize if we extend the language of Section  3 with variants and inductive types. As an example of inductive types, we consider lists. This specific choice is only for expository purposes and the whole development works at the level of generality of arbitrary algebraic data types generated as initial algebras of (polynomial) type constructors formed by finite products and variants. These types are easily interpreted in the category of diffeological spaces in much the same way. The categorically minded reader may regard this as a consequence of 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff being a concrete Grothendieck quasitopos, e.g. [BH11], and hence is complete and cocomplete.

5.1. Language.

We additionally consider the following types and terms:

τ,σ,ρ::=types|{𝟣τ1||𝗇τn}variant|𝐥𝐢𝐬𝐭(τ)listt,s,r::=terms|τ.tvariant constructor|[]|t::sempty list and cons|𝐜𝐚𝐬𝐞t𝐨𝐟{𝟣x1s1||𝗇xnsn}pattern matching: variants|𝐟𝐨𝐥𝐝(x1,x2).t𝐨𝐯𝐞𝐫s𝐟𝐫𝐨𝐦rlist fold\begin{array}[t]{l@{\quad\!\!}*3{l@{}}@{\,}l}{\tau},{\sigma},{\rho}&::=&\dots&% \mspace{-25.0mu}\qquad\text{types}\\ &\mathrel{\lvert}&\{\mathsf{\ell_{1}}\,{{\tau}_{1}}\mathrel{\big{\lvert}}% \ldots\mathrel{\big{\lvert}}\mathsf{\ell_{n}}\,{{\tau}_{n}}\}&\qquad\text{% variant}\\ &\mathrel{\lvert}&\mathbf{list}({\tau})&\qquad\text{list}\\[6.0pt] {t},{s},{r}&::=&\dots&\mspace{-25.0mu}\qquad\text{terms}\\ &\mathrel{\lvert}&{\tau}.\ell\,{t}&\qquad\text{variant constructor}\\ &\mathrel{\lvert}&[\,]\ \mathrel{\lvert}\ {t}::{s}&\qquad\text{empty list and % cons}\\ &\mathrel{\lvert}&\mathbf{case}\,{t}\,\mathbf{of}\,\{\mathsf{\ell_{1}}\,{{x}_{% 1}}\to{{s}_{1}}\mathrel{\big{\lvert}}\cdots\mathrel{\big{\lvert}}\mathsf{\ell_% {n}}\,{{x}_{n}}\to{{s}_{n}}\}&\qquad\text{pattern matching: variants}\\ &\mathrel{\lvert}&\mathbf{fold}\,({x}_{1},{x}_{2}).{t}\,\mathbf{over}\,{s}\,% \mathbf{from}\,{r}&\qquad\text{list fold}\\ \end{array}start_ARRAY start_ROW start_CELL italic_τ , italic_σ , italic_ρ end_CELL start_CELL : := end_CELL start_CELL … end_CELL start_CELL types end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | … | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } end_CELL start_CELL variant end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL bold_list ( italic_τ ) end_CELL start_CELL list end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_t , italic_s , italic_r end_CELL start_CELL : := end_CELL start_CELL … end_CELL start_CELL terms end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL italic_τ . roman_ℓ italic_t end_CELL start_CELL variant constructor end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL [ ] | italic_t : : italic_s end_CELL start_CELL empty list and cons end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL bold_case italic_t bold_of { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ⋯ | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } end_CELL start_CELL pattern matching: variants end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL | end_CELL start_CELL bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . italic_t bold_over italic_s bold_from italic_r end_CELL start_CELL list fold end_CELL start_CELL end_CELL end_ROW end_ARRAY

We extend the type system according to the rules of Fig. 4.

\inferruleΓt:τiΓτ.it:τ((𝗂τi)τ)\inferruleΓ[]:𝐥𝐢𝐬𝐭(τ)\inferruleΓt:τΓs:𝐥𝐢𝐬𝐭(τ)Γt::s:𝐥𝐢𝐬𝐭(τ)\inferruleΓt:{𝟣τ1||𝗇τn}for each 1inΓ,xi:τisi:τΓ𝐜𝐚𝐬𝐞t𝐨𝐟{𝟣x1s1||𝗇xnsn}:τ\inferruleΓs:𝐥𝐢𝐬𝐭(τ)Γr:σΓ,x1:τ,x2:σt:σΓ𝐟𝐨𝐥𝐝(x1,x2).t𝐨𝐯𝐞𝐫s𝐟𝐫𝐨𝐦r:σ\begin{array}[]{@{}c@{}}\inferrule{\Gamma\vdash{t}:{\tau}_{i}}{\Gamma\vdash{% \tau}.\ell_{i}\,{t}:{\tau}}((\mathsf{\ell_{i}}\,{\tau}_{i})\in{\tau})\quad% \inferrule{~{}}{\Gamma\vdash[\,]:\mathbf{list}({\tau})}\quad\inferrule{\Gamma% \vdash{t}:{\tau}\\ \Gamma\vdash{s}:\mathbf{list}({\tau})}{\Gamma\vdash{t}::{s}:\mathbf{list}({% \tau})}\\ \\ \inferrule{\Gamma\vdash{t}:\{\mathsf{\ell_{1}}\,{{\tau}_{1}}\mathrel{\big{% \lvert}}\ldots\mathrel{\big{\lvert}}\mathsf{\ell_{n}}\,{{\tau}_{n}}\}\\ \text{for each $1\leq i\leq n$: }\Gamma,{x}_{i}:{\tau}_{i}\vdash{s}_{i}:{\tau}% }{\Gamma\vdash\mathbf{case}\,{t}\,\mathbf{of}\,\{\begin{array}[t]{@{}l@{\,}l@{% }l@{}}\mathsf{\ell_{1}}\,{{x}_{1}}\to{{s}_{1}}\mathrel{\big{\lvert}}\cdots% \mathrel{\big{\lvert}}\mathsf{\ell_{n}}\,{{x}_{n}}&\to{{s}_{n}}\}:{\tau}\end{% array}}\\ \\ \inferrule{\Gamma\vdash{s}:\mathbf{list}({\tau})\\ \Gamma\vdash{r}:{\sigma}\\ \Gamma,{x}_{1}:{\tau},{x}_{2}:{\sigma}\vdash{t}:{\sigma}}{\Gamma\vdash\mathbf{% fold}\,({x}_{1},{x}_{2}).{t}\,\mathbf{over}\,{s}\,\mathbf{from}\,{r}:{\sigma}}% \end{array}start_ARRAY start_ROW start_CELL roman_Γ ⊢ italic_t : italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Γ ⊢ italic_τ . roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_t : italic_τ ( ( roman_ℓ start_POSTSUBSCRIPT sansserif_i end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ italic_τ ) roman_Γ ⊢ [ ] : bold_list ( italic_τ ) roman_Γ ⊢ italic_t : italic_τ end_CELL end_ROW start_ROW start_CELL roman_Γ ⊢ italic_s : bold_list ( italic_τ ) roman_Γ ⊢ italic_t : : italic_s : bold_list ( italic_τ ) end_CELL end_ROW start_ROW start_CELL end_CELL end_ROW start_ROW start_CELL roman_Γ ⊢ italic_t : { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | … | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } end_CELL end_ROW start_ROW start_CELL for each 1 ≤ italic_i ≤ italic_n : roman_Γ , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊢ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_τ roman_Γ ⊢ bold_case italic_t bold_of { start_ARRAY start_ROW start_CELL roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ⋯ | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_CELL start_CELL → italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } : italic_τ end_CELL start_CELL end_CELL end_ROW end_ARRAY end_CELL end_ROW start_ROW start_CELL end_CELL end_ROW start_ROW start_CELL roman_Γ ⊢ italic_s : bold_list ( italic_τ ) end_CELL end_ROW start_ROW start_CELL roman_Γ ⊢ italic_r : italic_σ end_CELL end_ROW start_ROW start_CELL roman_Γ , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_τ , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : italic_σ ⊢ italic_t : italic_σ roman_Γ ⊢ bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . italic_t bold_over italic_s bold_from italic_r : italic_σ end_CELL end_ROW end_ARRAY

Figure 4. Additional typing rules for the extended language.

We can then extend 𝒟(k,R)subscript𝒟𝑘𝑅\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT (again, writing it as 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG , for legibility) to our new types and terms by

𝒟({𝟣τ1||𝗇τn})=def{𝟣𝒟(τ1)||𝗇𝒟(τn)}superscriptdef𝒟|subscript1subscript𝜏1|subscript𝗇subscript𝜏𝑛|subscript1𝒟subscript𝜏1|subscript𝗇𝒟subscript𝜏𝑛\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\{\mathsf{\ell_{1}% }\,{{\tau}_{1}}\mathrel{\big{\lvert}}\ldots\mathrel{\big{\lvert}}\mathsf{\ell_% {n}}\,{{\tau}_{n}}\})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\{\mathsf{\ell_% {1}}\,{\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({\tau}_{1})}\mathrel{% \big{\lvert}}\ldots\mathrel{\big{\lvert}}\mathsf{\ell_{n}}\,{\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({\tau}_{n})}\}over→ start_ARG caligraphic_D end_ARG ( { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | … | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT over→ start_ARG caligraphic_D end_ARG ( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) | … | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT over→ start_ARG caligraphic_D end_ARG ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }
𝒟(𝐥𝐢𝐬𝐭(τ))=def𝐥𝐢𝐬𝐭(𝒟(τ))superscriptdef𝒟𝐥𝐢𝐬𝐭𝜏𝐥𝐢𝐬𝐭𝒟𝜏\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\mathbf{list}({% \tau}))\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathbf{list}(\scalebox{0.8}{% $\overrightarrow{\mathcal{D}}$}({\tau}))over→ start_ARG caligraphic_D end_ARG ( bold_list ( italic_τ ) ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP bold_list ( over→ start_ARG caligraphic_D end_ARG ( italic_τ ) )
𝒟(τ.t)=def𝒟(τ).𝒟(t)\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({\tau}.\ell\,{t})% \stackrel{{\scriptstyle\mathrm{def}}}{{=}}\scalebox{0.8}{$\overrightarrow{% \mathcal{D}}$}({\tau}).\ell\,{\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({% t})}over→ start_ARG caligraphic_D end_ARG ( italic_τ . roman_ℓ italic_t ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP over→ start_ARG caligraphic_D end_ARG ( italic_τ ) . roman_ℓ over→ start_ARG caligraphic_D end_ARG ( italic_t )
𝒟([])=def[]superscriptdef𝒟\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}([\,])\stackrel{{% \scriptstyle\mathrm{def}}}{{=}}[\,]over→ start_ARG caligraphic_D end_ARG ( [ ] ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP [ ]
𝒟(t::s)=def𝒟(t)::𝒟(s)\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({t}::{s})\stackrel% {{\scriptstyle\mathrm{def}}}{{=}}\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$% }({t})::\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}({s})over→ start_ARG caligraphic_D end_ARG ( italic_t : : italic_s ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP over→ start_ARG caligraphic_D end_ARG ( italic_t ) : : over→ start_ARG caligraphic_D end_ARG ( italic_s )
𝒟(𝐜𝐚𝐬𝐞t𝐨𝐟{𝟣x1s1||𝗇xnsn})=defsuperscriptdef𝒟𝐜𝐚𝐬𝐞𝑡𝐨𝐟subscript1subscript𝑥1subscript𝑠1||subscript𝗇subscript𝑥𝑛subscript𝑠𝑛absent\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\mathbf{case}\,{t}% \,\mathbf{of}\,\{\mathsf{\ell_{1}}\,{{x}_{1}}\to{{s}_{1}}\mathrel{\big{\lvert}% }\cdots\mathrel{\big{\lvert}}\mathsf{\ell_{n}}\,{{x}_{n}}\to{{s}_{n}}\})% \stackrel{{\scriptstyle\mathrm{def}}}{{=}}over→ start_ARG caligraphic_D end_ARG ( bold_case italic_t bold_of { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ⋯ | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP
𝐜𝐚𝐬𝐞𝒟(t)𝐨𝐟{𝟣x1𝒟(s1)||𝗇xn𝒟(sn)}𝐜𝐚𝐬𝐞𝒟𝑡𝐨𝐟subscript1subscript𝑥1𝒟subscript𝑠1||subscript𝗇subscript𝑥𝑛𝒟subscript𝑠𝑛\displaystyle\,\quad\mathbf{case}\,\scalebox{0.8}{$\overrightarrow{\mathcal{D}% }$}({t})\,\mathbf{of}\,\{\mathsf{\ell_{1}}\,{{x}_{1}}\to{\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({s}_{1})}\mathrel{\big{\lvert}}\cdots\mathrel{% \big{\lvert}}\mathsf{\ell_{n}}\,{{x}_{n}}\to{\scalebox{0.8}{$\overrightarrow{% \mathcal{D}}$}({s}_{n})}\}bold_case over→ start_ARG caligraphic_D end_ARG ( italic_t ) bold_of { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → over→ start_ARG caligraphic_D end_ARG ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) | ⋯ | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → over→ start_ARG caligraphic_D end_ARG ( italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }
𝒟(𝐟𝐨𝐥𝐝(x1,x2).t𝐨𝐯𝐞𝐫s𝐟𝐫𝐨𝐦r)=def𝐟𝐨𝐥𝐝(x1,x2).𝒟(t)𝐨𝐯𝐞𝐫𝒟(s)𝐟𝐫𝐨𝐦𝒟(r)\displaystyle\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}(\mathbf{fold}\,({x% }_{1},{x}_{2}).{t}\,\mathbf{over}\,{s}\,\mathbf{from}\,{r})\stackrel{{% \scriptstyle\mathrm{def}}}{{=}}\mathbf{fold}\,({x}_{1},{x}_{2}).\scalebox{0.8}% {$\overrightarrow{\mathcal{D}}$}({t})\,\mathbf{over}\,\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({s})\,\mathbf{from}\,\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}({r})over→ start_ARG caligraphic_D end_ARG ( bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . italic_t bold_over italic_s bold_from italic_r ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . over→ start_ARG caligraphic_D end_ARG ( italic_t ) bold_over over→ start_ARG caligraphic_D end_ARG ( italic_s ) bold_from over→ start_ARG caligraphic_D end_ARG ( italic_r )

To demonstrate the practical use of expressive type systems for differential programming, we consider the following two examples. {exa}[Lists of inputs for neural nets] Usually, we run a neural network on a large data set, the size of which might be determined at runtime. To evaluate a neural network on multiple inputs, in practice, one often sums the outcomes. This can be coded in our extended language as follows. Suppose that we have a network f:(𝐫𝐞𝐚𝐥nP)𝐫𝐞𝐚𝐥:𝑓superscript𝐫𝐞𝐚𝐥𝑛𝑃𝐫𝐞𝐚𝐥f:\boldsymbol{(}\mathbf{real}^{n}\boldsymbol{\mathop{*}}P\boldsymbol{)}\to% \mathbf{real}italic_f : bold_( bold_real start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_∗ italic_P bold_) → bold_real that operates on single input vectors. We can construct one that operates on lists of inputs as follows:

g=defλl,w.𝐟𝐨𝐥𝐝(x1,x2).fx1,w+x2𝐨𝐯𝐞𝐫l𝐟𝐫𝐨𝐦0¯:(𝐥𝐢𝐬𝐭(𝐫𝐞𝐚𝐥n)P)𝐫𝐞𝐚𝐥g\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\lambda\langle l,w\rangle.{\mathbf{% fold}\,({x}_{1},{x}_{2}).f\langle{x}_{1},w\rangle+{x}_{2}\,\mathbf{over}\,l\,% \mathbf{from}\,\underline{0}}:\boldsymbol{(}\mathbf{list}(\mathbf{real}^{n})% \boldsymbol{\mathop{*}}P\boldsymbol{)}\to\mathbf{real}italic_g start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_λ ⟨ italic_l , italic_w ⟩ . bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . italic_f ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w ⟩ + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_over italic_l bold_from under¯ start_ARG 0 end_ARG : bold_( bold_list ( bold_real start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) bold_∗ italic_P bold_) → bold_real
{exa}

[Missing data] In practically every application of statistics and machine learning, we face the problem of missing data: for some observations, only partial information is available.

In an expressive typed programming language like we consider, we can model missing data conveniently using the data type 𝐦𝐚𝐲𝐛𝐞(τ)={𝖭𝗈𝗍𝗁𝗂𝗇𝗀()|𝖩𝗎𝗌𝗍τ}𝐦𝐚𝐲𝐛𝐞𝜏|𝖭𝗈𝗍𝗁𝗂𝗇𝗀𝖩𝗎𝗌𝗍𝜏\mathbf{maybe}({\tau})=\{\mathsf{\mathsf{Nothing}}\,{\boldsymbol{(}\,% \boldsymbol{)}}\mathrel{\big{\lvert}}\mathsf{\mathsf{Just}}\,{{\tau}}\}bold_maybe ( italic_τ ) = { sansserif_Nothing bold_( bold_) | sansserif_Just italic_τ }. In the context of a neural network, one might use it as follows. First, define some helper functions

fromMaybe=defλx.λm.𝐜𝐚𝐬𝐞m𝐨𝐟{𝖭𝗈𝗍𝗁𝗂𝗇𝗀_x|𝖩𝗎𝗌𝗍xx}formulae-sequencesuperscriptdeffromMaybe𝜆𝑥𝜆𝑚𝐜𝐚𝐬𝐞𝑚𝐨𝐟𝖭𝗈𝗍𝗁𝗂𝗇𝗀_𝑥|𝖩𝗎𝗌𝗍superscript𝑥superscript𝑥\displaystyle\mathrm{fromMaybe}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}% \lambda{x}.{\lambda m.{\mathbf{case}\,m\,\mathbf{of}\,\{\mathsf{\mathsf{% Nothing}}\,{\_}\to{{x}}\mathrel{\big{\lvert}}\mathsf{\mathsf{Just}}\,{{x}^{% \prime}}\to{{x}^{\prime}}\}}}roman_fromMaybe start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_λ italic_x . italic_λ italic_m . bold_case italic_m bold_of { sansserif_Nothing _ → italic_x | sansserif_Just italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT → italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }
fromMayben=defλx1,,xn.λm1,,mn.fromMaybex1m1,,fromMaybexnmnformulae-sequencesuperscriptdefsuperscriptfromMaybe𝑛𝜆subscript𝑥1subscript𝑥𝑛𝜆subscript𝑚1subscript𝑚𝑛fromMaybesubscript𝑥1subscript𝑚1fromMaybesubscript𝑥𝑛subscript𝑚𝑛\displaystyle\mathrm{fromMaybe}^{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}% \lambda\langle{x}_{1},{.}{.}{.},{x}_{n}\rangle.{\lambda\langle m_{1},{.}{.}{.}% ,m_{n}\rangle.{\langle\mathrm{fromMaybe}\,{x}_{1}\,m_{1},{.}{.}{.},\mathrm{% fromMaybe}\,{x}_{n}\,m_{n}\rangle}}roman_fromMaybe start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_λ ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ . italic_λ ⟨ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ . ⟨ roman_fromMaybe italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , roman_fromMaybe italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩
:(𝐦𝐚𝐲𝐛𝐞(𝐫𝐞𝐚𝐥))n𝐫𝐞𝐚𝐥n𝐫𝐞𝐚𝐥n:absentsuperscript𝐦𝐚𝐲𝐛𝐞𝐫𝐞𝐚𝐥𝑛superscript𝐫𝐞𝐚𝐥𝑛superscript𝐫𝐞𝐚𝐥𝑛\displaystyle\qquad\qquad:(\mathbf{maybe}(\mathbf{real}))^{n}\to\mathbf{real}^% {n}\to\mathbf{real}^{n}: ( bold_maybe ( bold_real ) ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → bold_real start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → bold_real start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT
map=defλf.λl.𝐟𝐨𝐥𝐝(x1,x2).fx1::x2𝐨𝐯𝐞𝐫l𝐟𝐫𝐨𝐦[]:(τσ)𝐥𝐢𝐬𝐭(τ)𝐥𝐢𝐬𝐭(σ)\displaystyle\mathrm{map}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\lambda f.{% \lambda l.{\mathbf{fold}\,({x}_{1},{x}_{2}).f\,{x}_{1}::{x}_{2}\,\mathbf{over}% \,l\,\mathbf{from}\,[\,]}}:({\tau}\to{\sigma})\to\mathbf{list}({\tau})\to% \mathbf{list}({\sigma})roman_map start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP italic_λ italic_f . italic_λ italic_l . bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . italic_f italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : : italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_over italic_l bold_from [ ] : ( italic_τ → italic_σ ) → bold_list ( italic_τ ) → bold_list ( italic_σ )

Given a neural network f:(𝐥𝐢𝐬𝐭(𝐫𝐞𝐚𝐥k)P)𝐫𝐞𝐚𝐥:𝑓𝐥𝐢𝐬𝐭superscript𝐫𝐞𝐚𝐥𝑘𝑃𝐫𝐞𝐚𝐥f:\boldsymbol{(}\mathbf{list}(\mathbf{real}^{k})\boldsymbol{\mathop{*}}P% \boldsymbol{)}\to\mathbf{real}italic_f : bold_( bold_list ( bold_real start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) bold_∗ italic_P bold_) → bold_real, we can build a new one that operates on on a data set for which some covariates (features) are missing, by passing in default values to replace the missing covariates:

λl,m,w.fmap(fromMaybekm)l,w:(𝐥𝐢𝐬𝐭((𝐦𝐚𝐲𝐛𝐞(𝐫𝐞𝐚𝐥))k)(𝐫𝐞𝐚𝐥kP))𝐫𝐞𝐚𝐥\lambda\langle l,\langle m,w\rangle\rangle.f\langle\mathrm{map}\,(\mathrm{% fromMaybe}^{k}\,m)\,l,w\rangle:\boldsymbol{(}\mathbf{list}((\mathbf{maybe}(% \mathbf{real}))^{k})\boldsymbol{\mathop{*}}\boldsymbol{(}\mathbf{real}^{k}% \boldsymbol{\mathop{*}}P\boldsymbol{)}\boldsymbol{)}\to\mathbf{real}start_ROW start_CELL italic_λ ⟨ italic_l , ⟨ italic_m , italic_w ⟩ ⟩ . italic_f ⟨ roman_map ( roman_fromMaybe start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_m ) italic_l , italic_w ⟩ : bold_( bold_list ( ( bold_maybe ( bold_real ) ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) bold_∗ bold_( bold_real start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_∗ italic_P bold_) bold_) → bold_real end_CELL end_ROW

Then, given a data set l𝑙litalic_l with missing covariates, we can perform automatic differentiation on this network to optimize, simultaneously, the ordinary network parameters w𝑤witalic_w and the default values for missing covariates m𝑚mitalic_m.

5.2. Semantics.

In Section 4 we gave a denotational semantics for the simple language in diffeological spaces. This extends to the language in this section, as follows. As before, each type τ𝜏{\tau}italic_τ is interpreted as a diffeological space, which is a set equipped with a family of plots:

  • A variant type {𝟣τ1||𝗇τn}|subscript1subscript𝜏1|subscript𝗇subscript𝜏𝑛\{\mathsf{\ell_{1}}\,{{\tau}_{1}}\mathrel{\big{\lvert}}\ldots\mathrel{\big{% \lvert}}\mathsf{\ell_{n}}\,{{\tau}_{n}}\}{ roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | … | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } is inductively interpreted as the disjoint union of the semantic spaces, {𝟣τ1||𝗇τn}=defi=1nτi\textstyle\llbracket\{\mathsf{\ell_{1}}\,{{\tau}_{1}}\mathrel{\big{\lvert}}% \dots\mathrel{\big{\lvert}}\mathsf{\ell_{n}}\,{{\tau}_{n}}\}\rrbracket\ \ % \stackrel{{\scriptstyle\mathrm{def}}}{{=}}\ \ \biguplus_{i=1}^{n}\llbracket{% \tau}_{i}\rrbracket⟦ { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | … | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ⟧ start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ⨄ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟦ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟧, with U𝑈Uitalic_U-plots

    𝒫{𝟣τ1||𝗇τn}U=def{[Ujfjτji=1nτi]j=1n|U=j=1nUj,fj𝒫τjUj}.\textstyle\mathcal{P}_{\llbracket\{\mathsf{\ell_{1}}\,{{\tau}_{1}}\mathrel{% \big{\lvert}}\ldots\mathrel{\big{\lvert}}\mathsf{\ell_{n}}\,{{\tau}_{n}}\}% \rrbracket}^{U}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\left\{\left.\left[U_% {j}\xrightarrow{f_{j}}\llbracket{\tau}_{j}\rrbracket\to\biguplus_{i=1}^{n}% \llbracket{\tau}_{i}\rrbracket\right]_{j=1}^{n}~{}\right|~{}U=\biguplus_{j=1}^% {n}U_{j},\;f_{j}\in\mathcal{P}_{\llbracket{\tau}_{j}\rrbracket}^{U_{j}}\right\}.caligraphic_P start_POSTSUBSCRIPT ⟦ { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | … | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ⟧ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP { [ italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_ARROW start_OVERACCENT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW ⟦ italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟧ → ⨄ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟦ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟧ ] start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_U = ⨄ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT ⟦ italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟧ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } .
  • A list type 𝐥𝐢𝐬𝐭(τ)𝐥𝐢𝐬𝐭𝜏\mathbf{list}({\tau})bold_list ( italic_τ ) is interpreted as the union of the sets of length i𝑖iitalic_i tuples for all natural numbers i𝑖iitalic_i, 𝐥𝐢𝐬𝐭(τ)=defi=0τi\llbracket\mathbf{list}({\tau})\rrbracket\ \ \stackrel{{\scriptstyle\mathrm{% def}}}{{=}}\ \ \biguplus_{i=0}^{\infty}\llbracket{\tau}\rrbracket^{i}⟦ bold_list ( italic_τ ) ⟧ start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ⨄ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ⟦ italic_τ ⟧ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT with U𝑈Uitalic_U-plots

    𝒫𝐥𝐢𝐬𝐭(τ)U=def{[Ujfjτji=0τi]j=0|U=j=0Uj,fj𝒫τjUj}\textstyle\mathcal{P}_{\llbracket\mathbf{list}({\tau})\rrbracket}^{U}\stackrel% {{\scriptstyle\mathrm{def}}}{{=}}\left\{\left.\left[U_{j}\xrightarrow{f_{j}}% \llbracket{\tau}\rrbracket^{j}\to\biguplus_{i=0}^{\infty}\llbracket{\tau}% \rrbracket^{i}\right]_{j=0}^{\infty}~{}\right|~{}U=\biguplus_{j=0}^{\infty}U_{% j},\;f_{j}\in\mathcal{P}_{\llbracket{\tau}\rrbracket^{j}}^{U_{j}}\right\}caligraphic_P start_POSTSUBSCRIPT ⟦ bold_list ( italic_τ ) ⟧ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP { [ italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_ARROW start_OVERACCENT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW ⟦ italic_τ ⟧ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT → ⨄ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ⟦ italic_τ ⟧ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT | italic_U = ⨄ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT ⟦ italic_τ ⟧ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT }

The constructors and destructors for variants and lists are interpreted as in the usual set theoretic semantics.

It is routine to show inductively that these interpretations are smooth. Thus every term Γt:τprovesΓ𝑡:𝜏\Gamma\vdash{t}:{\tau}roman_Γ ⊢ italic_t : italic_τ in the extended language is interpreted as a smooth function t:Γτ\llbracket{t}\rrbracket:\llbracket\Gamma\rrbracket\to\llbracket{\tau}\rrbracket⟦ italic_t ⟧ : ⟦ roman_Γ ⟧ → ⟦ italic_τ ⟧ between diffeological spaces. List objects as initial algebras are computed as usual in a cocomplete category (e.g. [JR11]). More generally, the interpretation for algebraic data types follows exactly the usual categorical semantics of variant types and inductive types (e.g. [Pit95]).

6. Categorical analysis of (higher order) forward AD and its correctness

This section has three parts. First, we give a categorical account of the functoriality of AD (Ex. 6.1). Then we introduce our gluing construction, and relate it to the correctness of AD (dgm. (4)). Finally, we state and prove a correctness theorem for all first order types by considering a category of manifolds (Th. 8).

6.1. Syntactic categories.

The key contribution of this subsection is that the AD macro translation (Section 3.2) has a canonical status as a unique functor between categories with structure. To this end, we build a syntactic category 𝐒𝐲𝐧𝐒𝐲𝐧\mathbf{Syn}bold_Syn from our language, which has the property of a free category with certain structure. This means that for any category 𝒞𝒞\mathcal{C}caligraphic_C with this structure, there is a unique structure-preserving functor 𝐒𝐲𝐧𝒞𝐒𝐲𝐧𝒞\mathbf{Syn}\to\mathcal{C}bold_Syn → caligraphic_C, which is an interpretation of our language in that category. Generally speaking, this is the categorical view of denotational semantics (e.g. [Pit95]). But in this particular setting, the category 𝐒𝐲𝐧𝐒𝐲𝐧\mathbf{Syn}bold_Syn itself admits alternative forms of this structure, given by the dual numbers interpretation, the triple numbers interpretation etc. of Section 2. This gives canonical functors 𝐒𝐲𝐧𝐒𝐲𝐧𝐒𝐲𝐧𝐒𝐲𝐧\mathbf{Syn}\to\mathbf{Syn}bold_Syn → bold_Syn translating the language into itself, which are the AD macro translations (Section 3.2). A key point is that 𝐒𝐲𝐧𝐒𝐲𝐧\mathbf{Syn}bold_Syn is almost entirely determined by universal properties (for example, cartesian closure for the function space); the only freedom is in the choice of interpretation of

  1. (1)

    the real numbers 𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\mathbf{real}bold_real, which can be taken as the plain type 𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\mathbf{real}bold_real, or as the dual numbers interpretation 𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\mathbf{real}\ast\mathbf{real}bold_real ∗ bold_real etc.;

  2. (2)

    the primitive operations 𝗈𝗉𝗈𝗉\mathsf{op}sansserif_op, which can be taken as the operation 𝗈𝗉𝗈𝗉\mathsf{op}sansserif_op itself, or as the derivative of the operation etc..

𝐜𝐚𝐬𝐞t1,,tn𝐨𝐟x1,,xns=s[t1/x1,,tn/xn]\displaystyle\mathbf{case}\,\langle{t}_{1},\ldots,{t}_{n}\rangle\,\mathbf{of}% \,\langle{x}_{1},\ldots,{x}_{n}\rangle\to{s}={s}{}[^{{t}_{1}}\!/\!_{{x}_{1}},% \ldots,^{{t}_{n}}\!/\!_{{x}_{n}}]bold_case ⟨ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ bold_of ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → italic_s = italic_s [ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] s[t/y]=#x1,,xn𝐜𝐚𝐬𝐞t𝐨𝐟x1,,xns[x1,,xn/y]\displaystyle{s}{}[^{{t}}\!/\!_{{y}}]\stackrel{{\scriptstyle\#{x}_{1},\ldots,{% x}_{n}}}{{=}}\mathbf{case}\,{t}\,\mathbf{of}\,\langle{x}_{1},\ldots,{x}_{n}% \rangle\to{s}{}[^{\langle{x}_{1},\ldots,{x}_{n}\rangle}\!/\!_{{y}}]italic_s [ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ] start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG # italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_RELOP bold_case italic_t bold_of ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → italic_s [ start_POSTSUPERSCRIPT ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ] 𝐜𝐚𝐬𝐞𝗂t𝐨𝐟{𝟣x1s1||𝗇xnsn}=si[t/xi]\displaystyle\mathbf{case}\,\mathsf{\ell_{i}}\,{{t}}\,\mathbf{of}\,\{\mathsf{% \ell_{1}}\,{{x}_{1}}\to{{s}_{1}}\mathrel{\big{\lvert}}\cdots\mathrel{\big{% \lvert}}\mathsf{\ell_{n}}\,{{x}_{n}}\to{{s}_{n}}\}={s}_{i}{}[^{{t}}\!/\!_{{x}_% {i}}]bold_case roman_ℓ start_POSTSUBSCRIPT sansserif_i end_POSTSUBSCRIPT italic_t bold_of { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ⋯ | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } = italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] s[t/y]=#x1,,xn𝐜𝐚𝐬𝐞t𝐨𝐟{𝟣x1s[𝟣x1/y]||𝗇xns[𝗇xn/y]}\displaystyle{s}{}[^{{t}}\!/\!_{{y}}]\stackrel{{\scriptstyle\#{x}_{1},\ldots,{% x}_{n}}}{{=}}\mathbf{case}\,{t}\,\mathbf{of}\,\{\mathsf{\ell_{1}}\,{{x}_{1}}% \to{{s}{}[^{\mathsf{\ell_{1}}\,{{x}_{1}}}\!/\!_{{y}}]}\mathrel{\big{\lvert}}% \cdots\mathrel{\big{\lvert}}\mathsf{\ell_{n}}\,{{x}_{n}}\to{{s}{}[^{\mathsf{% \ell_{n}}\,{{x}_{n}}}\!/\!_{{y}}]}\}italic_s [ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ] start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG # italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_RELOP bold_case italic_t bold_of { roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_s [ start_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT sansserif_1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ] | ⋯ | roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → italic_s [ start_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT sansserif_n end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ] } 𝐟𝐨𝐥𝐝(x1,x2).t𝐨𝐯𝐞𝐫[]𝐟𝐫𝐨𝐦r=rformulae-sequence𝐟𝐨𝐥𝐝subscript𝑥1subscript𝑥2𝑡𝐨𝐯𝐞𝐫𝐟𝐫𝐨𝐦𝑟𝑟\displaystyle\mathbf{fold}\,({x}_{1},{x}_{2}).{t}\,\mathbf{over}\,[\,]\,% \mathbf{from}\,{r}={r}bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . italic_t bold_over [ ] bold_from italic_r = italic_r 𝐟𝐨𝐥𝐝(x1,x2).t𝐨𝐯𝐞𝐫s1::s2𝐟𝐫𝐨𝐦r=t[s1/x1,𝐟𝐨𝐥𝐝(x1,x2).t𝐨𝐯𝐞𝐫s2𝐟𝐫𝐨𝐦r/x2]\displaystyle\mathbf{fold}\,({x}_{1},{x}_{2}).{t}\,\mathbf{over}\,{s}_{1}::{s}% _{2}\,\mathbf{from}\,{r}={t}{}[^{{s}_{1}}\!/\!_{{x}_{1}},^{\mathbf{fold}\,({x}% _{1},{x}_{2}).{t}\,\mathbf{over}\,{s}_{2}\,\mathbf{from}\,{r}}\!/\!_{{x}_{2}}]bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . italic_t bold_over italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : : italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_from italic_r = italic_t [ start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , start_POSTSUPERSCRIPT bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . italic_t bold_over italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_from italic_r end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] u=s[[]/y],r[s/x2]=s[x1::y/y]s[t/y]=#x1,x2𝐟𝐨𝐥𝐝(x1,x2).r𝐨𝐯𝐞𝐫t𝐟𝐫𝐨𝐦u\displaystyle u={s}{}[^{[\,]}\!/\!_{{y}}],{r}{}[^{{s}}\!/\!_{{x}_{2}}]={s}{}[^% {{x}_{1}::{y}}\!/\!_{{y}}]\Rightarrow{s}{}[^{{t}}\!/\!_{{y}}]\stackrel{{% \scriptstyle\#{x}_{1},{x}_{2}}}{{=}}\mathbf{fold}\,({x}_{1},{x}_{2}).{r}\,% \mathbf{over}\,{t}\,\mathbf{from}\,uitalic_u = italic_s [ start_POSTSUPERSCRIPT [ ] end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ] , italic_r [ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] = italic_s [ start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : : italic_y end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ] ⇒ italic_s [ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ] start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG # italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_RELOP bold_fold ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . italic_r bold_over italic_t bold_from italic_u (λx.t)s=t[s/x]\displaystyle(\lambda{x}.{{t}})\,{s}={t}{}[^{{s}}\!/\!_{{x}}]( italic_λ italic_x . italic_t ) italic_s = italic_t [ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ] t=#xλx.txformulae-sequencesuperscript#𝑥𝑡𝜆𝑥𝑡𝑥\displaystyle{t}\stackrel{{\scriptstyle\#{x}}}{{=}}\lambda{x}.{{t}\,{x}}italic_t start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG # italic_x end_ARG end_RELOP italic_λ italic_x . italic_t italic_x We write =#x1,,xnsuperscript#subscript𝑥1subscript𝑥𝑛\stackrel{{\scriptstyle\#{x}_{1},\ldots,{x}_{n}}}{{=}}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG # italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_RELOP to indicate that the variables are free in the left hand side

Figure 5. Standard βη𝛽𝜂\beta\etaitalic_β italic_η-laws (e.g. [Pit95]) for products, functions, variants and lists.

In more detail, our language induces a syntactic category as follows. {defi} Let 𝐒𝐲𝐧𝐒𝐲𝐧\mathbf{Syn}bold_Syn be the category whose objects are types, and where a morphism τσ𝜏𝜎{\tau}\to{\sigma}italic_τ → italic_σ is a term in context x:τt:σ:𝑥𝜏proves𝑡:𝜎{x}:{\tau}\vdash{t}:{\sigma}italic_x : italic_τ ⊢ italic_t : italic_σ modulo the βη𝛽𝜂\beta\etaitalic_β italic_η-laws (Fig. 5). Composition is by substitution. For simplicity, we do not impose identities involving the primitive operations, such as the arithmetic identity x+y=y+x𝑥𝑦𝑦𝑥x+y=y+xitalic_x + italic_y = italic_y + italic_x in 𝐒𝐲𝐧𝐒𝐲𝐧\mathbf{Syn}bold_Syn. As is standard, this category has the following universal property.

Lemma 4 (e.g. [Pit95]).

For every bicartesian closed category 𝒞𝒞\mathcal{C}caligraphic_C with list objects, and every choice of an object F(𝐫𝐞𝐚𝐥)𝒞𝐹𝐫𝐞𝐚𝐥𝒞F(\mathbf{real})\in\mathcal{C}italic_F ( bold_real ) ∈ caligraphic_C and morphisms F(𝗈𝗉)𝒞(F(𝐫𝐞𝐚𝐥)n,F(𝐫𝐞𝐚𝐥))𝐹𝗈𝗉𝒞𝐹superscript𝐫𝐞𝐚𝐥𝑛𝐹𝐫𝐞𝐚𝐥F(\mathsf{op})\in\mathcal{C}(F(\mathbf{real})^{n},F(\mathbf{real}))italic_F ( sansserif_op ) ∈ caligraphic_C ( italic_F ( bold_real ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_F ( bold_real ) ) for all 𝗈𝗉𝖮𝗉n𝗈𝗉subscript𝖮𝗉𝑛\mathsf{op}\in\mathsf{Op}_{n}sansserif_op ∈ sansserif_Op start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, in 𝒞𝒞\mathcal{C}caligraphic_C, there is a unique functor F:𝐒𝐲𝐧𝒞:𝐹𝐒𝐲𝐧𝒞F:{\mathbf{Syn}\to\mathcal{C}}italic_F : bold_Syn → caligraphic_C respecting the interpretation and preserving the bicartesian closed structure as well as list objects.

Proof 6.1 (Proof notes).

The functor F:𝐒𝐲𝐧𝒞:𝐹𝐒𝐲𝐧𝒞F:\mathbf{Syn}\to\mathcal{C}italic_F : bold_Syn → caligraphic_C is a canonical denotational semantics for the language, interpreting types as objects of 𝒞𝒞\mathcal{C}caligraphic_C and terms as morphisms. For instance, F(τσ)=def(FτFσ)superscriptdef𝐹𝜏𝜎𝐹𝜏𝐹𝜎F({{\tau}\to{\sigma}})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}(F{\tau}\to F{% {\sigma}})italic_F ( italic_τ → italic_σ ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ( italic_F italic_τ → italic_F italic_σ ), the function space in the category 𝒞𝒞\mathcal{C}caligraphic_C, and F(ts)𝐹𝑡𝑠F{({t}\,{s})}italic_F ( italic_t italic_s ) is the composite (Ft,Fs);𝑒𝑣𝑎𝑙𝐹𝑡𝐹𝑠𝑒𝑣𝑎𝑙(F{{t}},F{{s}});\mathit{eval}( italic_F italic_t , italic_F italic_s ) ; italic_eval.

When 𝒞=𝐃𝐢𝐟𝐟𝒞𝐃𝐢𝐟𝐟\mathcal{C}=\mathbf{Diff}caligraphic_C = bold_Diff, the denotational semantics of the language in diffeological spaces (Section 4,5.2) can be understood as the unique structure preserving functor :𝐒𝐲𝐧𝐃𝐢𝐟𝐟\llbracket-\rrbracket:\mathbf{Syn}\to\mathbf{Diff}⟦ - ⟧ : bold_Syn → bold_Diff satisfying 𝐫𝐞𝐚𝐥=\llbracket\mathbf{real}\rrbracket=\mathbb{R}⟦ bold_real ⟧ = blackboard_R, ς=ς\llbracket\varsigma\rrbracket=\varsigma⟦ italic_ς ⟧ = italic_ς and so on.

{exa}

[Canonical definition of forward AD] The forward AD macro 𝒟(k,R)subscript𝒟𝑘𝑅\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT (Section 3,5.1) arises as a canonical bicartesian closed functor on 𝐒𝐲𝐧𝐒𝐲𝐧\mathbf{Syn}bold_Syn that preserves list objects. Consider the unique bicartesian closed functor F:𝐒𝐲𝐧𝐒𝐲𝐧:𝐹𝐒𝐲𝐧𝐒𝐲𝐧F:\mathbf{Syn}\to\mathbf{Syn}italic_F : bold_Syn → bold_Syn that preserves list objects such that F(𝐫𝐞𝐚𝐥)=𝐫𝐞𝐚𝐥(R+kk)𝐹𝐫𝐞𝐚𝐥superscript𝐫𝐞𝐚𝐥binomial𝑅𝑘𝑘F(\mathbf{real})=\mathbf{real}^{\binom{R+k}{k}}italic_F ( bold_real ) = bold_real start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT and

F(𝗈𝗉)=z:(F(𝐫𝐞𝐚𝐥)..F(𝐫𝐞𝐚𝐥))𝐜𝐚𝐬𝐞z𝐨𝐟x1,,xn𝒟(k,R)(𝗈𝗉(x1,,xn)):F(𝐫𝐞𝐚𝐥).F(\mathsf{op})={z}\!:\!\boldsymbol{(}F(\mathbf{real})\boldsymbol{\mathop{*}}..% \boldsymbol{\mathop{*}}F(\mathbf{real})\boldsymbol{)}\vdash\mathbf{case}\,{z}% \,\mathbf{of}\,\langle{x}_{1},...,{x}_{n}\rangle\to\scalebox{0.8}{$% \overrightarrow{\mathcal{D}}$}_{\!(k,R)}(\mathsf{op}({x}_{1},\ldots,{x}_{n})):% F(\mathbf{real}).italic_F ( sansserif_op ) = italic_z : bold_( italic_F ( bold_real ) bold_∗ . . bold_∗ italic_F ( bold_real ) bold_) ⊢ bold_case italic_z bold_of ⟨ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ → over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( sansserif_op ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) : italic_F ( bold_real ) .

Then for any type τ𝜏{\tau}italic_τ, F(τ)=𝒟(k,R)(τ)𝐹𝜏subscript𝒟𝑘𝑅𝜏F({\tau})=\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({\tau})italic_F ( italic_τ ) = over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_τ ), and for any term x:τt:σ:𝑥𝜏proves𝑡:𝜎x:{\tau}\vdash{t}:{\sigma}italic_x : italic_τ ⊢ italic_t : italic_σ, F(t)=𝒟(k,R)(t)𝐹𝑡subscript𝒟𝑘𝑅𝑡F({t})=\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({t})italic_F ( italic_t ) = over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_t ) as morphisms F(τ)F(σ)𝐹𝜏𝐹𝜎F({\tau})\to F({\sigma})italic_F ( italic_τ ) → italic_F ( italic_σ ) in the syntactic category.

This observation is a categorical counterpart to Lemma 1.

6.2. Categorical gluing and logical relations.

Gluing is a method for building new categorical models which has been used for many purposes, including logical relations and realizability [MS92]. Our logical relations argument in the proof of Theorem 3 can be understood in this setting. (In fact we originally found the proof of Theorem 3 in this way.) In this subsection, for the categorically minded, we explain this, and in doing so we quickly recover a correctness result for the more general language in Section 5 and for arbitrary first order types.

The general, established idea of categorical logical relations starts from the observation that that logical relations are defined by induction on the structure of types. Types have universal properties in a categorical semantics (e.g. cartesian closure for the function space), and so we can organize the logical relations argument by defining some category 𝒞𝒞\mathcal{C}caligraphic_C of relations and observing that it has the requisite categorical structure. The interpretation of types as relations can then be understood as coming from a unique structure preserving map 𝐒𝐲𝐧𝒞𝐒𝐲𝐧𝒞\mathbf{Syn}\to\mathcal{C}bold_Syn → caligraphic_C. In this paper, our logical relations are not quite as simple as a binary relation on sets; rather they are relations between plots. Nonetheless, this still forms a category with the appropriate structure, which follows because it can still be regarded as arising from a gluing construction, as we now explain.

We define a category 𝐆𝐥ksubscript𝐆𝐥𝑘\mathbf{Gl}_{k}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT whose objects are triples (X,X,S)𝑋superscript𝑋𝑆(X,X^{\prime},S)( italic_X , italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_S ) where X𝑋Xitalic_X and Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are diffeological spaces and S𝒫Xk×𝒫Xk𝑆superscriptsubscript𝒫𝑋superscript𝑘superscriptsubscript𝒫superscript𝑋superscript𝑘S\subseteq\mathcal{P}_{X}^{\mathbb{R}^{k}}\times\mathcal{P}_{X^{\prime}}^{% \mathbb{R}^{k}}italic_S ⊆ caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × caligraphic_P start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is a relation between their k𝑘kitalic_k-dimensional plots. A morphism (X,X,S)(Y,Y,T)𝑋superscript𝑋𝑆𝑌superscript𝑌𝑇(X,X^{\prime},S)\to(Y,Y^{\prime},T)( italic_X , italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_S ) → ( italic_Y , italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_T ) is a pair of smooth functions f:XY:𝑓𝑋𝑌f\colon X\to Yitalic_f : italic_X → italic_Y, f:XY:superscript𝑓superscript𝑋superscript𝑌f^{\prime}\colon X^{\prime}\to Y^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT → italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, such that if (g,g)S𝑔superscript𝑔𝑆(g,g^{\prime})\in S( italic_g , italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_S then (g;f,g;f)T𝑔𝑓superscript𝑔superscript𝑓𝑇(g;f,g^{\prime};f^{\prime})\in T( italic_g ; italic_f , italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_T. The idea is that this is a semantic domain where we can simultaneously interpret the language and its automatic derivatives.

Proposition 5.

The category 𝐆𝐥ksubscript𝐆𝐥𝑘\mathbf{Gl}_{k}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is bicartesian closed, has list objects, and the projection functor proj:𝐆𝐥k𝐃𝐢𝐟𝐟×𝐃𝐢𝐟𝐟:projsubscript𝐆𝐥𝑘𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathrm{proj}:\mathbf{Gl}_{k}\to\mathbf{Diff}\times\mathbf{Diff}roman_proj : bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → bold_Diff × bold_Diff preserves this structure.

Proof 6.2 (Proof notes).

The category 𝐆𝐥ksubscript𝐆𝐥𝑘\mathbf{Gl}_{k}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a full subcategory of the comma category
id𝐒𝐞𝐭𝐃𝐢𝐟𝐟(k,)×𝐃𝐢𝐟𝐟(k,)subscriptid𝐒𝐞𝐭𝐃𝐢𝐟𝐟superscript𝑘𝐃𝐢𝐟𝐟superscript𝑘{\rm id}_{\mathbf{Set}}\downarrow\mathbf{Diff}(\mathbb{R}^{k},-)\times\mathbf{% Diff}(\mathbb{R}^{k},-)roman_id start_POSTSUBSCRIPT bold_Set end_POSTSUBSCRIPT ↓ bold_Diff ( blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , - ) × bold_Diff ( blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , - ). The result thus follows by the general theory of categorical gluing (e.g. [JLS07, Lemma 15]).

We give a semantics =(0,1,S)\llparenthesis-\rrparenthesis=(\llparenthesis-\rrparenthesis_{0},% \llparenthesis-\rrparenthesis_{1},S_{-})⦇ - ⦈ = ( ⦇ - ⦈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⦇ - ⦈ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) for the language in 𝐆𝐥ksubscript𝐆𝐥𝑘\mathbf{Gl}_{k}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, interpreting types τ𝜏{\tau}italic_τ as objects (τ0,τ1,Sτ)(\llparenthesis{\tau}\rrparenthesis_{0},\llparenthesis{\tau}\rrparenthesis_{1}% ,S_{{\tau}})( ⦇ italic_τ ⦈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⦇ italic_τ ⦈ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ), and terms as morphisms. We let 𝐫𝐞𝐚𝐥0=def\llparenthesis\mathbf{real}\rrparenthesis_{0}\stackrel{{\scriptstyle\mathrm{% def}}}{{=}}\mathbb{R}⦇ bold_real ⦈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP blackboard_R and 𝐫𝐞𝐚𝐥1=def(R+kk)\llparenthesis\mathbf{real}\rrparenthesis_{1}\stackrel{{\scriptstyle\mathrm{% def}}}{{=}}\mathbb{R}^{\binom{R+k}{k}}⦇ bold_real ⦈ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP blackboard_R start_POSTSUPERSCRIPT ( FRACOP start_ARG italic_R + italic_k end_ARG start_ARG italic_k end_ARG ) end_POSTSUPERSCRIPT, with the relation

S𝐫𝐞𝐚𝐥=def{(f,(α1++αkf(x)x1α1xkαk)(α1,,αk)=(0,,0)(R,0,,0))|f:k smooth}.superscriptdefsubscript𝑆𝐫𝐞𝐚𝐥conditional-set𝑓superscriptsubscriptsuperscriptsubscript𝛼1subscript𝛼𝑘𝑓𝑥superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝛼𝑘subscript𝛼1subscript𝛼𝑘00𝑅00:𝑓superscript𝑘 smoothS_{\mathbf{real}}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\left\{(f,{\left(% \frac{\partial^{\alpha_{1}+\ldots+\alpha_{k}}f(x)}{\partial x_{1}^{\alpha_{1}}% \cdots\partial x_{k}^{\alpha_{k}}}\right)}_{(\alpha_{1},...,\alpha_{k})=(0,...% ,0)}^{(R,0,...,0)})~{}|~{}f:\mathbb{R}^{k}\to\mathbb{R}\text{ smooth}\right\}.italic_S start_POSTSUBSCRIPT bold_real end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP { ( italic_f , ( divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) start_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = ( 0 , … , 0 ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_R , 0 , … , 0 ) end_POSTSUPERSCRIPT ) | italic_f : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → blackboard_R smooth } .

We interpret the operations 𝗈𝗉𝗈𝗉\mathsf{op}sansserif_op according to 𝗈𝗉delimited-⟦⟧𝗈𝗉\llbracket\mathsf{op}\rrbracket⟦ sansserif_op ⟧ in 0\llparenthesis-\rrparenthesis_{0}⦇ - ⦈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, but according to the (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor representation of 𝗈𝗉delimited-⟦⟧𝗈𝗉\llbracket\mathsf{op}\rrbracket⟦ sansserif_op ⟧ in 1\llparenthesis-\rrparenthesis_{1}⦇ - ⦈ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. For instance, when k=2𝑘2k=2italic_k = 2 and r=2𝑟2r=2italic_r = 2, 1:2×22\llparenthesis*\rrparenthesis_{1}:\mathbb{R}^{2}\times\mathbb{R}^{2}\to\mathbb% {R}^{2}⦇ ∗ ⦈ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is

1\displaystyle\llparenthesis*\rrparenthesis_{1}⦇ ∗ ⦈ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ((x00,x01,x02,x10,x11,x20),(y00,y01,y02,y10,y11,y20))=defsuperscriptdefsubscript𝑥00subscript𝑥01subscript𝑥02subscript𝑥10subscript𝑥11subscript𝑥20subscript𝑦00subscript𝑦01subscript𝑦02subscript𝑦10subscript𝑦11subscript𝑦20absent\displaystyle((x_{00},x_{01},x_{02},x_{10},x_{11},x_{20}),(y_{00},y_{01},y_{02% },y_{10},y_{11},y_{20}))\stackrel{{\scriptstyle\mathrm{def}}}{{=}}( ( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT ) , ( italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT ) ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP
(x00y00,\displaystyle(x_{00}y_{00},( italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT ,
x00y01+x01y00,subscript𝑥00subscript𝑦01subscript𝑥01subscript𝑦00\displaystyle\;x_{00}y_{01}+x_{01}y_{00},italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT ,
x02y00+2x01y01+x00y02,subscript𝑥02subscript𝑦002subscript𝑥01subscript𝑦01subscript𝑥00subscript𝑦02\displaystyle\;x_{02}y_{00}+2x_{01}y_{01}+x_{00}y_{02},italic_x start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT + 2 italic_x start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 02 end_POSTSUBSCRIPT ,
x00y10+x10y00,subscript𝑥00subscript𝑦10subscript𝑥10subscript𝑦00\displaystyle\;x_{00}y_{10}+x_{10}y_{00},italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT ,
x11y00+x01y10+x10y01+x00y11,subscript𝑥11subscript𝑦00subscript𝑥01subscript𝑦10subscript𝑥10subscript𝑦01subscript𝑥00subscript𝑦11\displaystyle\;x_{11}y_{00}+x_{01}y_{10}+x_{10}y_{01}+x_{00}y_{11},italic_x start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ,
x20y00+2x10y10+x00y20).\displaystyle\;x_{20}y_{00}+2x_{10}y_{10}+x_{00}y_{20})\text{.}italic_x start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT + 2 italic_x start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT ) .

At this point one checks that these interpretations are indeed morphisms in 𝐆𝐥ksubscript𝐆𝐥𝑘\mathbf{Gl}_{k}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. This is equivalent to the statement that 𝗈𝗉1\llparenthesis\mathsf{op}\rrparenthesis_{1}⦇ sansserif_op ⦈ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor representation of 𝗈𝗉delimited-⟦⟧𝗈𝗉\llbracket\mathsf{op}\rrbracket⟦ sansserif_op ⟧ (3). The remaining constructions of the language are interpreted using the categorical structure of 𝐆𝐥ksubscript𝐆𝐥𝑘\mathbf{Gl}_{k}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, following Lemma  4.

Notice that the diagram below commutes. One can check this by hand or note that it follows from the initiality of 𝐒𝐲𝐧𝐒𝐲𝐧\mathbf{Syn}bold_Syn (Lemma  4): all the functors preserve all the structure.

𝐒𝐲𝐧𝐒𝐲𝐧\textstyle{\mathbf{Syn}\ignorespaces\ignorespaces\ignorespaces\ignorespaces% \ignorespaces\ignorespaces\ignorespaces\ignorespaces}bold_Syn(id,𝒟(k,R)())idsubscript𝒟𝑘𝑅\scriptstyle{({\rm id},\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}(% -))}( roman_id , over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( - ) )delimited-⦇⦈\scriptstyle{\llparenthesis-\rrparenthesis}⦇ - ⦈𝐒𝐲𝐧×𝐒𝐲𝐧𝐒𝐲𝐧𝐒𝐲𝐧\textstyle{\mathbf{Syn}\times\mathbf{Syn}\ignorespaces\ignorespaces% \ignorespaces\ignorespaces}bold_Syn × bold_Syn×\scriptstyle{\llbracket-\rrbracket\times\llbracket-\rrbracket}⟦ - ⟧ × ⟦ - ⟧𝐆𝐥ksubscript𝐆𝐥𝑘\textstyle{\mathbf{Gl}_{k}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTprojproj\scriptstyle{\mathrm{proj}}roman_proj𝐃𝐢𝐟𝐟×𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\textstyle{\mathbf{Diff}\times\mathbf{Diff}}bold_Diff × bold_Diff (4)

We thus arrive at a restatement of the correctness theorem (Th. 3), which holds even for the extended language with variants and lists, because for any x1:𝐫𝐞𝐚𝐥,,xn:𝐫𝐞𝐚𝐥t:𝐫𝐞𝐚𝐥:subscript𝑥1𝐫𝐞𝐚𝐥subscript𝑥𝑛:𝐫𝐞𝐚𝐥proves𝑡:𝐫𝐞𝐚𝐥x_{1}:\mathbf{real},{.}{.}{.},x_{n}:\mathbf{real}\vdash{t}:\mathbf{real}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : bold_real , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_real ⊢ italic_t : bold_real, the interpretations (t,𝒟(k,R)(t))(\llbracket{t}\rrbracket,\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}% }$}_{(k,R)}({t})\rrbracket)( ⟦ italic_t ⟧ , ⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_t ) ⟧ ) are in the image of the projection 𝐆𝐥k𝐃𝐢𝐟𝐟×𝐃𝐢𝐟𝐟subscript𝐆𝐥𝑘𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Gl}_{k}\to\mathbf{Diff}\times\mathbf{Diff}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → bold_Diff × bold_Diff, and hence 𝒟(k,R)(t)delimited-⟦⟧subscript𝒟𝑘𝑅𝑡\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({t})\rrbracket⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_t ) ⟧ is a (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor representation of tdelimited-⟦⟧𝑡\llbracket{t}\rrbracket⟦ italic_t ⟧.

6.3. Correctness at all first order types, via manifolds.

We now generalize Theorem 3 to hold at all first order types, not just the reals.

So far, we have shown that our macro translation (Section 3.2) gives correct derivatives to functions of the real numbers, even if other types are involved in the definitions of the functions (Theorem 3 and Section 6.2). We can state this formally because functions of the real numbers have well understood derivatives (Section 2). There are no established mathematical notions of derivatives at higher types, and so we cannot even begin to argue that our syntactic derivatives of functions (𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)(𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥(\mathbf{real}\to\mathbf{real})\to(\mathbf{real}\to\mathbf{real})( bold_real → bold_real ) → ( bold_real → bold_real ) match with some existing mathematical notion (see also Section 7).

However, for functions of first order type, like 𝐥𝐢𝐬𝐭(𝐫𝐞𝐚𝐥)𝐥𝐢𝐬𝐭(𝐫𝐞𝐚𝐥)𝐥𝐢𝐬𝐭𝐫𝐞𝐚𝐥𝐥𝐢𝐬𝐭𝐫𝐞𝐚𝐥\mathbf{list}(\mathbf{real})\to\mathbf{list}(\mathbf{real})bold_list ( bold_real ) → bold_list ( bold_real ), there are established mathematical notions of derivative, because we can understand 𝐥𝐢𝐬𝐭(𝐫𝐞𝐚𝐥)𝐥𝐢𝐬𝐭𝐫𝐞𝐚𝐥\mathbf{list}(\mathbf{real})bold_list ( bold_real ) as the manifold of all tuples of reals, and then appeal to the well-known theory of manifolds and jet bundles. We do this now, to achieve a correctness theorem for all first order types (Theorem 8). The key high level points are that

  • manifolds support a notion of differentiation, and an interpretation of all first order types, but not an interpretation of higher types;

  • diffeological spaces support all types, including higher types, but not an established notion of differentiation in general;

  • manifolds and smooth maps embed full and faithfully in diffeological spaces, preserving the interpretation of first order types, so we can use the two notions together.

We now explain this development in more detail.

For our purposes, a smooth manifold M𝑀Mitalic_M is a second-countable Hausdorff topological space together with a smooth atlas. In more detail, a topological space X𝑋Xitalic_X is second-countable when there exists a collection U:={Ui}iassign𝑈subscriptsubscript𝑈𝑖𝑖U:=\{U_{i}\}_{i\in\mathbb{N}}italic_U := { italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT of open subsets of X𝑋Xitalic_X such that any open subset of X𝑋Xitalic_X can be written as a union of elements of U𝑈Uitalic_U. A topological space X𝑋Xitalic_X is Hausdorff if for every distinct points x𝑥xitalic_x and y𝑦yitalic_y, there exists disjoint open subsets U,V𝑈𝑉U,Vitalic_U , italic_V of X𝑋Xitalic_X such that xU,yVformulae-sequence𝑥𝑈𝑦𝑉x\in U,y\in Vitalic_x ∈ italic_U , italic_y ∈ italic_V. A smooth atlas of a topological space X𝑋Xitalic_X is an open cover 𝒰𝒰\mathcal{U}caligraphic_U together with homeomorphisms (ϕU:Un(U))U𝒰\left(\phi_{U}:U\to\mathbb{R}^{n(U)}\right)_{U\in\mathcal{U}}( italic_ϕ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT : italic_U → blackboard_R start_POSTSUPERSCRIPT italic_n ( italic_U ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_U ∈ caligraphic_U end_POSTSUBSCRIPT (called charts, or local coordinates) such that ϕU1;ϕVsuperscriptsubscriptitalic-ϕ𝑈1subscriptitalic-ϕ𝑉\phi_{U}^{-1};\phi_{V}italic_ϕ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ; italic_ϕ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT is smooth on its domain of definition for all U,V𝒰𝑈𝑉𝒰U,V\in\mathcal{U}italic_U , italic_V ∈ caligraphic_U. A function f:MN:𝑓𝑀𝑁f:M\to Nitalic_f : italic_M → italic_N between manifolds is smooth if ϕU1;f;ψVsubscriptsuperscriptitalic-ϕ1𝑈𝑓subscript𝜓𝑉\phi^{-1}_{U};f;\psi_{V}italic_ϕ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ; italic_f ; italic_ψ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT is smooth for all charts ϕUsubscriptitalic-ϕ𝑈\phi_{U}italic_ϕ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and ψVsubscript𝜓𝑉\psi_{V}italic_ψ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT of M𝑀Mitalic_M and N𝑁Nitalic_N, respectively. Let us write 𝐌𝐚𝐧𝐌𝐚𝐧\mathbf{Man}bold_Man for this category. This definition of manifolds is a slight generalisation of the more usual one from differential geometry because different charts in an atlas may have different finite dimensions n(U)𝑛𝑈n(U)italic_n ( italic_U ). Thus we consider manifolds with dimensions that are potentially unbounded, albeit locally finite.

Each open subset of nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT can be regarded as a manifold. This lets us regard the category of manifolds 𝐌𝐚𝐧𝐌𝐚𝐧\mathbf{Man}bold_Man as a full subcategory of the category of diffeological spaces. We consider a manifold (X,{ϕU}U)𝑋subscriptsubscriptitalic-ϕ𝑈𝑈(X,\{\phi_{U}\}_{U})( italic_X , { italic_ϕ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) as a diffeological space with the same carrier set X𝑋Xitalic_X and where the plots 𝒫XUsuperscriptsubscript𝒫𝑋𝑈\mathcal{P}_{X}^{U}caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT, called the manifold diffeology, are the smooth functions in 𝐌𝐚𝐧(U,X)𝐌𝐚𝐧𝑈𝑋\mathbf{Man}(U,X)bold_Man ( italic_U , italic_X ). A function XY𝑋𝑌X\to Yitalic_X → italic_Y is smooth in the sense of manifolds if and only if it is smooth in the sense of diffeological spaces [IZ13]. For the categorically minded reader, this means that we have a full embedding of 𝐌𝐚𝐧𝐌𝐚𝐧\mathbf{Man}bold_Man into 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff. Moreover, the natural interpretation of the first order fragment of our language in 𝐌𝐚𝐧𝐌𝐚𝐧\mathbf{Man}bold_Man coincides with that in 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff. That is, the embedding of 𝐌𝐚𝐧𝐌𝐚𝐧\mathbf{Man}bold_Man into 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff preserves finite products and countable coproducts (hence initial algebras of polynomial endofunctors).

Proposition 6.

Suppose that a type τ𝜏{\tau}italic_τ is first order, i.e. it is just built from reals, products, variants, and lists (or, again, arbitrary inductive types), and not function types. Then the diffeological space τdelimited-⟦⟧𝜏\llbracket{\tau}\rrbracket⟦ italic_τ ⟧ is a manifold.

Proof 6.3 (Proof notes).

This is proved by induction on the structure of types. In fact, one may show that every such τdelimited-⟦⟧𝜏\llbracket{\tau}\rrbracket⟦ italic_τ ⟧ is isomorphic to a manifold of the form i=1ndisuperscriptsubscriptsymmetric-difference𝑖1𝑛superscriptsubscript𝑑𝑖\biguplus_{i=1}^{n}\mathbb{R}^{d_{i}}⨄ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT where the bound n𝑛nitalic_n is either finite or \infty, but this isomorphism is typically not an identity function.

We recall how the Taylor representation of any morphism f:MN:𝑓𝑀𝑁f:M\to Nitalic_f : italic_M → italic_N of manifolds is given by its action on jets [KSM99, Chapter IV]. For each point x𝑥xitalic_x in a manifold M𝑀Mitalic_M, define the (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-jet space 𝒥x(k,R)Msubscriptsuperscript𝒥𝑘𝑅𝑥𝑀\mathcal{J}^{(k,R)}_{x}Mcaligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_M to be the set {γ𝐌𝐚𝐧(k,M)γ(0)=x}/\{\gamma\in\mathbf{Man}(\mathbb{R}^{k},M)\mid\gamma(0)=x\}/\sim{ italic_γ ∈ bold_Man ( blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_M ) ∣ italic_γ ( 0 ) = italic_x } / ∼ of equivalence classes [γ]delimited-[]𝛾[\gamma][ italic_γ ] of k𝑘kitalic_k-dimensional plots γ𝛾\gammaitalic_γ in M𝑀Mitalic_M based at x𝑥xitalic_x, where we identify γ1γ2similar-tosubscript𝛾1subscript𝛾2\gamma_{1}\sim\gamma_{2}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT iff all partial derivatives of order Rabsent𝑅\leq R≤ italic_R coincide in the sense that

α1++αk(γ1;f)(x)x1α1xkαk(0)=α1++αk(γ2;f)(x)x1α1xkαk(0)superscriptsubscript𝛼1subscript𝛼𝑘subscript𝛾1𝑓𝑥superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝛼𝑘0superscriptsubscript𝛼1subscript𝛼𝑘subscript𝛾2𝑓𝑥superscriptsubscript𝑥1subscript𝛼1superscriptsubscript𝑥𝑘subscript𝛼𝑘0\frac{\partial^{\alpha_{1}+...+\alpha_{k}}(\gamma_{1};f)(x)}{\partial x_{1}^{% \alpha_{1}}\cdots\partial x_{k}^{\alpha_{k}}}(0)=\frac{\partial^{\alpha_{1}+..% .+\alpha_{k}}(\gamma_{2};f)(x)}{\partial x_{1}^{\alpha_{1}}\cdots\partial x_{k% }^{\alpha_{k}}}(0)divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_f ) ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ( 0 ) = divide start_ARG ∂ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; italic_f ) ( italic_x ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ( 0 )

for all smooth f:M:𝑓𝑀f:M\to\mathbb{R}italic_f : italic_M → blackboard_R and all multi-indices (α1,,αk)=(0,,0),,(R,0,,0)subscript𝛼1subscript𝛼𝑘00𝑅00(\alpha_{1},...,\alpha_{k})=(0,...,0),...,(R,0,...,0)( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = ( 0 , … , 0 ) , … , ( italic_R , 0 , … , 0 ). In the case of (k,R)=(1,1)𝑘𝑅11(k,R)=(1,1)( italic_k , italic_R ) = ( 1 , 1 ), a (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-jet space is better known as a tangent space. The (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-jet bundle (a.k.a. tangent bundle, in case (k,R)=(1,1)𝑘𝑅11(k,R)=(1,1)( italic_k , italic_R ) = ( 1 , 1 )) of M𝑀Mitalic_M is the set 𝒥(k,R)(M)=defxM𝒥x(k,R)(M)superscriptdefsuperscript𝒥𝑘𝑅𝑀subscriptsymmetric-difference𝑥𝑀subscriptsuperscript𝒥𝑘𝑅𝑥𝑀\mathcal{J}^{(k,R)}(M)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\biguplus_{x% \in M}\mathcal{J}^{(k,R)}_{x}(M)caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ⨄ start_POSTSUBSCRIPT italic_x ∈ italic_M end_POSTSUBSCRIPT caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M ). The charts of M𝑀Mitalic_M equip 𝒥(k,R)(M)superscript𝒥𝑘𝑅𝑀\mathcal{J}^{(k,R)}(M)caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M ) with a canonical manifold structure. The (manifold) diffeology of these jet bundles can be concisely summarized by the plots 𝒫𝒥(k,R)(M)U={f:U|𝒥(k,R)(M)|g𝒫MU×k.uU.(g(u,0),[vg(u,v)])=f(u)}superscriptsubscript𝒫superscript𝒥𝑘𝑅𝑀𝑈conditional-set𝑓formulae-sequence𝑈conditionalsuperscript𝒥𝑘𝑅𝑀𝑔superscriptsubscript𝒫𝑀𝑈superscript𝑘for-all𝑢𝑈𝑔𝑢0delimited-[]maps-to𝑣𝑔𝑢𝑣𝑓𝑢\mathcal{P}_{\mathcal{J}^{(k,R)}(M)}^{U}=\left\{f:U\to|\mathcal{J}^{(k,R)}(M)|% \mid\exists g\in\mathcal{P}_{M}^{U\times\mathbb{R}^{k}}.\forall u\in U.(g(u,0)% ,[v\mapsto g(u,v)])=f(u)\right\}caligraphic_P start_POSTSUBSCRIPT caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT = { italic_f : italic_U → | caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M ) | ∣ ∃ italic_g ∈ caligraphic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U × blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT . ∀ italic_u ∈ italic_U . ( italic_g ( italic_u , 0 ) , [ italic_v ↦ italic_g ( italic_u , italic_v ) ] ) = italic_f ( italic_u ) }.
Then 𝒥(k,R)superscript𝒥𝑘𝑅\mathcal{J}^{(k,R)}caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT acts on smooth maps f:MN:𝑓𝑀𝑁f:M\to Nitalic_f : italic_M → italic_N to give 𝒥(k,R)(f):𝒥(k,R)(M)𝒥(k,R)(N):superscript𝒥𝑘𝑅𝑓superscript𝒥𝑘𝑅𝑀superscript𝒥𝑘𝑅𝑁\mathcal{J}^{(k,R)}(f):\mathcal{J}^{(k,R)}(M)\to\mathcal{J}^{(k,R)}(N)caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_f ) : caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M ) → caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_N ) is defined as 𝒥(k,R)(f)(x,[γ])=def(f(x),[γ;f])superscriptdefsuperscript𝒥𝑘𝑅𝑓𝑥delimited-[]𝛾𝑓𝑥𝛾𝑓\mathcal{J}^{(k,R)}(f)(x,[\gamma])\stackrel{{\scriptstyle\mathrm{def}}}{{=}}(f% (x),[\gamma;f])caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_f ) ( italic_x , [ italic_γ ] ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ( italic_f ( italic_x ) , [ italic_γ ; italic_f ] ). In local coordinates, this action 𝒥(k,R)(f)superscript𝒥𝑘𝑅𝑓\mathcal{J}^{(k,R)}(f)caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_f ) is seen to coincide precisely with the (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor representation of f𝑓fitalic_f given by the Faà di Bruno formula [Mer04]. All told, the (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-jet bundle is a functor 𝒥(k,R):𝐌𝐚𝐧𝐌𝐚𝐧:superscript𝒥𝑘𝑅𝐌𝐚𝐧𝐌𝐚𝐧\mathcal{J}^{(k,R)}:\mathbf{Man}\to\mathbf{Man}caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT : bold_Man → bold_Man [KSM99].

We can understand the jet bundle of a composite space in terms of that of its parts.

Lemma 7.

There are canonical isomorphisms 𝒥(k,R)(i=1Mi)i=1𝒥(k,R)(Mi)superscript𝒥𝑘𝑅superscriptsubscriptsymmetric-difference𝑖1subscript𝑀𝑖superscriptsubscriptsymmetric-difference𝑖1superscript𝒥𝑘𝑅subscript𝑀𝑖\mathcal{J}^{(k,R)}(\biguplus_{i=1}^{\infty}M_{i})\cong\biguplus_{i=1}^{\infty% }\mathcal{J}^{(k,R)}(M_{i})caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( ⨄ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≅ ⨄ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and 𝒥(k,R)(M1××Mn)𝒥(k,R)(M1)××𝒥(k,R)(Mn)superscript𝒥𝑘𝑅subscript𝑀1subscript𝑀𝑛superscript𝒥𝑘𝑅subscript𝑀1superscript𝒥𝑘𝑅subscript𝑀𝑛\mathcal{J}^{(k,R)}(M_{1}\times\ldots\times M_{n})\cong\mathcal{J}^{(k,R)}(M_{% 1})\times\ldots\times\mathcal{J}^{(k,R)}(M_{n})caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × … × italic_M start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≅ caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) × … × caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ).

Proof 6.4 (Proof notes).

For disjoint unions, notice that that smooth morphisms from ksuperscript𝑘\mathbb{R}^{k}blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT into a disjoint union of manifolds always factor over a single inclusion, because ksuperscript𝑘\mathbb{R}^{k}blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT is connected. For products, it is well-known that partial derivatives of a morphism (f1,,fn)subscript𝑓1subscript𝑓𝑛(f_{1},...,f_{n})( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) are calculated component-wise [Lee13, ex. 3-2].

We define a canonical isomorphism ϕτ𝒟𝒥:𝒟(k,R)(τ)𝒥(k,R)(τ)\phi_{{\tau}}^{\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}\mathcal{J}}:% \llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({\tau})% \rrbracket\to\mathcal{J}^{(k,R)}(\llbracket{\tau}\rrbracket)italic_ϕ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over→ start_ARG caligraphic_D end_ARG caligraphic_J end_POSTSUPERSCRIPT : ⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_τ ) ⟧ → caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( ⟦ italic_τ ⟧ ) for every type τ𝜏{\tau}italic_τ, by induction on the structure of types. We let ϕ𝐫𝐞𝐚𝐥𝒟𝒥:𝒟(k,R)(𝐫𝐞𝐚𝐥)𝒥(k,R)(𝐫𝐞𝐚𝐥)\phi_{\mathbf{real}}^{\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}\mathcal{J% }}:\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}(\mathbf{% real})\rrbracket\to\mathcal{J}^{(k,R)}(\llbracket\mathbf{real}\rrbracket)italic_ϕ start_POSTSUBSCRIPT bold_real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over→ start_ARG caligraphic_D end_ARG caligraphic_J end_POSTSUPERSCRIPT : ⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( bold_real ) ⟧ → caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( ⟦ bold_real ⟧ ) be given by ϕ𝐫𝐞𝐚𝐥𝒟𝒥(x,x)=def(x,[tx+xt])superscriptdefsuperscriptsubscriptitalic-ϕ𝐫𝐞𝐚𝐥𝒟𝒥𝑥superscript𝑥𝑥delimited-[]maps-to𝑡𝑥superscript𝑥𝑡\phi_{\mathbf{real}}^{\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}\mathcal{J% }}(x,x^{\prime})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}(x,[t\mapsto x+x^{% \prime}t])italic_ϕ start_POSTSUBSCRIPT bold_real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over→ start_ARG caligraphic_D end_ARG caligraphic_J end_POSTSUPERSCRIPT ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ( italic_x , [ italic_t ↦ italic_x + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_t ] ). For the other types, we use Lemma 7. We can now phrase correctness at all first order types.

Theorem 8 (Semantic correctness of 𝒟(k,R)subscript𝒟𝑘𝑅{\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT (full)).

For any ground τ𝜏{\tau}italic_τ, any first order context ΓΓ\Gammaroman_Γ and any term Γt:τprovesΓ𝑡:𝜏\Gamma\vdash{t}:{\tau}roman_Γ ⊢ italic_t : italic_τ, the syntactic translation 𝒟(k,R)subscript𝒟𝑘𝑅\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT coincides with the (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-jet bundle functor, modulo these canonical isomorphisms:

𝒟(k,R)(Γ)delimited-⟦⟧subscript𝒟𝑘𝑅Γ\textstyle{\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}(% \Gamma)\rrbracket\ignorespaces\ignorespaces\ignorespaces\ignorespaces% \ignorespaces\ignorespaces\ignorespaces\ignorespaces}⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( roman_Γ ) ⟧𝒟(k,R)(t)delimited-⟦⟧subscript𝒟𝑘𝑅𝑡\scriptstyle{\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}(% {t})\rrbracket}⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_t ) ⟧ϕΓ𝒟𝒥superscriptsubscriptitalic-ϕΓ𝒟𝒥\scriptstyle{\phi_{\Gamma}^{\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}% \mathcal{J}}}italic_ϕ start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over→ start_ARG caligraphic_D end_ARG caligraphic_J end_POSTSUPERSCRIPT\scriptstyle{\cong}𝒟(k,R)(τ)delimited-⟦⟧subscript𝒟𝑘𝑅𝜏\textstyle{\llbracket\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(k,R)}({% \tau})\rrbracket\ignorespaces\ignorespaces\ignorespaces\ignorespaces}⟦ over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( italic_k , italic_R ) end_POSTSUBSCRIPT ( italic_τ ) ⟧ϕτ𝒟𝒥superscriptsubscriptitalic-ϕ𝜏𝒟𝒥\scriptstyle{\phi_{{\tau}}^{\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}% \mathcal{J}}}italic_ϕ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over→ start_ARG caligraphic_D end_ARG caligraphic_J end_POSTSUPERSCRIPT\scriptstyle{\cong}𝒥(k,R)(Γ)\textstyle{\mathcal{J}^{(k,R)}(\llbracket\Gamma\rrbracket)\ignorespaces% \ignorespaces\ignorespaces\ignorespaces}caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( ⟦ roman_Γ ⟧ )𝒥(k,R)(t)\scriptstyle{\mathcal{J}^{(k,R)}(\llbracket{t}\rrbracket)}caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( ⟦ italic_t ⟧ )𝒥(k,R)(τ)\textstyle{\mathcal{J}^{(k,R)}(\llbracket{\tau}\rrbracket)}caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( ⟦ italic_τ ⟧ )
Proof 6.5 (Proof notes).

For any k𝑘kitalic_k-dimensional plot γ𝐌𝐚𝐧(k,M)𝛾𝐌𝐚𝐧superscript𝑘𝑀\gamma\in\mathbf{Man}(\mathbb{R}^{k},M)italic_γ ∈ bold_Man ( blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_M ), let γ¯𝐌𝐚𝐧(k,𝒥(k,R)(M))¯𝛾𝐌𝐚𝐧superscript𝑘superscript𝒥𝑘𝑅𝑀\bar{\gamma}\in\mathbf{Man}(\mathbb{R}^{k},\mathcal{J}^{(k,R)}(M))over¯ start_ARG italic_γ end_ARG ∈ bold_Man ( blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M ) ) be the (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-jet curve, given by γ¯(x)=(γ(x),[tγ(x+t)])¯𝛾𝑥𝛾𝑥delimited-[]maps-to𝑡𝛾𝑥𝑡\bar{\gamma}(x)=(\gamma(x),[t\mapsto\gamma(x+t)])over¯ start_ARG italic_γ end_ARG ( italic_x ) = ( italic_γ ( italic_x ) , [ italic_t ↦ italic_γ ( italic_x + italic_t ) ] ). First, we note that a smooth map h:𝒥(k,R)(M)𝒥(k,R)(N):superscript𝒥𝑘𝑅𝑀superscript𝒥𝑘𝑅𝑁h:\mathcal{J}^{(k,R)}(M)\to\mathcal{J}^{(k,R)}(N)italic_h : caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_M ) → caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_N ) is of the form 𝒥(k,R)(g)superscript𝒥𝑘𝑅𝑔\mathcal{J}^{(k,R)}(g)caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_g ) for some g:MN:𝑔𝑀𝑁g:M\to Nitalic_g : italic_M → italic_N if for all smooth γ:kM:𝛾superscript𝑘𝑀\gamma:\mathbb{R}^{k}\to Mitalic_γ : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → italic_M we have γ¯;h=(γ;g)¯:k𝒥(k,R)(N):¯𝛾¯𝛾𝑔superscript𝑘superscript𝒥𝑘𝑅𝑁\bar{\gamma};h=\overline{(\gamma;g)}:\mathbb{R}^{k}\to\mathcal{J}^{(k,R)}(N)over¯ start_ARG italic_γ end_ARG ; italic_h = over¯ start_ARG ( italic_γ ; italic_g ) end_ARG : blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → caligraphic_J start_POSTSUPERSCRIPT ( italic_k , italic_R ) end_POSTSUPERSCRIPT ( italic_N ). This generalizes (3). Second, for any first order type τ𝜏{\tau}italic_τ, Sτ={(f,f~)|f~;ϕτ𝒟𝒥=f¯}subscript𝑆delimited-⟦⟧𝜏conditional-set𝑓~𝑓~𝑓superscriptsubscriptitalic-ϕ𝜏𝒟𝒥¯𝑓S_{\llbracket{\tau}\rrbracket}=\{(f,\tilde{f})~{}|~{}\tilde{f};\phi_{\tau}^{% \scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}\mathcal{J}}=\bar{f}\}italic_S start_POSTSUBSCRIPT ⟦ italic_τ ⟧ end_POSTSUBSCRIPT = { ( italic_f , over~ start_ARG italic_f end_ARG ) | over~ start_ARG italic_f end_ARG ; italic_ϕ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over→ start_ARG caligraphic_D end_ARG caligraphic_J end_POSTSUPERSCRIPT = over¯ start_ARG italic_f end_ARG }. This is shown by induction on the structure of types. We conclude the theorem from diagram (4), by putting these two observations together.

7. Discussion: What are derivatives of higher order functions?

In our gluing categories 𝐆𝐥ksubscript𝐆𝐥𝑘\mathbf{Gl}_{k}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of Section 6.2, we have avoided the question of what semantic derivatives should be associated with higher order functions. Our syntactic macro 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG provides a specific derivative for every definable function, but in the model 𝐆𝐥ksubscript𝐆𝐥𝑘\mathbf{Gl}_{k}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT there is only a relation between plots and their corresponding Taylor representations, and this relation is not necessarily single-valued. Our approach has been rather indifferent about what “the” correct derivative of a higher order function should be. Instead, all we have cared about is that we are using “a” derivative that is correct in the sense that it can never be used to produce incorrect derivatives for first order functions, where we do have an unambiguous notion of correct derivative.

7.1. Automatic derivatives of higher order functions may not be unique!

For a concrete example to show that derivatives of higher order functions might not be unique in our framework, let us consider the case (k,R)=(1,1)𝑘𝑅11(k,R)=(1,1)( italic_k , italic_R ) = ( 1 , 1 ) and focus on first derivatives of the evaluation function

ev::evabsent\displaystyle\mathrm{ev}:\ roman_ev : (𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)𝐫𝐞𝐚𝐥=();\displaystyle\mathbb{R}\to\llbracket(\mathbf{real}\to\mathbf{real})\to\mathbf{% real}\rrbracket=\mathbb{R}\to(\mathbb{R}\Rightarrow\mathbb{R})\Rightarrow% \mathbb{R};blackboard_R → ⟦ ( bold_real → bold_real ) → bold_real ⟧ = blackboard_R → ( blackboard_R ⇒ blackboard_R ) ⇒ blackboard_R ;
r(ff(r)).maps-to𝑟maps-to𝑓𝑓𝑟\displaystyle r\mapsto(f\mapsto f(r)).italic_r ↦ ( italic_f ↦ italic_f ( italic_r ) ) .

Our macro 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG will return λa:.λf:××.f(a,1)\lambda a:\mathbb{R}.\lambda f:\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}% \times\mathbb{R}.f(a,1)italic_λ italic_a : blackboard_R . italic_λ italic_f : blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R . italic_f ( italic_a , 1 ). In this section we show that the lambda term λa:.λf:××.sortf(a,1)\lambda a:\mathbb{R}.\lambda f:\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}% \times\mathbb{R}.\mathrm{sort}f(a,1)italic_λ italic_a : blackboard_R . italic_λ italic_f : blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R . roman_sort italic_f ( italic_a , 1 ) is also a valid derivative of the evaluation map, where sort:(××)(××):sort\mathrm{sort}:(\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}\times\mathbb{R}% )\Rightarrow(\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}\times\mathbb{R})roman_sort : ( blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R ) ⇒ ( blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R ) is defined by

sort:=λf.λ((r,_).(π1(f(r,0)),π2𝒟((,0);f;π1)(r))).\displaystyle\mathrm{sort}:=\lambda f.\lambda((r,\_).(\pi_{1}(f(r,0)),\pi_{2}% \scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}((-,0);f;\pi_{1})(r))).roman_sort := italic_λ italic_f . italic_λ ( ( italic_r , _ ) . ( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_f ( italic_r , 0 ) ) , italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over→ start_ARG caligraphic_D end_ARG ( ( - , 0 ) ; italic_f ; italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_r ) ) ) .

This map is idempotent and it converts any map ××\mathbb{R}\times\mathbb{R}\to\mathbb{R}\times\mathbb{R}blackboard_R × blackboard_R → blackboard_R × blackboard_R into the dual-numbers representation of its first component. For example, (sort(swap))sortswap(\mathrm{sort}(\mathrm{swap}))( roman_sort ( roman_swap ) ) is the constantly (0,0)00(0,0)( 0 , 0 ) function, where we write

swap::swapabsent\displaystyle\mathrm{swap}:\ roman_swap : ××\displaystyle\mathbb{R}\times\mathbb{R}\to\mathbb{R}\times\mathbb{R}blackboard_R × blackboard_R → blackboard_R × blackboard_R
(r,r)(r,r).maps-to𝑟superscript𝑟superscript𝑟𝑟\displaystyle(r,r^{\prime})\mapsto(r^{\prime},r).( italic_r , italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ↦ ( italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_r ) .

According to our gluing semantics, a function g:τ1σ1g:\llparenthesis{\tau}\rrparenthesis_{1}\to\llparenthesis{\sigma}% \rrparenthesis_{1}italic_g : ⦇ italic_τ ⦈ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → ⦇ italic_σ ⦈ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT defines a correct (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor representation of a function f:τ0σ0f:\llparenthesis{\tau}\rrparenthesis_{0}\to\llparenthesis{\sigma}% \rrparenthesis_{0}italic_f : ⦇ italic_τ ⦈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT → ⦇ italic_σ ⦈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT iff (f,g)𝑓𝑔(f,g)( italic_f , italic_g ) defines a morphism τσ\llparenthesis{\tau}\rrparenthesis\to\llparenthesis{\sigma}\rrparenthesis⦇ italic_τ ⦈ → ⦇ italic_σ ⦈ in 𝐆𝐥ksubscript𝐆𝐥𝑘\mathbf{Gl}_{k}bold_Gl start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. In particular, there is no guarantee that every f𝑓fitalic_f has a unique correct (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-Taylor representation g𝑔gitalic_g. (Although such Taylor representations are, in fact, unique when τ,σ𝜏𝜎{\tau},{\sigma}italic_τ , italic_σ are first order types.) The gluing relation (𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)𝐫𝐞𝐚𝐥delimited-⦇⦈𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\llparenthesis(\mathbf{real}\to\mathbf{real})\to\mathbf{real}\rrparenthesis⦇ ( bold_real → bold_real ) → bold_real ⦈ in 𝐆𝐥1subscript𝐆𝐥1\mathbf{Gl}_{1}bold_Gl start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT relates curves in γ:():𝛾\gamma:\mathbb{R}\to(\mathbb{R}\Rightarrow\mathbb{R})\Rightarrow\mathbb{R}italic_γ : blackboard_R → ( blackboard_R ⇒ blackboard_R ) ⇒ blackboard_R to “tangent curves” γ:(××)×:superscript𝛾\gamma^{\prime}:\mathbb{R}\to(\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}% \times\mathbb{R})\Rightarrow\mathbb{R}\times\mathbb{R}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : blackboard_R → ( blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R ) ⇒ blackboard_R × blackboard_R. In this relation, the function evev\mathrm{ev}roman_ev is related to at least two different tangent curves.

Lemma 9.

We have a smooth map

sort::sortabsent\displaystyle\mathrm{sort}:\ roman_sort : (××)(××)\displaystyle(\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}\times\mathbb{R})% \to(\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}\times\mathbb{R})( blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R ) → ( blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R )
f((r,_)(π1(f(r,0)),((,0);f;π1)(r))).maps-to𝑓maps-to𝑟_subscript𝜋1𝑓𝑟00𝑓subscript𝜋1𝑟\displaystyle f\mapsto((r,\_)\mapsto(\pi_{1}(f(r,0)),\nabla((-,0);f;\pi_{1})(r% ))).italic_f ↦ ( ( italic_r , _ ) ↦ ( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_f ( italic_r , 0 ) ) , ∇ ( ( - , 0 ) ; italic_f ; italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_r ) ) ) .
Proof 7.1.

Let f𝒫××U𝑓superscriptsubscript𝒫𝑈f\in\mathcal{P}_{\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}\times\mathbb{% R}}^{U}italic_f ∈ caligraphic_P start_POSTSUBSCRIPT blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT and let γ1,γ2𝒫Usubscript𝛾1subscript𝛾2superscriptsubscript𝒫𝑈\gamma_{1},\gamma_{2}\in\mathcal{P}_{\mathbb{R}}^{U}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT. Then, also u(γ1(u),0);f(u);π1𝒫U=𝐌𝐚𝐧(U,)formulae-sequencemaps-to𝑢subscript𝛾1𝑢0𝑓𝑢subscript𝜋1superscriptsubscript𝒫𝑈𝐌𝐚𝐧𝑈u\mapsto(\gamma_{1}(u),0);f(u);\pi_{1}\in\mathcal{P}_{\mathbb{R}}^{U}=\mathbf{% Man}(U,\mathbb{R})italic_u ↦ ( italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_u ) , 0 ) ; italic_f ( italic_u ) ; italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT = bold_Man ( italic_U , blackboard_R ) by definition of the exponential in 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff. Therefore, we also have that u(u(γ1(u),0);f(u);π1)𝐌𝐚𝐧(U,)=𝒫Umaps-to𝑢maps-tosuperscript𝑢subscript𝛾1superscript𝑢0𝑓superscript𝑢subscript𝜋1𝐌𝐚𝐧𝑈superscriptsubscript𝒫𝑈u\mapsto\nabla(u^{\prime}\mapsto(\gamma_{1}(u^{\prime}),0);f(u^{\prime});\pi_{% 1})\in\mathbf{Man}(U,\mathbb{R})=\mathcal{P}_{\mathbb{R}}^{U}italic_u ↦ ∇ ( italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↦ ( italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , 0 ) ; italic_f ( italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ; italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∈ bold_Man ( italic_U , blackboard_R ) = caligraphic_P start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT, as we are working with infinitely differentiable smooth maps. Consequently,

u(f;sort)(u)(γ1(u),γ2(u))=(π1(f(u))(γ1(u),0),(u(γ1(u),0);f(u);π1))𝒫×U,maps-to𝑢𝑓sort𝑢subscript𝛾1𝑢subscript𝛾2𝑢subscript𝜋1𝑓𝑢subscript𝛾1𝑢0maps-tosuperscript𝑢subscript𝛾1superscript𝑢0𝑓superscript𝑢subscript𝜋1superscriptsubscript𝒫𝑈u\mapsto(f;\mathrm{sort})(u)(\gamma_{1}(u),\gamma_{2}(u))=(\pi_{1}(f(u))(% \gamma_{1}(u),0),\nabla(u^{\prime}\mapsto(\gamma_{1}(u^{\prime}),0);f(u^{% \prime});\pi_{1}))\in\mathcal{P}_{\mathbb{R}\times\mathbb{R}}^{U},italic_u ↦ ( italic_f ; roman_sort ) ( italic_u ) ( italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_u ) , italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_u ) ) = ( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_f ( italic_u ) ) ( italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_u ) , 0 ) , ∇ ( italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↦ ( italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , 0 ) ; italic_f ( italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ; italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ∈ caligraphic_P start_POSTSUBSCRIPT blackboard_R × blackboard_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ,

by definition of the product in 𝐃𝐢𝐟𝐟𝐃𝐢𝐟𝐟\mathbf{Diff}bold_Diff. It follows that (f;sort)𝒫××U𝑓sortsuperscriptsubscript𝒫𝑈(f;\mathrm{sort})\in\mathcal{P}_{\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{% R}\times\mathbb{R}}^{U}( italic_f ; roman_sort ) ∈ caligraphic_P start_POSTSUBSCRIPT blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT.

Proposition 10.

We have that both (ev,ev1)(𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)𝐫𝐞𝐚𝐥(\mathrm{ev},\mathrm{ev}^{\prime}_{1})\in\llparenthesis(\mathbf{real}\to% \mathbf{real})\to\mathbf{real}\rrparenthesis( roman_ev , roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∈ ⦇ ( bold_real → bold_real ) → bold_real ⦈ and (ev,ev2)(𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥)𝐫𝐞𝐚𝐥(\mathrm{ev},\mathrm{ev}^{\prime}_{2})\in\llparenthesis(\mathbf{real}\to% \mathbf{real})\to\mathbf{real}\rrparenthesis( roman_ev , roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ ⦇ ( bold_real → bold_real ) → bold_real ⦈ for

ev1::subscriptsuperscriptev1absent\displaystyle\mathrm{ev}^{\prime}_{1}:roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : (××)×\displaystyle\mathbb{R}\to(\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}% \times\mathbb{R})\Rightarrow\mathbb{R}\times\mathbb{R}blackboard_R → ( blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R ) ⇒ blackboard_R × blackboard_R
a(ff(a,1))maps-to𝑎maps-to𝑓𝑓𝑎1\displaystyle a\mapsto(f\mapsto f(a,1))italic_a ↦ ( italic_f ↦ italic_f ( italic_a , 1 ) )
ev2::subscriptsuperscriptev2absent\displaystyle\mathrm{ev}^{\prime}_{2}:roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : (××)×\displaystyle\mathbb{R}\to(\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}% \times\mathbb{R})\Rightarrow\mathbb{R}\times\mathbb{R}blackboard_R → ( blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R ) ⇒ blackboard_R × blackboard_R
a(f(sortf)(a,1)).maps-to𝑎maps-to𝑓sort𝑓𝑎1\displaystyle a\mapsto(f\mapsto(\mathrm{sort}f)(a,1)).italic_a ↦ ( italic_f ↦ ( roman_sort italic_f ) ( italic_a , 1 ) ) .
Proof 7.2.

By definition of delimited-⦇⦈\llparenthesis-\rrparenthesis⦇ - ⦈, we need to show that for any (γ,γ)𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥(\gamma,\gamma^{\prime})\in\llparenthesis\mathbf{real}\to\mathbf{real}\rrparenthesis( italic_γ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ ⦇ bold_real → bold_real ⦈, we have that (xev(x)(γ(x)),xevi(x)(γ(x)))𝐫𝐞𝐚𝐥(x\mapsto\mathrm{ev}(x)(\gamma(x)),x\mapsto\mathrm{ev}^{\prime}_{i}(x)(\gamma^% {\prime}(x)))\in\llparenthesis\mathbf{real}\rrparenthesis( italic_x ↦ roman_ev ( italic_x ) ( italic_γ ( italic_x ) ) , italic_x ↦ roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ) ) ∈ ⦇ bold_real ⦈. This means that we need to show that for i=1,2𝑖12i=1,2italic_i = 1 , 2

xevi(x)(γ(x))=(xev(x)(γ(x)),(xev(x)(γ(x))))maps-to𝑥subscriptsuperscriptev𝑖𝑥superscript𝛾𝑥maps-to𝑥ev𝑥𝛾𝑥maps-to𝑥ev𝑥𝛾𝑥\displaystyle x\mapsto\mathrm{ev}^{\prime}_{i}(x)(\gamma^{\prime}(x))=(x% \mapsto\mathrm{ev}(x)(\gamma(x)),\nabla(x\mapsto\mathrm{ev}(x)(\gamma(x))))italic_x ↦ roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ) = ( italic_x ↦ roman_ev ( italic_x ) ( italic_γ ( italic_x ) ) , ∇ ( italic_x ↦ roman_ev ( italic_x ) ( italic_γ ( italic_x ) ) ) )

Unrolling further, this means we need to show that for any γ::𝛾\gamma:\mathbb{R}\to\mathbb{R}\Rightarrow\mathbb{R}italic_γ : blackboard_R → blackboard_R ⇒ blackboard_R and γ:××:superscript𝛾\gamma^{\prime}:\mathbb{R}\to\mathbb{R}\times\mathbb{R}\Rightarrow\mathbb{R}% \times\mathbb{R}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : blackboard_R → blackboard_R × blackboard_R ⇒ blackboard_R × blackboard_R such that for any (δ,δ)𝐫𝐞𝐚𝐥(\delta,\delta^{\prime})\in\llparenthesis\mathbf{real}\rrparenthesis( italic_δ , italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ ⦇ bold_real ⦈ (which means that δ::𝛿\delta:\mathbb{R}\to\mathbb{R}italic_δ : blackboard_R → blackboard_R and δ=(δ,δ)superscript𝛿𝛿𝛿\delta^{\prime}=(\delta,\nabla\delta)italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_δ , ∇ italic_δ )), we have that

(rγ(r)(δ(r)),rγ(r)(δ(r)))𝐫𝐞𝐚𝐥\displaystyle\Big{(}r\mapsto\gamma(r)(\delta(r)),r\mapsto\gamma^{\prime}(r)(% \delta^{\prime}(r))\Big{)}\in\llparenthesis\mathbf{real}\rrparenthesis( italic_r ↦ italic_γ ( italic_r ) ( italic_δ ( italic_r ) ) , italic_r ↦ italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_r ) ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_r ) ) ) ∈ ⦇ bold_real ⦈

The latter part finally means that we need to show that

rγ(r)(δ(r),δ(r))=(rγ(r)(δ(r)),(rγ(r)(δ(r))))maps-to𝑟superscript𝛾𝑟𝛿𝑟𝛿𝑟maps-to𝑟𝛾𝑟𝛿𝑟maps-to𝑟𝛾𝑟𝛿𝑟\displaystyle r\mapsto\gamma^{\prime}(r)(\delta(r),\nabla\delta(r))=(r\mapsto% \gamma(r)(\delta(r)),\nabla(r\mapsto\gamma(r)(\delta(r))))italic_r ↦ italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_r ) ( italic_δ ( italic_r ) , ∇ italic_δ ( italic_r ) ) = ( italic_r ↦ italic_γ ( italic_r ) ( italic_δ ( italic_r ) ) , ∇ ( italic_r ↦ italic_γ ( italic_r ) ( italic_δ ( italic_r ) ) ) )

Now, focussing on ev1subscriptsuperscriptev1\mathrm{ev}^{\prime}_{1}roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT: we need to show that

xev1(x)(γ(x))=(xev(x)(γ(x)),\displaystyle x\mapsto\mathrm{ev}^{\prime}_{1}(x)(\gamma^{\prime}(x))=(x% \mapsto\mathrm{ev}(x)(\gamma(x)),italic_x ↦ roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ) = ( italic_x ↦ roman_ev ( italic_x ) ( italic_γ ( italic_x ) ) ,
(xev(x)(γ(x))))\displaystyle\nabla(x\mapsto\mathrm{ev}(x)(\gamma(x))))∇ ( italic_x ↦ roman_ev ( italic_x ) ( italic_γ ( italic_x ) ) ) )

Inlining the definition of ev1subscriptsuperscriptev1\mathrm{ev}^{\prime}_{1}roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT: we need to show that

xγ(x)(x,1)=(xγ(x)(x),(xγ(x)(x)))maps-to𝑥superscript𝛾𝑥𝑥1maps-to𝑥𝛾𝑥𝑥maps-to𝑥𝛾𝑥𝑥\displaystyle x\mapsto\gamma^{\prime}(x)(x,1)=(x\mapsto\gamma(x)(x),\nabla(x% \mapsto\gamma(x)(x)))italic_x ↦ italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ( italic_x , 1 ) = ( italic_x ↦ italic_γ ( italic_x ) ( italic_x ) , ∇ ( italic_x ↦ italic_γ ( italic_x ) ( italic_x ) ) )

This follows by assumption by choosing δ(r)=r𝛿𝑟𝑟\delta(r)=ritalic_δ ( italic_r ) = italic_r, and hence δ(r)=(r,1)superscript𝛿𝑟𝑟1\delta^{\prime}(r)=(r,1)italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_r ) = ( italic_r , 1 ).

Focussing on ev2subscriptsuperscriptev2\mathrm{ev}^{\prime}_{2}roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT: we need to show that

xev2(x)(γ(x))=(xev(x)(γ(x)),(xev(x)(γ(x))))maps-to𝑥subscriptsuperscriptev2𝑥superscript𝛾𝑥maps-to𝑥ev𝑥𝛾𝑥maps-to𝑥ev𝑥𝛾𝑥\displaystyle x\mapsto\mathrm{ev}^{\prime}_{2}(x)(\gamma^{\prime}(x))=(x% \mapsto\mathrm{ev}(x)(\gamma(x)),\nabla(x\mapsto\mathrm{ev}(x)(\gamma(x))))italic_x ↦ roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ) = ( italic_x ↦ roman_ev ( italic_x ) ( italic_γ ( italic_x ) ) , ∇ ( italic_x ↦ roman_ev ( italic_x ) ( italic_γ ( italic_x ) ) ) )

Inlining ev2subscriptsuperscriptev2\mathrm{ev}^{\prime}_{2}roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT’s definition: we need to show that

(x((π1(γ(x)(x,0)),x((,0);γ(x);π1)(x))))=maps-to𝑥maps-tosubscript𝜋1superscript𝛾𝑥𝑥0𝑥0superscript𝛾𝑥subscript𝜋1𝑥absent\displaystyle\Big{(}x\mapsto((\pi_{1}(\gamma^{\prime}(x)(x,0)),x\mapsto\nabla(% (-,0);\gamma^{\prime}(x);\pi_{1})(x)))\Big{)}=( italic_x ↦ ( ( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ( italic_x , 0 ) ) , italic_x ↦ ∇ ( ( - , 0 ) ; italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ; italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_x ) ) ) ) =
(x((r,_)(π1(γ(x)(r,0)),((,0);γ(x);π1)(r)))(x,1))maps-to𝑥maps-to𝑟_subscript𝜋1superscript𝛾𝑥𝑟00superscript𝛾𝑥subscript𝜋1𝑟𝑥1\displaystyle\Big{(}x\mapsto((r,\_)\mapsto(\pi_{1}(\gamma^{\prime}(x)(r,0)),% \nabla((-,0);\gamma^{\prime}(x);\pi_{1})(r)))(x,1)\Big{)}( italic_x ↦ ( ( italic_r , _ ) ↦ ( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ( italic_r , 0 ) ) , ∇ ( ( - , 0 ) ; italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ; italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_r ) ) ) ( italic_x , 1 ) )

is equal to

x(sortγ(x))(x,1)=maps-to𝑥sortsuperscript𝛾𝑥𝑥1absent\displaystyle x\mapsto(\mathrm{sort}~{}\gamma^{\prime}(x))(x,1)=italic_x ↦ ( roman_sort italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ) ( italic_x , 1 ) =
(xγ(x)(x),(xγ(x)(x)))maps-to𝑥𝛾𝑥𝑥maps-to𝑥𝛾𝑥𝑥\displaystyle\Big{(}x\mapsto\gamma(x)(x),\nabla(x\mapsto\gamma(x)(x))\Big{)}( italic_x ↦ italic_γ ( italic_x ) ( italic_x ) , ∇ ( italic_x ↦ italic_γ ( italic_x ) ( italic_x ) ) )

That is, we need to show that π1(γ(x)(x,0))=γ(x)(x)subscript𝜋1superscript𝛾𝑥𝑥0𝛾𝑥𝑥\pi_{1}(\gamma^{\prime}(x)(x,0))=\gamma(x)(x)italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ( italic_x , 0 ) ) = italic_γ ( italic_x ) ( italic_x ) for all x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R, which holds by the assumption that (γ,γ)𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥(\gamma,\gamma^{\prime})\in\llparenthesis\mathbf{real}\to\mathbf{real}\rrparenthesis( italic_γ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ ⦇ bold_real → bold_real ⦈ by choosing δ(x)=x𝛿superscript𝑥𝑥\delta(x^{\prime})=xitalic_δ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_x (and hence δ(x)=(x,0)superscript𝛿superscript𝑥𝑥0\delta^{\prime}(x^{\prime})=(x,0)italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ( italic_x , 0 )) and then specializing to x=x𝑥superscript𝑥x=x^{\prime}italic_x = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Yet, ev1ev2subscriptsuperscriptev1subscriptsuperscriptev2\mathrm{ev}^{\prime}_{1}\neq\mathrm{ev}^{\prime}_{2}roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as ev1(a)(swap)=(1,a)subscriptsuperscriptev1𝑎swap1𝑎\mathrm{ev}^{\prime}_{1}(a)(\mathrm{swap})=(1,a)roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a ) ( roman_swap ) = ( 1 , italic_a ) and ev2(a)(swap)=(0,0)subscriptsuperscriptev2𝑎swap00\mathrm{ev}^{\prime}_{2}(a)(\mathrm{swap})=(0,0)roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a ) ( roman_swap ) = ( 0 , 0 ). This shows that ev1ev2subscriptsuperscriptev1subscriptsuperscriptev2\mathrm{ev}^{\prime}_{1}\neq\mathrm{ev}^{\prime}_{2}roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are both “valid” semantic derivatives of the evaluation function (ev)ev(\mathrm{ev})( roman_ev ) in our framework. In particular, it shows that semantic derivatives of higher order functions might not be unique. Our macro 𝒟𝒟\overrightarrow{\mathcal{D}}over→ start_ARG caligraphic_D end_ARG will return ev1subscriptsuperscriptev1\mathrm{ev}^{\prime}_{1}roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, but everything would still work just as well if it instead returned ev2subscriptsuperscriptev2\mathrm{ev}^{\prime}_{2}roman_ev start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

7.2. Canonical derivatives of higher order functions?

Differential geometers and analysts have long pursued notions of a canonical derivative of various higher order functions arising, for example, in the calculus of variations and in the study of infinite dimensional Lie groups [KM97]. Such an uncontroversial notion of derivative exists on various (infinite dimensional) spaces of functions that form suitable (so-called convenient) vector spaces, or, manifolds locally modelled on such vector spaces. At the level of generality of diffeological spaces, however, various natural notions of derivative that coincide in convenient vector spaces start to diverge and it is no longer clear what the best definition of a derivative is [CW14]. Another, fundamentally different setting that defines canonical derivatives of many higher order functions is given by synthetic differential geometry [Koc06].

While derivatives of higher order functions are of deep interest and have rightly been studied in their own right in differential geometry, we believe the situation is subtly different in computer science:

  1. (1)

    In programming applications, we use higher order programs only to construct the first order functions that we ultimately end up running and calculating derivatives of. Automatic differentiation methods can exploit this freedom: derivatives of higher order functions only matter in so far as they can be used to construct the correct derivatives of first order functions, so we can choose a simple and cheap notion of derivative among the valid options. As such, the fact that our semantics does not commit to a single notion of derivative of higher order functions can be seen as a feature rather than bug that models the pragmatics of programming practice.

  2. (2)

    While function spaces in differential geometry are typically infinite dimensional objects that are unsuitable for representation in the finite memory of a computer, higher order functions as used in programming are much more restricted: all they can do is call a function on finitely many arguments and analyse the function outputs. As such, function types in programming can be thought of as (locally) finite dimensional. In case a canonical notion of automatic derivative of higher order function is really desired, it may be worth pursuing a more intentional notion of semantics such as one based on game semantics. Such intentional techniques could capture the computational notion of higher order function better than our current (and other) extensional semantics using existing techniques from differential geometry. We hope that an exploration of such techniques might lead to an appropriate notion of computable derivative, even for higher order functions.

8. Discussion and future work

8.1. Summary

We have shown that diffeological spaces provide a denotational semantics for a higher order language with variants and inductive types (Section 4,5). We have used this to show correctness of simple forward-mode AD translations for calculating higher derivatives (Theorem 3, Theomem 8).

The structure of our elementary correctness argument for Theorem 3 is a typical logical relations proof over a denotational semantics. As explained in Section 6, this can equivalently be understood as a denotational semantics in a new kind of space obtained by categorical gluing.

Overall, then, there are two logical relations at play. One is in diffeological spaces, which ensures that all definable functions are smooth. The other is in the correctness proof (equivalently in the categorical gluing), which explicitly tracks the derivative of each function, and tracks the syntactic AD even at higher types.

8.2. Connection to the state of the art in AD implementation

As is common in denotational semantics research, we have here focused on an idealized language and simple translations to illustrate the main aspects of the method. There are a number of points where our approach is simplistic compared to the advanced current practice, as we now explain.

8.2.1. Representation of vectors

In our examples we have treated n𝑛nitalic_n-vectors as tuples of length n𝑛nitalic_n. This style of programming does not scale to large n𝑛nitalic_n. A better solution would be to use array types, following [SFVPJ19]. As demonstrated by [CJS20], our categorical semantics and correctness proofs straightforwardly extend to cover them, in a similar way to our treatment of lists. In fact, [CJS20] formalizes our correctness arguments in Coq and extends them to apply to the system of [SFVPJ19].

8.2.2. Efficient forward-mode AD

For AD to be useful, it must be fast. The (1,1)11(1,1)( 1 , 1 )-AD macro 𝒟(1,1)subscript𝒟11\scalebox{0.8}{$\overrightarrow{\mathcal{D}}$}_{(1,1)}over→ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT ( 1 , 1 ) end_POSTSUBSCRIPT that we use is the basis of an efficient AD library [SFVPJ19]. Numerous optimizations are needed, ranging from algebraic manipulations, to partial evaluations, to the use of an optimizing C compiler, but the resulting implementation is performant in experiments [SFVPJ19]. The Coq formalization [CJS20] validates some of these manipulations using a similar semantics to ours. We believe the implementation in [SFVPJ19] can be extended to apply to the more general (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-AD methods we described in this paper through minor changes.

8.2.3. Reverse-mode and mixed-mode AD

While forward-mode AD methods are useful, many applications require reverse-mode AD, or even mixed-mode AD for efficiency. In [HSV20a], we described how our correctness proof applies to a continuation-based AD technique that closely resembles reverse-mode AD, but only has the correct complexity under a non-standard operational semantics [BMP20] (in particular, the linear factoring rule is crucial). It remains to be seen whether this technique and its correctness proof can be adapted to yield genuine reverse AD under a standard operational semantics.

Alternatively, by relying on a variation of our techniques, [Vák21] gives a correctness proof of a rather different (1,1)11(1,1)( 1 , 1 )-reverse AD algorithm that stores the (primal, adjoint)-vector pair as a struct-of-arrays rather than as an array-of-structs. Future work could explore extended its analysis to (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-reverse AD and mixed-mode AD for efficiently computing higher order derivatives.

8.2.4. Other language features

The idealized languages that we considered so far do not touch on several useful language constructs. For example: the use of functions that are partial (such as division) or partly-smooth (such as ReLU); phenomena such as iteration, recursion; and probabilities. Recent work by MV [Vák20] shows how our analysis of (1,1)11(1,1)( 1 , 1 )-AD extends to apply to partiality, iteration, and recursion. This development is orthogonal to the one in this paper: its methods combine directly with those in the present paper to analyze (k,R)𝑘𝑅(k,R)( italic_k , italic_R )-forward mode AD of recursive programs. We leave the analysis of AD of probabilistic programs for future work.

Acknowledgment

We have benefited from discussing this work with many people, including M. Betancourt, B. Carpenter, O. Kammar, C. Mak, L. Ong, B. Pearlmutter, G. Plotkin, A. Shaikhha, J. Sigal, and others. In the course of this work, MV has also been employed at Oxford (EPSRC Project EP/M023974/1) and at Columbia in the Stan development team. This project has also received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 895827; a Royal Society University Research Fellowship; the ERC BLAST grant; the Air Force Office of Scientific Research under award number FA9550–21–1–0038; and a Facebook Research Award.

References

  • [AAB+16] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265–283, 2016.
  • [Ama12] Shun-ichi Amari. Differential-geometrical methods in statistics, volume 28. Springer Science & Business Media, 2012.
  • [AP20] Martín Abadi and Gordon D Plotkin. A simple differentiable programming language. In Proc. POPL 2020. ACM, 2020.
  • [BCLG20] Gilles Barthe, Raphaëlle Crubillé, Ugo Dal Lago, and Francesco Gavazzo. On the versatility of open logical relations: Continuity, automatic differentiation, and a containment theorem. In Proc. ESOP 2020. Springer, 2020. To appear.
  • [Bet18] Michael Betancourt. A geometric theory of higher-order automatic differentiation. arXiv preprint arXiv:1812.11592, 2018.
  • [BH11] John Baez and Alexander Hoffnung. Convenient categories of smooth spaces. Transactions of the American Mathematical Society, 363(11):5789–5825, 2011.
  • [BJD19] Jesse Bettencourt, Matthew J Johnson, and David Duvenaud. Taylor-mode automatic differentiation for higher-order derivatives in JAX. 2019.
  • [BML+20] Gilbert Bernstein, Michael Mara, Tzu-Mao Li, Dougal Maclaurin, and Jonathan Ragan-Kelley. Differentiating a tensor language. arXiv preprint arXiv:2008.11256, 2020.
  • [BMP20] Alois Brunel, Damiano Mazza, and Michele Pagani. Backpropagation in the simply typed lambda-calculus with linear negation. In Proc. POPL 2020, 2020.
  • [BS96] Claus Bendtsen and Ole Stauning. Fadbad, a flexible C++ package for automatic differentiation. Technical report, Technical Report IMM–REP–1996–17, Department of Mathematical Modelling, Technical University of Denmark, Lyngby, 1996.
  • [BS97] Claus Bendtsen and Ole Stauning. Tadiff, a flexible c++ package for automatic differentiation. TU of Denmark, Department of Mathematical Modelling, Lungby. Technical report IMM-REP-1997-07, 1997.
  • [CCG+20] J. Robin B. Cockett, Geoff S. H. Cruttwell, Jonathan Gallagher, Jean-Simon Pacaud Lemay, Benjamin MacAdam, Gordon D. Plotkin, and Dorette Pronk. Reverse derivative categories. In Proc. CSL 2020, 2020.
  • [CGM19] Geoff Cruttwell, Jonathan Gallagher, and Ben MacAdam. Towards formalizing and extending differential programming using tangent categories. In Proc. ACT 2019, 2019.
  • [CHB+15] Bob Carpenter, Matthew D Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. The Stan math library: Reverse-mode automatic differentiation in C++. arXiv preprint arXiv:1509.07164, 2015.
  • [CJS20] Curtis Chin Jen Sem. Formalized correctness proofs of automatic differentiation in Coq. Master’s Thesis, Utrecht University, 2020. Thesis: https://dspace.library.uu.nl/handle/1874/400790. Coq code: https://github.com/crtschin/thesis.
  • [CRBD18] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018.
  • [CS96] G Constantine and T Savits. A multivariate Faa di Bruno formula with applications. Transactions of the American Mathematical Society, 348(2):503–520, 1996.
  • [CS11] J Robin B Cockett and Robert AG Seely. The Faa di Bruno construction. Theory and Applications of Categories, 25(15):394–425, 2011.
  • [CW14] J Daniel Christensen and Enxin Wu. Tangent spaces and tangent bundles for diffeological spaces. arXiv preprint arXiv:1411.5425, 2014.
  • [DHS11] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121–2159, 2011.
  • [Ell18] Conal Elliott. The simple essence of automatic differentiation. Proceedings of the ACM on Programming Languages, 2(ICFP):70, 2018.
  • [EM03] L Hernández Encinas and J Munoz Masque. A short proof of the generalized Faà di Bruno’s formula. Applied Mathematics Letters, 16(6):975–979, 2003.
  • [ER03] Thomas Ehrhard and Laurent Regnier. The differential lambda-calculus. Theoretical Computer Science, 309(1-3):1–41, 2003.
  • [FJL18] Roy Frostig, Matthew James Johnson, and Chris Leary. Compiling machine learning programs via high-level tracing. Systems for Machine Learning, 2018.
  • [FST19] Brendan Fong, David Spivak, and Rémy Tuyéras. Backprop as functor: A compositional perspective on supervised learning. In 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–13. IEEE, 2019.
  • [GSS00] Izrail Moiseevitch Gelfand, Richard A Silverman, and Richard A Silverman. Calculus of variations. Courier Corporation, 2000.
  • [GUW00] Andreas Griewank, Jean Utke, and Andrea Walther. Evaluating higher derivative tensors by forward propagation of univariate taylor series. Mathematics of Computation, 69(231):1117–1130, 2000.
  • [HG14] Matthew D Hoffman and Andrew Gelman. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014.
  • [HSV20a] Mathieu Huot, Sam Staton, and Matthijs Vákár. Correctness of automatic differentiation via diffeologies and categorical gluing. In FoSSaCS, pages 319–338, 2020.
  • [HSV20b] Mathieu Huot, Sam Staton, and Matthijs Vákár. Correctness of automatic differentiation via diffeologies and categorical gluing. Full version, 2020. arxiv:2001.02209.
  • [IZ13] Patrick Iglesias-Zemmour. Diffeology. American Mathematical Soc., 2013.
  • [JLS07] Peter T Johnstone, Stephen Lack, and P Sobocinski. Quasitoposes, quasiadhesive categories and Artin glueing. In Proc. CALCO 2007, 2007.
  • [JR11] Bart Jacobs and JMMM Rutten. An introduction to (co)algebras and (co)induction. In Advanced Topics in Bisimulation and Coinduction, pages 38–99. CUP, 2011.
  • [Kar01] Jerzy Karczmarczuk. Functional differentiation of computer programs. Higher-Order and Symbolic Computation, 14(1):35–57, 2001.
  • [KB14] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [KK04] Dana A Knoll and David E Keyes. Jacobian-free Newton–Krylov methods: a survey of approaches and applications. Journal of Computational Physics, 193(2):357–397, 2004.
  • [KM97] Andreas Kriegl and Peter W Michor. The convenient setting of global analysis, volume 53. American Mathematical Soc., 1997.
  • [Koc06] Anders Kock. Synthetic differential geometry, volume 333. Cambridge University Press, 2006.
  • [KSM99] Ivan Kolár, Jan Slovák, and Peter W Michor. Natural operations in differential geometry. 1999.
  • [KTR+17] Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. Automatic differentiation variational inference. The Journal of Machine Learning Research, 18(1):430–474, 2017.
  • [KW+52] Jack Kiefer, Jacob Wolfowitz, et al. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 23(3):462–466, 1952.
  • [Lee13] John M Lee. Smooth manifolds. In Introduction to Smooth Manifolds, pages 1–31. Springer, 2013.
  • [LMG18] Sören Laue, Matthias Mitterreiter, and Joachim Giesen. Computing higher order derivatives of matrix and tensor expressions. Advances in Neural Information Processing Systems, 31:2750–2759, 2018.
  • [LMG20] Sören Laue, Matthias Mitterreiter, and Joachim Giesen. A simple and efficient tensor calculus. In AAAI, pages 4527–4534, 2020.
  • [LN89] Dong C Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical programming, 45(1-3):503–528, 1989.
  • [LNV21] Fernando Lucatelli Nunes and Matthijs Vákár. CHAD for expressive total languages. arXiv e-prints, pages arXiv–2110, 2021.
  • [LYRY20] Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. On correctness of automatic differentiation for non-differentiable functions. In Advances in Neural Information Processing Systems, 2020.
  • [Man12] Oleksandr Manzyuk. A simply typed λ𝜆\lambdaitalic_λ-calculus of forward automatic differentiation. In Proc. MFPS 2012, 2012.
  • [Mar10] James Martens. Deep learning via Hessian-free optimization. In ICML, volume 27, pages 735–742, 2010.
  • [Mer04] Joel Merker. Four explicit formulas for the prolongations of an infinitesimal lie symmetry and multivariate Faa di Bruno formulas. arXiv preprint math/0411650, 2004.
  • [MO20] Carol Mak and Luke Ong. A differential-form pullback programming language for higher-order reverse-mode automatic differentiation. arxiv:2002.08241, 2020.
  • [MP21] Damiano Mazza and Michele Pagani. Automatic differentiation in PCF. Proc. ACM Program. Lang., 5(POPL):1–27, 2021. doi:10.1145/3434309.
  • [MS92] John C Mitchell and Andre Scedrov. Notes on sconing and relators. In International Workshop on Computer Science Logic, pages 352–378. Springer, 1992.
  • [Nea11] Radford M Neal. MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, chapter 5. Chapman & Hall / CRC Press, 2011.
  • [PGC+17] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
  • [Pit95] Andrew M Pitts. Categorical logic. Technical report, University of Cambridge, Computer Laboratory, 1995.
  • [Plo18] Gordon D Plotkin. Some principles of differential programming languages. Invited talk, POPL 2018, 2018.
  • [PS07] Barak A Pearlmutter and Jeffrey Mark Siskind. Lazy multivariate higher-order forward-mode ad. ACM SIGPLAN Notices, 42(1):155–160, 2007.
  • [PS08] Barak A Pearlmutter and Jeffrey Mark Siskind. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(2):7, 2008.
  • [Qia99] Ning Qian. On the momentum term in gradient descent learning algorithms. Neural networks, 12(1):145–151, 1999.
  • [RM51] Herbert Robbins and Sutton Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400–407, 1951.
  • [Sav06] Thomas H Savits. Some statistical applications of Faa di Bruno. Journal of Multivariate Analysis, 97(10):2131–2140, 2006.
  • [SFVPJ19] Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, and Simon Peyton Jones. Efficient differentiable programming in a functional array-processing language. Proceedings of the ACM on Programming Languages, 3(ICFP):97, 2019.
  • [SMC20] Benjamin Sherman, Jesse Michel, and Michael Carbin. λSsubscript𝜆𝑆\lambda_{S}italic_λ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT: Computable semantics for differentiable programming with higher-order functions and datatypes. arXiv preprint arXiv:2007.08017, 2020.
  • [Sou80] Jean-Marie Souriau. Groupes différentiels. In Differential geometrical methods in mathematical physics, pages 91–128. Springer, 1980.
  • [Sta11] Andrew Stacey. Comparative smootheology. Theory Appl. Categ., 25(4):64–117, 2011.
  • [Vák20] Matthijs Vákár. Denotational correctness of foward-mode automatic differentiation for iteration and recursion. arXiv preprint arXiv:2007.05282, 2020.
  • [Vák21] Matthijs Vákár. Reverse AD at higher types: Pure, principled and denotationally correct. In ESOP, pages 607–634, 2021.
  • [VMBBL18] Bart Van Merriënboer, Olivier Breuleux, Arnaud Bergeron, and Pascal Lamblin. Automatic differentiation in ML: Where we are and where we should be going. In Advances in Neural Information Processing Systems, pages 8757–8767, 2018.
  • [VS21] Matthijs Vákár and Tom Smeding. CHAD: Combinatory homomorphic automatic differentiation. arXiv preprint arXiv:2103.15776, 2021.
  • [WGP16] Mu Wang, Assefaw Gebremedhin, and Alex Pothen. Capitalizing on live variables: new algorithms for efficient hessian computation via automatic differentiation. Mathematical Programming Computation, 8(4):393–433, 2016.
  • [WWE+19] Fei Wang, Xilun Wu, Gregory Essertel, James Decker, and Tiark Rompf. Demystifying differentiable programming: Shift/reset the penultimate backpropagator. Proceedings of the ACM on Programming Languages, 3(ICFP), 2019.
  • [ZHCW20] Shaopeng Zhu, Shih-Han Hung, Shouvanik Chakrabarti, and Xiaodi Wu. On the principles of differentiable quantum programming languages. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, pages 272–285. ACM, 2020. doi:10.1145/3385412.3386011.

Appendix A 𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp and 𝐌𝐚𝐧𝐌𝐚𝐧\mathbf{Man}bold_Man are not cartesian closed categories

Lemma 11.

There is no continuous injection d+1dsuperscript𝑑1superscript𝑑\mathbb{R}^{d+1}\to\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Proof A.1.

If there were, it would restrict to a continuous injection Sddsuperscript𝑆𝑑superscript𝑑S^{d}\to\mathbb{R}^{d}italic_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The Borsuk-Ulam theorem, however, tells us that every continuous f:Sdd:𝑓superscript𝑆𝑑superscript𝑑f:S^{d}\to\mathbb{R}^{d}italic_f : italic_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT has some xSd𝑥superscript𝑆𝑑x\in S^{d}italic_x ∈ italic_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that f(x)=f(x)𝑓𝑥𝑓𝑥f(x)=f(-x)italic_f ( italic_x ) = italic_f ( - italic_x ), which is a contradiction.

Let us define the terms:

x0:𝐫𝐞𝐚𝐥,,xn:𝐫𝐞𝐚𝐥tn=λy.x0+x1y++xnyn:𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥{x}_{0}:\mathbf{real},\ldots,{x}_{n}:\mathbf{real}\vdash{t}_{n}=\lambda{y}.{{x% }_{0}+{x}_{1}*y+\dots+{x}_{n}*y^{n}}:\mathbf{real}\to\mathbf{real}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : bold_real , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : bold_real ⊢ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_λ italic_y . italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ italic_y + ⋯ + italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∗ italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : bold_real → bold_real

Assuming that 𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp/𝐌𝐚𝐧𝐌𝐚𝐧\mathbf{Man}bold_Man is cartesian closed, observe that these get interpreted as injective continuous (because smooth) functions n𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\mathbb{R}^{n}\to\llbracket\mathbf{real}\to\mathbf{real}\rrbracketblackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → ⟦ bold_real → bold_real ⟧ in 𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp and 𝐌𝐚𝐧𝐌𝐚𝐧\mathbf{Man}bold_Man.

Theorem 12.

𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp is not cartesian closed.

Proof A.2.

In case 𝐂𝐚𝐫𝐭𝐒𝐩𝐂𝐚𝐫𝐭𝐒𝐩\mathbf{CartSp}bold_CartSp were cartesian closed, we would have 𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥=𝐫𝐞𝐚𝐥n\llbracket\mathbf{real}\to\mathbf{real}\rrbracket=\mathbf{real}^{n}⟦ bold_real → bold_real ⟧ = bold_real start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for some n𝑛nitalic_n. Then, we would get, in particular a continuous injection tn+1:n+1n\llbracket{t}_{n+1}\rrbracket:\mathbb{R}^{n+1}\to\mathbb{R}^{n}⟦ italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⟧ : blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, which contradicts Lemma 11.

Theorem 13.

𝐌𝐚𝐧𝐌𝐚𝐧\mathbf{Man}bold_Man is not cartesian closed.

Proof A.3.

Observe that we have ιn:nn+1:subscript𝜄𝑛superscript𝑛superscript𝑛1\iota_{n}:\mathbb{R}^{n}\to\mathbb{R}^{n+1}italic_ι start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT; a0,,ana0,,an,0maps-tosubscript𝑎0subscript𝑎𝑛subscript𝑎0subscript𝑎𝑛0\langle a_{0},\ldots,a_{n}\rangle\mapsto\langle a_{0},\ldots,a_{n},0\rangle⟨ italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ ↦ ⟨ italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , 0 ⟩ and that ιn;tn+1=tn\iota_{n};\llbracket{t}_{n+1}\rrbracket=\llbracket{t}_{n}\rrbracketitalic_ι start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ; ⟦ italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⟧ = ⟦ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟧. Let us write Ansubscript𝐴𝑛A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for the image of tndelimited-⟦⟧subscript𝑡𝑛\llbracket{t}_{n}\rrbracket⟦ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟧ and A=nAn𝐴subscript𝑛subscript𝐴𝑛A=\cup_{n\in\mathbb{N}}A_{n}italic_A = ∪ start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Then, Ansubscript𝐴𝑛A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is connected because it is the continuous image of a connected set. Similarly, A𝐴Aitalic_A is connected because it is the non-disjoint union of connected sets. This means that A𝐴Aitalic_A lies in a single connected component of 𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥delimited-⟦⟧𝐫𝐞𝐚𝐥𝐫𝐞𝐚𝐥\llbracket\mathbf{real}\to\mathbf{real}\rrbracket⟦ bold_real → bold_real ⟧, which is a manifold with some finite dimension, say d𝑑ditalic_d.

Take some xd+1𝑥superscript𝑑1x\in\mathbb{R}^{d+1}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT (say, 00), take some open d𝑑ditalic_d-ball U𝑈Uitalic_U around td+1(x)\llbracket{t}_{d+1}\rrbracket(x)⟦ italic_t start_POSTSUBSCRIPT italic_d + 1 end_POSTSUBSCRIPT ⟧ ( italic_x ), and take some open d+1𝑑1d+1italic_d + 1-ball V𝑉Vitalic_V around x𝑥xitalic_x in td+11(U)\llbracket{t}_{d+1}\rrbracket^{-1}(U)⟦ italic_t start_POSTSUBSCRIPT italic_d + 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_U ). Then, td+1delimited-⟦⟧subscript𝑡𝑑1\llbracket{t}_{d+1}\rrbracket⟦ italic_t start_POSTSUBSCRIPT italic_d + 1 end_POSTSUBSCRIPT ⟧ restricts to a continuous injection from V𝑉Vitalic_V to U𝑈Uitalic_U, or equivalently, d+1superscript𝑑1\mathbb{R}^{d+1}blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT to dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, which contradicts Lemma 11.