Universal approximation on non-geometric rough paths and applications to financial derivatives pricing

Fred Espen Benth and Fabian A. Harang and Fride Straum

Abstract.

We present a novel perspective on the universal approximation theorem for rough path functionals, introducing a polynomial-based approximation class. We extend universal approximation to non-geometric rough paths within the tensor algebra. This development addresses critical needs in finance, where no-arbitrage conditions necessitate Itô integration. Furthermore, our findings motivate a hypothesis for payoff functionals in financial markets, allowing straightforward analysis of signature payoffs proposed in [Arr18].

Key words and phrases:

Signature payoff, derivatives pricing, path dependence, rough paths, signatures, model free finance, universal approximation

2020 Mathematics Subject Classification:

60L10; 60L90; 60H05;91G20; 91G60

We acknowledge the support from the Center for Advanced Study (CAS) in Oslo, Norway, which funded and hosted our research project "Signatures for Images" during the 2023/24 academic year

1. Introduction

Financial derivatives allow firms to hedge exposure to diverse financial risks, such as price, currency, or interest rate fluctuations. In commodity markets, participants contend with uncertainties stemming from production volumes, demand variability, and transportation logistics. To address these challenges, a variety of financial derivatives are traded, designed to mitigate risk and stabilize revenues. These instruments often involve complex transactions with multivariate payoffs that replicate potential future cash flows. For example, power producers manage both price and volume risks, while renewable energy producers face weather-driven uncertainties, such as wind or cloud cover. Retailers, on the other hand, contend with temperature-sensitive demand variations. Multivariate derivatives, such as spread and quanto options, are particularly attractive for managing such intertwined risks, although their pricing remains a significant challenge due to the lack of closed-form solutions.

Pricing these products typically relies on numerical approximations of theoretical prices using techniques like Monte Carlo simulations or numerical solutions of partial differential equations. However, the inherent complexity and high dimensionality of such problems demand novel mathematical tools to improve efficiency and accuracy.

In recent years, insights from the theory of rough paths, introduced by Lyons [Lyo98], have emerged as a powerful framework for understanding stochastic processes and their functionals. A central concept in this theory is the signature, a collection of iterated integrals that captures the fundamental characteristics of a path. Initially studied by Chen [Che54] from an algebraic topology perspective, the signature framework has since evolved into a dynamic field of research, offering new methodologies for modelling and computation in various domains. More precisely, for a smooth path $X:[0,T]\rightarrow\mathbb{R}^{d}$ we define $\mathop{}\!\mathrm{d}X_{t}=\frac{\mathop{}\!\mathrm{d}}{\mathop{}\!\mathrm{d}t% }X_{t}$ , and for $s\leq t$

\mathbf{X}^{n}_{s,t}:=\int_{s<r_{1}<\dots<r_{n}<t}\bigotimes_{i=1}^{n}\mathop{% }\!\mathrm{d}X_{r_{i}}\in(\mathbb{R}^{d})^{\otimes n}.

(1.1)

For $p\in\mathbb{N}$ , we define the $p$ -truncated signature $\mathbf{X}^{\leq p}$ by

\mathbf{X}^{\leq p}_{s,t}=\left(1,\mathbf{X}^{1}_{s,t},\ldots,\mathbf{X}^{p}_{% s,t}\right)\in\{\mathbf{1}\}\otimes\bigotimes_{i=1}^{p}(\mathbb{R}^{d})^{% \otimes i}=T^{p}(\mathbb{R}^{d})

and the signature $\mathbf{X}^{\leq\infty}$ as the limit when $p\rightarrow\infty$ . The space $T^{p}(\mathbb{R}^{d})$ is known as the truncated tensor algebra, and we call $T((\mathbb{R}^{d}))$ given formally as $T^{\infty}(\mathbb{R}^{d})$ , the (extended) tensor algebra. A more detailed introduction will be provided in Section 2. Beyond smooth paths, the theory extends to stochastic paths, where probabilistic tools enable the construction of these integrals under frameworks like Itô or Stratonovich integration (see e.g. [FV10]).

In many ways, the iterated integral signature can be seen as an infinite-dimensional extension of polynomials. In fact, it shares many of the key features of polynomials, but related to one-parameter functions. Some key properties are

(i)

Chen’s relation holds, i.e., the signature is a multiplicative functional. More specifically, for any $u\in[s,t]$ then $\mathbf{X}_{s,u}\otimes\mathbf{X}_{u,t}=\mathbf{X}_{s,t}$ . The tensor product is understood in the tensor algebra $T((\mathbb{R}^{d}))$ , which is defined in Section 2. This not only provides computational efficiency but also serves as a fundamental building block in rough integration theory.
(ii)

The signature uniquely characterizes the path up to tree-like equivalence [HL10].
(iii)

It is re-parametrization invariant; for a monotone increasing function $\phi:[0,T]\rightarrow[0,T]$ with $\phi(0)=0$ and $\phi(T)=T$ , and define $\bar{X}_{t}=X_{\phi(t)}$ , then $\bar{\mathbf{X}}^{\leq\infty}_{0,T}=\mathbf{X}^{\leq\infty}_{0,T}$ [FV10].
(iv)

The signature is invariant to translation; $Y_{t}=X_{t}+a$ for some $a\in\mathbb{R}^{d}$ , then $\mathbf{Y}=\mathbf{X}$ .
(v)

It is associated with a rich algebraic structure (see e.g. [FV10] for an introduction).
(vi)

The signature is tightly connected to stochastic integration theory, and naturally encode information about stochastic integration choices, enriching the understanding of stochastic integration theory beyond the classical martingale theory [FH14].
(vii)

The signature characterizes the law of stochastic processes [CO22].

Importantly, the signature serves as a universal approximation basis for continuous functionals on path space, akin to how polynomials approximate real-valued functions. This universal approximation property allows any continuous functional on the space of Lipschitz paths to be approximated arbitrarily well by a linear functional of the signature [CPSF23]. The universal approximation property is proven through the Stone-Weierstrass theorem, and thus requires a feature set that forms a sub-algebra of the continous functionals on path-space.

The signature framework extends to stochastic functional approximation. A key challenge lies in encoding the choice of stochastic integration—such as Itô or Stratonovich—into the functional representation. While the universal approximation property has been extended to geometric rough paths, which naturally align with Stratonovich integration, this leaves a significant gap for Itô-based financial functionals, which are most prevalent in practice.

This article addresses the challenges of applying signature-based universal approximation to non-geometric rough paths, with a focus on practical implementation in financial markets. By bridging the gap between non-geometric rough paths and universal approximation, we investigate the framework for efficient pricing in the Itô setting for complex financial derivatives, providing several examples throughout the text. The remainder of the paper develops the theoretical foundations and explores applications in detail.

1.1. Main ideas and contribution

In this article we present a universal approximation result for functionals of non-geometric rough paths. The main challenge with non-geometric rough paths and by extension the non-geometric signature, is that the multiplication of two elements in the non-geometric signature does not yield another element contained in the same signature. This implies that just considering the linear span of signature elements as the subset of continuous linear functionals to use for functional approximation is not sufficiently rich to become an algebra; a strict requirement of the Stone-Weierstrass theorem. To overcome this challenge, we enrich the feature set, the linear span of our signature terms, by polynomials of the signature terms. This becomes a very large class of features that we use for universal approximation, but provides the sufficient set which guarantees universal approximation in the non-geometric setting. When applying this theorem to geometric rough paths, the approximation can be written as a linear functional acting on the signature through the shuffle product.

In finance the universal approximation theorem with signatures has been successfully used in the context of pricing complex financial derivatives, see [Arr18, LNPA20, LNPA19]. Such derivatives typically have a payoff of functional form, in practice often a so-called \sayAsian structure. There the payoff depends not on the price at a given time, but on the average price over a time period, thus introducing an integral $\int_{a}^{b}X_{r}\mathop{}\!\mathrm{d}r$ with the price being $t\mapsto X_{t}$ . For even more complicated structures, one can encounter derivatives in energy markets, where the payoff is given as an integral over the product of two stochastic processes, $\int_{a}^{b}X_{r}Y_{r}\mathop{}\!\mathrm{d}r$ , for example representing electricity price and temperature. In even more complicated models, one could imagine compositions of a finite number of such structures. The point is that payoff functionals, mapping paths to prices $X\mapsto F(X)$ , are typically given in a very specific form of (lower-ordered) signature functionals. In the examples above, the payoff functional can be seen as continuous function $f$ acting on a finite number of terms in the signature of the price signals, enriched with a time component.

This motivates a working hypothesis of the paper; we consider functionals that we assume to be given as continuous functions acting on terms from the signature. While being a relevant hypothesis in many practical applications, one can resort to classical function approximation techniques to approximate the continuous function of interest. Approximating this function through some polynomial can under various assumptions yield convergence rates for the approximation, providing theoretical guarantees important for practical implementation. This simplifies the derivatives pricing problem from [Arr18] where machine learning techniques are suggested to find the functional approximation, resulting in an approximation which is difficult to analyze from a practical perspective. In contrast, we believe that our approach, mixing the new universal approximation method, with the hypothesis suited for payoff functionals provides a simple way of using signature features for practical pricing problems arising in financial markets.

Throughout the article, we illustrate our results and contributions through examples, with an emphasis on derivatives pricing in energy markets where there exists many complex derivatives structures. We emphasise deriving rather explicit conditions ensuring convergence rates for the approximations that we introduce.

1.2. Organization of the paper

In Section 2 we provide a basic introduction of the signature and building the algebraic foundation for our analysis. As this article is targeting an audience in financial mathematics, we have chosen to provide a detailed introduction to the algebraic side. In Section 3 we recall the state of the art in universal approximation, and propose the new universal approximation for non-geometric rough paths. In Section 4 we discuss stochastic price paths and the computation of signature correlators. Section 5 combines our considerations and provides some approximation results for financial derivatives. This is highlighted with a discussion on the applications in energy markets. At last we provide a conclusion with an outlook to future developments in Section 6.

1.3. Notation

For a complete metric space $(E,d_{E})$ , we denote by $C([0,T];E)$ the space of continuous paths $X:[0,T]\rightarrow E$ with the uniform topology. The set of continuous paths with finite $p\geq 1$ variation from $[0,T]$ into $E$ is denoted by $V^{p}([0,T];E)$ , and is equipped with the norm

\|X\|_{p,[0,T]}:=\left(\sup_{P\in\mathcal{D}}\sum_{t_{k}\in P}d_{E}(X_{t_{k}},% X_{t_{k+1}})^{p}\right)^{\frac{1}{p}},

where $\mathcal{D}$ is the collection of all partitions over $[0,T]$ . The subset of continuous paths with finite $p$ -variation is denoted by $V^{p}_{c}([0,T];E)$ . Whenever the interval $[0,T]$ under consideration is otherwise clear, we simply write $\|X\|_{p}=\|X\|_{p,[0,T]}$ . Recall that when $x\in\mathbb{N}\setminus\{0\}$ , the gamma function $\Gamma$ satisfies $\Gamma(x+1)=x!$ , and by slightly abuse of notation we will write $x!:=\Gamma(x+1)$ for $x>0$ . Throughout the article, we will occasionally consider Hilbert spaces, denoted by $H$ , and where the inner product is then given by $\langle\cdot,\cdot\rangle_{H}$ and the associated norm is denoted by $|\cdot|_{H}$ . If the space $H$ under consideration is clear, we dismiss the index and simply write $\langle\cdot,\cdot\rangle$ and $|\cdot|$ for the inner product and norm. For a path $X:[0,T]\rightarrow\mathbb{R}^{d}$ we define the time-enhanced path $\hat{X}:[0,T]\rightarrow\mathbb{R}^{d+1}$ , where $\hat{X}_{t}=(t,X_{t})$ . This notation will be used consistently throughout the article. Frequently, we resort to the notation $\triangle_{T}=\{(s,t)\in[0,T]^{2}\mid 0\leq s\leq t\leq T\}$ .

2. Basics of words and signatures

In this section, we will provide a fundamental overview of the conventions and concepts related to weighted words, denoted by $\pi$ , and signatures, represented by $\mathbf{X}^{\leq\infty}$ , along with their pairing $\langle\pi,\mathbf{X}^{\leq\infty}\rangle$ .

2.1. Words

Given that we are working with $\mathbb{R}^{d}$ -valued paths $X:[0,T]\rightarrow\mathbb{R}^{d}$ , the alphabet of our consideration is $\mathcal{A}=\{1,\dots,d\}$ . Throughout this article, $\mathcal{A}$ will denote the set $\{1,\dots,d\}$ unless otherwise stated.

Definition 2.1.

A word of length $n\in\mathbb{N}$ is a sequence $w=\mathbf{i}_{1}\dots\mathbf{i}_{n}$ where $\mathbf{i}_{j}\in\mathcal{A}$ for every $j=1,\dots,n$ . We denote by $\mathcal{W}_{n}$ the set of all words of length $n\in\mathbb{N}$ . For $n=0$ we have that $\mathcal{W}_{0}=\{\emptyset\}$ where $\emptyset$ is the empty word. We further let $\mathcal{W}$ denote the set of all words.

The algebra $\mathbb{R}\langle\mathcal{A}\rangle$ is introduced as $T(\text{vect}_{\mathbb{R}}(\mathcal{A}))$ , representing the vector space generated by $\mathcal{W}$ .

Definition 2.2.

The algebra of all non-commutative polynomials in $\mathcal{A}$ is defined to be

\mathbb{R}\langle\mathcal{A}\rangle=\left\{\pi=\sum_{w\in\mathcal{W}}\alpha_{w% }w\mid\alpha_{w}\in\mathbb{R},\;\alpha_{w}\neq 0\text{ for a finite number of % }w\in\mathcal{W}\right\}.

We refer to $\pi\in\mathbb{R}\langle\mathcal{A}\rangle$ as a weighted word.

$\mathbb{R}\langle\mathcal{A}\rangle$ forms an algebra with respect to concatenation which is for $w=\mathbf{i}_{1}\dots\mathbf{i}_{n}$ and $w^{\prime}=\mathbf{j}_{1}\dots\mathbf{j}_{m}$ given by $w\cdot w^{\prime}=ww^{\prime}=\mathbf{i}_{1}\dots\mathbf{i}_{n}\mathbf{j}_{1}% \dots\mathbf{j}_{m}.$

Example 2.3.

Suppose we work with an alphabet given by $\mathcal{A}=\{\mathbf{a,b,c}\}$ . Then we have that

\mathcal{W}=\{\emptyset,\mathbf{a,b,c,aa,ab,ac},\dots\},

where $\mathbf{ab}\neq\mathbf{ba}$ . As an example, one element $\pi\in\mathbb{R}\langle\mathcal{A}\rangle$ is

\pi=\emptyset+2\mathbf{a}+\sqrt{3}\mathbf{ab}+\cdots+100\mathbf{abcbac}.

Moreover, the concatenation of the weighted words $(3\mathbf{ab+a})$ and $(\mathbf{b+c})$ is

(3\mathbf{ab+a})\cdot(\mathbf{b+c})=3\mathbf{abb}+3\mathbf{abc+ab+ac}.

Note that we use alphabets consisting of the natural numbers up to dimensions $d$ in this paper. However, to separate clearly between the weights $a_{w}$ and the words $w$ , we used letters for the alphabet in this example.

Remark 2.4.

In the next Subsection, we will introduce the concept of signatures. However, it is worth noting that the signature $\mathbf{X}^{\leq\infty}$ of a path $X:[0,T]\rightarrow\mathbb{R}^{d}$ can also be defined recursively through the projections of words. Since the path is $\mathbb{R}^{d}$ -valued, let $\mathcal{W}$ denote the collection of words formed from the alphabet $\mathcal{A}=\{1,\dots,d\}$ . We can regard the signature $\mathbf{X}^{\leq\infty}$ as an element of $C(\triangle_{T})^{\mathcal{W}}$ . To formalize this, we define it recursively using the projections $\langle w,\mathbf{X}^{\leq\infty}\rangle$ for $w\in\mathcal{W}$ , as follows:

First, for the empty word, we set $\langle\emptyset,\mathbf{X}^{\leq\infty}_{s,t}\rangle:=1$ . For any non-empty word $w=\mathbf{i}_{1}\dots\mathbf{i}_{n}\in\mathcal{W}\setminus\{\emptyset\}$ , we define the projection as:

\langle w,\mathbf{X}^{\leq\infty}_{s,t}\rangle:=\int_{s}^{t}\langle\mathbf{i}_% {1}\dots\mathbf{i}_{n-1},\mathbf{X}^{\leq\infty}_{s,r}\rangle\,dX^{i_{n}}_{r}.

2.2. Signatures

In this Subsection, we introduce the basic concepts of signatures. We begin by presenting key algebras and a Hilbert space that plays a central role in the theory of signatures. Next, we define what a signature is, and finally, we demonstrate how to pair a weighted word with a signature. Let us start with the setup. Let $H$ denote a general $\mathbb{F}$ -Hilbert space, where the field $\mathbb{F}$ is either $\mathbb{R}$ or $\mathbb{C}$ . We need to introduce a triplet of spaces $T(H)\subset\mathcal{F}(H)\subset T((H))$ . First of all, by convention we let $H^{\otimes 0}=\mathbb{F}.$

Definition 2.5.

For a Hilbert space $H$ we call

T((H))=\prod_{n=0}^{\infty}H^{\otimes n}=\{x=(x_{0},x_{1},\dots,x_{n},\dots)% \mid x_{n}\in H^{\otimes n},\,\text{for}\,n=0,1,2,...\}.

the extended tensor algebra.

As we will see later, any well-defined signature naturally belongs to the extended tensor algebra $T((H))$ . Another space closely linked to the theory of words and signatures is the tensor algebra, which we introduce next:

Definition 2.6.

The tensor algebra is given by the algebraic direct sum

T(H)=\bigoplus_{n=0}^{\infty}H^{\otimes n}=\{x\in T((H))\mid\,\forall\;x\;% \exists\;N\in\mathbb{N}\text{ such that }x_{n}=0\;\forall n\geq N\}.

We equip $T((H))$ with a sum $+$ , scalar multiplication, and a product $\otimes$ . These operations are defined as follows: For elements $x=(x_{0},x_{1},\dots,x_{n},\dots)$ and $y=(y_{0},y_{1},\dots,y_{n},\dots)$ in $T((H))$ , we define the the sum of $x$ and $y$ element-wise, i.e.,

x+y=(x_{0}+y_{0},x_{1}+y_{1},\dots,x_{n}+y_{n},\dots).

Scalar multiplication is also defined element-wise as

\lambda x=(\lambda x_{0},\lambda x_{1},\dots),\quad\lambda\in\mathbb{R}.

Lastly, the product of $x$ and $y$ is given by

x\otimes y=(z_{0},z_{1},\dots,z_{n},\dots)\quad\mathrm{where}\quad z_{n}=\sum_% {k=0}^{n}x_{k}\otimes y_{n-k}.

These operations turn the tensor algebra $T(H)$ into an algebra. Moreover, we denote by

T^{N}(H)=\{x\in T(H)\mid x_{n}=0\;\forall n\geq N\}

the $N$ -truncated tensor algebra.

Remark 2.7.

We present next two useful facts about the tensor algebra and the extended tensor algebra when $H=\mathbb{R}^{d}$ . First, we have the isomorphisms

T(\mathbb{R}^{d})\simeq T((\mathbb{R}^{d})^{*})\simeq\mathbb{R}\langle\mathcal% {A}\rangle,

see e.g. [Lan02]. Thus, every word $\pi\in\mathbb{R}\langle\mathcal{A}\rangle$ can be viewed as an element in $T(\mathbb{R}^{d}).$ Secondly, we have a dual pairing between $T(\mathbb{R}^{d})$ and $T((\mathbb{R}^{d}))$ , see e.g. [LCL07], because of the fact that

T((\mathbb{R}^{d})^{*})\simeq T((\mathbb{R}^{d}))^{*}

where $T((\mathbb{R}^{d}))^{*}$ denotes the algebraic dual of $T((\mathbb{R}^{d}))$ . Hence for $\pi\in T(\mathbb{R}^{d})$ and $x\in T((\mathbb{R}^{d}))$ we denote by $\langle\pi,x\rangle$ the algebraic dual pairing between $\pi$ and $x$ .

Putting these two facts together, we find that

\mathbb{R}\langle\mathcal{A}\rangle\simeq T((\mathbb{R}^{d}))^{*}

and therefore every weighted word $\pi\in\mathbb{R}\langle\mathcal{A}\rangle$ can be viewed as a linear functional on the extended tensor algebra, $T((\mathbb{R}^{d}))$ .

Now we want to introduce a Hilbert space for which a big class of signatures belongs to, called the full Fock space. For $x=x_{1}\otimes\dots\otimes x_{n}$ , $y=y_{1}\otimes\dots\otimes y_{n}\in H^{\otimes n}$ we consider an inner product on $H^{\otimes n}$ by

\langle x,y\rangle_{n}=\prod_{i=1}^{n}\langle x_{i},y_{i}\rangle_{H}.

Hence, the norm on $H^{\otimes n}$ becomes $|x|_{n}^{2}=\prod_{i=1}^{n}|x_{i}|^{2}_{H}$ . Then for $x=(x_{n}),y=(y_{n})\in\bigoplus_{n=0}^{\infty}H^{\otimes n},$ where $H^{\otimes 0}=\mathbb{F},$ we can define an inner product by

\langle x,y\rangle_{\mathcal{F}(H)}=\sum_{n=0}^{\infty}\langle x_{n},y_{n}% \rangle_{n},

and hence a norm

\|x\|_{\mathcal{F}(H)}^{2}=\sum_{n=0}^{\infty}|x_{n}|_{n}^{2}.

The full Fock space over $H$ is the Hilbert space given by the topological direct sum, $\widehat{\bigoplus}$ ,

\mathcal{F}(H):=\widehat{\bigoplus}_{n=0}^{\infty}H^{\otimes n}=\left\{x=(x_{n% })\mid x_{n}\in H^{\otimes n},\,n\in\mathbb{N},\,\|x\|_{\mathcal{F}(H)}^{2}<% \infty\right\}.

Moreover, $T(H)$ is a dense subspace in $\mathcal{F}(H)$ .

Example 2.8.

Recall that for $\mathcal{A}=\{1,\dots,d\}$ we have that $\mathbb{R}\langle\mathcal{A}\rangle\simeq T(\mathbb{R}^{d})$ and therefore any $\pi\in\mathbb{R}\langle\mathcal{A}\rangle$ can be recognized as an element $e_{\pi}\in T(\mathbb{R}^{d})$ . Moreover, since $T(\mathbb{R}^{d})$ is a (dense) subspace of $\mathcal{F}(\mathbb{R}^{d})$ , we can compute the Fock norm of a weighted word $\pi\in\mathbb{R}\langle\mathcal{A}\rangle.$ For example, let $d=2$ so that $\mathcal{A}=\{1,2\}$ and consider the weighted word

\pi=2\emptyset+3\cdot\mathbf{1}+\mathbf{12}.

Then

e_{\pi}=2e_{\emptyset}+3e_{1}+e_{1}\otimes e_{2}=(2e_{\emptyset},3e_{1},e_{1}% \otimes e_{2})

where $e_{\emptyset}=1$ is the basis vector in the field $\mathbb{R}$ while $e_{1},e_{2}\in\mathbb{R}^{2}$ are the basis vectors in $\mathbb{R}^{2}$ . Hence, we get that

\|\pi\|^{2}_{\mathcal{F}(\mathbb{R}^{d})}:=\|e_{\pi}\|^{2}_{\mathcal{F}(% \mathbb{R}^{d})}=2^{2}|e_{\emptyset}|^{2}+3^{2}|e_{1}|^{2}+|e_{1}|^{2}|e_{2}|^% {2}=4+9+1=14.

Next we recall the definition of a multiplicative functional, a fundamental object in the theory of rough paths [LCL07]:

Definition 2.9.

A multiplicative functional of degree $N\in\mathbb{N}$ is a continuous map

\triangle_{T}\ni(s,t)\mapsto\mathbf{X}_{s,t}=(1,X^{1}_{s,t},\dots,X^{N}_{s,t})% \in T^{N}(H)

satisfying Chen’s identity:

\mathbf{X}_{s,t}=\mathbf{X}_{s,u}\otimes\mathbf{X}_{u,t}\quad\forall\;0\leq s% \leq u\leq t\leq T.

For $p\geq 1$ , a multiplicative functional $\mathbf{X}=(1,X^{1},\dots,X^{N})$ of degree $N\in\mathbb{N}$ is said to have finite $p$ -variation if

\|\mathbf{X}\|_{p,[0,T]}:=\max_{0\leq i\leq N}\sup_{P\in\mathcal{D}}\sum_{t_{k% }\in P}|X^{i}_{t_{k},t_{k+1}}|_{i}^{p/i}<\infty

and $\mathcal{D}$ is the collection of all finite partitions over $[0,T]$ .

Definition 2.10.

Let $p\geq 1$ and let $\mathbf{X}_{s,t}=(1,X^{1}_{s,t},\dots,X^{\lfloor p\rfloor}_{s,t})$ be a multiplicative functional of degree $\lfloor p\rfloor\in\mathbb{N}$ . We call $\mathbf{X}$ a $p$ -rough path if $X^{i}$ is of finite $p$ -variation for all $i=1,\dots,\lfloor p\rfloor.$

Theorem 2.11.

[Lyons’ extension theorem [LCL07]] Let $\mathbf{X}$ be a p-rough path. Then for any $n\geq\lfloor p\rfloor+1$ there exists a unique continuous map

X^{n}:\triangle_{T}\rightarrow H^{\otimes n}

such that

\mathbf{X}^{\leq\infty}=(1,X^{1},\dots,X^{\lfloor p\rfloor},\dots,X^{n},\dots)% :\triangle_{T}\rightarrow T((H))

is a multiplicative functional with finite $p$ -variation. We call $\mathbf{X}^{\leq\infty}$ the signature of $\mathbf{X}$ . Moreover, for any norm $|\cdot|_{k}$ on $H^{\otimes k}$ we have that

|X^{k}_{s,t}|_{k}\leq\frac{\|\mathbf{X}\|_{p,[s,t]}^{k}}{\beta(p)(k/p)!}

where $\beta(p)$ is a constant only dependent of $p$ .

Definition 2.12 (Geometric rough paths, [LCL07]).

A geometric $p$ -rough path is a $p$ -rough path that can be expressed as a limit of $1$ -rough paths in the $p$ -variation distance. The space of geometric $p$ -rough paths in $H$ is denoted by $G^{p}(H)$ .

We now present a useful theorem: the signature of a $p$ -rough path is an element of the Fock space, instead of the entire extended tensor algebra, $T((H))$ . To this end, let

E_{\alpha,\beta}(z)=\sum_{n=0}^{\infty}\frac{z^{n}}{(\alpha n+\beta)!},\quad% \alpha,\beta\in\mathbb{C},\;\mathfrak{R}(\alpha),\mathfrak{R}(\beta)>0,\;z\in% \mathbb{C}

be the Mittag-Leffler function.

Corollary 2.13.

For $p\geq 1$ let $\mathbf{X}^{\leq p}$ be a $p$ -rough path. We have that $\mathbf{X}^{\leq\infty}_{s,t}\in\mathcal{F}(H)$ for any $(s,t)\in\triangle_{T}$ .

Proof.

From Lyons’ extension theorem 2.11, we have that $\mathbf{X}^{\leq\infty}_{s,t}\in T((H))$ , and, moreover,

\displaystyle\|\mathbf{X}^{\leq\infty}_{s,t}\|_{\mathcal{F}(H)}=\sum_{n=0}^{% \infty}|X^{n}_{s,t}|_{n}\leq\sum_{n=0}^{\infty}\frac{\|\mathbf{X}\|_{p,[s,t]}^% {n}}{\beta(p)(n/p)!}\leq\frac{E_{1/p,0}(\|\mathbf{X}\|_{p,[s,t]})}{\beta(p)}<\infty.

The result follows. ∎

We wish to utilize the dual pairing in Remark 2.7 between weighted words $\pi\in\mathbb{R}\langle\mathcal{A}\rangle$ and signatures $\mathbf{X}^{\leq\infty}_{s,t}\in T((\mathbb{R}^{d}))$ . Hence, from now on we consider the case when $H=\mathbb{R}^{d}$ . Given an $\mathbb{R}^{d}$ -valued path $\mathbf{X}$ for which the signature $\mathbf{X}^{\leq\infty}_{T}:=\mathbf{X}^{\leq\infty}_{0,T}\in T((\mathbb{R}^{d% }))$ is well-defined, and a word $w=\mathbf{i}_{1}\dots\mathbf{i}_{n}\in\mathcal{W}_{n}$ , we can identify $w$ as an element $e_{w}\in T(\mathbb{R}^{d})\simeq T((\mathbb{R}^{d}))^{*}$ and we have the following relation with the signature

\langle w,\mathbf{X}^{\leq\infty}_{T}\rangle:=\langle e_{w},\mathbf{X}^{\leq% \infty}_{T}\rangle=\underset{0<u_{i}<\dots<u_{n}<T}{\int\dots\int}dX_{u_{1}}^{% i_{1}}\dots dX_{u_{n}}^{i_{n}}.

(2.1)

Here and in the sequel of this paper we shall use the generic notation $\pi$ to signify a (finite) linear combination of elements $e_{w}$ , or, after identification, a finite linear combination of words (as in the Example 2.8 above). Indeed, if $\pi=\sum_{w\in\mathcal{W}}a_{w}w$ for $w\in\mathcal{W}$ and $a_{w}\in\mathbb{R}\setminus\{0\}$ for finitely many $w\in\mathcal{W}$ , then,

\langle\pi,\mathbf{X}^{\leq\infty}_{T}\rangle=\sum_{w\in\mathcal{W}}a_{w}% \langle w,\mathbf{X}^{\leq\infty}_{T}\rangle,

with $\langle w,\mathbf{X}^{\leq\infty}_{T}\rangle$ given as in (2.1).

The shuffle product is turning $T(\mathbb{R}^{d})$ into an algebra, and is very convenient and useful in operating with products of signatures. Indeed, the shuffle product is the key operation in the rough path theory that linearises nonlinear functionals, at least approximately, as can be seen in the Universal Approximation Theorem (as we recall for the convenience of the reader in Proposition 3.1). We define the shuffle product $\shuffle$ next: The shuffle product between two words gives a word which is constructed by taking a linear combination of all the different ways to combine two words while preserving their own orders. As an example, if we shuffle together the words $\mathbf{ab}$ and $\mathbf{c}$ , we get

\mathbf{ab}\shuffle\mathbf{c=abc+acb+cab}.

Note that $\mathbf{bac},\mathbf{bca}$ and $\mathbf{cba}$ is not in the sum on the right hand side since it violates the order of $\mathbf{ab}$ .

Below follows an important Lemma for products of signatures and the shuffle product:

Lemma 2.14 (Shuffle property, [LCL07]).

Let $\pi,\pi^{\prime}\in T(\mathbb{R}^{d})$ and let $\mathbf{X}$ be a geometric rough path, according to Definition 2.12. Then the following identity holds

\langle\pi,\mathbf{X}^{\leq\infty}_{T}\rangle\langle\pi^{\prime},\mathbf{X}^{% \leq\infty}_{T}\rangle=\langle\pi\shuffle\pi^{\prime},\mathbf{X}^{\leq\infty}_% {T}\rangle.

Interestingly, one can link shuffle product monomials to monomials of the signatures: given $\pi\in T(\mathbb{R}^{d})$ we define $\pi_{n}\in T(\mathbb{R}^{d})$ for every $n\in\mathbb{N}$ by

\pi_{0}=\pi_{\emptyset},\quad\pi_{1}=\pi,\quad\pi_{2}=\pi_{1}\shuffle\pi_{1},% \quad\pi_{n}=\pi_{n-1}\shuffle\pi_{1}=\overbrace{\pi_{1}\shuffle\dots\shuffle% \pi_{1}}^{\text{n-times}}.

Then by the shuffle property we have that

\langle\pi_{n},\mathbf{X}^{\leq\infty}_{T}\rangle=\langle\pi_{n-1},\mathbf{X}^% {\leq\infty}_{T}\rangle\langle\pi_{1},\mathbf{X}^{\leq\infty}_{T}\rangle=% \langle\pi,\mathbf{X}^{\leq\infty}_{T}\rangle^{n}.

This gives a convenient link between monomials of signatures and monomials of words.

3. Functional approximation with signatures

With the goal of approximating complex pricing functionals from financial markets, we recall here some basic properties of universality of the signature. In addition, we provide a new statement of the universal approximation property for functionals acting on general rough paths (not restricted to geometric rough paths).

3.1. Universal approximation for geometric rough paths

The functional approximation setup outlined in [LNPA20], based on ideas also formulated in [Arr18], is strongly based on the universal approximation property of the signature. This property is a consequence of the Stone-Weierstrass theorem, using the fact that the linear span of signatures of geometric rough paths forms an algebra, and that when lifting the underlying path $X$ to its time extension $\hat{X}_{t}=(t,X_{t})$ , the signature $\hat{\mathbf{X}}^{\leq\infty}$ uniquely determines the path, and thus separates points. The universal approximation theorem relied upon in [Arr18, LNPA20] is based on Lipschitz-paths. The following universal approximation theorem is based on the recently proposed extension by Cuchiero et.al. in [CPSF23] to the setting of $G^{\lfloor p\rfloor}(\mathbb{R}^{d})$ -valued paths.

Proposition 3.1 (Universal approximation).

For $p\geq 1$ let $\mathcal{K}\subset\hat{V}_{c}^{p}([0,T];G^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ be a compact subset which is finite in the $p$ -variation norm. Suppose $F$ is a functional on $\hat{V}_{c}^{p}([0,T];G^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ . Let $\hat{\mathbf{X}}^{\leq\lfloor p\rfloor}\in\mathcal{K}$ be a $p$ -rough path defined according to Definition 2.10 over the extended path $\hat{X}_{t}=(t,X_{t})$ for some $X:[0,T]\rightarrow\mathbb{R}^{d}$ . Then for each $\epsilon>0$ there exists a linear functional $\pi\in T(\mathbb{R}^{d})^{*}$ such that

\max_{\hat{\mathbf{X}}^{\leq p}\in K}|F(\hat{\mathbf{X}}^{\leq p})_{[0,T]}-% \langle\pi,\hat{\mathbf{X}}^{\leq\infty}_{[0,T]}\rangle|<\epsilon.

The universal approximation applied to $G^{\lfloor p\rfloor}(\mathbb{R}^{d})$ valued paths allows us, in particular, to consider functionals of truly rough signals $\hat{X}$ , that only have finite $p\geq 1$ variation, as long as we have constructed the rough path of $\hat{\mathbf{X}}^{\leq p}$ over $\hat{X}$ . For instance, we can consider functionals of paths of the Brownian motion. However, in its current form, it is only formulated for Stratonovich lifts of the iterated integral, excluding the more natural choice of rough paths lift for financial applications, namely the Itô lift. One might try to work around this problem by identifying the Itô-Stratonovich correction term and implementing this in the functional and signature. However, under the working hypothesis that will be used in the remainder of the text, we can circumvent this challenge completely.

3.2. General universal approximation on rough paths

The universal approximation theorem in Proposition 3.1 heavily relies upon the geometric structure of the rough paths, on which the functionals $F$ act on. The reason is that when applying the Stone-Weierstrass theorem to check for denseness of the linear span of signature terms in the space $\hat{V}^{p}([0,T];G^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ , one relies upon being able to multiply two signature terms $\langle\pi_{1},\hat{\mathbf{X}}^{\leq\infty}\rangle$ and $\langle\pi_{2},\hat{\mathbf{X}}^{\leq\infty}\rangle$ to obtain a new signature term $\langle\pi,\hat{\mathbf{X}}^{\leq\infty}\rangle$ . When $\hat{\mathbf{X}}^{\leq\infty}$ is geometric (i.e. takes values in $G^{\lfloor p\rfloor}(\mathbb{R}^{d})$ ) it follows from Lemma 2.14 that this holds with $\pi=\pi_{1}\shuffle\pi_{2}$ . However, this is a restrictive class of signatures; in the semi-martingale setting it corresponds to Stratonovich lifts of the rough path. In financial applications, one typically work with functions that acts on non-geometric rough paths, such that Itô lifts of semi-martingales. Thus extending the universal approximation property to any rough path provides practical consequences for several applications.

It turns out that a simple mixing of ideas from the Stone-Weierstrass theorem over polynomials, with the classical signature universal approximation allows one to obtain such a new generalized universal approximation. To this end we will need a so-called separation of points property of the sub-algebra of continuous functionals that we will consider as a the basis for functional approximation. We therefore recall the following technical lemma from [Bre11, Cor. 4.24].

Lemma 3.2.

Let $\Omega\subset\mathbb{R}^{d}$ be open, and suppose $u\in L^{1}_{loc}(\Omega)$ is such that

\int_{\Omega}u(z)f(z)\mathop{}\!\mathrm{d}z=0,\quad\forall f\in C^{\infty}_{c}% (\Omega).

Then $u\equiv 0$ a.e on $\Omega$ .

With this lemma at hand we are now ready to prove a generalized version of polynomial universal approximation over rough paths with values in the tensor algebra $T((\mathbb{R}^{d}))$ .

Theorem 3.3 (Generalized universal approximation).

Let $F$ be a continuous functional on a compact set $\mathcal{K}\subset\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ . Then for any $\epsilon>0$ there exists finite set $\mathcal{N}\subset\mathbb{N}^{n}$ , and a polynomial $\bar{f}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ given by

\bar{f}_{\mathcal{N}}(x)=\sum_{m\in\mathcal{N}}\alpha_{m}x^{m},

(3.1)

and a sequence of linear operators $\{\pi_{i}\}_{i=1}^{n}\subset T(\mathbb{R}^{d})$ with the property that for all $\hat{\mathbf{X}}^{\leq p}\in\mathcal{K}$ then

\left|F(\hat{\mathbf{X}}^{\leq p})-\bar{f}_{\mathcal{N}}\left(\langle\pi_{1},% \hat{\mathbf{X}}^{\leq\infty}\rangle,\ldots,\langle\pi_{n},\hat{\mathbf{X}}^{% \leq\infty}\rangle\right)\right|<\epsilon.

Proof.

Let $\mathcal{E}^{p}:=\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ . Define for $n\in\mathbb{N}$ and $p\geq 1$ ,

\mathcal{A}_{n,p}=\mathrm{span}\{\hat{\mathbf{X}}^{\leq p}\mapsto\prod_{i=1}^{% n}\langle\pi_{i},\hat{\mathbf{X}}^{\leq\infty}\rangle^{m_{i}}\,|\,\,m\in% \mathbb{N}^{n},\,\{\pi_{i}\}_{i=1}^{n}\subset T((\mathbb{R}^{d})^{*}),\,\hat{% \mathbf{X}}^{\leq p}\in\mathcal{E}^{p}\}.

(3.2)

Furthermore, let $\mathcal{A}_{p}=\bigcup_{n=1}^{\infty}\mathcal{A}_{n,p}$ . Clearly, $\mathcal{A}_{p}\subset C(\mathcal{E}^{p};\mathbb{R})$ . With the goal of applying Stone-Weierstrass theorem to prove denseness of $\mathcal{A}_{p}$ in $C(\mathcal{E}^{p};\mathbb{R})$ we check that the following holds:

The set $\mathcal{A}_{p}$ forms a sub-algebra. Indeed; addition holds. Furthermore, for two elements $A=\prod_{i=1}^{n}\langle\pi_{i},\hat{\mathbf{X}}^{\leq\infty}\rangle^{m_{i}}% \in\mathcal{A}_{n,p}$ and $B=\prod_{i=1}^{k}\langle\pi^{\prime}_{i},\hat{\mathbf{Y}}^{\leq\infty}\rangle^% {m_{i}}\in\mathcal{A}_{k,p}$ then we can choose $\{\tilde{\pi_{i}}\}$ such that

AB=\prod_{i=1}^{n}\prod_{j=1}^{k}\langle\pi_{i},\hat{\mathbf{X}}^{\leq\infty}% \rangle^{m_{i}}\langle\pi^{\prime}_{i},\hat{\mathbf{Y}}^{\leq\infty}\rangle^{m% ^{\prime}_{i}}=\prod_{i=1}^{n+k}\langle\tilde{\pi}_{i},(\hat{\mathbf{X}},\hat{% \mathbf{Y}})^{\leq\infty}\rangle^{\tilde{m}_{i}}\in\mathcal{A}_{k+n,p}.

where $(\hat{\mathbf{X}},\hat{\mathbf{Y}})^{\leq\infty}$ denotes the signature of $(\hat{\mathbf{X}},\hat{\mathbf{Y}})\in\mathcal{E}^{p},$ $\tilde{\pi}_{i}=\pi_{i}$ for $i\in\{1,...,n\}$ and $\tilde{\pi}_{i}=\pi^{\prime}_{i-n}$ for $i=n+1,...,n+k$ , and $\tilde{m_{i}}$ is the concatenation of $m_{i}$ and $m_{i}^{\prime}$ .

ii)

$\mathcal{A}_{p}$ separates points. Since $\mathcal{A}_{p}$ is created over the time-extended paths $(t,X_{t})$ and $(t,Y_{t})$ , then the following two integral functionals exists in $\mathcal{A}_{p}$ :

	$\displaystyle I^{n}_{X}=\int_{0}^{T}s^{n}(X_{s}^{j}-X_{0}^{j})\mathop{}\!% \mathrm{d}s=\langle w_{1},\hat{\mathbf{X}}^{\leq\infty}\rangle,$
	$\displaystyle I^{n}_{Y}=\int_{0}^{T}s^{n}(Y_{s}^{j}-Y_{0}^{j})\mathop{}\!% \mathrm{d}s=\langle w_{1},\hat{\mathbf{Y}}^{\leq\infty}\rangle,$

where we recall that $\hat{X}_{t}=(t,X_{t})\in\mathbb{R}^{d+1}$ , the time component corresponds to the first component in the $d+1$ dimensional vector, and then

w_{1}=\underbrace{{\bf 1\ldots 1}}_{n-\text{times}}\,\mathrm{\bf j}.

Suppose now that $I_{X}^{n}=I_{Y}^{n}$ for all $n$ . Then since polynomials are dense in $C([0,T];\mathbb{R})$ it follows by Lemma 3.2 that $X=Y$ . As a consequence it implies that $\mathcal{A}_{p}$ separate points.

iii)

the constant function $1$ is in $\mathcal{A}_{p}$ , since by choosing $\pi=\emptyset\in T((\mathbb{R}^{d})^{*})$ we get that $\langle\emptyset,\mathbf{X}^{\leq\infty}\rangle=1$ and therefore in particular $\langle\emptyset,\hat{\mathbf{X}}^{\leq p}\rangle=1$ .

From these three properties it follows by the Stone-Weierstrass theorem that $\mathcal{A}_{p}$ is dense in $C(\mathcal{E}_{p};\mathbb{R})$ , which concludes the proof. ∎

3.3. Signature associated to price paths

As discussed in the introduction, even the most complex financial derivatives typically have a simple functional structure. By this we mean that the path dependent nature of the functional either comes through an averaging over the price path (like in Asian style derivatives), or products of price paths (quanto-style options), in addition to basket of different assets etc. with these structures. It is therefore natural to assume that for this purpose, given a price path $X$ , and the extended signature $\hat{\mathbf{X}}^{\leq\infty}$ , the payoff functional $F(\hat{X})$ can be written as a function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , and $n$ different weighted words $\pi$ , such that

F(\hat{\mathbf{X}}^{\leq p})=f(\langle\pi_{1},\hat{\mathbf{X}}^{\leq\infty})% \rangle,\ldots,\langle\pi_{n},\hat{\mathbf{X}}^{\leq\infty}\rangle).

This will therefore be the main working hypothesis of the subsequent sections, and we will illustrate several numerical and analytic advantages of using this specific structure. We will also give examples to show exactly how this hypothesis applies for various exotic derivatives. As a first simple example, we have the following:

Example 3.4.

Asian options are contracts that pays the holder an amount of money according to the average price over a period of time. If $X$ is the price process, the holder receives $f(\int_{0}^{T}X_{s}ds)$ at time $T$ . Considering the time-enhanced price path $\hat{X}$ defined by $\hat{X}_{t}=(t,X_{t})$ , we see (recall Example 2.8 with $d=2$ and $\mathcal{A}=\{\mathbf{1},\mathbf{2}\}$ ) that $\int_{0}^{T}X_{t}dt=\int_{0}^{T}\int_{0}^{t}dX_{r}dt=\langle w,\hat{\mathbf{X}% }^{\leq\infty}_{0,T}\rangle$ for $w=\mathbf{2}\mathbf{1}$ . Another example is a spread option between two assets with price dynamics $X_{1}$ and $X_{2}$ , respectively, paying the holder $\max(X_{1,T}-cX_{2,T},0)$ at time $T$ , for a conversion constant $c$ (here, $c$ may convert the currency of the second asset into the currency of the first, say). With $\hat{X}=(t,X_{1},X_{2})$ , we can (still following the notation in Example 2.8, now with $d=3$ ) express the payoff as

f(\langle w_{1},\hat{\mathbf{X}}_{0,T}^{\leq\infty}\rangle,\langle w_{2},\hat{% \mathbf{X}}_{0,T}^{\leq\infty}\rangle)=\max(\langle w_{1},\hat{\mathbf{X}}_{0,% T}^{\leq\infty}\rangle-c\langle w_{2},\hat{\mathbf{X}}_{0,T}^{\leq\infty}% \rangle,0)

with $w_{1}=\mathbf{2}$ and $w_{2}=\mathbf{3}$ , or, more simple,

f(\langle\pi,\hat{\mathbf{X}}_{0,T}^{\leq\infty}\rangle)=\max(\langle\pi,\hat{% \mathbf{X}}_{0,T}^{\leq\infty}\rangle,0)

with the weighted word $\pi=w_{1}-cw_{2}$ . Yet another example from energy finance is so-called quanto-options (see e.g. [BLM15]), where the holder receives a payment at exercise time $T$ according to a product of two payoffs on the average of the spot energy price and temperature, say. Denoting $X_{1}$ the energy spot price, $X_{2}$ the temperature process, and $\hat{X}_{t}=(t,X_{1,t},X_{2,t})$ , we have a payoff

f(\langle w_{1},\hat{\mathbf{X}}_{0,T}^{\leq\infty}\rangle,\langle w_{2},\hat{% \mathbf{X}}_{0,T}^{\leq\infty}\rangle)=g(\langle w_{1},\hat{\mathbf{X}}_{0,T}^% {\leq\infty}\rangle)h(\langle w_{2},\hat{\mathbf{X}}_{0,T}^{\leq\infty}\rangle)

where $w_{1}=\mathbf{2}\mathbf{1}$ and $w_{2}=\mathbf{3}\mathbf{1}$ , and $g,h$ are the payoff functions written on the average of the spot energy price and temperature, resp.

Let us precise the hypothesis we work under in the remainder of this paper.

Hypothesis 3.5.

Let $p\geq 1$ and assume that for a given continuous functional $F$ acting on $\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ , there exists a collection of linear operators $\{\pi_{i}\}_{i=1}^{n}\in T(\mathbb{R}^{d})$ , and a function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ such that for any $\hat{\mathbf{X}}^{\leq p}\in\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^% {d}))$

F(\hat{\mathbf{X}}^{\leq p})=f(\langle\pi_{1},\hat{\mathbf{X}}^{\leq\infty})% \rangle,\ldots,\langle\pi_{n},\hat{\mathbf{X}}^{\leq\infty}\rangle).

(3.3)

Remark 3.6.

Define $g:\hat{V}^{p}\rightarrow\mathbb{R}^{n}$ to be the signature functional

g(\hat{\mathbf{X}}^{\leq p})=(\langle\pi_{1},\hat{\mathbf{X}}^{\leq p}\rangle,% ...,\langle\pi_{n},\hat{\mathbf{X}}^{\leq p}\rangle),

and define $G:=g(\hat{V}^{p})\subset\mathbb{R}^{n}$ . Under the assumption that $F$ is a continuous functional in Hypothesis 3.5, it follows that also $f:G\rightarrow\mathbb{R}$ must be continuous. Indeed, we know that $\hat{\mathbf{X}}^{\leq p}\mapsto\mathbf{X}^{\leq\infty}$ is a continuous mapping according to Lyons’ Extension theorem 2.11. The right hand side of (3.3) can be written as a composition between $f$ and a continuous functional $g$ with image $G$ . Since by assumption, $F$ is continuous, and we know that $g$ is continuous, thus it follows that also $f$ must be continuous. Of course, the converse would also be true; suppose we start out with a continuous function $f$ , then also $F=f\circ g$ is a continuous functional, by composition of two continuous maps.

The main advantage of invoking Hypothesis 3.5 is that analytic functions are dense in the space of continuous functions. Thus, if Hypothesis 3.5 holds for a continuous function $f$ , then we may approximate the functional $\hat{\mathbf{X}}^{\leq p}\mapsto F(\hat{\mathbf{X}}^{\leq p})$ by a (multivariate) polynomial in a finite number of signature coefficients. More precisely, if Hypothesis 3.5 holds, there exists an analytic function $\bar{f}\equiv f$ such that for $a\in\mathbb{R}^{n}$ , $\bar{f}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ and

\bar{f}(x)=\sum_{m\in\mathbb{N}^{n}}\alpha_{m}(x-a)^{m}

where for a multi-index $m\in\mathbb{N}^{n}$ and $x\in\mathbb{R}^{n}$ , we write $x^{m}=x_{1}^{m_{1}}\dots x_{n}^{m_{n}}$ . The coefficients $\alpha_{m}$ are real numbers labeled by the multi-index $m\in\mathbb{N}^{n}$ . This leads to the following simplified version of the universal approximation theorem, being a consequence of the Stone-Weierstrass theorem and Hypothesis 3.5.

Theorem 3.7.

Let $F$ be a continuous functional on $\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ , and suppose Hypothesis 3.5 holds with a sequence $\{\pi_{i}\}_{i=1}^{n}$ and a continuous function $f$ . Then for any $\epsilon>0$ there exists a finite set $\mathcal{N}\subset\mathbb{N}^{n}$ a hypercube $\mathcal{K}_{\epsilon}\subset\mathbb{R}^{n}$ , and a polynomial $\bar{f}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ given by

\bar{f}_{\mathcal{N}}(x)=\sum_{m\in\mathcal{N}}\alpha_{m}x^{m},

with the property that

\left|F(\hat{\mathbf{X}}^{\leq p})-\bar{f}_{\mathcal{N}}\left(\langle\pi_{1},% \hat{\mathbf{X}}^{\leq\infty}\rangle,\ldots,\langle\pi_{n},\hat{\mathbf{X}}^{% \leq\infty}\rangle\right)\right|<\epsilon,

(3.4)

for all $\hat{\mathbf{X}}^{\leq p}\in\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^% {d}))$ such that $\left(\langle\pi_{1},\hat{\mathbf{X}}^{\leq\infty}\rangle,\ldots,\langle\pi_{n% },\hat{\mathbf{X}}^{\leq\infty}\rangle\right)\in\mathcal{K}_{\epsilon}$ . Furthermore, let $K>0$ be the constant from Theorem 2.11 such that for $\pi_{i}\in T((\mathbb{R}^{d})^{*})$ of the form $\pi_{i}=\sum_{j=1}^{N}\kappa_{ij}e_{w_{ij}}$ with $\{\kappa_{ij}\}_{j=1}^{N}\subset\mathbb{R}$ ,

|\langle\pi_{i},\hat{\mathbf{X}}^{\leq\infty}\rangle|\leq\sum_{j=1}^{N}|\kappa% _{ij}|\frac{K^{|w_{ij}|}}{(|w_{ij}|/p)!},

where $|w_{i}|=|e_{w_{i}}|$ . Suppose the coefficients $\{\alpha_{m}\}_{m\in\mathcal{N}}$ satisfies for some $C>0$

|\alpha_{m}|\leq\frac{C^{|m|}}{m!}\prod_{i=1}^{n}\left(\sum_{j=1}^{N_{i}}|% \kappa_{ij}|\frac{(|w_{ij}|/p)!}{K^{|w_{ij}|}}\right)^{m_{i}},

(3.5)

where $m!=m_{1}!m_{2}!\ldots m_{n}!$ . Then we have that

\left|F(\hat{\mathbf{X}}^{\leq p})-\bar{f}_{\mathcal{N}}\left(\langle\pi_{1},% \hat{\mathbf{X}}^{\leq\infty}\rangle,\ldots,\langle\pi_{n},\hat{\mathbf{X}}^{% \leq\infty}\rangle\right)\right|\leq\frac{\exp(C)C^{|\mathcal{N}|+1}}{(n-1)!(|% \mathcal{N}|-n-1)!}.

(3.6)

Proof.

This is a simple consequence of the Stone-Weierstrass approximation theorem for continuous functions, using that there always exists a compact subset $\mathcal{K}_{\epsilon}\subset\mathbb{R}^{n}$ such that

\left(\langle\pi_{1},\hat{\mathbf{X}}^{\leq\infty}\rangle,\ldots,\langle\pi_{n% },\hat{\mathbf{X}}^{\leq\infty}\rangle\right)\in\mathcal{K}_{\epsilon}.

Restricting the domain of $f$ to $\mathcal{K}_{\epsilon}$ , we are done showing (3.4). For the convergence rate (3.6), define the remainder term

R:=\sum_{m\in\mathbb{N}^{n}\setminus\mathcal{N}}\alpha_{m}\left(\langle\pi_{1}% ,\hat{\mathbf{X}}^{\leq\infty}\rangle,\ldots,\langle\pi_{n},\hat{\mathbf{X}}^{% \leq\infty}\rangle\right)^{m}

Invoking the bound on the signature decay and the assumption on $|\alpha_{m}|$ , we see that

|R|\leq\sum_{m\in\mathbb{N}^{n}\setminus\mathcal{N}}|a_{m}|\prod_{i=1}^{n}% \left(\sum_{j=1}^{N}|\kappa_{ij}|\frac{K^{|w_{ij}|}}{(|w_{ij}|/p)!}\right)^{m_% {i}}\leq\sum_{m\in\mathbb{N}^{d}\setminus\mathcal{N}}\frac{C^{|m|}}{m!}.

The right hand side of this inequality corresponds to the remainder term of a multivariate Taylor approximation of the function $\exp(C\prod_{i=1}^{n}x_{i})$ around $0$ up to order $m\in\mathcal{N}$ . Thus from the multivariate Taylor theorem, it follows that

|R|\leq\frac{\exp(C)C^{|\mathcal{N}|+1}}{|\mathcal{N}|!}\sum_{|m|=|\mathcal{N}% |+1}1

By an elementary combinatorial argument (using the so-called "stars and bars"-argument) we see that $\sum_{|m|=|\mathcal{N}|+1}1=\binom{|\mathcal{N}|}{n-1}$ . This concludes the proof.

∎

Remark 3.8.

While Hypothesis 3.5 is certainly limiting the class of functionals $F$ that we can analyze, universal approximation becomes easier. In addition, the assumption that $F$ only acts on compact subsets of $p$ -variation paths with values in the space of geometric rough paths, i.e. $\hat{V}^{p}_{c}([0,T];G^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ , is dropped, allowing for an easier verification of universality. For later probabilistic arguments related to financial prices as expected functionals, this point will simplify computations and discussions.

Furthermore, the classical assumption that the approximation holds over compact subsets $\mathcal{K}\subset\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ , as seen in Theorem 3.3, the compactness statement in Theorem 3.7 significantly simplifies this. Indeed, describing compact subsets of $\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ can be a challenging task, as illustrated in e.g. [Gul24]. In Theorem 3.7 one essentially only need to choose a bound $M>0$ , and one can consider any $\hat{\mathbf{X}}^{\leq p}\in$ such that $\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ such that

|\langle\pi_{i},\hat{\mathbf{X}}^{\leq\infty}\rangle|\leq M,\quad\forall\,\,i=% 1,\ldots,n.

The compact subsets of an infinite dimensional space is therefore replaced by a (something that may be interpreted as) bounded subsets. This can also make probabilistic statements easier, as will be illustrated in subsequent sections.

Remark 3.9.

It is important to note that in Theorem 3.7 the statement allows for any functional acting on the space of $p$ -variation paths with values in the truncated tensor algebra. This is a significant difference with the classical universal approximation theorem for signatures stated in Proposition 3.1, as the space of geometric rough paths limits the possible structure of the $F$ under consideration. In particular, a canonical asset pricing model would be constructed from semi-martingales and Itô processes. To preserve a martingale property of the derivative prices, one then use the Itô integral for computing derivatives prices, an integration choice which in the sense of signatures is not geometric. In contrast, under Hypothesis 3.5 one can easily work with functionals that structurally contain Itô integration, and still obtain a direct and descriptive approximation of the functional in terms of the signature associated to the price path.

Remark 3.10.

Computationally, Hypothesis 3.5, given a specific $F$ , one only requires the computation of $n$ terms from the signature, and not the full signature, and with these $n$ terms one can achieve as high accuracy as desired for functional approximation. This is in stark contrast to the much more general Universal approximation theorem in Proposition 3.1 and Theorem 3.3, where the accuracy of the approximation is dictated by number of signature terms included. Invoking Hypothesis 3.5 therefore has the potential to reduce computational time significantly.

Remark 3.11.

The condition assumed on the coefficients $\{\alpha_{m}\}$ in (3.5) yields the bounds in (3.6). Different assumptions on $\{\alpha_{m}\}$ will yield different convergence rates. While the condition in (3.5) is seemingly abstract, it can be verified to be weaker than the conditions satisfied by the coefficients in a Taylor expansion. On the other hand, the condition is not satisfied by a much "slower" convergent polynomial series, such as the Bernoulli polynomials. A more clear illustration of this condition will be given by the subsequent examples.

A restriction of the functional approximation in Theorem 3.7 to the case of functionals on geometric rough paths can readily be seen as a special case of the classical universal approximation theorem presented in Proposition 3.1. Indeed, we have the following corollary:

Corollary 3.12.

Let $F$ be a continuous functional on the space of $p$ -variation (extended) geometric rough paths, $\hat{V}^{p}([0,T];G^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ , and suppose Hypothesis 3.5 holds with a sequence $\{\pi_{i}\}_{i=1}^{n}$ and a continuous function $f$ . Then for any $\epsilon>0$ there exists finite set $\mathcal{N}\subset\mathbb{N}^{n}$ and a compact $\mathcal{K}\subset\hat{V}^{p}([0,T];G^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ , such that

\left|F(\hat{\mathbf{X}}^{\leq p})-\sum_{m\in\mathcal{N}}\alpha_{m}\langle\phi% _{m},\hat{\mathbf{X}}^{\leq\infty}\rangle\right|<\epsilon,\quad\forall\hat{% \mathbf{X}}^{\leq p}\in\mathcal{K}.

Proof.

Since now $\hat{\mathbf{X}}^{\leq\infty}$ is a geometric rough path, it follows from Lemma 2.14 that for $m\in\mathbb{N}^{n}$ there exists a linear functional $\phi_{m}\in T((\mathbb{R}^{d})^{*})$ such that

\left(\langle\pi_{1},\hat{\mathbf{X}}^{\leq\infty}\rangle,\ldots,\langle\pi_{n% },\hat{\mathbf{X}}^{\leq\infty}\rangle\right)^{m}=\langle\phi_{m},\hat{\mathbf% {X}}^{\leq\infty}\rangle.

Thus, inserting this into the polynomial, the result follows. ∎

Remark 3.13.

Note that the linear functionals $\phi_{m}$ quickly become very large sums of words, even when the $\pi_{i}$ ’s consist of elementary words. As an example, for some single letter $a\in\{1,\ldots,d\}$ , consider the product $\langle a,\hat{\mathbf{X}}^{\leq\infty}\rangle^{k}$ for some potentially large $k\in\mathbb{N}$ . Then doing the $k$ th power of the shuffle product of $a$ , yields the word $a\ldots a$ ( $a$ repeated $k$ times), and we get the weight $k!$ in front, i.e.,

\langle a,\hat{\mathbf{X}}^{\leq\infty}\rangle^{k}=k!\langle a\ldots a,\hat{% \mathbf{X}}^{\leq\infty}\rangle.

See, e.g., [BB02] and the references therein for a longer exposition of the shuffle product and algebras. It becomes quickly expensive to compute higher order signature terms. However, as long as Hypothesis 3.5 is in place, signature computations can be made much more efficient if what one really needs is only to compute the power of the number $\langle a,\hat{\mathbf{X}}^{\leq\infty}\rangle$ . When computing expected values, this is often the situation.

3.4. Examples of functions

We will in this Subsection consider a few examples of functions that can be approximated, and investigate their convergence properties. As already discussed, most examples of financial payoff functionals only considers the simpler case when Hypothesis 3.5 holds. That is, for each specific payoff functional $F(\hat{\mathbf{X}}^{\leq p})$ there exists a finite number of linear operators $\pi_{i}\in T(\mathbb{R}^{d})$ for $i=1,\ldots,n$ such that

F(\hat{\mathbf{X}}^{\leq p})_{T}=f(\langle\pi_{1},\hat{\mathbf{X}}^{\leq\infty% }_{T}\rangle,\ldots,\langle\pi_{n},\hat{\mathbf{X}}^{\leq\infty}_{T}\rangle).

From both the Universal approximation theorem in Proposition 3.1 or from Theorem 3.7 we know there exists an associated approximation in terms of the signature of the (rough path lifted) price path (either as a linear combination of signature terms, or as a polynomial of a finite number of signature terms). However, while there is no standard way of finding and describing the linear functional $\pi$ in Proposition 3.1, there is much theory available to compute potential sequences of $\{\alpha_{m}\}_{m\in\mathcal{N}}$ to obtain a good approximation in Theorem 3.7.

We provide now three elementary examples of such approximation choices.

Example 3.14 (Taylor polynomials).

Suppose the payoff functional $F$ can be identified through hypothesis 3.5 with an infinitely continuously differentiable function $f\in C^{\infty}(\mathbb{R})$ . Then an elementary Taylor expansion of $f(\langle\pi,\mathbf{X}^{\leq\infty}\rangle)$ around $0$ yields

F(\hat{\mathbf{X}}^{\leq p})_{T}=f(\langle\pi,\hat{\mathbf{X}}^{\leq\infty}_{T% }\rangle)=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}\langle\pi,\hat{\mathbf{X}}^% {\leq\infty}_{T}\rangle^{n},

where $f^{(n)}$ denotes the $n$ ’th derivative of $f$ . Moreover, under less restrictive regularity assumptions, we can truncate this sum at any level $k\in\mathbb{N}$ and explicitly determine the error we make by the formula

f(\langle\pi,\hat{\mathbf{X}}^{\leq\infty}_{T}\rangle)=\sum_{n=0}^{k}\frac{f^{% (n)}(0)}{n!}\langle\pi,\hat{\mathbf{X}}^{\leq\infty}_{T}\rangle^{n}+R_{k}(% \langle\pi,\hat{\mathbf{X}}^{\leq\infty}_{T}\rangle)

where there exists some $b$ between $0$ and $\langle\pi,\mathbf{X}^{\leq\infty}_{T}\rangle$ such that

R_{k}(x)=\frac{f^{(k+1)}(b)}{(k+1)!}\langle\pi,\hat{\mathbf{X}}^{\leq\infty}_{% T}\rangle^{k+1}.

This yields an analytic expression for the remainder term in Theorem 3.7 when $f$ is sufficiently regular.

Example 3.15 (Hermite polynomials).

Let $\pi\in T(\mathbb{R}^{d})^{*}$ and $K>0$ and suppose we have an option that pays

F(\hat{\mathbf{X}}^{\leq p})=f(\langle\pi,\hat{\mathbf{X}}^{\leq p}\rangle)=% \max(\langle\pi,\hat{\mathbf{X}}^{\leq\infty}\rangle-K,0).

The function $\max(x-K,0)$ can be approximated by Hermite polynomials. More precisely, for $n\in\mathbb{N}_{0}$ the $n$ ’th Hermite polynomial is given by

\xi_{n}(x)=(-1)^{n}\frac{1}{w(x)}\frac{d^{n}}{dx^{n}}w(x)\quad\mathrm{where}% \quad w(x)=\frac{1}{\sqrt{2\pi}}e^{\frac{-x^{2}}{2}},

is the density of the standard normal distribution $\mathcal{N}(0,1)$ . Note that $\xi_{0}=1$ . Moreover, for the Hilbert space $L^{2}(\mathbb{R},w(x)dx)=:L^{2}_{w}$ with inner product

\langle g,h\rangle_{L^{2}_{w}}:=\int_{-\infty}^{\infty}g(x)h(x)w(x)dx

we obtain an orthonormal basis $\{e_{n}\}_{n=0}^{\infty}$ given by

e_{n}(x)=\frac{\xi_{n}(x)}{\sqrt{n!}}.

In particular, any function $g\in L^{2}_{w}$ can be written as

g(x)=\sum_{n=0}^{\infty}\alpha_{n}e_{n}(x),\quad\mathrm{where}\quad\alpha_{n}=% \langle g,e_{n}\rangle_{L^{2}_{w}}.

As argued in [Ben21], the function $f(x)=\max(x-K,0)$ for some constant $K\in\mathbb{R}$ , belongs to $L_{w}^{2}.$ From this Hermite polynomial expansion, we have an exact formula for $F(\hat{\mathbf{X}}^{\leq p})$ given by

F(\hat{\mathbf{X}}^{\leq p})=\sum_{n=0}^{\infty}\langle f,e_{n}\rangle_{L^{2}_% {w}}e_{n}(\langle\pi,\hat{\mathbf{X}}^{\leq\infty}\rangle).

Moreover, we can truncate this sum at any desired level $N$ to reach a suitable approximation by

F_{N}(\hat{\mathbf{X}}^{\leq p}):=\sum_{n=0}^{N}\langle f,e_{n}\rangle e_{n}(% \langle\pi,\hat{\mathbf{X}}^{\leq\infty}\rangle).

Example 3.16 (Bernstein approximation).

The standard choice of approximation of a continuous function by a polynomial is arguably the Bernstein polynomial. Any continuous function $f:[0,1]\rightarrow\mathbb{R}$ can be approximated arbitrarily well as follows

B_{n}(f)(x)=\sum_{k=1}^{n}f\left(\frac{k}{n}\right)b_{k,n}(x),\quad\mathrm{{% \it where}}\quad b_{k,n}(x)={n\choose k}x^{k}(1-x)^{n-k},

and $\lim_{n\rightarrow\infty}B_{n}(f)=f$ . Again, if an option pays $F(\hat{\mathbf{X}}^{\leq p})$ , and it satisfies Hypothesis 3.5 with a continuous function $f:[0,1]\rightarrow\mathbb{R}$ and a $\pi\in\mathcal{F}(\mathbb{R}^{d})$ such that $\langle\pi,\hat{\mathbf{X}}^{\leq\infty}_{T}\rangle\in[0,1]$ and

F(\hat{\mathbf{X}}^{\leq p})_{T}=f(\langle\pi,\hat{\mathbf{X}}^{\leq\infty}_{T% }\rangle),

we have the Bernstein approximation

\displaystyle F(\hat{\mathbf{X}}^{\leq p})

\displaystyle\approx\sum_{k=1}^{n}f\left(\frac{k}{n}\right)b_{k,n}(\langle\pi,% \hat{\mathbf{X}}^{\leq\infty}_{T}\rangle).

4. Stochastic market prices - lifting to rough paths

While the methodologies for numerical approximation of complex derivative payoffs we propose here will be in the spirit of model free finance, we will also connect the results to classical pricing when the underlying stock is assumed to be a semi-martingale. More specifically, we consider the case of an Itô process of the form

\mathop{}\!\mathrm{d}Y_{t}=\mu_{t}\mathop{}\!\mathrm{d}t+\sigma_{t}\mathop{}\!% \mathrm{d}B_{t},\quad Y_{0}=y\in\mathbb{R}^{d}.

(4.1)

Here $\mu$ and $\sigma$ are square integrable processes and adapted to the filtration $\{\mathcal{F}_{t}\}_{t\in[0,T]}$ generated by the Brownian motion $\{B_{t}\}_{t\in[0,T]}$ . The process $Y$ may be used to model the log-price dynamics of an asset price, or the absolute price, the volatility or any other relevant stochastic asset dynamic.

In light of the theory for rough paths and signatures presented briefly in the beginning of Section 2, it is only natural to ask whether the signature can be constructed above the stochastic process $\{Y\}_{t\in[0,T]}$ . More precisely, one wants to make sure that for almost all $\omega\in\Omega$ , the following map exists

Y(\omega)\mapsto\mathbf{Y}^{\leq\infty}(\omega)=\left(1,Y(\omega),\left(\int Y% \otimes\mathop{}\!\mathrm{d}Y\right)(\omega),\ldots\right)\in T((\mathbb{R}^{d% })).

Since $B$ is a Brownian motion, the regularity of $Y$ will be of finite $p$ -variation for $p\geq 2$ . There is therefore no canonical construction of the iterated integral $\int Y(\omega)\mathop{}\!\mathrm{d}Y(\omega)$ . However, since $Y$ is a semi-martingale, we can use this probabilistic structure to construct the iterated integrals $\int Y\mathop{}\!\mathrm{d}Y$ as random variables in $L^{2}(\Omega)$ . As is well-known, there exist different choices of constructing this integral as a random variable, with Itô or Stratonovich integration as the most commonly used. It is up to the application at hand which integral to use for the specific task. Typically, in financial models, Itô integration is selected as this preserves adaptedness and a martingale structure, necessary for arbitrage-free pricing. Given the choice of integration, using the Burkholder-Gundy-Davis inequality, one can then apply Kolmogorov’s continuity theorem to identify a subset $\Omega^{*}\subset\Omega$ of full measure such that for each $\omega\in\Omega^{*}$ there exists a realization of the iterated integral $\left(\int Y\otimes\mathop{}\!\mathrm{d}Y\right)(\omega)$ , see e.g. [FH14, Section 3]. Moreover, one can verify that this object satisfies Chen’s relation

Y_{s,u}(\omega)\otimes Y_{u,t}(\omega)=\left(\int_{s}^{t}Y_{s,r}\otimes\mathop% {}\!\mathrm{d}Y_{r}\right)(\omega)\\ -\left(\int_{s}^{u}Y_{s,r}\otimes\mathop{}\!\mathrm{d}Y_{r}\right)(\omega)-% \left(\int_{u}^{t}Y_{u,r}\otimes\mathop{}\!\mathrm{d}Y_{r}\right)(\omega).

We can therefore conclude that $\left(Y(\omega),\left(\int Y\otimes\mathop{}\!\mathrm{d}Y\right)(\omega)\right)$ is a $2$ -rough path according to Definition 2.10. By applying Theorem 2.11 we know that also the signature $\mathbf{Y}^{\leq\infty}(\omega)$ exists.

For practical financial purposes, we are interested in computing the expected value of multivariate monomials of signature functionals, as will be seen as a crucial component of the pricing approximation. More precisely, the price $p_{t}$ at time $t\geq 0$ of a contingent claim on a financial asset with payoff at some future time $T\geq t$ can be written as

p=\mathbb{E}[F(\hat{Y})].

where $F$ is a pay-off functional, possibly dependent on the whole price path $Y$ . As previously described, a pay-off functional $F$ is a functional on the rough paths lift $\mathbf{Y}^{\leq p}$ to show the dependence on the chosen rough paths lift, i.e. stochastic integration choice. In the next section we will show how this price, given as the conditional expectation, can be approximated by a sum of different correlators of signature terms of the form

\mathbb{E}[\prod_{i=1}^{n}\langle\pi_{i},\hat{\mathbf{Y}}^{\leq\infty}\rangle^% {m_{i}}],\quad\mathrm{for}\quad\{\pi_{i}\}_{i=1}^{n}\in T((\mathbb{R}^{d})^{*}% ),\quad m\in\mathbb{N}^{n}.

This is in contrast to the functional approximation considered in [LNPA20] where one computes the complete expected signature $\mathbb{E}[\hat{\mathbf{Y}}^{\leq\infty}]$ , or dynamically as $\mathbb{E}[\hat{\mathbf{Y}}^{\leq\infty}\,|\,\mathcal{F}_{t}]$ , and then consider $\langle\pi,\mathbb{E}[\hat{\mathbf{Y}}^{\leq\infty}]\rangle$ . In the setting of an Itô process $Y$ , the latter methodology requires one to solve a (very) high dimensional Kolmogorov equation as described on [LN15]. However, since $Y$ is an Itô process, then one can show that $t\mapsto\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}_{t}\rangle$ is a real valued Itô process, as proven in the proposition below. Thus computing $\mathbb{E}[f(\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}\rangle)]$ can be done by standard use of a (low) dimensional Kolmogorov equation, and then one must do this computation for $f(x)=x^{n}$ and several different $n$ . Indeed, we provide a simple proof of this claim:

Proposition 4.1.

Suppose $Y$ is an Itô process of the form (4.1) where $\mu\in L^{2}(\Omega\times[0,T])$ and $\sigma\in L^{\infty}(\Omega\times[0,T])$ are adapted to the filtration generated by the Brownian motion. Consider the Itô lift $Y\mapsto\mathbf{Y}^{\leq\infty}$ . Then for any $\pi\in T((\mathbb{R}^{d})^{*})$ the process $t\mapsto\langle\pi,\mathbf{Y}^{\leq\infty}\rangle$ is an Itô process.

Proof.

Let $\pi=w$ be a single word of length $n$ , given of the form $w=\mathbf{i}_{1}\dots\mathbf{i}_{n}$ for $\mathbf{i}_{j}\in\{1,\ldots,d\}$ for all $j=1,\ldots,n$ . If $n=1$ , it follows that $t\mapsto\langle\mathbf{i}_{1},\mathbf{Y}^{\leq\infty}\rangle$ is an Itô process by by definition. When $n\geq 2$ , assume that for all words $w$ of length $n-1$ , $\langle w,\mathbf{Y}^{\leq\infty}\rangle$ is a square integrable and adapted. We then use the recursive definition of the signature in 2.4 to see that

\langle\mathbf{i}_{1}\dots\mathbf{i}_{n},\mathbf{Y}^{\leq\infty}_{s,t}\rangle=% \int_{s}^{t}\langle\mathbf{i}_{1}\dots\mathbf{i}_{n-1},\mathbf{Y}^{\leq\infty}% _{s,r}\rangle\mu_{r}^{i_{n}}\mathop{}\!\mathrm{d}r\\ +\sum_{j=1}^{d}\int_{s}^{t}\langle\mathbf{i}_{1}\dots\mathbf{i}_{n-1},\mathbf{% Y}^{\leq\infty}_{s,r}\rangle\sigma_{r}^{i_{n},j}\mathop{}\!\mathrm{d}B^{j}_{r}.

To see that this process is an Ito process we must verify that it is square integrable. Using that $\sigma\in L^{\infty}(\Omega\times[0,T])$ , then by Hölder’s inequality we have that for all $[s,t]\subset[0,T]$

	$\displaystyle\\|\langle\mathbf{i}_{1}\dots\mathbf{i}_{n-1},\mathbf{Y}^{\leq% \infty}_{s,\cdot}\rangle\sigma_{\cdot}^{i_{n},j}\\|_{L^{2}(\Omega\times[s,t])}$
	$\displaystyle\qquad\qquad\leq\\|\sigma^{i_{n},j}\\|_{L^{\infty}(\Omega\times[s,t% ])}\\|\langle\mathbf{i}_{1}\dots\mathbf{i}_{n-1},\mathbf{Y}^{\leq\infty}_{s,% \cdot}\rangle\\|_{L^{2}(\Omega\times[s,t])}.$

By the inductive hypothesis, $\|\langle\mathbf{i}_{1}\dots\mathbf{i}_{n-1},\mathbf{Y}^{\leq\infty}_{s,\cdot}% \rangle\|_{L^{2}(\Omega\times[0,T])}<\infty$ and so the product of the two is square integrable. Adaptedness follows immediately by the inductive hypothesis. ∎

For certain choices of $\pi\in T((\mathbb{R}^{d})^{*})$ and assumptions of the underlying stochastic process $Y$ , we can compute explicit expressions for the signature moments of the form $\mathbb{E}[\langle\pi,\mathbf{Y}^{\leq\infty}\rangle^{n}]$ for $n\in\mathbb{N}$ . We illustrate this through some common choices in the following examples.

Example 4.2.

Let $\{(B_{t}^{1},B^{2}_{t})\}_{t\in[0,T]}$ be a two dimensional Brownian motion with independent components, and consider the path $\hat{Y}_{t}=(t,B_{t}^{1},B_{t}^{2})$ and the word $\pi=\mathbf{21}-\mathbf{31}$ . Recall that this choice of $\pi$ relates to the spread options case considered in Example 3.4. We are interested in computing the $n$ -th moment of $\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}\rangle$ . We first see that $\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}_{s,t}\rangle=\int_{s}^{t}B_{r}^{1}-B^% {2}_{r}\mathop{}\!\mathrm{d}r$ . Note that since $(B^{1}_{t},B^{2}_{t})$ is a normally distributed random variable, their difference $\bar{B}_{r}:=B^{1}_{r}-B^{2}_{r}\sim\mathcal{N}(0,2r)$ . A simple argument based on the Itô formula for $t\bar{B}_{t}$ shows that

\int_{0}^{t}\bar{B}_{r}\mathop{}\!\mathrm{d}r=\int_{0}^{t}(t-r)\mathop{}\!% \mathrm{d}\bar{B}_{r}=\int_{0}^{t}(t-r)\sqrt{2}dB_{r},

where $B_{r}$ is a standard Brownian motion. Thus, using the Itô isometry, we see that

\mathbb{E}[\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}_{s,t}\rangle^{2}]=\mathbb{% E}\left[\left(\int_{0}^{t}\bar{B}_{r}\mathop{}\!\mathrm{d}r\right)^{2}\right]=% \frac{2}{3}t^{3}.

By Gaussianity, it furthermore follows that,

\mathbb{E}[\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}_{s,t}\rangle^{n}]=\left(% \frac{2}{3}t^{3}\right)^{\frac{n}{2}}(n-1)!!\quad\mathrm{for\,\,even\,\,}n,

(4.2)

and $\mathbb{E}[\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}_{s,t}\rangle^{n}]=0$ for odd $n$ . Here, $m!!=m(m-2)(m-4)\dots 3\cdot 1$ for $m\in\mathbb{N}$ being an odd number.

The above example shows that the moments of words applied to signatures are very easy to calculate when the path is the time-extended Brownian motion. [CSF21, Section 4.5] derive an explicit formula for the expected signature of the time-extended path of a $d$ -dimensional Brownian motion. According to formula (4.18) in [CSF21], one has

\langle w,\mathbb{E}[\mathbf{Y}^{\leq N}_{0,t}]\rangle=\frac{(t/2)^{n/2}}{(n/2% )!}\prod_{k=0}^{n/2-1}\mathbf{1}_{i_{n-2k}=i_{n-2k-1}}

(4.3)

for words $w=e_{i_{1}}\otimes..\otimes e_{i_{n}}$ where $\{i_{1},\ldots,i_{n}\}\in\{1,\ldots,d\}^{n}$ , even $n\in\mathbb{N}$ with $n\leq N$ and $e_{i}$ is the $i$ th unit vector in $\mathbb{R}^{d}$ . The tensor products are here interpreted as the non-symmetric ones and the iterated integrals in the signature are interpreted in the Stratonovich-sense. We recall that in a financial setting however, we are mostly interested in martingale structures, typically guaranteed with the Itô lift, and must there also compute certain Itô-Stratonovich corrections that we do not consider further here. When $n$ is odd, the expected signature is zero. An extension of (4.3) to correlated Brownian motions are found in [CGSF23, Thm. A.1]. Equation (4.3) is of similar complexity as (4.2) in the example above. However, we see that to compute the expected signature, we need to find the whole representation of the expected signature (up to depth $N$ ) before applying it to words, whereas in our approach, we first identify the different terms in the signature which we need according to the words we are given, and then compute the expected moments in question. The words in the former approach might be very long as we have converted the moments into linear representations, and thus $N$ becomes big.

Towards signature approximations of complex derivative prices, we will need to investigate correlators of functionals of the signature. More precisely, given a multi-index $m\in\mathbb{N}^{n}$ , and $n$ weighted words $\{\pi_{i}\}_{i=1}^{n}\in T((\mathbb{R}^{d})^{*})$ we are interested in computing the correlator

\rho_{m}=\mathbb{E}\left[\left(\langle\pi_{1},\mathbf{Y}^{\leq\infty}\rangle,% \ldots,\langle\pi_{n},\mathbf{Y}^{\leq\infty}\rangle\right)^{m}\right],

(4.4)

where we recall that for a vector $x\in\mathbb{R}^{n}$ we define $x^{m}=x^{m_{1}}\dots x^{m_{n}}$ . Correlators appear in statistical turbulence theory as interesting objects to study, see [BNBV18]. We also mention [BL21] for correlators applied to financial derivatives pricing along with polynomial processes. Computing these correlators for some arbitrary sequence $\{\pi_{i}\}_{i=1}^{n}$ can be challenging, and if $n$ and $d$ gets large, might even become unfeasible. However, for certain linear functionals $\pi_{i}$ , typically given as short words or even single letters, the computation may be analytically tractable by invoking stochastic structures, even for large $n$ .

In contrast, the signature methodologies presented in [LNPA19, LNPA20] would require one to compute the expected signature term $\langle\phi,\mathbb{E}[\mathbf{Y}^{\leq\infty}]\rangle$ for some $\phi\in T((\mathbb{R}^{d})^{*})$ which potentially becomes a very long sum of very large words. Indeed, given that $\mathbf{Y}^{\leq\infty}$ is a geometric signature, as already discussed in Remark 3.13, using the shuffle product from Lemma 2.14, it is possible to find a $\phi\in T((\mathbb{R}^{d})^{*})$ such that

\left(\langle\pi_{1},\mathbf{Y}^{\leq\infty}\rangle,\ldots,\langle\pi_{n},% \mathbf{Y}^{\leq\infty}\rangle\right)^{m}=\langle\phi,\mathbf{Y}^{\leq\infty}\rangle.

Computationally, it will be more expensive to compute the right hand side than the left hand side, since one would need to compute the complete signature up to a very high degree (i.e. the length of the longest single word in $\phi$ ). Furthermore, we have that the correlator $\rho_{m}=\mathbb{E}[\langle\phi,\mathbf{Y}^{\leq\infty}\rangle]$ , and thus computing these expected signature terms will be challenging. Even in the setting where the underlying path solves an Itô SDE, one is required to solve a Fokker-Planck equation with values in the tensor algebra $T((\mathbb{R}^{d}))$ , which is numerically very challenging (see e.g. [LN15]).

5. Approximation of exotic derivatives

We are now ready to present the core results of this article, namely, an approximation formula for exotic, path dependent, financial derivatives. To this end, we begin with an assumption on the probability spaces we work with for the admissible price paths.

Hypothesis 5.1 (Market prices and probability measures).

We consider a complete stochastic basis $(\Omega,\mathcal{F},\{\mathcal{F}_{t}\}_{t\in[0,T]},\mathbb{P})$ supporting market prices as measurable maps from $\Omega$ to $\hat{V}^{p}_{c}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ , with the property that for any $\tilde{\epsilon}>0$ there exists a compact subset $\mathcal{K}_{\epsilon}\subset\hat{V}^{p}_{c}([0,T];T^{\lfloor p\rfloor}(% \mathbb{R}^{d}))$ such that

\mathbb{P}(\hat{\mathbf{X}}^{\leq p}\in\mathcal{K}_{\epsilon})\geq 1-\tilde{% \epsilon}.

Remark 5.2.

Let $E$ be a separable and complete metric space. Then every Borel probability measure $\mu$ on $E$ is tight; that is, for every $\epsilon>0$ , there exists a compact set $\mathcal{K}_{\epsilon}\subseteq E$ such that $\mu(\mathcal{K}_{\epsilon})>1-\epsilon$ , see [Lin86]. Consequently, if $E:=\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ were separable, the hypothesis above would be unnecessary.

However, as shown in [FV10, Sec. 8], the space $V_{c}^{p}([0,T],G^{N}(\mathbb{R}^{d}))$ is in general not separable. In contrast, the closure of smooth paths with values in the $N$ -step free Lie group $\mathbb{R}^{d}$ in $p$ -variation norm, denoted by $\overline{C^{\infty}([0,T],G^{N}(\mathbb{R}^{d}))}^{p}$ , is separable. Note also that we have the inclusion of spaces

\bigcup_{1\leq q<p}V_{c}^{q}([0,T],G^{N}(\mathbb{R}^{d}))\subseteq\overline{C^% {\infty}([0,T],G^{N}(\mathbb{R}^{d}))}^{p}\subseteq V_{c}^{p}([0,T],G^{N}(% \mathbb{R}^{d}))

Since $E$ is not separable, we must explicitly assume that all distributions $\mathbb{P}_{\hat{\mathbb{X}}^{\leq p}}$ on $E$ are tight, or equivalently, that they are Radon measures (finite tight Borel measures). In the non-separable setting of $E$ , Radon measures - characterized by their separable image - are the "right" type of measures to consider. For further details, see [Lin86] or [Bil68].

The above market prices and probability measures provides a broad framework for pricing. In the following we will not deal with the problem of risk neutral prices, and rather refer the reader to [Arr18, LNPA20] for a discussion on this point. For the purpose here, the reader may assume that the probability measure $\mathbb{P}$ chosen is a risk neutral measure in the context of the given pricing problem.

Theorem 5.3.

Let $(\Omega,\mathcal{F},\{\mathcal{F}_{t}\}_{t\in[0,T]},\mathbb{P})$ be a filtered probability space satisfying Hypothesis 5.1, and suppose the price of a financial derivative, denoted by $p$ , can be represented by an adapted payoff functional $F$ acting on the set of random price paths $L^{k}(\Omega;\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d})))$ for some sufficiently large $k$ (see Remark 5.4), and is given by

p=\mathbb{E}[F(\hat{\mathbf{X}}^{\leq p})].

(5.1)

Furthermore, suppose for all $t\in[0,T]$ , $\mathbb{E}[|F(\hat{\mathbf{X}}^{\leq p})|^{q}]=M<\infty$ for some $q\geq 1$ . Then for any $\epsilon>0$ there exists a finite set $\mathcal{N}\subset\mathbb{N}^{d}$ and a sequence of numbers $\{\alpha_{m}\}_{m\in\mathcal{N}}$ such that

\left|p-\sum_{m\in\mathcal{N}}\alpha_{m}\rho_{m}\right|<\epsilon,\quad\mathbb{% P}-a.s.

Where $\rho_{m}$ denotes the signature correlator from (4.4) for the stochastic process $\hat{\mathbf{X}}^{\leq\infty}$ .

Remark 5.4.

Here we consider payoff functionals as functionals acting on random variables with finite $k$ -moments. The requirement on $k$ will depend on the approximation accuracy that is desired, since finiteness of the correlators $\rho_{m}$ is only guaranteed from the moments of the price paths.

Proof.

We begin to observe that for some suffieciently large compact subset $\mathcal{K}_{\tilde{\epsilon}}\subset\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(% \mathbb{R}^{d}))$ we have

	$\displaystyle\|\mathbb{E}[F(\hat{\mathbf{X}}^{\leq p})]\|$	$\displaystyle=\|\mathbb{E}[F(\hat{\mathbf{X}}^{\leq p})\mathbf{1}_{\mathcal{K}_% {\tilde{\epsilon}}}\|]\|+\mathbb{E}[\|F(\hat{\mathbf{X}}^{\leq p})\mathbf{1}_{% \mathcal{K}_{\tilde{\epsilon}}^{c}}\|]$
		$\displaystyle\leq M^{\frac{1}{q}}\tilde{\epsilon}^{\frac{q-1}{q}}+\mathbb{E}[F% (\hat{\mathbf{X}}^{\leq p})\mathbf{1}_{\mathcal{K}_{\tilde{\epsilon}}}].$

In the last inequality we have applied Hölders inequality to the product $\mathbf{1}_{\mathcal{K}_{\tilde{\epsilon}}}F(\hat{\mathbf{X}}^{\leq p})$ , invoking the bound on the $q$ -moment of $F(\hat{\mathbf{X}}^{\leq p})$ as well as Hypothesis 5.1 on the probability measure to get that $(\mathbb{E}[|\mathbf{1}_{\mathcal{K}_{\tilde{\epsilon}}}|^{\frac{q}{q-1}}])^{% \frac{q-1}{q}}\leq\tilde{\epsilon}^{\frac{q-1}{q}}$ . Now, apply Theorem 3.3 to the payoff functional $F(\mathbf{X}^{\leq p})$ and the expectation acting on the corresponding signature polynomial from Theorem 3.3 then yields the correlators from (4.4) applied to the signature $\hat{\mathbf{X}}^{\leq\infty}$ . Since $\tilde{\epsilon}$ can be chosen arbitrarily small by assumption (just choose $\mathcal{K}_{\epsilon}$ larger), and likewise for the universal approximation, the proof is complete. ∎

The next corollary enables further simplifications for pricing approximation by invoking Hypothesis 3.5 on the payoff functional.

Corollary 5.5.

Suppose the price of a financial derivative, denoted by $p$ , can be represented by a payoff functional $F$ acting on the set of admissible price paths $\in\hat{V}^{p}([0,T];T^{\lfloor p\rfloor}(\mathbb{R}^{d}))$ , and is given by (5.1). Furthermore, suppose that for $F$ Hypothesis 3.5 holds for some function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ . Then for any $\epsilon>0$ there exists a finite set $\mathcal{N}\subset\mathbb{N}^{d}$ and a sequence of numbers $\{\alpha_{m}\}_{m\in\mathcal{N}}$ such that

\left|p-\sum_{m\in\mathcal{N}}\alpha_{m}\rho_{m}\right|<\epsilon,

Proof.

This follows from the universal approximation in Theorem 3.7, together with a similar probabilistic analysis as in the proof of Theorem 5.3. ∎

We illustrate the above corollary by considering a specific choice of functional, common in financial practice, namely the max function acting on the signature; the functionals for certain Asian options.

Example 5.6 (Simple Asian option).

We will in this example consider a very basic Asian option, and show how moments of the signature can be used to approximate this price. Of course, there are well known formulas and approximations (see e.g. [GY93]) for the price of an Asian option in the Itô setting, but we believe that this is instructive. In incomplete markets, one could also imagine that $X$ is not an Itô process, and therefore the following expansion could still be an interesting pricing technique.

Let us consider the standard Asian call option payoff function

F(\hat{\mathbf{X}}^{\leq p})=f(\langle\mathbf{21},\hat{\mathbf{X}}^{\leq\infty% }\rangle)=\max\left(0,\int_{0}^{t}X_{s}ds-K\right)

A smooth approximation for the max-function is given by $\bar{f}(x)=x\sigma(Nx)=\frac{x}{1+e^{-Nx}}$ , where $\sigma(x)=\frac{1}{1+e^{-x}}$ is the well-known sigmoid function and for some large enough $N\in\mathbb{N}$ . The figure below shows how this function, $\bar{f}$ , resembles the max-function.

Refer to caption — Figure 5.1. Plot of $\bar{f}(x)=\frac{x}{1+e^{-Nx}}$

We have a Maclaurin series for the Sigmoid function $\sigma(Nx)=\frac{1}{1+e^{-Nx}}$ , and then multiplying by $x$ we get :

\bar{f}(x)=\sum_{n=0}^{\infty}\frac{(-1)^{n}E_{n}(0)}{(2n)!}N^{n}x^{n+1},

(5.2)

where $E_{n}(x)$ is the Euler polynomial. Truncating this approximation at level $M$ , one get a price approximation for

p=\mathbb{E}[F(\hat{\mathbf{X}}^{\leq p})]\approx\sum_{n=0}^{M}\frac{(-1)^{n}E% _{n}(0)}{(2n)!}N^{n}\mathbb{E}\left[\langle\mathbf{21},\hat{\mathbf{X}}^{\leq% \infty}\rangle^{n+1}\right].

Providing further theoretical convergence rates can be done under certain assumptions on the moments $\mathbb{E}[\langle\mathbf{21},\hat{\mathbf{X}}^{\leq\infty}\rangle^{n}]$ , but will not be further dealt with here. However, we believe that the example highlights how moments of signatures can be used in derivative price approximation.

5.1. Pricing exotic derivatives in electricity markets

We consider here some cases of interest in pricing and valuation in electricity markets.

The so-called quality factor is used to assess the profitability of renewable power production such as solar or wind. It measures the income relative to a plant with fixed base load price producing the same volume. The quality factor is defined as

Q=\frac{\int_{0}^{T}V_{s}P_{s}\,ds}{\int_{0}^{T}V_{s}\,ds\frac{1}{T}\int_{0}^{% T}P_{s}\,ds}\,.

(5.3)

Here, $V_{t}$ is the volume power produced at time $t$ and $P_{t}$ is the spot power price. A natural question is to ask what is the expected quality factor, i.e., $\mathbb{E}[Q]$ . The expectation is either under the risk-adjusted probability or the market probability.

The volume produced $V_{t}$ is given by the installed capacity $c$ (measured in megawatt (MW)) times the capacity factor $C_{t}$ . The process $(C_{t})$ takes values between 0 and 1, measuring the amount of production from solar or wind in a power plant of capacity 1 MW. We also introduce a maximal power price for the market, denoted $P_{\infty}$ , which can be an upper limit we believe never will be exceeded in practice. Hence, with $V_{t}=cC_{t}$ and $S_{t}=P_{t}/P_{\infty}$ we get

	$\displaystyle Q$	$\displaystyle=\frac{\frac{1}{T}\int_{0}^{T}C_{s}S_{s}\,ds}{\frac{1}{T}\int_{0}% ^{T}C_{s}\,ds\frac{1}{T}\int_{0}^{T}S_{s}\,ds}$
		$\displaystyle=\frac{1}{T}\int_{0}^{T}C_{s}S_{s}\,ds\frac{1}{1+(\frac{1}{T}\int% _{0}^{T}C_{s}\,ds-1)}\frac{1}{1+(\frac{1}{T}\int_{0}^{T}S_{s}\,ds-1)}\,.$

But since $C_{s},S_{s}\in(0,1)$ , we have that $(1/T)\int_{0}^{T}C_{s}\,ds-1\in(-1,0)$ and similarly $(1/T)\int_{0}^{T}S_{s}\,ds-1\in(-1,0)$ , which yields that

	$\displaystyle Q$	$\displaystyle=\frac{1}{T}\int_{0}^{T}C_{s}S_{s}\,ds\sum_{m=0}^{\infty}(-1)^{m}% \left(\frac{1}{T}\int_{0}^{T}C_{s}-1\,ds\right)^{m}$
		$\displaystyle\qquad\times\sum_{n=0}^{\infty}(-1)^{n}\left(\frac{1}{T}\int_{0}^% {T}S_{s}-1\,ds\right)^{n}\,.$

Introduce the time-enhanced process $Y$ where $Y_{t}=(t,C_{t},S_{t})$ . Since $C_{s}S_{s}=\langle\mathbf{2},\hat{\mathbf{Y}}_{s}^{\leq\infty}\rangle\langle% \mathbf{3},\hat{\mathbf{Y}}_{s}^{\leq\infty}\rangle$ , it follows from the shuffle property in Lemma 2.14 that

\int_{0}^{T}C_{s}S_{s}\,ds=\int_{0}^{T}C_{0,s}S_{0,s}\,ds+C_{0}\int_{0}^{T}S_{% s}\,ds+S_{0}\int_{0}^{T}C_{s}\,ds=\langle\pi_{1},\hat{\mathbf{Y}}_{T}^{\leq% \infty}\rangle\,,

where

\pi_{1}=(\mathbf{2}\shuffle\mathbf{3})\mathbf{1}+S_{0}\mathbf{3}\mathbf{1}+C_{% 0}\mathbf{2}\mathbf{1},

as long as $Y$ is a geometric rough path. Moreover, we notice that $\int_{0}^{T}C_{s}-1\,ds=\langle\pi_{2},\hat{\mathbf{Y}}_{T}^{\leq\infty}\rangle$ and $\int_{0}^{T}S_{s}-1\,ds=\langle\pi_{3},\hat{\mathbf{Y}}_{T}^{\leq\infty}\rangle$ with $\pi_{2}=\mathbf{2}\mathbf{1}+(C_{0}-1)\mathbf{1}$ and $\pi_{3}=\mathbf{3}\mathbf{1}+(S_{0}-1)\mathbf{1}$ , resp. Hence, after truncation, we compute an approximation of the expected value of the quality factor by

\mathbb{E}[Q]\approx\frac{1}{T}\sum_{m=0}^{M}\sum_{n=0}^{N}(-T)^{-(n+m)}% \mathbb{E}\left[\langle\pi_{1},\hat{\mathbf{Y}}_{T}^{\leq\infty}\rangle\langle% \pi_{2},\hat{\mathbf{Y}}_{T}^{\leq\infty}\rangle^{m}\langle\pi_{3},\hat{% \mathbf{Y}}_{T}^{\leq\infty}\rangle^{n}\right]

(5.4)

We can simulate the functionals inside the above expectation given stochastic models of $C$ and $S$ . Notice that we only need to have available simulations of the three functionals in order to compute the expectations by Monte Carlo for any orders of $m$ and $n$ . If we appeal to Lemma 2.14 we can use the shuffle property to re-state the signature functionals inside the expectation to $\langle\pi,\hat{\mathbf{Y}}_{T}^{\leq\infty}\rangle$ , however, $\pi$ will depend on $m$ and $n$ and therefore we would need to consider higher and higher signatures in the calculation. This shows the power in our approach.

Remark 5.7.

In the above discussion of the computation of the expected quality factor, we have to assume $Y$ being a geometric rough path in order to re-express a product between the capacity factor and price processes $C_{s}P_{s}$ as a functional on the signature. The capacity factor process is derived from wind speeds or solar irradiation, where empirical studies indicate that higher-order continuous time autoregressive (CAR) processes are suitable (see e.g. [BvB09] for wind and [LGB23] for solar). The CAR-processes are of order 2 or higher, implying that the paths are continuously differentiable and thus of finite 1-variation. On the other hand, different studies point to power spot prices being sums of Ornstein-Uhlenbeck processes (see [LS02]), or even fractional models with Hurst parameters less than 0.5 (see [Ben17]), and therefore may have regularity at most as Brownian motion. We notice however, that power spot prices (which is what the process $P$ models) are by definition only available on a discrete time grid (typically of hourly granularity), and hence we can in principle imagine paths which are of high regularity when considering continuous-time paths.

Recall Example 3.4. Quanto options are options with a product payoff on two underlying assets, typically being a call or put on price and a volume variable. The volume variable is indicating production indirectly, for example through temperature (which controls the demand for power) or wind speed (which controls the amount of renewable wind power that can be generated). Let now $V_{t}$ be the volume-variable (temperature, wind…). In power, a typical option is a call on average volume over a period, and a put on the price average over the same period.¹¹1This gives an insurance against too low prices when renewable power production is high.,

\max\left(\frac{1}{T}\int_{0}^{T}V_{s}\,ds-K,0\right)\max\left(L-\frac{1}{T}% \int_{0}^{T}P_{s}\,ds,0\right)

In general, this can be expressed as

F(Y)_{T}=f_{-K}(\langle w_{2},\hat{\mathbf{Y}}^{\leq\infty}_{T}\rangle)f_{L}(% \langle w_{3},\hat{\mathbf{Y}}^{\leq\infty}_{T}\rangle)

where $f_{a}(x)=\max(0,x+a)$ for some $a\in\mathbb{R}$ , with words $w_{2}=\mathbf{2}\mathbf{1}+V_{0}\mathbf{1}$ and $w_{3}=-\mathbf{3}\mathbf{1}-P_{0}\mathbf{1}$ . Here, the time-enhanced process is $Y_{t}=(t,V_{t},P_{t})$ . If we have power series expansions of $f$ and $g$ available, we can find an approximation of the price expressed by the risk-adjusted expectation by computing terms of the kind

\mathbb{E}\left[\langle w_{2},\hat{\mathbf{Y}}_{T}^{\leq\infty}\rangle^{m}% \langle w_{3},\hat{\mathbf{Y}}_{T}^{\leq\infty}\rangle^{n}\right]\,.

These expected values are simplified versions of the expectation for the quality factor in (5.4) discussed above. The payoff of quanto options motivates studying the following general structure: let $h:\mathbb{R}^{d}\rightarrow\mathbb{R}$ be some measurable function, and consider the payoff

F(Y)_{T}=h(\langle\pi_{1},\hat{\mathbf{Y}}^{\leq\infty}_{T}\rangle,\ldots,% \langle\pi_{d},\hat{\mathbf{Y}}^{\leq\infty}_{T}\rangle)

for words $\pi_{1},\ldots,\pi_{d}$ and path $Y$ . Indeed, the quality factor is itself an example of a specification of an $h$ . The time-enhanced price path may also consist of more variables that $V$ and $P$ . For example, one could have an option settled on temperature, wind and price.

Example 5.8.

In this example we will illustrate how specific structural assumptions on the driving underlying price processes can be used to make price approximations expcicit by leveraging moment computations. Consider two correlated Ornstein-Uhlenbeck processes

	$\displaystyle Y^{1}_{t}$	$\displaystyle=-a_{1}Y^{1}_{t}dt+\sigma_{1}dB^{1}_{t}$
	$\displaystyle Y^{2}_{t}$	$\displaystyle=-a_{2}Y^{2}_{t}dt+\sigma_{2}(\rho\mathop{}\!\mathrm{d}B_{t}^{1}+% \sqrt{1-\rho^{2}}\mathop{}\!\mathrm{d}B^{2}_{t})$

for two independent Brownian motions $B^{1}$ and $B^{2}$ , and $\rho\in(-1,1)$ is the correlation coefficient. Solving explicitly, we see that the difference $\bar{Y}_{t}=Y^{1}_{t}-Y^{2}_{t}$ is given by

\bar{Y}_{t}=e^{-a_{1}t}Y^{1}_{0}-e^{-a_{2}t}Y^{2}_{0}+\int_{0}^{t}\sigma_{1}e^% {-a_{1}(t-s)}-\rho\sigma_{2}e^{-a_{2}(t-s)}\mathop{}\!\mathrm{d}B^{1}_{s}\\ -\int_{0}^{t}e^{-a_{2}(t-s)}\sigma_{2}\sqrt{1-\rho^{2}}\mathop{}\!\mathrm{d}B_% {s}^{2}.

(5.5)

We then see that $\bar{Y}$ is normally distributed with mean $e^{-a_{1}t}Y^{1}_{0}-e^{-a_{2}t}Y^{2}_{0}$ , and second moment given by

\mathbb{E}[\bar{Y}^{2}_{t}]=\int_{0}^{t}\sigma_{1}^{2}e^{-2a_{1}(t-s)}+\sigma_% {2}^{2}e^{-2a_{2}(t-s)}-2\sigma_{1}\rho\sigma_{2}e^{-(a_{1}+a_{2})(t-s)}% \mathop{}\!\mathrm{d}s.

(5.6)

Now, just as in Example 4.2, we again consider the pairing $\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}\rangle=\int_{0}^{t}\bar{Y}_{s}ds=:Z_{% t},$ where $\pi=\mathbf{21}-\mathbf{31}.$ A simple use of Fubini yields that

Z_{t}=\frac{Y^{1}_{0}}{a_{1}}(1-e^{-a_{1}t})-\frac{Y^{2}_{0}}{a_{2}}(1-e^{-a_{% 2}t})+\int_{0}^{t}\frac{\sigma_{1}}{a_{1}}\left(1-e^{-a_{1}(t-u)}\right)\,% \mathrm{d}B^{1}_{u}\\ -\int_{0}^{t}\frac{\rho\sigma_{2}}{a_{2}}\left(1-e^{-a_{2}(t-u)}\right)\,% \mathrm{d}B^{1}_{u}-\int_{0}^{t}\frac{\sigma_{2}\sqrt{1-\rho^{2}}}{a_{2}}\left% (1-e^{-a_{2}(t-u)}\right)\,\mathrm{d}B^{2}_{u}.

(5.7)

Using Itô-isometry, we get the mean and second moment of $Z_{t}=\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}\rangle$ is given by

	$\displaystyle\mu_{Z}:=\mathbb{E}[Z_{t}]$	$\displaystyle=\frac{Y^{1}_{0}}{a_{1}}(1-e^{-a_{1}t})-\frac{Y^{2}_{0}}{a_{2}}(1% -e^{-a_{2}t}).$
	$\displaystyle\sigma^{2}_{Z}:=\mathbb{E}[Z_{t}^{2}]$	$\displaystyle=\left(\frac{Y^{1}_{0}}{a_{1}}(1-e^{-a_{1}t})-\frac{Y^{2}_{0}}{a_% {2}}(1-e^{-a_{2}t})\right)^{2}$
		$\displaystyle+\int_{0}^{t}\left(\frac{\sigma_{1}}{a_{1}}(1-e^{-a_{1}(t-u)})% \right)^{2}\,\mathrm{d}u$
		$\displaystyle+\int_{0}^{t}\left(\frac{\rho\sigma_{2}}{a_{2}}(1-e^{-a_{2}(t-u)}% )\right)^{2}\,\mathrm{d}u$
		$\displaystyle+\int_{0}^{t}\left(\frac{\sigma_{2}\sqrt{1-\rho^{2}}}{a_{2}}(1-e^% {-a_{2}(t-u)})\right)^{2}\,\mathrm{d}u.$

Moreover, since $Z_{t}=\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}\rangle$ is Gaussian, the higher order moments become

\mathbb{E}[\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}\rangle^{n}]=\mathbb{E}[Z_{% t}^{n}]=\sigma_{Z}^{n}(-i\sqrt{2})^{n}U\left(-\frac{p}{2},\frac{1}{2},-\frac{1% }{2}\left(\frac{\mu_{Z}}{\sigma_{Z}}\right)^{2}\right),

where $U$ is the confluent hyper-geometric function. Now, the correlator is given by $\rho_{n}=\mathbb{E}\left[\langle\pi,\hat{\mathbf{Y}}^{\leq\infty}\rangle^{n}\right]$ , hence from Corollary 5.5, we know there exist an $N\in\mathbb{N}$ and $\{\alpha_{n}\}_{n=0}^{N}\subset\mathbb{R}$ such that

p:=\mathbb{E}\left[F(\hat{\mathbf{Y}}^{\leq p})\right]\approx\sum_{n=0}^{N}% \alpha_{n}\rho_{n}=\sum_{n=1}^{N}\alpha_{n}\sigma_{Z}^{n}(-i\sqrt{2})^{n}U% \left(-\frac{p}{2},\frac{1}{2},-\frac{1}{2}\left(\frac{\mu_{Z}}{\sigma_{Z}}% \right)^{2}\right).

Consider the setting where the expected functional is an Asian option, such as analyzed in 5.6. One can then use the coefficients found there to get an explicit formula for the approximation of an Asian spread option, in the case when the underlying processes are assumed to be given by Ornstein-Uhlenbeck processes.

6. Conclusion

In this work, we develop a new type of universal approximation theorem for non-geometric rough paths, addressing the practical challenges of financial applications that naturally involve Itô integration. The results presented provide a robust framework for advancing derivatives pricing methodologies in financial markets, ensuring both computational efficiency and theoretical rigor. By introducing a polynomial-based approximation framework, we demonstrated how complex payoff functionals for financial derivatives can be efficiently represented and approximated using signature terms. This approach bridges the gap between rough path theory and the practical requirements of financial markets, providing a robust tool for pricing exotic derivatives and path-dependent contracts.

Our results highlight the versatility of signatures in capturing the intricacies of stochastic paths while enabling computational efficiency and how this can be used in finance. The proposed framework not only broadens the scope of universal approximation beyond geometric rough paths but also lays a foundation for further exploration of functional approximation in stochastic finance. Specifically, it builds further on the research developed in [LNPA19, LNPA20, Arr18], and provides a new perspective in the Itô setting The methodology for universal approximation here seems also promising in the context of the Volterra signature induced from the analysis in [HT21], and this is currently something we are working on. Several new applications of this method seems promising.

References

[Arr18] Imanol Perez Arribas. Derivatives pricing using signature payoffs, 2018.
[BB02] Douglas Bowman and David M. Bradley. The algebra and combinatorics of shuffles and multiple zeta values. Journal of Combinatorial Theory, Series A, 97(1):43–61, 2002.
[Ben17] Mikkel Bennedsen. A rough multi-factor model of electricity spot prices. Energy Economics, 63:301–313, 2017.
[Ben21] Fred Espen Benth. Pricing of commodity and energy derivatives for polynomial processes. Mathematics, 9(2), 2021.
[Bil68] Patrick Billingsley. Convergence of probability measures. John Wiley & Sons, Inc., New York-London-Sydney, 1968.
[BL21] Fred Espen Benth and Silvia Lavagnini. Correlators of polynomial processes. SIAM Journal of Financial Mathematics, 12(4):1374–1415, 2021.
[BLM15] Fred Espen Benth, Nina Lange, and Tor Aage Myklebust. Pricing and hedging quanto options in energy markets. Journal of Energy Markets, 8(1):1–35, 2015.
[BNBV18] Ole E. Barndorff-Nielsen, Fred Espen Benth, and Almut Veraart. Ambit Stochastics, volume 88 of Probability Theory and Stochastic Modelling. Springer Nature, Cham, 2018.
[Bre11] Haim Brezis. Functional Analysis, Sobolev Spaces and Partial Differential Equations. Universitext. Springer, New York, 2011.
[BvB09] Fred Espen Benth and Jurate Šaltytė Benth. Dynamic pricing of wind futures. Energy Economics, 31(1):16–24, 2009.
[CGSF23] Christa Cuchiero, Guido Gazzani, and Sara Svaluto-Ferro. Signature-based models: theory and calibration. SIAM Journal of Financial Mathematics, 14(3), 2023.
[Che54] Kuo-Tsai Chen. Iterated integrals and exponential homomorphisms. Proceedings of the London Mathematical Society, s3-4(1):502–512, 1954.
[CO22] Ilya Chevyrev and Harald Oberhauser. Signature moments to characterize laws of stochastic processes. Journal of Machine Learning Research, 23(176):1–42, 2022.
[CPSF23] Christa Cuchiero, Francesca Primavera, and Sara Svaluto-Ferro. Universal approximation theorems for continuous functions of càdlàg paths and Lévy-type signature models, 2023.
[CSF21] Christa Cuchiero and Sara Svaluto-Ferro. Infinite-dimensional polynomial processes. Finance & Stochastics, 25:383–426, 2021.
[FH14] Peter K. Friz and Martin Hairer. A Course on Rough Paths. Universitext. Springer, Cham, 2014.
[FV10] Peter K. Friz and Nicolas B. Victoir. Multidimensional Stochastic Processes as Rough Paths: Theory and Applications. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2010.
[Gul24] Jacek Gulgowski. Compactness in the spaces of functions of bounded variation. Zeitschrift für Analysis und ihre Anwendungen, 42, 01 2024.
[GY93] Helyette Geman and Marc Yor. Bessel processes, asian options, and perpetuities. Mathematical Finance, 3:349–375, 1993.
[HL10] Ben Hambly and Terry Lyons. Uniqueness for the signature of a path of bounded variation and the reduced path group. Annals of Mathematics, 171(1):109–167, 2010.
[HT21] Fabian A. Harang and Samy Tindel. Volterra equations driven by rough signals. Stochastic Process. Appl., 142:34–78, 2021.
[Lan02] Serge Lang. Algebra, volume 211 of Graduate Texts in Mathematics. Springer-Verlag, New York, third edition, 2002.
[LCL07] Terry J. Lyons, Michael Caruana, and Thierry Lévy. Differential Equations Driven by Rough Paths, volume 1908 of Lecture Notes in Mathematics. Springer, Berlin, 2007.
[LGB23] Karl Larsson, Rikard Green, and Fred Espen Benth. A stochastic time-series model for solar irradiation. Energy Economics, 117, 2023.
[Lin86] Werner Linde. Probability in Banach spaces—stable and infinitely divisible distributions. A Wiley-Interscience Publication. John Wiley & Sons, Ltd., Chichester, second edition, 1986.
[LN15] Terry Lyons and Hao Ni. Expected signature of a Brownian motion up to the first exit time from a bounded domain. The Annals of Probability, 43(5):2729–2762, 2015.
[LNPA19] Terry Lyons, Sina Nejad, and Imanol Perez Arribas. Numerical method for model-free pricing of exotic derivatives in discrete time using rough path signatures. Applied Mathematical Finance, 26(6):583–597, 2019.
[LNPA20] Terry Lyons, Sina Nejad, and Imanol Perez Arribas. Non-parametric pricing and hedging of exotic derivatives. Applied Mathematical Finance, 27(6):457–494, 2020.
[LS02] Julio J. Lucia and Eduardo S. Schwartz. Electricity prices and power derivatives: Evidence from the nordic power exchange. Review of Derivatives Research, 5(1):5–50, 2002.
[Lyo98] T. Lyons. Differential equations driven by rough signals. Revista Matemática Iberoamericana, pages 215–310, 1998.