Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: arXiv.org perpetual non-exclusive license
arXiv:2305.12511v2 [cs.LG] 06 Apr 2024

PCF-GAN: generating sequential data via the characteristic function of measures on the path space

Hang Lou
Department of Mathematics
University College London
hang.lou.19@ucl.ac.uk
&Siran Li
Department of Mathematics
Shanghai Jiao Tong University
sl4025@nyu.edu
&Hao Ni
Department of Mathematics
University College London
h.ni@ucl.ac.uk
Abstract

Generating high-fidelity time series data using generative adversarial networks (GANs) remains a challenging task, as it is difficult to capture the temporal dependence of joint probability distributions induced by time-series data. Towards this goal, a key step is the development of an effective discriminator to distinguish between time series distributions. We propose the so-called PCF-GAN, a novel GAN that incorporates the path characteristic function (PCF) as the principled representation of time series distribution into the discriminator to enhance its generative performance. On the one hand, we establish theoretical foundations of the PCF distance by proving its characteristicity, boundedness, differentiability with respect to generator parameters, and weak continuity, which ensure the stability and feasibility of training the PCF-GAN. On the other hand, we design efficient initialisation and optimisation schemes for PCFs to strengthen the discriminative power and accelerate training efficiency. To further boost the capabilities of complex time series generation, we integrate the auto-encoder structure via sequential embedding into the PCF-GAN, which provides additional reconstruction functionality. Extensive numerical experiments on various datasets demonstrate the consistently superior performance of PCF-GAN over state-of-the-art baselines, in both generation and reconstruction quality.

1 Introduction

Generative Adversarial Networks (GANs) have been a powerful tool for generating complex data distributions, e.g., image data. The original GAN suffers from optimisation instability and mode collapse, partially remedied later by an alternative training scheme using integral probability metric (IPM) in lieu of Jensen–Shannon divergence. The IPMs, e.g., metrics based on Wasserstein distances or Maximum Mean Discrepancy (MMD), consistently yield good measures between generated and real data distributions, thus resulting in more powerful GANs on empirical data ([14, 2, 24]).

More recently, [1] proposed an IPM based on the characteristic function (CF) of measures on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, which has the characteristic property, boundedness, and differentiability. Such properties enable the GAN constructed using this IPM as discriminator (“CF-GAN”) to stabilise training and improve generative performance. However, ineffective in capturing the temporal dependency of sequential data, such CF-metric fails to address high-frequency cases due to the curse of dimensionality. To tackle this issue, we take the continuous time perspective of time series and lift discrete time series to the path space ([28, 29, 23]). This allows us to treat time series of variable length, unequal sampling, and high frequency in a unified approach. We propose a path characteristic function (PCF) distance to characterise distributions on the path space, and propose the corresponding PCF distance as a novel IPM to quantify the distance between measures on the path space.

Built on top of the unitary feature of paths ([26]), our proposed PCF has theoretical foundations deeply rooted in the rough path theory ([7]), which exploits the non-commutativity and the group structure of the unitary feature to encode information on order of paths. The CF may be regarded as the special case of PCF with linear random path and 1×1111\times 11 × 1 unitary matrix. We show that the PCF distance (PCFD) possesses favourable analytic properties, including boundedness and differentiability in model parameters, and we establish the linkages between PCFD and MMD. These results vastly generalise classical theorems on measures on dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ([1]), with much more technically involved proofs due to the infinite-dimensionality of path space.

On the numerical side, we design an efficient algorithm which, by optimising the trainable parameters of PCFD, maximises the discriminative power and improves the stability and efficiency of GAN training. Inspired by [25, 41], we integrate the proposed PCF into the IPM-GAN framework, utilising an auto-encoder architecture specifically tailored to sequential data. This model design enables our algorithm to generate and reconstruct realistic time series simultaneously, which has advantages in diverse applications, including privacy preservation ([35]) and semantic representation extraction for downstream tasks ([10]). To assess the efficacy of our PCF-GAN, we conduct extensive numerical experiments on several standard time series benchmarking datasets for both generation and reconstruction tasks.

We summarize key contributions of this work below:

  • proposing a new metric for the distributions on the path space via PCF;

  • providing theoretical proofs for analytic properties of the proposed loss metric which benefit GAN training;

  • introducing a novel PCF-GAN to generate &\&& reconstruct time series simultaneously; and

  • reporting substantial empirical results validating the out-performance of our approach, compared with several state-of-the-art GANs with different loss functions on various time series generation and reconstruction tasks.

Related work. Given the wide practical use of, and challenges for, realistic time series synthesis ([3, 4]), various approaches are proposed to improve the quality of GANs for synthetic time series generation. Several works, e.g., [43, 45, 36], are devoted to improving the discriminator of GANs to be better suited to distributions induced by time series. Among them, COT-GAN in [43] shares a similar philosophy with PCF-GAN by introducing a novel discriminator based on causal optimal transport (which can be seen as an improved variant of the Sinkhorn divergence tailored to sequential data), while TimeGAN ([45]) shares a similar auto-encoder structure, which improves the generator’s quality and enables time series reconstruction. Unlike PCF-GAN, the reconstruction and generation modules of TimeGAN are separated, whereas it has additional stepwise supervised loss and discriminative loss. In a different vein, CEGEN[36], GT-GAN [17], COSCI-GAN [39], and EWGAN[37] focus primarily on the design of network framework and generator architecture, which achieve state-of-the-art results on several benchmarking datasets.

2 Preliminaries

The characteristic function of a measure on dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, namely that the Fourier transform, plays a central role in probability theory and analysis. The path characteristic function (PCF) is a natural extension of the characteristic function to the path space.

2.1 Characteristic function distance (CFD) between random variables in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

Let X𝑋Xitalic_X be an dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT-valued random variable with the law μ=X1𝜇superscript𝑋1\mu=\mathbb{P}\circ X^{-1}italic_μ = blackboard_P ∘ italic_X start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. The characteristic function of X𝑋Xitalic_X, denoted as ΦX:d:subscriptΦ𝑋superscript𝑑\Phi_{X}:\mathbb{R}^{d}\rightarrow\mathbb{C}roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_C, maps each λd𝜆superscript𝑑\lambda\in\mathbb{R}^{d}italic_λ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT to the expectation of its complex unitary transform: ΦX:λ𝔼Xμ[eiλ,X]:subscriptΦ𝑋𝜆subscript𝔼similar-to𝑋𝜇delimited-[]superscript𝑒𝑖𝜆𝑋\Phi_{X}:\lambda\longmapsto\mathbb{E}_{X\sim\mu}\left[e^{i\langle\lambda,X% \rangle}\right]roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT : italic_λ ⟼ blackboard_E start_POSTSUBSCRIPT italic_X ∼ italic_μ end_POSTSUBSCRIPT [ italic_e start_POSTSUPERSCRIPT italic_i ⟨ italic_λ , italic_X ⟩ end_POSTSUPERSCRIPT ]. Here Uλ:d,xeiλ,x:subscript𝑈𝜆formulae-sequencesuperscript𝑑maps-to𝑥superscript𝑒𝑖𝜆𝑥U_{\lambda}:\mathbb{R}^{d}\rightarrow\mathbb{C},x\mapsto e^{i\langle\lambda,x\rangle}italic_U start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_C , italic_x ↦ italic_e start_POSTSUPERSCRIPT italic_i ⟨ italic_λ , italic_x ⟩ end_POSTSUPERSCRIPT is the solution to the linear controlled differential equation:

dUλ(x)=iUλ(x)λ,dx,Uλ(𝟎)=1,formulae-sequencedsubscript𝑈𝜆𝑥𝑖subscript𝑈𝜆𝑥𝜆d𝑥subscript𝑈𝜆01\displaystyle{\rm d}U_{\lambda}(x)=iU_{\lambda}(x)\langle\lambda,{\rm d}x% \rangle,\qquad U_{\lambda}(\mathbf{0})=1,roman_d italic_U start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) = italic_i italic_U start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) ⟨ italic_λ , roman_d italic_x ⟩ , italic_U start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( bold_0 ) = 1 , (1)

where 𝟎0\mathbf{0}bold_0 is the zero vector in dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ is the Euclidean inner product on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

References [11, 16] studied the squared characteristic function distance (CFD) between two dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT-valued random variables X𝑋Xitalic_X and Y𝑌Yitalic_Y with respect to another probability distribution 𝚲𝚲\boldsymbol{\Lambda}bold_Λ on dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT:

CFD𝚲2(X,Y)=𝔼Z𝚲[|ΦX(Z)ΦY(Z)|2].subscriptsuperscriptCFD2𝚲𝑋𝑌subscript𝔼similar-to𝑍𝚲delimited-[]superscriptsubscriptΦ𝑋𝑍subscriptΦ𝑌𝑍2\displaystyle\text{CFD}^{2}_{\boldsymbol{\Lambda}}(X,Y)=\mathbb{E}_{Z\sim% \boldsymbol{\Lambda}}\left[\big{|}\Phi_{X}(Z)-\Phi_{Y}(Z)\big{|}^{2}\right].CFD start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_Λ end_POSTSUBSCRIPT ( italic_X , italic_Y ) = blackboard_E start_POSTSUBSCRIPT italic_Z ∼ bold_Λ end_POSTSUBSCRIPT [ | roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_Z ) - roman_Φ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_Z ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] . (2)

It is proved in [25, 1] that if the support of ΛΛ\Lambdaroman_Λ is dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, then CFD𝚲subscriptCFD𝚲{\rm CFD}_{\boldsymbol{\Lambda}}roman_CFD start_POSTSUBSCRIPT bold_Λ end_POSTSUBSCRIPT is a distance metric, so that CFD𝚲2(X,Y)=0subscriptsuperscriptCFD2𝚲𝑋𝑌0\text{CFD}^{2}_{\boldsymbol{\Lambda}}(X,Y)=0CFD start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_Λ end_POSTSUBSCRIPT ( italic_X , italic_Y ) = 0 if and only if X𝑋Xitalic_X and Y𝑌Yitalic_Y have the same distribution. This justifies the usage of CFD𝚲2subscriptsuperscriptCFD2𝚲\text{CFD}^{2}_{\boldsymbol{\Lambda}}CFD start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_Λ end_POSTSUBSCRIPT as a discriminator for GAN training to learn finite-dimensional random variables from data.

2.2 Unitary feature of a path

Let BV([0,T];d)BV0𝑇superscript𝑑{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) be the space of dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT-valued paths of bounded variation over [0,T]0𝑇[0,T][ 0 , italic_T ]. Consider

𝒳:={𝐱¯:[0,T]d+1:𝐱¯(t)=(t,𝐱(t)) for t[0,T];𝐱BV([0,T];d);𝐱(0)=0}.\displaystyle\mathcal{X}:=\left\{\bar{\mathbf{x}}:[0,T]\rightarrow\mathbb{R}^{% d+1}:\bar{\mathbf{x}}(t)=(t,{\bf x}(t))\text{ for }t\in[0,T];\,\mathbf{x}\in{% \rm BV}\left([0,T];{\mathbb{R}}^{d}\right);\,{\bf x}(0)=0\right\}.caligraphic_X := { over¯ start_ARG bold_x end_ARG : [ 0 , italic_T ] → blackboard_R start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT : over¯ start_ARG bold_x end_ARG ( italic_t ) = ( italic_t , bold_x ( italic_t ) ) for italic_t ∈ [ 0 , italic_T ] ; bold_x ∈ roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; bold_x ( 0 ) = 0 } . (3)

For a discrete time series x=(ti,xi)i=0N𝑥superscriptsubscriptsubscript𝑡𝑖subscript𝑥𝑖𝑖0𝑁x=(t_{i},x_{i})_{i=0}^{N}italic_x = ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, where 0=t0<t1<<tN=T0subscript𝑡0subscript𝑡1subscript𝑡𝑁𝑇0=t_{0}<t_{1}<\cdots<t_{N}=T0 = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < ⋯ < italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = italic_T and xidsubscript𝑥𝑖superscript𝑑x_{i}\in\mathbb{R}^{d}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (i{0,,N}𝑖0𝑁i\in\{0,\cdots,N\}italic_i ∈ { 0 , ⋯ , italic_N }), we can embed it into some 𝐱𝒳𝐱𝒳{\bf x}\in\mathcal{X}bold_x ∈ caligraphic_X whose evaluation at (ti)i=1Nsuperscriptsubscriptsubscript𝑡𝑖𝑖1𝑁(t_{i})_{i=1}^{N}( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT coincides with x𝑥xitalic_x. This is well suited for sequence-valued data in the high-frequency limit with finer time-discretisation and is often robust in practice ([27, 26]). Such embeddings are not unique. In this work, we adopt the linear interpolation for embedding, following [23, 18, 32].

Let m×m:={m×m complex matrices}assignsuperscript𝑚𝑚𝑚𝑚 complex matrices\mathbb{C}^{m\times m}:=\left\{m\times m\text{ complex matrices}\right\}blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT := { italic_m × italic_m complex matrices }, Imsubscript𝐼𝑚I_{m}italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT be the identity matrix, and *** be conjugate transpose. Write U(m)𝑈𝑚U(m)italic_U ( italic_m ) and 𝔲(m)𝔲𝑚\mathfrak{u}(m)fraktur_u ( italic_m ) for the Lie group of m×m𝑚𝑚m\times mitalic_m × italic_m unitary matrices and its Lie algebra, resp.:

U(m)={Am×m:A*A=Im},𝔲(m):={Am×m:A*+A=0}.formulae-sequence𝑈𝑚conditional-set𝐴superscript𝑚𝑚superscript𝐴𝐴subscript𝐼𝑚assign𝔲𝑚conditional-set𝐴superscript𝑚𝑚superscript𝐴𝐴0\displaystyle U(m)=\{A\in\mathbb{C}^{m\times m}:A^{*}A=I_{m}\},\qquad\mathfrak% {u}(m):=\{A\in\mathbb{C}^{m\times m}:A^{*}+A=0\}.italic_U ( italic_m ) = { italic_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT : italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_A = italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } , fraktur_u ( italic_m ) := { italic_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT : italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_A = 0 } .
Definition 2.1.

Let 𝐱BV([0,T];d)𝐱BV0𝑇superscript𝑑\mathbf{x}\in{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)bold_x ∈ roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) be a continuous path and M:d𝔲(m):𝑀superscript𝑑𝔲𝑚M:\mathbb{R}^{d}\rightarrow\mathfrak{u}(m)italic_M : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → fraktur_u ( italic_m ) be a linear map. The unitary feature of 𝐱𝐱\mathbf{x}bold_x under M𝑀Mitalic_M is the solution 𝐲:[0,T]U(m):𝐲0𝑇𝑈𝑚{\bf y}:[0,T]\to U(m)bold_y : [ 0 , italic_T ] → italic_U ( italic_m ) to the following equation:

d𝐲t=𝐲tM(d𝐱t),𝐲0=Im.formulae-sequencedsubscript𝐲𝑡subscript𝐲𝑡𝑀dsubscript𝐱𝑡subscript𝐲0subscript𝐼𝑚\displaystyle{\rm d}\mathbf{y}_{t}=\mathbf{y}_{t}\cdot M({\rm d}\mathbf{x}_{t}% ),\qquad\mathbf{y}_{0}=I_{m}.roman_d bold_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_M ( roman_d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , bold_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT . (4)

We write 𝒰M(𝐱):=𝐲Tassignsubscript𝒰𝑀𝐱subscript𝐲𝑇\mathcal{U}_{M}(\mathbf{x}):={\bf y}_{T}caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ) := bold_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, i.e., the endpoint of the solution path.

By a slight abuse of notations, 𝒰M(𝐱)subscript𝒰𝑀𝐱\mathcal{U}_{M}(\mathbf{x})caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ) is also called the unitary feature of 𝐱𝐱{\bf x}bold_x under M𝑀Mitalic_M. Unitary feature is a special case of the Cartan/path development, for which one may consider paths taking values in any Lie group G𝐺Gitalic_G. We take only G=U(m)𝐺𝑈𝑚G=U(m)italic_G = italic_U ( italic_m ) here; md𝑚𝑑m\neq ditalic_m ≠ italic_d in general ([6, 30]).

Example 2.2.

For M(d,𝔲(m))𝑀superscript𝑑𝔲𝑚M\in{\mathcal{L}}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) and 𝐱BV([0,T];d)𝐱normal-BV0𝑇superscript𝑑\mathbf{x}\in{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)bold_x ∈ roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) linear, 𝒰M(X)=eM(𝐱T𝐱0).subscript𝒰𝑀𝑋superscript𝑒𝑀subscript𝐱𝑇subscript𝐱0\mathcal{U}_{M}(X)=e^{M(\mathbf{x}_{T}-\mathbf{x}_{0})}.caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X ) = italic_e start_POSTSUPERSCRIPT italic_M ( bold_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT . In particular, when m=1𝑚1m=1italic_m = 1, 𝔲(1)𝔲1\mathfrak{u}(1)fraktur_u ( 1 ) is reduced to i𝑖i{\mathbb{R}}italic_i blackboard_R and M(y)=iλM,y𝑀𝑦𝑖subscript𝜆𝑀𝑦M(y)=i\left\langle\lambda_{M},y\right\rangleitalic_M ( italic_y ) = italic_i ⟨ italic_λ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT , italic_y ⟩ for some λMdsubscript𝜆𝑀superscript𝑑\lambda_{M}\in\mathbb{R}^{d}italic_λ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Motivated by the universality and characteristic property of unitary features ([7], see Section A.3), we constructed a unitary layer which transforms any d𝑑ditalic_d-dimensional time series x=(x0,,xN)𝑥subscript𝑥0subscript𝑥𝑁x=(x_{0},\cdots,x_{N})italic_x = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) to the unitary feature of its piecewise linear interpolation 𝐗𝐗{\bf X}bold_X. It is a special case of the path development layer [26], when Lie algebra is chosen as 𝔲(m)𝔲𝑚\mathfrak{u}(m)fraktur_u ( italic_m ). In fact, the explicit formula holds: 𝒰M(𝐗)=i=1N+1exp(M(Δxi))subscript𝒰𝑀𝐗superscriptsubscriptproduct𝑖1𝑁1𝑀Δsubscript𝑥𝑖\mathcal{U}_{M}({\bf X})=\prod_{i=1}^{N+1}\exp\left(M(\Delta x_{i})\right)caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N + 1 end_POSTSUPERSCRIPT roman_exp ( italic_M ( roman_Δ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ), where Δxi:=xixi1assignΔsubscript𝑥𝑖subscript𝑥𝑖subscript𝑥𝑖1\Delta x_{i}:=x_{i}-x_{i-1}roman_Δ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT and exp\exproman_exp is the matrix exponential.

Convention 2.3.

The space (d,𝔲(m))superscript𝑑𝔲𝑚\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) in which M𝑀Mitalic_M of Eq. (4) resides is isomorphic to 𝔲(m)d𝔲superscript𝑚𝑑\mathfrak{u}(m)^{d}fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, where 𝔲(m)𝔲𝑚\mathfrak{u}(m)fraktur_u ( italic_m ) is Lie algebra isomorphic to m(m1)2superscript𝑚𝑚12\mathbb{R}^{\frac{m(m-1)}{2}}blackboard_R start_POSTSUPERSCRIPT divide start_ARG italic_m ( italic_m - 1 ) end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT. For each θ𝔲(m)d𝜃𝔲superscript𝑚𝑑\theta\in\mathfrak{u}(m)^{d}italic_θ ∈ fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT given by anti-Hermitian matrices {θ(i)}i=1dsuperscriptsubscriptsuperscript𝜃𝑖𝑖1𝑑\left\{\theta^{(i)}\right\}_{i=1}^{d}{ italic_θ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, a linear map M𝑀Mitalic_M is uniquely induced: M(x)=i=1dθ(i)x,ei,xdformulae-sequence𝑀𝑥superscriptsubscript𝑖1𝑑superscript𝜃𝑖𝑥subscript𝑒𝑖for-all𝑥superscript𝑑M(x)=\sum_{i=1}^{d}\theta^{(i)}\left\langle x,e_{i}\right\rangle,\forall x\in{% \mathbb{R}}^{d}italic_M ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ⟨ italic_x , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ , ∀ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

3 Path characteristic function loss

3.1 Path characteristic function (PCF)

The unitary feature of a path 𝐱𝒳𝐱𝒳\mathbf{x}\in\mathcal{X}bold_x ∈ caligraphic_X plays a role similar to that played by eix,λsuperscript𝑒𝑖𝑥𝜆e^{i\langle x,\lambda\rangle}italic_e start_POSTSUPERSCRIPT italic_i ⟨ italic_x , italic_λ ⟩ end_POSTSUPERSCRIPT to an dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT-valued random variable. Thus, for a random path 𝐗𝐗\mathbf{X}bold_X, the expected unitary feature can be viewed as the characteristic function for measures on the path space ([7]).

Definition 3.1.

Let 𝐗𝐗\mathbf{X}bold_X be an 𝒳𝒳\mathcal{X}caligraphic_X-valued random variable and 𝐗subscript𝐗\mathbb{P}_{\mathbf{X}}blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT be its measure. The path characteristic function (PCF) of 𝐗𝐗\mathbf{X}bold_X of order m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N is the map 𝚽𝐗(m):(d,𝔲(m))m×m:subscriptsuperscript𝚽𝑚𝐗superscript𝑑𝔲𝑚superscript𝑚𝑚\mathbf{\Phi}^{(m)}_{\mathbf{X}}:{\mathcal{L}}\left(\mathbb{R}^{d},\mathfrak{u% }(m)\right)\to\mathbb{C}^{m\times m}bold_Φ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT : caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) → blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT given by

𝚽𝐗(M):=𝔼[𝒰M(𝐗)]=𝒳𝒰M(𝐱)d𝐗(𝐱).assignsubscript𝚽𝐗𝑀𝔼delimited-[]subscript𝒰𝑀𝐗subscript𝒳subscript𝒰𝑀𝐱differential-dsubscript𝐗𝐱\displaystyle\mathbf{\Phi}_{\mathbf{X}}(M):=\mathbb{E}[\mathcal{U}_{M}(\mathbf% {X})]=\int_{\mathcal{X}}\mathcal{U}_{M}(\mathbf{x})\,{\rm d}\mathbb{P}_{% \mathbf{X}}(\mathbf{x}).bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) := blackboard_E [ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X ) ] = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ) roman_d blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( bold_x ) .

The path characteristic function (PCF) 𝚽𝐗:m=0(d,𝔲(m))m=0m×m:subscript𝚽𝐗superscriptsubscriptdirect-sum𝑚0superscript𝑑𝔲𝑚superscriptsubscriptdirect-sum𝑚0superscript𝑚𝑚\mathbf{\Phi}_{\mathbf{X}}:\bigoplus_{m=0}^{\infty}{\mathcal{L}}\left(\mathbb{% R}^{d},\mathfrak{u}(m)\right)\to\bigoplus_{m=0}^{\infty}\mathbb{C}^{m\times m}bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT : ⨁ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) → ⨁ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT is defined by the natural grading: 𝚽𝐗|(d,𝔲(m))=𝚽𝐗(m)evaluated-atsubscript𝚽𝐗superscript𝑑𝔲𝑚superscriptsubscript𝚽𝐗𝑚{\bf\Phi}_{\mathbf{X}}\big{|}_{{\mathcal{L}}\left(\mathbb{R}^{d},\mathfrak{u}(% m)\right)}={\bf\Phi}_{\mathbf{X}}^{(m)}bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT | start_POSTSUBSCRIPT caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) end_POSTSUBSCRIPT = bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT for each m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N.

In the above, 𝒰M(𝐱)U(m)subscript𝒰𝑀𝐱𝑈𝑚\mathcal{U}_{M}(\mathbf{x})\in U(m)caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ) ∈ italic_U ( italic_m ) is the unitary feature of the path 𝐱𝐱\mathbf{x}bold_x under M𝑀Mitalic_M. See Definition 2.1.

Similarly to the characteristic function of dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT-valued random variables, the PCF always exists. Moreover, we have the following important result, whose proof is presented in Appendix A.

Theorem 3.2 (Characteristicity).

Let 𝐗𝐗\mathbf{X}bold_X and 𝐘𝐘\mathbf{Y}bold_Y be 𝒳𝒳\mathcal{X}caligraphic_X-valued random variables. They have the same distribution (denoted as 𝐗=𝑑𝐘superscript𝑑𝐗𝐘\mathbf{X}\stackrel{{\scriptstyle\text{d}}}{{=}}\mathbf{Y}bold_X start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG d end_ARG end_RELOP bold_Y) if and only if 𝚽𝐗=𝚽𝐘subscript𝚽𝐗subscript𝚽𝐘\mathbf{\Phi}_{\mathbf{X}}=\mathbf{\Phi}_{\mathbf{Y}}bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT = bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT.

3.2 A new distance measure via PCF

We now introduce a novel and natural distance metric, which measures the discrepancy between distributions on the path space via comparing their PCFs. Throughout, dHSsubscript𝑑HSd_{\rm HS}italic_d start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT denotes the metric associated with the Hilbert–Schmidt norm HS\|\bullet\|_{\rm HS}∥ ∙ ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT on m×msuperscript𝑚𝑚\mathbb{C}^{m\times m}blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT:

dHS(A,B):=ABHS2=tr[(AB)(AB)*].assignsubscript𝑑HS𝐴𝐵subscriptsuperscriptdelimited-∥∥𝐴𝐵2HStrdelimited-[]𝐴𝐵superscript𝐴𝐵d_{{\rm HS}}(A,B):=\sqrt{\left\lVert A-B\right\rVert^{2}_{\rm HS}}=\sqrt{{\rm tr% }\,\left[(A-B)(A-B)^{*}\right]}.italic_d start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ( italic_A , italic_B ) := square-root start_ARG ∥ italic_A - italic_B ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT end_ARG = square-root start_ARG roman_tr [ ( italic_A - italic_B ) ( italic_A - italic_B ) start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ] end_ARG .
Definition 3.3.

Let 𝐗,𝐘:[0,T]d:𝐗𝐘0𝑇superscript𝑑\mathbf{X},\mathbf{Y}:[0,T]\to{\mathbb{R}}^{d}bold_X , bold_Y : [ 0 , italic_T ] → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be stochastic processes and subscript\mathbb{P}_{\mathcal{M}}blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT be a probability distribution on 𝔲(m)d:=(d,𝔲(m))assign𝔲superscript𝑚𝑑superscript𝑑𝔲𝑚\mathfrak{u}(m)^{d}:=\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT := caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) (recall Convention 2.3). Define the squared PCF-based distance (PCFD) between 𝐗𝐗\mathbf{X}bold_X and 𝐘𝐘\mathbf{Y}bold_Y with respect to subscript\mathbb{P}_{\mathcal{M}}blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT as

PCFD2(𝐗,𝐘)=𝔼M[dHS2(𝚽𝐗(M),𝚽𝐘(M))].subscriptsuperscriptPCFD2𝐗𝐘subscript𝔼similar-to𝑀subscriptdelimited-[]superscriptsubscript𝑑HS2subscript𝚽𝐗𝑀subscript𝚽𝐘𝑀{\rm PCFD}^{2}_{\mathcal{M}}(\mathbf{X},\mathbf{Y})=\mathbb{E}_{M\sim\mathbb{P% }_{\mathcal{M}}}\left[d_{\text{HS}}^{2}\big{(}\mathbf{\Phi}_{\mathbf{X}}(M),% \mathbf{\Phi}_{\mathbf{Y}}(M)\big{)}\right].roman_PCFD start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_X , bold_Y ) = blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_d start_POSTSUBSCRIPT HS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) , bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ) ] . (5)

We shall not distinguish between {\mathcal{M}}caligraphic_M and subscript\mathbb{P}_{\mathcal{M}}blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT for simplicity.

PCFD exhibits several mathematical properties, which provide the theoretical justification for its efficacy as the discriminator on the space of measures on the path space, leading to empirical performance boost. First, PCFD has the characteristic property.

Lemma 3.4 (Separation of points).

Let 𝐗,𝐘𝒫(𝒳)𝐗𝐘𝒫𝒳\mathbf{X},\mathbf{Y}\in\mathcal{P}(\mathcal{X})bold_X , bold_Y ∈ caligraphic_P ( caligraphic_X ) and 𝐗𝐘𝐗𝐘\mathbf{X}\neq\mathbf{Y}bold_X ≠ bold_Y. Then there exists m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N, such that if \mathcal{M}caligraphic_M is a 𝔲(m)d𝔲superscript𝑚𝑑\mathfrak{u}(m)^{d}fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT-valued random variable with full support, then 𝑃𝐶𝐹𝐷(𝐗,𝐘)0subscript𝑃𝐶𝐹𝐷𝐗𝐘0\text{PCFD}_{\mathcal{M}}(\mathbf{X},\mathbf{Y})\neq 0PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_X , bold_Y ) ≠ 0.

Furthermore, PCFDsubscriptPCFD{\rm PCFD}_{\mathcal{M}}roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT has a simple uniform upper bound for any fixed m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N:

Lemma 3.5.

Let \mathcal{M}caligraphic_M be a 𝔲(m)d𝔲superscript𝑚𝑑\mathfrak{u}(m)^{d}fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT-valued random variable. Then, for any BV([0,T];d)normal-BV0𝑇superscript𝑑{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT )-valued random variables 𝐗𝐗\mathbf{X}bold_X and 𝐘𝐘\mathbf{Y}bold_Y, it holds that PCFD2(𝐗,𝐘)2m2.subscriptsuperscriptnormal-PCFD2𝐗𝐘2superscript𝑚2{\rm PCFD}^{2}_{\mathcal{M}}(\mathbf{X},\mathbf{Y})\leq 2m^{2}.roman_PCFD start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_X , bold_Y ) ≤ 2 italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Under mild conditions, PCFDPCFD{\rm PCFD}roman_PCFD is a.e. differentiable with respect to a continuous parameter, thus ensuring the feasibility of gradient descent in training.

Theorem 3.6 (Lipschitz dependence on continuous parameter).

Let 𝒳𝒳\mathcal{X}caligraphic_X and 𝒵𝒵\mathcal{Z}caligraphic_Z be subsets of BV([0,T];d)normal-BV0𝑇superscript𝑑{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ), (Θ,ρ)normal-Θ𝜌\left(\Theta,\rho\right)( roman_Θ , italic_ρ ) be a metric space, \mathbb{Q}blackboard_Q be a Borel probability measure on 𝒵𝒵\mathcal{Z}caligraphic_Z, and {\mathcal{M}}caligraphic_M be a Borel probability measure on 𝔲(m)d𝔲superscript𝑚𝑑\mathfrak{u}(m)^{d}fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Assume that g:Θ×𝒵𝒳normal-:𝑔normal-→normal-Θ𝒵𝒳g:\Theta\times\mathcal{Z}\to\mathcal{X}italic_g : roman_Θ × caligraphic_Z → caligraphic_X, (θ,𝐙)gθ(𝐙)maps-to𝜃𝐙subscript𝑔𝜃𝐙(\theta,\mathbf{Z})\mapsto g_{\theta}(\mathbf{Z})( italic_θ , bold_Z ) ↦ italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) is Lipschitz in θ𝜃\thetaitalic_θ such that Tot.Var.[gθ(𝐙)gθ(𝐙)]ω(𝐙)ρ(θ,θ)formulae-sequencenormal-Totnormal-Vardelimited-[]subscript𝑔𝜃𝐙subscript𝑔superscript𝜃normal-′𝐙𝜔𝐙𝜌𝜃superscript𝜃normal-′{\rm Tot.Var.}\left[g_{\theta}(\mathbf{Z})-g_{\theta^{\prime}}(\mathbf{Z})% \right]\leq\omega(\mathbf{Z})\rho\left(\theta,\theta^{\prime}\right)roman_Tot . roman_Var . [ italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) - italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) ] ≤ italic_ω ( bold_Z ) italic_ρ ( italic_θ , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). In addition, suppose that 𝔼M[|M|2]<subscript𝔼similar-to𝑀subscriptdelimited-[]superscriptdelimited-|‖delimited-‖|𝑀2\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\left[|\|M|\|^{2}\right]<\inftyblackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ | ∥ italic_M | ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] < ∞ and 𝔼𝐙[ω(𝐙)]<subscript𝔼similar-to𝐙delimited-[]𝜔𝐙\mathbb{E}_{\mathbf{Z}\sim\mathbb{Q}}\left[\omega(\mathbf{Z})\right]<\inftyblackboard_E start_POSTSUBSCRIPT bold_Z ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_ω ( bold_Z ) ] < ∞. Then PCFD(gθ(𝐙),𝐗)subscriptnormal-PCFDsubscript𝑔𝜃𝐙𝐗{\rm PCFD}_{\mathcal{M}}\left(g_{\theta}(\mathbf{Z}),\mathbf{X}\right)roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) is Lipschitz in θ𝜃\thetaitalic_θ. Moreover, it holds that

|PCFD(gθ(𝐙),𝐗)PCFD(gθ(𝐙),𝐗)|𝔼M[|M|2]𝔼𝐙[ω(𝐙)]ρ(θ,θ)subscriptPCFDsubscript𝑔𝜃𝐙𝐗subscriptPCFDsubscript𝑔superscript𝜃𝐙𝐗subscript𝔼similar-to𝑀subscriptdelimited-[]superscriptdelimited-|‖delimited-‖|𝑀2subscript𝔼similar-to𝐙delimited-[]𝜔𝐙𝜌𝜃superscript𝜃\displaystyle\left|{\rm PCFD}_{\mathcal{M}}\left(g_{\theta}(\mathbf{Z}),% \mathbf{X}\right)-{\rm PCFD}_{\mathcal{M}}\left(g_{\theta^{{}^{\prime}}}(% \mathbf{Z}),\mathbf{X}\right)\right|\leq\sqrt{\mathbb{E}_{M\sim\mathbb{P}_{% \mathcal{M}}}\left[|\|M|\|^{2}\right]}\,\mathbb{E}_{\mathbf{Z}\sim\mathbb{Q}}% \left[\omega(\mathbf{Z})\right]\,\rho\left(\theta,\theta^{\prime}\right)| roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) - roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) | ≤ square-root start_ARG blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ | ∥ italic_M | ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG blackboard_E start_POSTSUBSCRIPT bold_Z ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_ω ( bold_Z ) ] italic_ρ ( italic_θ , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

for any θ,θΘ𝜃superscript𝜃normal-′normal-Θ\theta,\theta^{\prime}\in\Thetaitalic_θ , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Θ, 𝐙𝒵𝐙𝒵{\mathbf{Z}}\in\mathcal{Z}bold_Z ∈ caligraphic_Z, 𝐗𝒳𝐗𝒳\mathbf{X}\in\mathcal{X}bold_X ∈ caligraphic_X, and 𝒫(𝔲(m)d)𝒫𝔲superscript𝑚𝑑{\mathcal{M}}\in\mathcal{P}\left(\mathfrak{u}(m)^{d}\right)caligraphic_M ∈ caligraphic_P ( fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ).

Remark 3.7.

The parameter space (Θ,ρ)Θ𝜌\left(\Theta,\rho\right)( roman_Θ , italic_ρ ) is usually taken to be d¯superscript¯𝑑{\mathbb{R}}^{\bar{d}}blackboard_R start_POSTSUPERSCRIPT over¯ start_ARG italic_d end_ARG end_POSTSUPERSCRIPT for some d¯¯𝑑\bar{d}\in\mathbb{N}over¯ start_ARG italic_d end_ARG ∈ blackboard_N. In this case, by Rademacher’s theorem PCFD(gθ(𝐙),𝐗)subscriptPCFDsubscript𝑔𝜃𝐙𝐗{\rm PCFD}_{\mathcal{M}}\left(g_{\theta}(\mathbf{Z}),\mathbf{X}\right)roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) is a.e. differentiable in θ𝜃\thetaitalic_θ.

Similarly to metrics on measures over dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (cf. [2, 24]), we construct a metric based on PCFD, denoted as PCFD~~PCFD\widetilde{\rm PCFD}over~ start_ARG roman_PCFD end_ARG, on the space 𝒫(𝒳)𝒫𝒳\mathcal{P}(\mathcal{X})caligraphic_P ( caligraphic_X ) of Borel probability measures over the path space, and we prove that it metrises the weak-star topology on 𝒫(𝒳)𝒫𝒳\mathcal{P}(\mathcal{X})caligraphic_P ( caligraphic_X ). Throughout, dsuperscriptd\stackrel{{\scriptstyle\text{d}}}{{\rightarrow}}start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG d end_ARG end_RELOP denotes the convergence in law.

Theorem 3.8 (Informal, convergence in law).

Let {𝐗n}nsubscriptsubscript𝐗𝑛𝑛\{\mathbf{X}_{n}\}_{n\in\mathbb{N}}{ bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT and 𝐗𝐗\mathbf{X}bold_X be 𝒳𝒳\mathcal{X}caligraphic_X-valued random variables with measures supported in a compact subset of 𝒳𝒳\mathcal{X}caligraphic_X. Then PCFD~(𝐗n,𝐗)0𝐗n𝑑𝐗iffnormal-→normal-~normal-PCFDsubscript𝐗𝑛𝐗0superscriptnormal-→𝑑subscript𝐗𝑛𝐗\widetilde{\rm PCFD}(\mathbf{X}_{n},\mathbf{X})\rightarrow 0\iff\mathbf{X}_{n}% \stackrel{{\scriptstyle\text{d}}}{{\rightarrow}}\mathbf{X}over~ start_ARG roman_PCFD end_ARG ( bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_X ) → 0 ⇔ bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG d end_ARG end_RELOP bold_X.

The formal statement and proof can be found in Lemma B.2 and Theorem B.8 in the Appendix.

Similar to [40] for dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we prove that PCFD can be interpreted as an MMD with a specific kernel κ𝜅\kappaitalic_κ (see Appendix B.3). Example B.12 illustrates that the PCFD has the superior test power for hypothesis testing on stochastic processes compared with CF distance on the flattened time series.

3.3 Computing PCFD under empirical measures

Now, we shall illustrate how to compute the PCFD on the path space.

Let 𝐗¯:={𝐱i}i=1nassign¯𝐗superscriptsubscriptsuperscript𝐱𝑖𝑖1𝑛\bar{\mathbf{X}}:=\{\mathbf{x}^{i}\}_{i=1}^{n}over¯ start_ARG bold_X end_ARG := { bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝐘¯:={𝐲i}i=1nassign¯𝐘superscriptsubscriptsuperscript𝐲𝑖𝑖1superscript𝑛\bar{\mathbf{Y}}:=\{\mathbf{y}^{i}\}_{i=1}^{n^{\prime}}over¯ start_ARG bold_Y end_ARG := { bold_y start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT be i.i.d. drawn respectively from 𝒳𝒳\mathcal{X}caligraphic_X-valued random variables 𝐗𝐗\mathbf{X}bold_X and 𝐘𝐘\mathbf{Y}bold_Y. First, for any linear map M𝔲(m)d𝑀𝔲superscript𝑚𝑑M\in\mathfrak{u}(m)^{d}italic_M ∈ fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, the empirical estimator of 𝚽𝐗(M)subscript𝚽𝐗𝑀\mathbf{\Phi}_{\mathbf{X}}(M)bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) is the average of unitary features of all observations 𝐗¯={𝐱i}i=1n¯𝐗superscriptsubscriptsubscript𝐱𝑖𝑖1𝑛\bar{\mathbf{X}}=\{\mathbf{x}_{i}\}_{i=1}^{n}over¯ start_ARG bold_X end_ARG = { bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, i.e., 𝚽𝐗¯(M)=1ni=1n𝒰M(𝐱i)subscript𝚽¯𝐗𝑀1𝑛superscriptsubscript𝑖1𝑛subscript𝒰𝑀subscript𝐱𝑖\mathbf{\Phi}_{\bar{\mathbf{X}}}(M)=\frac{1}{n}\sum_{i=1}^{n}\mathcal{U}_{M}(% \mathbf{x}_{i})bold_Φ start_POSTSUBSCRIPT over¯ start_ARG bold_X end_ARG end_POSTSUBSCRIPT ( italic_M ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). We then parameterise the 𝔲(m)d𝔲superscript𝑚𝑑\mathfrak{u}(m)^{d}fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT-valued random variable \mathcal{M}caligraphic_M via the empirical measure θMsubscriptsubscript𝜃𝑀\mathcal{M}_{\theta_{M}}caligraphic_M start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT, i.e., θM=i=1kδMisubscriptsubscript𝜃𝑀superscriptsubscript𝑖1𝑘subscript𝛿subscript𝑀𝑖\mathcal{M}_{\theta_{M}}=\sum_{i=1}^{k}\delta_{M_{i}}caligraphic_M start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, where θM:={Mi}i=1k𝔲(m)d×kassignsubscript𝜃𝑀superscriptsubscriptsubscript𝑀𝑖𝑖1𝑘𝔲superscript𝑚𝑑𝑘\theta_{M}:=\left\{M_{i}\right\}_{i=1}^{k}\in\mathfrak{u}(m)^{d\times k}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT := { italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d × italic_k end_POSTSUPERSCRIPT are the trainable model parameters. Finally, define the corresponding empirical path characteristic function distance (EPCFD) as

EPCFDθM(𝐗¯,𝐘¯)=1ki=1k𝚽𝐗¯(Mi)𝚽𝐘¯(Mi)HS2.{\rm EPCFD}_{\theta_{M}}\left(\bar{\mathbf{X}},\bar{\mathbf{Y}}\right)=\sqrt{% \frac{1}{k}\sum_{i=1}^{k}\left\lVert\mathbf{\Phi}_{\bar{\mathbf{X}}}(M_{i})-% \mathbf{\Phi}_{\bar{\mathbf{Y}}}(M_{i})\right\rVert_{\rm HS}^{2}.}roman_EPCFD start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over¯ start_ARG bold_X end_ARG , over¯ start_ARG bold_Y end_ARG ) = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ bold_Φ start_POSTSUBSCRIPT over¯ start_ARG bold_X end_ARG end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - bold_Φ start_POSTSUBSCRIPT over¯ start_ARG bold_Y end_ARG end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . end_ARG (6)
Refer to caption
Figure 1: Flowchart of calculating the PCF 𝚽𝐗(Mθ)subscript𝚽𝐗subscript𝑀𝜃\mathbf{\Phi}_{\mathbf{X}}(M_{\theta})bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ).

Our approach to approximating \mathcal{M}caligraphic_M via the empirical distribution differs from that in [25], where \mathcal{M}caligraphic_M is parameterised by mixture of Gaussian distributions. In §4.1 and §5, it is shown that, by optimising the empirical distribution, a moderately sized k𝑘kitalic_k is sufficient for achieving superior performance, in contrast to a larger sample size required by [25].

4 PCF-GAN for time series generation

4.1 Training of the EPCFD

In this subsection, we apply the EPCFD to GAN training for time series generation as the discriminator. We train the generator to minimise the EPCFD between true and synthetic data distribution, whereas the empirical distribution of \mathcal{M}caligraphic_M characterised by θM𝔲(m)d×ksubscript𝜃𝑀𝔲superscript𝑚𝑑𝑘\theta_{M}\in\mathfrak{u}(m)^{d\times k}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∈ fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d × italic_k end_POSTSUPERSCRIPT is optimised by maximising EPCFD.

By an abuse of notation, let 𝒳:=d×nTassign𝒳superscript𝑑subscript𝑛𝑇\mathcal{X}:={\mathbb{R}}^{d\times n_{T}}caligraphic_X := blackboard_R start_POSTSUPERSCRIPT italic_d × italic_n start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (𝒵:=e×nTassign𝒵superscript𝑒subscript𝑛𝑇\mathcal{Z}:={\mathbb{R}}^{e\times n_{T}}caligraphic_Z := blackboard_R start_POSTSUPERSCRIPT italic_e × italic_n start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, resp.) denote the data (noise, resp.) space, composed of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (esuperscript𝑒\mathbb{R}^{e}blackboard_R start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT, resp.) time series of length nTsubscript𝑛𝑇n_{T}italic_n start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. As discussed in §2.2, 𝒳𝒳\mathcal{X}caligraphic_X and 𝒵𝒵\mathcal{Z}caligraphic_Z can be viewed as path spaces via linear interpolation. Like the standard GANs, our model is comprised of a generator Gθg:𝒵d×nT:subscript𝐺subscript𝜃𝑔𝒵superscript𝑑subscript𝑛𝑇G_{\theta_{g}}:\mathcal{Z}\rightarrow{\mathbb{R}}^{d\times n_{T}}italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT : caligraphic_Z → blackboard_R start_POSTSUPERSCRIPT italic_d × italic_n start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and the discriminator EPCFDθM:(𝒳)×(𝒳)+:subscriptEPCFDsubscript𝜃𝑀𝒳𝒳superscript\text{EPCFD}_{\theta_{M}}:\mathbb{P}(\mathcal{X})\times\mathbb{P}(\mathcal{X})% \rightarrow\mathbb{R}^{+}EPCFD start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT : blackboard_P ( caligraphic_X ) × blackboard_P ( caligraphic_X ) → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, where θM𝔲(m)k×dsubscript𝜃𝑀𝔲superscript𝑚𝑘𝑑\theta_{M}\in\mathfrak{u}(m)^{k\times d}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∈ fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_k × italic_d end_POSTSUPERSCRIPT is the model parameter of the discriminator, which fully characterises the empirical measure of \mathcal{M}caligraphic_M. The pre-specified noise random variable 𝐙=(Zti)i=0nT1𝐙superscriptsubscriptsubscript𝑍subscript𝑡𝑖𝑖0subscript𝑛𝑇1\mathbf{Z}=(Z_{t_{i}})_{i=0}^{n_{T}-1}bold_Z = ( italic_Z start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT is the discretised Brownian motion on [0,1]01[0,1][ 0 , 1 ] with time mesh 1nT1subscript𝑛𝑇\frac{1}{n_{T}}divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG. The induced distribution of the fake data is given by Gθg(𝐙)subscript𝐺subscript𝜃𝑔𝐙G_{\theta_{g}}(\mathbf{Z})italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_Z ). Hence, the min-max objective of our basic version PCF-GAN is

minθgmaxθM EPCFDθM(Gθg(𝐙),𝐗).subscriptsubscript𝜃𝑔subscriptsubscript𝜃𝑀subscript EPCFDsubscript𝜃𝑀subscript𝐺subscript𝜃𝑔𝐙𝐗\displaystyle\min_{\theta_{g}}\max_{\theta_{M}}\text{ EPCFD}_{\theta_{M}}(G_{% \theta_{g}}(\mathbf{Z}),\mathbf{X}).roman_min start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT EPCFD start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) .

We apply mini-batch gradient descent to optimise the model parameters of the generator and discriminator in an alternative manner. In particular, to compute gradients of the discriminator parameter θMsubscript𝜃𝑀\theta_{M}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT, we use the efficient backpropagation algorithm through time introduced in [26], which effectively leverages the Lie group-valued outputs and the recurrence structure of the unitary feature. The initialisation of θMsubscript𝜃𝑀\theta_{M}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT for the optimisation is outlined in the Section B.4.1.

Learning time-dependent Ornstein–Uhlenbeck process

Following [19], we apply the proposed PCF-GAN to the toy example of learning the distribution of synthetic time series data simulated via the time-dependent Ornstein–Uhlenbeck (OU) process. Let (𝐗t)t[0,T]subscriptsubscript𝐗𝑡𝑡0𝑇(\mathbf{X}_{t})_{t\in[0,T]}( bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT be an \mathbb{R}blackboard_R-valued stochastic process described by the SDE, i.e., d𝐗t=(μtθ𝐗t)dt+σd𝐁𝐭with 𝐗𝟎𝒩(𝟎,𝟏),𝑑subscript𝐗𝑡𝜇𝑡𝜃subscript𝐗𝑡𝑑𝑡𝜎𝑑subscript𝐁𝐭with subscript𝐗0similar-to𝒩01d\mathbf{X}_{t}=\left(\mu t-\theta\mathbf{X}_{t}\right)dt+\sigma d\bf B_{t}% \text{with }X_{0}\sim\mathcal{N}(0,1),italic_d bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_μ italic_t - italic_θ bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_d italic_t + italic_σ italic_d bold_B start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT with bold_X start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_0 , bold_1 ) , where (𝐁𝐭)𝐭[𝟎,𝐓]subscriptsubscript𝐁𝐭𝐭0𝐓(\bf B_{t})_{t\in[0,T]}( bold_B start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_t ∈ [ bold_0 , bold_T ] end_POSTSUBSCRIPT is 1D Brownian motion and 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ) is the standard normal distribution. We set μ=0.01𝜇0.01\mu=0.01italic_μ = 0.01, θ=0.02𝜃0.02\theta=0.02italic_θ = 0.02, σ=0.4𝜎0.4\sigma=0.4italic_σ = 0.4 and time discretisation δt=0.1𝛿𝑡0.1\delta t=0.1italic_δ italic_t = 0.1. We generate 10000 samples from t=0𝑡0t=0italic_t = 0 to t=63𝑡63t=63italic_t = 63, down-sampled at each integer time point. Figure 2 shows that the synthetic data generated by our GAN model, which uses the EPCFD discriminator, is visually indistinguishable from true data. Also, our model accurately captures the marginal distribution at various time points.

Refer to caption
(a)
Refer to caption
(b)
Figure 2: Left: Sample paths generated from the time-dependent OU process and synthetic paths from PCF-GAN. Right: The marginal distribution comparison at t{10,20,30,40,50,60}𝑡102030405060t\in\{10,20,30,40,50,60\}italic_t ∈ { 10 , 20 , 30 , 40 , 50 , 60 }.

4.2 PCF-GAN: learning with PCFD and sequential embedding

In order to effectively learn the distribution of high-dimensional or complex time series, using solely the EPCF loss as the GAN discriminator fails to be the best approach, due to the computational limitations imposed by the sample size k𝑘kitalic_k and the order m𝑚mitalic_m of EPCFD. To overcome this issue, we adopt the approach [41, 25], and train a generator that matches the distribution of the embedding of time series via the auto-encoder structure. Figure 3 illustrates the mechanics of our model.

To proceed, let us first recall the generator Gθg:𝒵𝒳:subscript𝐺subscript𝜃𝑔𝒵𝒳G_{\theta_{g}}:\mathcal{Z}\rightarrow\mathcal{X}italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT : caligraphic_Z → caligraphic_X and introduce the embedding layer Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT, which maps 𝒳𝒳\mathcal{X}caligraphic_X to 𝒵𝒵\mathcal{Z}caligraphic_Z (the noise space). Here θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is the model parameters of the embedding layer and will be learned from data. To this end, it is natural to optimize the model parameters θgsubscript𝜃𝑔\theta_{g}italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT of the generator by minimising the generative loss Lgeneratorsubscript𝐿generatorL_{\text{generator}}italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT, which is the EPCFD distance of the embedding between true distribution 𝐗𝐗\mathbf{X}bold_X and synthetic distribution Gθg(𝐙)subscript𝐺subscript𝜃𝑔𝐙G_{\theta_{g}}(\mathbf{Z})italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_Z ); in formula,

Lgenerator(θg,θM,θf)=EPCFDθM(Fθf(Gθg(𝐙)),Fθf(𝐗))).\displaystyle L_{\text{generator}}(\theta_{g},\theta_{M},\theta_{f})=\text{% EPCFD}_{\theta_{M}}(F_{\theta_{f}}(G_{\theta_{g}}(\mathbf{Z})),F_{\theta_{f}}(% \mathbf{X}))).italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) = EPCFD start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_Z ) ) , italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_X ) ) ) . (7)
Refer to caption

Figure 3: Visualization of the PCF-GAN architecture

Encoder(Fθf)subscript𝐹subscript𝜃𝑓(F_{\theta_{f}})( italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT )-decoder(Gθg)subscript𝐺subscript𝜃𝑔(G_{\theta_{g}})( italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) structure: The motivation to consider the auto-encoder structure is based on the observation that the embedding might be degenerated when optimizing Lgeneratorsubscript𝐿generatorL_{\text{generator}}italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT. For example, no matter whether true and synthetic distribution agrees or not, Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT could be simply a constant function to achieve the perfect generator loss 0. Such a degeneracy can be prohibited if Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT is injective. In heuristic terms, the “good" embedding should capture essential information about real time series of 𝐗𝐗\mathbf{X}bold_X and allows the reconstruction of time series 𝐗𝐗\mathbf{X}bold_X from its embedding Fθf(𝐗)subscript𝐹subscript𝜃𝑓𝐗F_{\theta_{f}}(\mathbf{X})italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_X ). This motivates us to train the embedding Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT such that FθfGθgsubscript𝐹subscript𝜃𝑓subscript𝐺subscript𝜃𝑔F_{\theta_{f}}\circ G_{\theta_{g}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT is close to the identity map. If this condition is satisfied, it implies that Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT and Gθgsubscript𝐺subscript𝜃𝑔G_{\theta_{g}}italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT are pseudo-inverses of each other, thereby ensuring the desired injectivity. In this way, Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT and Gθgsubscript𝐺subscript𝜃𝑔G_{\theta_{g}}italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT serve as the encoder and decoder of raw data, respectively.

To impose the injectivity of Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we consider two additional loss functions for training θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT as follows:

Reconstruction loss L𝐫𝐞𝐜𝐨𝐯𝐞𝐫𝐲subscript𝐿𝐫𝐞𝐜𝐨𝐯𝐞𝐫𝐲L_{\text{recovery}}italic_L start_POSTSUBSCRIPT recovery end_POSTSUBSCRIPT: It is defined as the l2superscript𝑙2l^{2}italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT samplewise distance between the original and reconstructed noise by FθfGθgsubscript𝐹subscript𝜃𝑓subscript𝐺subscript𝜃𝑔F_{\theta_{f}}\circ G_{\theta_{g}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT, i.e., Lrecovery=𝔼[|ZFθf(Gθg(𝐙))|2]subscript𝐿recovery𝔼delimited-[]superscript𝑍subscript𝐹subscript𝜃𝑓subscript𝐺subscript𝜃𝑔𝐙2L_{\text{recovery}}=\mathbb{E}[|Z-F_{\theta_{f}}(G_{\theta_{g}}(\mathbf{Z}))|^% {2}]italic_L start_POSTSUBSCRIPT recovery end_POSTSUBSCRIPT = blackboard_E [ | italic_Z - italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_Z ) ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]. Note that Lrecovery=0subscript𝐿recovery0L_{\text{recovery}}=0italic_L start_POSTSUBSCRIPT recovery end_POSTSUBSCRIPT = 0 implies that Fθf(Gθg(𝐳))=𝐳subscript𝐹subscript𝜃𝑓subscript𝐺subscript𝜃𝑔𝐳𝐳F_{\theta_{f}}(G_{\theta_{g}}(\mathbf{z}))=\mathbf{z}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z ) ) = bold_z, for any sample 𝐳𝐳\mathbf{z}bold_z in the support of 𝐙𝐙\mathbf{Z}bold_Z almost surely.

Regularization loss L𝐫𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧subscript𝐿𝐫𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧L_{\text{regularization}}italic_L start_POSTSUBSCRIPT regularization end_POSTSUBSCRIPT: It is proposed to match the distribution of the original noise variable 𝐙𝐙\mathbf{Z}bold_Z and embedding of true distribution 𝐗𝐗\mathbf{X}bold_X. It is motivated by the observation that if the perfect generator Gθ(𝐙)=𝐗subscript𝐺𝜃𝐙𝐗G_{\theta}(\mathbf{Z})=\mathbf{X}italic_G start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) = bold_X and FθfGθgsubscript𝐹subscript𝜃𝑓subscript𝐺subscript𝜃𝑔F_{\theta_{f}}\circ G_{\theta_{g}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the identity map, then 𝐙=Fθf(𝐗)𝐙subscript𝐹subscript𝜃𝑓𝐗\mathbf{Z}=F_{\theta_{f}}(\mathbf{X})bold_Z = italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_X ). Specifically,

Lregularization=EPCFDθM(𝐙,Fθf(𝐗)),subscript𝐿regularizationsubscriptEPCFDsuperscriptsubscript𝜃𝑀𝐙subscript𝐹subscript𝜃𝑓𝐗L_{\text{regularization}}=\text{EPCFD}_{\theta_{M}^{\prime}}(\mathbf{Z},F_{% \theta_{f}}(\mathbf{X})),italic_L start_POSTSUBSCRIPT regularization end_POSTSUBSCRIPT = EPCFD start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z , italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_X ) ) , (8)

where we distinguish θMsuperscriptsubscript𝜃𝑀\theta_{M}^{\prime}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT from θMsubscript𝜃𝑀\theta_{M}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT in Lgeneratorsubscript𝐿generatorL_{\text{generator}}italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT. The regularization loss effectively stabilises the training and resolves the mode collapse [41] due to the lack of infectivity of the embedding.

Training the embedding parameters θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT: The embedding layer Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT aims to not only discriminate the real and fake data distributions as a critic, but also preserve injectivity. Hence we optimise the embedding parameter θfsubscript𝜃𝑓\theta_{f}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT by the following hybrid loss function:

maxθf(Lgeneratorλ1Lrecoveryλ2Lregularization),subscriptsubscript𝜃𝑓subscript𝐿generatorsubscript𝜆1subscript𝐿recoverysubscript𝜆2subscript𝐿regularization\max_{\theta_{f}}\left(L_{\text{generator}}-\lambda_{1}L_{\text{recovery}}-% \lambda_{2}L_{\text{regularization}}\right),roman_max start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT recovery end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT regularization end_POSTSUBSCRIPT ) , (9)

where λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are hyper-parameters that balance the three losses.

Training the EPCFD parameters (θM,θM)subscript𝜃𝑀superscriptsubscript𝜃𝑀normal-′(\theta_{M},\theta_{M}^{\prime})( italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ): Note that Lgeneratorsubscript𝐿generatorL_{\text{generator}}italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT and Lregularizationsubscript𝐿regularizationL_{\text{regularization}}italic_L start_POSTSUBSCRIPT regularization end_POSTSUBSCRIPT have trainable parameters of EPCFD, i.e., θMsubscript𝜃𝑀\theta_{M}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT and θMsuperscriptsubscript𝜃𝑀\theta_{M}^{\prime}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Similar to the basic PCF-GAN, we optimize θMsubscript𝜃𝑀\theta_{M}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT and θMsuperscriptsubscript𝜃𝑀\theta_{M}^{\prime}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by maximising the EPCFD to improve the discriminative power.

maxθMLgenerator,maxθMLregularizationsubscriptsubscript𝜃𝑀subscript𝐿generatorsubscriptsuperscriptsubscript𝜃𝑀subscript𝐿regularization\max_{\theta_{M}}L_{\text{generator}},\quad\max_{\theta_{M}^{\prime}}L_{\text{% regularization}}roman_max start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT , roman_max start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT regularization end_POSTSUBSCRIPT (10)

By doing so, we enhance the discriminative power of EPCFDθMsubscriptEPCFDsubscript𝜃𝑀\text{EPCFD}_{\theta_{M}}EPCFD start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT and EPCFDθMsubscriptEPCFDsuperscriptsubscript𝜃𝑀\text{EPCFD}_{\theta_{M}^{\prime}}EPCFD start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Consequently, this facilitates the training of the generator such that the embedding of the true data aligns with both the noise distribution and the reconstructed noise distribution.

Differentiability of EPCFD with respect to parameters of the embedding layer and generators are guaranteed by Theorem 3.6, as long as FθfGθgsubscript𝐹subscript𝜃𝑓subscript𝐺subscript𝜃𝑔F_{\theta_{f}}\circ G_{\theta_{g}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT satisfies the Lipschitz condition thereof. Let us also stress on two key advantages of our proposed PCF-GAN. First, it possesses the ability to generate synthetic time series with reconstruction functionality, thanks to the auto-encoder structure in PCF-GAN. Second, by virtue of the uniform boundedness of PCFD shown in Lemma 3.5, our PCF-GAN does not require any additional gradient constraints of the embedding layer and EPCFD parameters, in contrast to other MMD-based GANs and Wasserstein-GAN. It helps with the training efficiency and alleviates the vanishing gradient problem in training sequential networks like RNNs.

We provide the pseudo-code for the proposed PCF-GAN in Algorithm 1.

Algorithm 1 PCF-GAN.
1:Input: dsubscript𝑑\mathbb{P}_{d}blackboard_P start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT (real time series distribution), zsubscript𝑧\mathbb{P}_{z}blackboard_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT (noise distribution), θMsubscript𝜃𝑀\theta_{M}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT,θMsuperscriptsubscript𝜃𝑀\theta_{M}^{\prime}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT,θf,θgsubscript𝜃𝑓subscript𝜃𝑔\theta_{f},\theta_{g}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT(model parameters for EPCFD, critic F𝐹Fitalic_F and generator G𝐺Gitalic_G), λ1,λ2+subscript𝜆1subscript𝜆2superscript\lambda_{1},\lambda_{2}\in\mathbb{R}^{+}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT (penalty weights), b𝑏bitalic_b (batch size), η𝜂\eta\in\mathbb{R}italic_η ∈ blackboard_R (learning rate), ncsubscript𝑛𝑐n_{c}italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT the iteration number of discriminator per generator update, .
2:while θM,θM,θM,θc,θgsubscript𝜃𝑀superscriptsubscript𝜃𝑀subscript𝜃𝑀subscript𝜃𝑐subscript𝜃𝑔\theta_{M},\theta_{M}^{\prime},\theta_{M},\theta_{c},\theta_{g}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT not converge  do
3:     for i{1,,nc}𝑖1subscript𝑛𝑐i\in\{1,\dots,n_{c}\}italic_i ∈ { 1 , … , italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT } do
4:         # train the unitary linear maps in EPCFD
5:         Sample from distributions: Xd,Zzformulae-sequencesimilar-to𝑋subscript𝑑similar-to𝑍subscript𝑧X\sim\mathbb{P}_{d},Z\sim\mathbb{P}_{z}italic_X ∼ blackboard_P start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_Z ∼ blackboard_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT.
6:         Generator Loss: Lgenerator=EPCFDθM(Fθf(X),Fθf(Gθg(Z)))subscript𝐿𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑜𝑟subscriptEPCFDsubscript𝜃𝑀subscript𝐹subscript𝜃𝑓𝑋subscript𝐹subscript𝜃𝑓subscript𝐺subscript𝜃𝑔𝑍L_{generator}=\text{EPCFD}_{\theta_{M}}(F_{\theta_{f}}(X),F_{\theta_{f}}(G_{% \theta_{g}}(Z)))italic_L start_POSTSUBSCRIPT italic_g italic_e italic_n italic_e italic_r italic_a italic_t italic_o italic_r end_POSTSUBSCRIPT = EPCFD start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X ) , italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Z ) ) )
7:         Update: θMθM+ηθMLgeneratorsubscript𝜃𝑀subscript𝜃𝑀𝜂subscriptsubscript𝜃𝑀subscript𝐿generator\theta_{M}\leftarrow\theta_{M}+\eta\cdot\triangledown_{\theta_{M}}L_{\text{% generator}}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ← italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT + italic_η ⋅ ▽ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT
8:         Regularization Loss: Lregularization=EPCFDθM(Z,Fθf(X))subscript𝐿𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛subscriptEPCFDsuperscriptsubscript𝜃𝑀𝑍subscript𝐹subscript𝜃𝑓𝑋L_{regularization}=\text{EPCFD}_{\theta_{M}^{\prime}}(Z,F_{\theta_{f}}(X))italic_L start_POSTSUBSCRIPT italic_r italic_e italic_g italic_u italic_l italic_a italic_r italic_i italic_z italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT = EPCFD start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Z , italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X ) )
9:         Update: θMθM+ηθM(Lregularization)superscriptsubscript𝜃𝑀superscriptsubscript𝜃𝑀𝜂subscriptsuperscriptsubscript𝜃𝑀subscript𝐿regularization\theta_{M}^{\prime}\leftarrow\theta_{M}^{\prime}+\eta\cdot\triangledown_{% \theta_{M}^{\prime}}(L_{\text{regularization}})italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_η ⋅ ▽ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT regularization end_POSTSUBSCRIPT )
10:         # train the embedding
11:         Reconstruction Loss: Lrecovery=𝔼[|ZFθf(Gθg(Z))|2]subscript𝐿recovery𝔼delimited-[]superscript𝑍subscript𝐹subscript𝜃𝑓subscript𝐺subscript𝜃𝑔𝑍2L_{\text{recovery}}=\mathbb{E}[|Z-F_{\theta_{f}}(G_{\theta_{g}}(Z))|^{2}]italic_L start_POSTSUBSCRIPT recovery end_POSTSUBSCRIPT = blackboard_E [ | italic_Z - italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Z ) ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
12:         Loss on critic: Lc=Lgeneratorλ1Lrecoveryλ2Lregularizationsubscript𝐿𝑐subscript𝐿generatorsubscript𝜆1subscript𝐿recoverysubscript𝜆2subscript𝐿regularizationL_{c}=L_{\text{generator}}-\lambda_{1}\cdot L_{\text{recovery}}-\lambda_{2}% \cdot L_{\text{regularization}}italic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_L start_POSTSUBSCRIPT recovery end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_L start_POSTSUBSCRIPT regularization end_POSTSUBSCRIPT
13:         Update: θfθf+ηθcLcsubscript𝜃𝑓subscript𝜃𝑓𝜂subscriptsubscript𝜃𝑐subscript𝐿𝑐\theta_{f}\leftarrow\theta_{f}+\eta\cdot\triangledown_{\theta_{c}}L_{c}italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ← italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_η ⋅ ▽ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT
14:     end for
15:     # train the generator
16:     Sample from distributions: Xd,Zzformulae-sequencesimilar-to𝑋subscript𝑑similar-to𝑍subscript𝑧X\sim\mathbb{P}_{d},Z\sim\mathbb{P}_{z}italic_X ∼ blackboard_P start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_Z ∼ blackboard_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT.
17:     Generator Loss: Lgenerator=EPCFD(Fθf(X),Fθf(Gθg(Z)))subscript𝐿generatorsubscriptEPCFDsubscript𝐹subscript𝜃𝑓𝑋subscript𝐹subscript𝜃𝑓subscript𝐺subscript𝜃𝑔𝑍L_{\text{generator}}=\text{EPCFD}_{\mathcal{M}}(F_{\theta_{f}}(X),F_{\theta_{f% }}(G_{\theta_{g}}(Z)))italic_L start_POSTSUBSCRIPT generator end_POSTSUBSCRIPT = EPCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X ) , italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Z ) ) )
18:     Update: θgθgηθgLgsubscript𝜃𝑔subscript𝜃𝑔𝜂subscriptsubscript𝜃𝑔subscript𝐿𝑔\theta_{g}\leftarrow\theta_{g}-\eta\cdot\triangledown_{\theta_{g}}L_{g}italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ← italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT - italic_η ⋅ ▽ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT
19:end while

5 Numerical Experiments

To validate its efficacy, we apply our proposed PCF-GAN to a broad range of time series data and benchmark with state-of-the-art GANs for time series generation using various test metrics. Full details on numerics (dataset, evaluation metrics, and hyperparameter choices) are in Appendix C. Additional ablation studies and visualisations of generated samples are reported in Appendix D.

Baselines: We take Recurrent GAN (RGAN)[12], TimeGAN [45], and COT-GAN [43] as benchmarking models. These are representatives of GANs exhibiting strong empirical performance for time series generation. For fairness, we compare our model to the baselines while fixing the generators and embedding/discriminator to be the common sequential neural network (2 layers of LSTMs).

Dataset: We benchmark our model on four different time series datasets with various characteristics: dimensions, sample frequency, periodicity, noise level, and correlation. (1) Rough Volatility: High-frequency synthetic time series data with low noise-to-signal. (2) Stock: The daily historical data on ten publicly traded stocks from 2013 to 2021, including as features the volume and high, low, opening, closing, and adjusted closing prices. (3) Beijing Air Quality [47]: An UCI multivariate time series on hourly air pollutants data from different monitoring sites. (4) EEG Eye State [38]: An UCI dataset of a high frequency and continuous EEG eye measurement. We summarise the key statistics of the datasets in Table 1.

Table 1: Summuary statistics for four datasets
Dataset Dimension Length Sample rate Auto-cor (lag 1) Auto-cor (lag 5) Cross-cor
RV 2 200 - 0.967 0.916 -0.014
Stock 5 20 1day 0.958 0.922 0.604
Air 10 24 1hour 0.947 0.752 0.0487
EEG 14 20 8ms 0.517 0.457 0.418

Evaluation metrics: The following three metrics are used to assess the quality of generative models. For time series generation/reconstruction, we compare the true and fake/reconstructed distribution by GθgFθfsubscript𝐺subscript𝜃𝑔subscript𝐹subscript𝜃𝑓G_{\theta_{g}}\circ F_{\theta_{f}}italic_G start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT via the below test metrics. (1) Discriminative score [45]: We train a post-hoc classifier to distinguish true and fake data. We report the classification error on the test data. The better generative model yields a lower classification error, as it means that the classifier struggles to differentiate between true and fake data. (2) Predictive score [45, 12]: We train a post-hoc sequence-to-sequence regression model to predict the latter part of a time series given the first part from the generated data. We then evaluate and report the mean square error (MSE) on the true time series data. The lower MSE indicates better the generated data can be used to train a predictive model. (3) Sig-MMD [9, 42]: We use MMD with the signature feature as a generic metric on time series distribution. Smaller the values, indicating closer the distributions, are better. To compute three evaluation metrics, we randomly generated 10,000 samples of true and synthetic (reconstructed) distribution resp. The mean and standard deviation of each metric based on 10 repeated random sampling are reported.

5.1 Time series generation

Table 2 indicates that PCF-GAN consistently outperforms the other baselines across all datasets, as demonstrated by all three test metrics. Specifically, in terms of the discriminative score, PCF-GAN achieves a remarkable performance with values of 0.01080.01080.01080.0108 and 0.07840.07840.07840.0784 on the Rough volatility and Stock datasets, respectively. These values are 61%percent6161\%61 % and 39%percent3939\%39 % lower than those achieved by the second-best model. Regarding the predictive score, PCF-GAN achieves the best result across all four datasets. While COT-GAN surpasses PCF-GAN in terms of the Sig-MMD metric on the EEG dataset, PCF-GAN consistently outperforms the other models in the remaining three datasets. Additionally, to assess the fitting on auto-correlation, cross-correlation and marginal distribution, we include the corresponding numerical results in Table 4 in Appendix D.4. For a qualitative analysis of generative quality, we provide the visualizations of generated samples for all models and datasets in Appendix D without selective bias. Furthermore, to showcase the effectiveness of our auto-encoder architecture for the generation task, we present an ablation study in Appendix D.

Table 2: Performance comparison of PCF-GAN and baselines. Best for each task shown in bold.
Task Generation Reconstruction
Dataset Test Metrics RGAN COT-GAN TimeGAN PCF-GAN TimeGAN (R) PCF-GAN(R)
RV Discriminative .0271±plus-or-minus\pm±.048 .0499±plus-or-minus\pm±.068 .0327±plus-or-minus\pm±.019 .0108±plus-or-minus\pm±.006 .5000±plus-or-minus\pm±.000 .2820±plus-or-minus\pm±.082
Predictive .0393±plus-or-minus\pm±.000 .0395±plus-or-minus\pm±.000 .0395±plus-or-minus\pm±.001 .0390±plus-or-minus\pm±.000 .0590±plus-or-minus\pm±.003 .0398±plus-or-minus\pm±.001
Sig-MMD .0163±plus-or-minus\pm±.004 .0116±plus-or-minus\pm±.003 .0027±plus-or-minus\pm±.004 .0024±plus-or-minus\pm±.001 3.308±plus-or-minus\pm±1.34 .0960±plus-or-minus\pm±.050
Stock Discriminative .1283±plus-or-minus\pm±.015 .4966±plus-or-minus\pm±.002 .3286±plus-or-minus\pm±.063 .0784±plus-or-minus\pm±.028 .4943±plus-or-minus\pm±.002 .3181±plus-or-minus\pm±.038
Predictive .0132±plus-or-minus\pm±.000 .0144±plus-or-minus\pm±.000 .0139±plus-or-minus\pm±.000 .0125±plus-or-minus\pm±.000 .1180±plus-or-minus\pm±.012 .0127±plus-or-minus\pm±.000
Sig-MMD .0248±plus-or-minus\pm±.008 .0029±plus-or-minus\pm± .000 .0272±plus-or-minus\pm±.006 .0017±plus-or-minus\pm±.000 .7587±plus-or-minus\pm±.186 .0078±plus-or-minus\pm±.004
Air Discriminative .4549±plus-or-minus\pm±.012 .4992±plus-or-minus\pm±.002 .3460±plus-or-minus\pm±.025 .2326±plus-or-minus\pm±.058 .4999±plus-or-minus\pm±.000 .4140±plus-or-minus\pm±.013
Predictive .0261±plus-or-minus\pm±.001 .0260±plus-or-minus\pm±.001 .0256±plus-or-minus\pm±.000 .0237±plus-or-minus\pm±.000 .0619±plus-or-minus\pm±.004 .0289±plus-or-minus\pm±.000
Sig-MMD .0456±plus-or-minus\pm±.015 .0128±plus-or-minus\pm±.002 .0146±plus-or-minus\pm±.026 .0126±plus-or-minus\pm±.005 .4141±plus-or-minus\pm±.078 .0359±plus-or-minus\pm±.012
EEG Discriminative .4908±plus-or-minus\pm±.003 .4931±plus-or-minus\pm±.007 .4771±plus-or-minus\pm±.008 .3660±plus-or-minus\pm±.025 .5000±plus-or-minus\pm±.000 .4959±plus-or-minus\pm±.003
Predictive .0315±plus-or-minus\pm±.000 .0304±plus-or-minus\pm±.000 .0342±plus-or-minus\pm±.001 .0246±plus-or-minus\pm±.000 .0499±plus-or-minus\pm±.001 .0328±plus-or-minus\pm±.001
Sig-MMD .0602±plus-or-minus\pm±.010 .0102±plus-or-minus\pm±.002 .0640±plus-or-minus\pm±.025 .0180±plus-or-minus\pm±.004 .0700±plus-or-minus\pm±.021 .0641±plus-or-minus\pm±.019

5.2 Time series reconstruction

As TimeGAN is the only baseline model incorporating reconstruction capability, for reconstruction tasks we only compare with TimeGAN. The reconstructed examples of time series using both PCF-GAN and TimeGAN are shown in Figure 4; see Appendix D for more samples.

Refer to caption
Figure 4: Examples of time series reconstruction using PCF-GAN and TimeGAN.

Visually, the PCF-GAN achieves better reconstruction results than TimeGAN by producing more accurate reconstructed time series samples. Notably, the reconstructed samples from PCF-GAN preserve the temporal dependency of original time series for all four datasets, while some reconstructed samples from TimeGAN in EEG and Stock datasets are completely mismatched. This is further quantified in Table 2 on the reconstruction task, where the reconstructed samples from PCF-GAN consistently outperform those from TimeGAN in terms of all test metrics.

5.3 Training stability and efficiency

Refer to caption
Figure 5: Training curves for PCF-GAN of real (left) and generated (right) time series distributions on the Rough Volatility dataset at different training iterations. Plotted by a moving average over a window with 500 iterations.

Figure 5 demonstrates the training progress of the PCF-GAN on RV dataset. Compared to the fluctuating generator loss typically observed in traditional GANs, the PCF-GAN yields better convergence by leveraging the autoencoder structure. This is achieved by minimising reconstruction and regularisation losses, which ensures the injectivity of Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT and enables production of a semantic embedding throughout the training process. The decay of generator loss in the embedding space directly reflects the improvement in the quality of the generated time series. This is particularly useful for debugging and conducting hyperparameter searches. Furthermore, decay in both recovery and regularisation loss signifies the enhanced performance of the autoencoder.

By leveraging the effective critic Fθfsubscript𝐹subscript𝜃𝑓F_{\theta_{f}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we achieve enhanced performance with a moderate increase in parameters (ranging from 1200 to 6400) within θMsubscript𝜃𝑀\theta_{M}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT of EPCFD. The training of these additional parameters is highly efficient in PCF-GAN, while still outperforming all baseline models. Specifically, our algorithm is approximately twice as fast as TimeGAN (using three extra critic modules) and three times as fast as COT-GAN (with one additional critic module and the Sinkhorn algorithm). However, it takes 1.5 times as long as RGAN due to the extra training required on θMsubscript𝜃𝑀\theta_{M}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT.

6 Conclusion & Broader impact

Conclusion We introduce a novel, principled and efficient PCF-GAN model based on PCF for generating high-fidelity sequential data. With theoretical support, it achieves state-of-the-art generative performance with additional reconstruction functionality in various tasks of time series generation.

Limitation and future work In this work, we use LSTM-based networks for the autoencoder and do not explore other sequential models (e.g., transformers). The suitable choice of network architecture for the autoencoder may further improve the efficacy of the proposed PCF-GAN on more complicated data, e.g., video and skeletal human action sequence, which merits further investigation. As a distance metric on time series, PCFD can be flexibly incorporated with other advanced generators of time series GAN models, hence may further improve the performance. For example, one can replace the average cross-entropy loss used in [17, 39] and the Wasserstein distance in [36] by PCFD, with some simple modifications on the discriminators. Furthermore, although we establish the link between PCFD and MMD, it is interesting to design efficient algorithms to compute the kernel specified in Section B.3.

Broader impact Like other GAN models, this model has the potential to aid data-hungry algorithms by augmenting small datasets. Additionally, it can enable data sharing in domains such as finance and healthcare, where sensitive time series data is plentiful. However, it is important to acknowledge that the generation of synthetic data also carries the risk of potential misuse (e.g. generating fake news).

Acknowledgments and Disclosure of Funding

The research of SL is supported by NSFC (National Natural Science Foundation of China) Grant No. 12201399, and the Shanghai Frontiers Science Center of Modern Analysis. This research project is also supported by SL’s visiting scholarship at New York University-Shanghai. HN is supported by the EPSRC under the program grant EP/S026347/1 and The Alan Turing Institute under the EPSRC grant EP/N510129/1. LH is supported by University College London and the China Scholarship Council under the UCL-CSC scholarship (No. 201908060002). SL and HN are supported by the SJTU-UCL joint seed fund WH610160507/067. HN and HL are supported by the Ecosystem Leadership Award under the EPSRC Grant OobfJ22///100020 and The Alan Turing Institute in part. HN is grateful to Jiajie Tao and Zijiu Lyu for proofreading the paper.

References

  • [1] Abdul Fatir Ansari, Jonathan Scarlett, and Harold Soh. A characteristic function approach to deep implicit generative modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7478–7487, 2020.
  • [2] Martin Arjovsky and Léon Bottou. Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862, 2017.
  • [3] Samuel A Assefa, Danial Dervovic, Mahmoud Mahfouz, Robert E Tillman, Prashant Reddy, and Manuela Veloso. Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance, pages 1–8, 2020.
  • [4] Steven M Bellovin, Preetam K Dutta, and Nathan Reitinger. Privacy and synthetic datasets. Stan. Tech. L. Rev., 22:1, 2019.
  • [5] Lukas Biewald. Experiment tracking with weights and biases, 2020. Software available from wandb.com.
  • [6] Horatio Boedihardjo and Xi Geng. Sl_2 (r)-developments and signature asymptotics for planar paths with bounded variation. arXiv preprint arXiv:2009.13082, 2020.
  • [7] Ilya Chevyrev, Terry Lyons, et al. Characteristic functions of measures on geometric rough paths. The Annals of Probability, 44(6):4049–4082, 2016.
  • [8] Ilya Chevyrev, Vidit Nanda, and Harald Oberhauser. Persistence paths and signature features in topological data analysis. IEEE transactions on pattern analysis and machine intelligence, 42(1):192–202, 2018.
  • [9] Ilya Chevyrev and Harald Oberhauser. Signature moments to characterize laws of stochastic processes. Journal of Machine Learning Research, 23(176):1–42, 2022.
  • [10] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  • [11] Kacper P Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, and Arthur Gretton. Fast two-sample testing with analytic representations of probability measures. Advances in Neural Information Processing Systems, 28:1981–1989, 2015.
  • [12] Cristóbal Esteban, Stephanie L Hyland, and Gunnar Rätsch. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633, 2017.
  • [13] Marián Fabian, Petr Habala, Petr Hájek, Vicente Montesinos Santalucía, Jan Pelant, and Václav Zizler. Functional analysis and infinite-dimensional geometry, volume 8 of CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer-Verlag, New York, 2001.
  • [14] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of Wasserstein GANs. Advances in Neural Information Processing Systems, 30:5769–5779, 2017.
  • [15] Ben Hambly and Terry Lyons. Uniqueness for the signature of a path of bounded variation and the reduced path group. Annals of Mathematics, pages 109–167, 2010.
  • [16] Christopher R Heathcote. The integrated squared error estimation of parameters. Biometrika, 64(2):255–264, 1977.
  • [17] Jinsung Jeon, Jeonghak Kim, Haryong Song, Seunghyeon Cho, and Noseong Park. GT-GAN: General purpose time series synthesis with generative adversarial networks. Advances in Neural Information Processing Systems, 35:36999–37010, 2022.
  • [18] Patrick Kidger, Patric Bonnier, Imanol Perez Arribas, Cristopher Salvi, and Terry Lyons. Deep signature transforms. Advances in Neural Information Processing Systems, 32:3082–3092, 2019.
  • [19] Patrick Kidger, James Foster, Xuechen Li, and Terry J Lyons. Neural SDEs as infinite-dimensional GANs. In International Conference on Machine Learning, pages 5453–5463. PMLR, 2021.
  • [20] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [21] Achim Klenke. Probability theory: a comprehensive course. Springer Science & Business Media, 2013.
  • [22] Erich Leo Lehmann, Joseph P Romano, and George Casella. Testing statistical hypotheses, volume 3. Springer, 2005.
  • [23] Daniel Levin, Terry Lyons, and Hao Ni. Learning from the past, predicting the statistics for the future, learning an evolving system. arXiv preprint arXiv:1309.0260, 2013.
  • [24] Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos. MMD GAN: Towards deeper understanding of moment matching network. Advances in Neural Information Processing Systems, 30:2200–2210, 2017.
  • [25] Shengxi Li, Zeyang Yu, Min Xiang, and Danilo Mandic. Reciprocal adversarial learning via characteristic functions. Advances in Neural Information Processing Systems, 33:217–228, 2020.
  • [26] Hang Lou, Siran Li, and Hao Ni. Path development network with finite-dimensional Lie group representation. arXiv preprint arXiv:2204.00740, 2022.
  • [27] Terry Lyons. Rough paths, signatures and the modelling of functions on streams. arXiv preprint arXiv:1405.4537, 2014.
  • [28] Terry J Lyons. Differential equations driven by rough signals. Revista Matemática Iberoamericana, 14(2):215–310, 1998.
  • [29] Terry J Lyons, Michael Caruana, and Thierry Lévy. Differential equations driven by rough paths. Springer, 2007.
  • [30] Terry J Lyons and Weijun Xu. Hyperbolic development and inversion of signature. Journal of Functional Analysis, 272(7):2933–2955, 2017.
  • [31] Hao Ni, Lukasz Szpruch, Marc Sabate-Vidales, Baoren Xiao, Magnus Wiese, and Shujian Liao. Sig-Wasserstein GANs for time series generation. In Proceedings of the Second ACM International Conference on AI in Finance, pages 1–8, 2021.
  • [32] Hao Ni, Lukasz Szpruch, Magnus Wiese, Shujian Liao, and Baoren Xiao. Conditional sig-wasserstein gans for time series generation. arXiv preprint arXiv:2006.05421, 2020.
  • [33] Kalyanapuram R. Parthasarathy. Probability measures on metric spaces, volume 3 of Probability and Mathematical Statistics. Academic Press, Inc., New York-London, 1967.
  • [34] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  • [35] Vibhor Rastogi and Suman Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 735–746, 2010.
  • [36] Carl Remlinger, Joseph Mikael, and Romuald Elie. Conditional loss and deep euler scheme for time series generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8098–8105, 2022.
  • [37] Jinfu Ren, Yang Liu, and Jiming Liu. EWGAN: Entropy-based Wasserstein GAN for imbalanced learning. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 10011–10012, 2019.
  • [38] Oliver Roesler, Lucas Bader, Jan Forster, Yoshikatsu Hayashi, Stefan Heßler, and David Suendermann-Oeft. Comparison of eeg devices for eye state classification. Proc. of the AIHLS, 2014.
  • [39] Ali Seyfi, Jean-Francois Rajotte, and Raymond Ng. Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN). Advances in Neural Information Processing Systems, 35:32777–32788, 2022.
  • [40] Bharath K Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert RG Lanckriet. Hilbert space embeddings and metrics on probability measures. The Journal of Machine Learning Research, 11:1517–1561, 2010.
  • [41] Akash Srivastava, Lazar Valkov, Chris Russell, Michael U Gutmann, and Charles Sutton. VEEGAN: Reducing mode collapse in GANs using implicit variational learning. Advances in Neural Information Processing Systems, 30:3310–3320, 2017.
  • [42] Csaba Toth and Harald Oberhauser. Bayesian learning from sequential data using Gaussian processes with signature covariances. In International Conference on Machine Learning, pages 9548–9560. PMLR, 2020.
  • [43] Tianlin Xu, Li Kevin Wenliang, Michael Munn, and Beatrice Acciaio. COT-GAN: Generating sequential data via causal optimal transport. Advances in Neural Information Processing Systems, 33:8798–8809, 2020.
  • [44] Yasin Yazıcı, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, and Vijay Chandrasekhar. The unusual effectiveness of averaging in GAN training. In International Conference on Learning Representations, 2019.
  • [45] Jinsung Yoon, Daniel Jarrett, and Mihaela Van der Schaar. Time-series generative adversarial networks. Advances in Neural Information Processing Systems, 32:5509–5519, 2019.
  • [46] Kôsaku Yosida. Functional analysis. Sixth edition, volume 123 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin-New York, 1980.
  • [47] Shuyi Zhang, Bin Guo, Anlan Dong, Jing He, Ziping Xu, and Song Xi Chen. Cautionary tales on air-quality improvement in beijing. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2205):20170457, 2017.
  • [48] Robert J Zimmer. Essential results of functional analysis. University of Chicago Press, 1990.

In Appendix A, we collect some notations and properties for paths and unitary feature of a path. Appendix B gives a thorough introduction to the distance function via the path characteristics function. Detailed proofs for the theoretical results on PCFD are provided. Appendix C discusses experimental details and Appendix D presents supplementary numerical results.

Appendix A Preliminaries

A.1 Paths with bounded variation

Definition A.1.

Let X:[0,T]d:𝑋0𝑇superscript𝑑X:[0,T]\rightarrow{\mathbb{R}}^{d}italic_X : [ 0 , italic_T ] → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a continuous path. The total variation of X𝑋Xitalic_X on the interval [0,T]0𝑇[0,T][ 0 , italic_T ] is defined by

Tot.Var.(X):=sup𝒟[0,T]{|XtXt1|}formulae-sequenceTotVarassign𝑋subscriptsupremum𝒟0𝑇subscriptsubscript𝑋subscript𝑡subscript𝑋subscript𝑡1{\rm Tot.Var.}(X):=\sup_{\mathcal{D}\subset[0,T]}\left\{\sum_{\ell}\left|X_{t_% {\ell}}-X_{t_{\ell-1}}\right|\right\}roman_Tot . roman_Var . ( italic_X ) := roman_sup start_POSTSUBSCRIPT caligraphic_D ⊂ [ 0 , italic_T ] end_POSTSUBSCRIPT { ∑ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | } (11)

where the supremum is taken over all finite partitions 𝒟={t}=0N𝒟superscriptsubscriptsubscript𝑡0𝑁\mathcal{D}=\{t_{\ell}\}_{\ell=0}^{N}caligraphic_D = { italic_t start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT of [0,T]0𝑇[0,T][ 0 , italic_T ]. When Tot.Var.(X)formulae-sequenceTotVar𝑋{\rm Tot.Var.}(X)roman_Tot . roman_Var . ( italic_X ) is finite, say that X𝑋Xitalic_X is a path of bounded variation (BV-path) on [0,T]0𝑇[0,T][ 0 , italic_T ] and denote XBV([0,T];d)𝑋BV0𝑇superscript𝑑X\in{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)italic_X ∈ roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ).

BV-paths can be defined without the continuity assumption, but we shall not seek for greater generality in this work. It is well-known that

XBV:=XC0([0,T])+Tot.Var.(X)formulae-sequenceassignsubscriptnorm𝑋BVsubscriptnorm𝑋superscript𝐶00𝑇TotVar𝑋\|X\|_{\rm BV}:=\|X\|_{C^{0}([0,T])}+{\rm Tot.Var.}(X)∥ italic_X ∥ start_POSTSUBSCRIPT roman_BV end_POSTSUBSCRIPT := ∥ italic_X ∥ start_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( [ 0 , italic_T ] ) end_POSTSUBSCRIPT + roman_Tot . roman_Var . ( italic_X )

defines a norm (the BV-norm). There is a more general notion of paths of finite p𝑝pitalic_p-variation for p1𝑝1p\geq 1italic_p ≥ 1 (see [29]), where the case p=1𝑝1p=1italic_p = 1 corresponds to BV-paths discussed above. We restrict ourselves to p=1𝑝1p=1italic_p = 1, as this is sufficient for the study of sequential data in practice as piecewise linear approximations of continuous paths.

Definition A.2.

(Concatenation of paths) Let X:[0,s]d:𝑋0𝑠superscript𝑑X:[0,s]\rightarrow{\mathbb{R}}^{d}italic_X : [ 0 , italic_s ] → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and Y:[s,t]d:𝑌𝑠𝑡superscript𝑑Y:[s,t]\rightarrow{\mathbb{R}}^{d}italic_Y : [ italic_s , italic_t ] → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be two continuous paths. Their concatenation denoted as the path XY:[0,t]d:𝑋𝑌0𝑡superscript𝑑X\star Y:[0,t]\rightarrow{\mathbb{R}}^{d}italic_X ⋆ italic_Y : [ 0 , italic_t ] → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is defined by

(XY)u={Xu,u[0,s],YuYs+Xs,u[s,t].subscript𝑋𝑌𝑢casessubscript𝑋𝑢𝑢0𝑠subscript𝑌𝑢subscript𝑌𝑠subscript𝑋𝑠𝑢𝑠𝑡\displaystyle(X\star Y)_{u}=\begin{cases}X_{u},&u\in[0,s],\\ Y_{u}-Y_{s}+X_{s},&u\in[s,t].\end{cases}( italic_X ⋆ italic_Y ) start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = { start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , end_CELL start_CELL italic_u ∈ [ 0 , italic_s ] , end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , end_CELL start_CELL italic_u ∈ [ italic_s , italic_t ] . end_CELL end_ROW
Definition A.3 (Tree-like equivalence).

A continuous path X:[0,T]d:𝑋0𝑇superscript𝑑X:[0,T]\rightarrow{\mathbb{R}}^{d}italic_X : [ 0 , italic_T ] → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is called tree-like if there is an {\mathbb{R}}blackboard_R-tree 𝒯𝒯\mathcal{T}caligraphic_T, a continuous function ϕ:[0,T]𝒯:italic-ϕ0𝑇𝒯\phi:[0,T]\rightarrow\mathcal{T}italic_ϕ : [ 0 , italic_T ] → caligraphic_T, and a function ψ:𝒯d:𝜓𝒯superscript𝑑\psi:\mathcal{T}\rightarrow{\mathbb{R}}^{d}italic_ψ : caligraphic_T → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that ϕ(0)=ϕ(T)italic-ϕ0italic-ϕ𝑇\phi(0)=\phi(T)italic_ϕ ( 0 ) = italic_ϕ ( italic_T ) and X=ψϕ𝑋𝜓italic-ϕX=\psi\circ\phiitalic_X = italic_ψ ∘ italic_ϕ.

Let X:[0,T]d:𝑋0𝑇superscript𝑑\overleftarrow{X}:[0,T]\rightarrow{\mathbb{R}}^{d}over← start_ARG italic_X end_ARG : [ 0 , italic_T ] → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT denote the time-reversal of continuous path X𝑋Xitalic_X, namely that X(t)=X(Tt)𝑋𝑡𝑋𝑇𝑡\overleftarrow{X}(t)=X(T-t)over← start_ARG italic_X end_ARG ( italic_t ) = italic_X ( italic_T - italic_t ). We say that X𝑋Xitalic_X and Y𝑌Yitalic_Y are in tree-like equivalence (denoted as XτYsubscriptsimilar-to𝜏𝑋𝑌X\sim_{\tau}Yitalic_X ∼ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_Y) if XY𝑋𝑌X\star\overleftarrow{Y}italic_X ⋆ over← start_ARG italic_Y end_ARG is tree-like.

An important example is when path X𝑋Xitalic_X is a time re-parameterisation of Y𝑌Yitalic_Y. That is, for XBV([0,T];d)𝑋BV0𝑇superscript𝑑X\in{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)italic_X ∈ roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ), take a nondecreasing surjection λ:[0,T][T1,T2]:𝜆0𝑇subscript𝑇1subscript𝑇2\lambda:[0,T]\rightarrow[T_{1},T_{2}]italic_λ : [ 0 , italic_T ] → [ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ], and take X(t)=Y(λ(t))𝑋𝑡𝑌𝜆𝑡X(t)=Y(\lambda(t))italic_X ( italic_t ) = italic_Y ( italic_λ ( italic_t ) ).

A.2 Matrix groups and algebras

The unitary group and symplectic group are subsets of the space of m×m𝑚𝑚m\times mitalic_m × italic_m matrices:

U(m):={Am×m:A*A=AA*=Im},assign𝑈𝑚conditional-set𝐴superscript𝑚𝑚superscript𝐴𝐴𝐴superscript𝐴subscript𝐼𝑚\displaystyle U(m):=\left\{A\in\mathbb{C}^{m\times m}:\,A^{*}A=AA^{*}=I_{m}% \right\},italic_U ( italic_m ) := { italic_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT : italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_A = italic_A italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ,
Sp(2m,):={A2m×2m:A*JmA=Jm}.assign𝑆𝑝2𝑚conditional-set𝐴superscript2𝑚2𝑚superscript𝐴subscript𝐽𝑚𝐴subscript𝐽𝑚\displaystyle Sp(2m,\mathbb{C}):=\left\{A\in\mathbb{C}^{2m\times 2m}:\,A^{*}J_% {m}A=J_{m}\right\}.italic_S italic_p ( 2 italic_m , blackboard_C ) := { italic_A ∈ blackboard_C start_POSTSUPERSCRIPT 2 italic_m × 2 italic_m end_POSTSUPERSCRIPT : italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_J start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_A = italic_J start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } .

where Jm:=(0ImIm0)assignsubscript𝐽𝑚matrix0subscript𝐼𝑚subscript𝐼𝑚0J_{m}:=\left(\begin{matrix}0&I_{m}\\ -I_{m}&0\end{matrix}\right)italic_J start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := ( start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL - italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG ) and Imm×msubscript𝐼𝑚superscript𝑚𝑚I_{m}\in{\mathbb{C}}^{m\times m}italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT is the identity. Their corresponding Lie algebras are

𝔲(m):={Am×m:A*+A=0},assign𝔲𝑚conditional-set𝐴superscript𝑚𝑚superscript𝐴𝐴0\displaystyle\mathfrak{u}(m):=\left\{A\in\mathbb{C}^{m\times m}:\,A^{*}+A=0% \right\},fraktur_u ( italic_m ) := { italic_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT : italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_A = 0 } ,
𝔰𝔭(2m,):={A2m×2m:A*Jm+JmA=0}.assign𝔰𝔭2𝑚conditional-set𝐴superscript2𝑚2𝑚superscript𝐴subscript𝐽𝑚subscript𝐽𝑚𝐴0\displaystyle\mathfrak{sp}(2m,\mathbb{C}):=\left\{A\in\mathbb{C}^{2m\times 2m}% :\,A^{*}J_{m}+J_{m}A=0\right\}.fraktur_s fraktur_p ( 2 italic_m , blackboard_C ) := { italic_A ∈ blackboard_C start_POSTSUPERSCRIPT 2 italic_m × 2 italic_m end_POSTSUPERSCRIPT : italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_J start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_J start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_A = 0 } .

The unitary group is compact and is a group of isometries of matrix multiplication with respect to the Hilbert–Schmidt norm. Such properties are crucial for establishing theorems and properties related to the path characteristic function (PCF), as discussed in subsequent sections.

The compact symplectic group Sp(m)Sp𝑚{\rm Sp}(m)roman_Sp ( italic_m ) is the simply-connected maximal compact real Lie subgroup of Sp(2m,)Sp2𝑚{\rm Sp}(2m,\mathbb{C})roman_Sp ( 2 italic_m , blackboard_C ). It is the real form of Sp(2n,)Sp2𝑛{\rm Sp}(2n,\mathbb{C})roman_Sp ( 2 italic_n , blackboard_C ), and satisfies

Sp(m)=Sp(2m,)U(2m).Sp𝑚Sp2𝑚𝑈2𝑚\displaystyle{\rm Sp}(m)={\rm Sp}(2m,\mathbb{C})\cap U(2m).roman_Sp ( italic_m ) = roman_Sp ( 2 italic_m , blackboard_C ) ∩ italic_U ( 2 italic_m ) .

Note that U(m)𝑈𝑚U(m)italic_U ( italic_m ) and Sp(m)Sp𝑚{\rm Sp}(m)roman_Sp ( italic_m ) are both real Lie groups, albeit they have complex entries in general.

A.3 Unitary feature of a path

Recall Definition 2.1 for the unitary feature, reproduced below:

Definition A.4.

Let M:d𝔲(m):𝑀superscript𝑑𝔲𝑚M:{\mathbb{R}}^{d}\rightarrow\mathfrak{u}(m)italic_M : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → fraktur_u ( italic_m ) be a linear map and let XBV([0,T];d)𝑋BV0𝑇superscript𝑑X\in{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)italic_X ∈ roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) be a BV-path. The unitary feature [a.k.a. the path development on the unitary group U(m)𝑈𝑚U(m)italic_U ( italic_m )] of 𝐱𝐱\mathbf{x}bold_x under M𝑀Mitalic_M is the solution to the equation

d𝐲t=𝐲tM(d𝐱t) for all t[0,T] with 𝐘0=Im.formulae-sequencedsubscript𝐲𝑡subscript𝐲𝑡𝑀dsubscript𝐱𝑡 for all 𝑡0𝑇 with subscript𝐘0subscript𝐼𝑚{\rm d}\mathbf{y}_{t}=\mathbf{y}_{t}\cdot M({\rm d}\mathbf{x}_{t})\qquad\text{% for all }t\in[0,T]\text{ with }\mathbf{Y}_{0}=I_{m}.roman_d bold_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_M ( roman_d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for all italic_t ∈ [ 0 , italic_T ] with bold_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT .

We write 𝒰M(𝐱):=𝐲Tassignsubscript𝒰𝑀𝐱subscript𝐲𝑇\mathcal{U}_{M}(\mathbf{x}):=\mathbf{y}_{T}caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ) := bold_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

Definition 2.1 is motivated by [7, §4]. Consider M(d,𝔲(fd))𝑀superscript𝑑𝔲subscriptfdM\in\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(\mathcal{H}_{\rm fd})\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( caligraphic_H start_POSTSUBSCRIPT roman_fd end_POSTSUBSCRIPT ) ) with fdsubscriptfd\mathcal{H}_{\rm fd}caligraphic_H start_POSTSUBSCRIPT roman_fd end_POSTSUBSCRIPT ranging over all finite-dimensional complex Hilbert spaces. Extend M𝑀Mitalic_M by naturality to the tensor algebra over dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT; that is, define M~:T((d))k=0(d)k𝔲(fd):~𝑀𝑇superscript𝑑superscriptsubscriptdirect-sum𝑘0superscriptsuperscript𝑑tensor-productabsent𝑘𝔲subscriptfd\widetilde{M}:T\left(\left({\mathbb{R}}^{d}\right)\right)\equiv\bigoplus_{k=0}% ^{\infty}\left({\mathbb{R}}^{d}\right)^{\otimes k}\to\mathfrak{u}(\mathcal{H}_% {\rm fd})over~ start_ARG italic_M end_ARG : italic_T ( ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) ≡ ⨁ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ italic_k end_POSTSUPERSCRIPT → fraktur_u ( caligraphic_H start_POSTSUBSCRIPT roman_fd end_POSTSUBSCRIPT ) by linearity and the following rule:

M~(v1vk):=M(v1)M(vk)for any k and v1,,vkd.formulae-sequenceassign~𝑀tensor-productsubscript𝑣1subscript𝑣𝑘𝑀subscript𝑣1𝑀subscript𝑣𝑘for any k and subscript𝑣1subscript𝑣𝑘superscript𝑑\displaystyle\widetilde{M}(v_{1}\otimes\ldots\otimes v_{k}):=M(v_{1})\ldots M(% v_{k})\quad\text{for any $k\in\mathbb{N}$ and }v_{1},\ldots,v_{k}\in{\mathbb{R% }}^{d}.over~ start_ARG italic_M end_ARG ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ … ⊗ italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) := italic_M ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) … italic_M ( italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) for any italic_k ∈ blackboard_N and italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

Then denote by 𝒜(d)𝒜superscript𝑑{\mathcal{A}\left({\mathbb{R}}^{d}\right)}caligraphic_A ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) the totality of such M~~𝑀\widetilde{M}over~ start_ARG italic_M end_ARG. Any element in 𝒜(d)𝒜superscript𝑑{\mathcal{A}\left({\mathbb{R}}^{d}\right)}caligraphic_A ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) is a unitary representation of the Lie group 𝒢((d)):={group-like elements in T((d))}assign𝒢superscript𝑑group-like elements in T((d))\mathcal{G}\left(\left({\mathbb{R}}^{d}\right)\right):=\left\{\text{group-like% elements in $T\left(\left({\mathbb{R}}^{d}\right)\right)$}\right\}caligraphic_G ( ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) := { group-like elements in italic_T ( ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) }. See [7, p.4059].

The following two lemmas are contained in [26].

Lemma A.5.

[Multiplicativity] Let XBV([0,s],d)𝑋normal-BV0𝑠superscript𝑑X\in{\rm BV}\left([0,s],{\mathbb{R}}^{d}\right)italic_X ∈ roman_BV ( [ 0 , italic_s ] , blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) and YBV([s,t],d)𝑌normal-BV𝑠𝑡superscript𝑑Y\in{\rm BV}\left([s,t],{\mathbb{R}}^{d}\right)italic_Y ∈ roman_BV ( [ italic_s , italic_t ] , blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ). Denote by X*Y𝑋𝑌X*Yitalic_X * italic_Y their concatenation: (XY)(v)=X(v)normal-∗𝑋𝑌𝑣𝑋𝑣(X\ast Y)(v)=X(v)( italic_X ∗ italic_Y ) ( italic_v ) = italic_X ( italic_v ) for v[0,s]𝑣0𝑠v\in[0,s]italic_v ∈ [ 0 , italic_s ] and Y(v)Y(s)+X(s)𝑌𝑣𝑌𝑠𝑋𝑠Y(v)-Y(s)+X(s)italic_Y ( italic_v ) - italic_Y ( italic_s ) + italic_X ( italic_s ) for v[s,t]𝑣𝑠𝑡v\in[s,t]italic_v ∈ [ italic_s , italic_t ]. Then 𝒰M(X*Y)=𝒰M(X)𝒰M(Y)subscript𝒰𝑀𝑋𝑌normal-⋅subscript𝒰𝑀𝑋subscript𝒰𝑀𝑌\mathcal{U}_{M}(X*Y)=\mathcal{U}_{M}(X)\cdot\mathcal{U}_{M}(Y)caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X * italic_Y ) = caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X ) ⋅ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y ) for all M(d,𝔲(m))𝑀superscript𝑑𝔲𝑚M\in{\mathcal{L}}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ).

We shall compute by Lemma A.5 and Example 2.2 the unitary feature of piecewise linear paths.

Lemma A.6 (Invariance under time-reparametrisation).

Let XBV([0,T],d)𝑋normal-BV0𝑇superscript𝑑X\in{\rm BV}([0,T],{\mathbb{R}}^{d})italic_X ∈ roman_BV ( [ 0 , italic_T ] , blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) and let λ:tλtnormal-:𝜆maps-to𝑡subscript𝜆𝑡\lambda:t\mapsto\lambda_{t}italic_λ : italic_t ↦ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be a non-decreasing 𝒞1superscript𝒞1\mathcal{C}^{1}caligraphic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT-diffeomorphism from [0,T]0𝑇[0,T][ 0 , italic_T ] onto [0,S]0𝑆[0,S][ 0 , italic_S ]. Define Xtλ:=Xλtassignsuperscriptsubscript𝑋𝑡𝜆subscript𝑋subscript𝜆𝑡X_{t}^{\lambda}:=X_{\lambda_{t}}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT := italic_X start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT for t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ]. Then, for all M(d,𝔲(m))𝑀superscript𝑑𝔲𝑚M\in{\mathcal{L}}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) and for every s,t[0,T]𝑠𝑡0𝑇s,t\in[0,T]italic_s , italic_t ∈ [ 0 , italic_T ], it holds that 𝒰M(Xλs,λt)=𝒰M(Xs,tλ).subscript𝒰𝑀subscript𝑋subscript𝜆𝑠subscript𝜆𝑡subscript𝒰𝑀subscriptsuperscript𝑋𝜆𝑠𝑡\mathcal{U}_{M}\left(X_{\lambda_{s},\lambda_{t}}\right)=\mathcal{U}_{M}\left(X% ^{\lambda}_{s,t}\right).caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_t end_POSTSUBSCRIPT ) .

A key property of the unitary feature is that it completely determines the law of random paths:

Theorem A.7 (Uniqueness of unitary feature).

For any two paths 𝐗1𝐗2subscript𝐗1subscript𝐗2\mathbf{X}_{1}\neq\mathbf{X}_{2}bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in 𝒳𝒳\mathcal{X}caligraphic_X, there exists an M(d,𝔲(m))𝑀superscript𝑑𝔲𝑚M\in{\mathcal{L}}\left(\mathbb{R}^{d},\mathfrak{u}(m)\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) with some m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N such that 𝒰M(𝐗1)𝒰M(𝐗2)subscript𝒰𝑀subscript𝐗1subscript𝒰𝑀subscript𝐗2\mathcal{U}_{M}(\mathbf{X}_{1})\neq\mathcal{U}_{M}(\mathbf{X}_{2})caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≠ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ).

Proof.

For 𝐗1𝐗2subscript𝐗1subscript𝐗2\mathbf{X}_{1}\neq\mathbf{X}_{2}bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in 𝒳𝒳\mathcal{X}caligraphic_X, by uniqueness of signature over BV-paths (cf. [15]) one has Sig(𝐗1)Sig(𝐗2)Sigsubscript𝐗1Sigsubscript𝐗2{\rm Sig}(\mathbf{X}_{1})\neq{\rm Sig}(\mathbf{X}_{2})roman_Sig ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≠ roman_Sig ( bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) in 𝒢((d))𝒢superscript𝑑\mathcal{G}\left(\left({\mathbb{R}}^{d}\right)\right)caligraphic_G ( ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ). Here we use the fact that the signatures of BV-paths are group-like elements in the tensor algebra. Then, as 𝒜(d)𝒜superscript𝑑{\mathcal{A}\left({\mathbb{R}}^{d}\right)}caligraphic_A ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) separates points over 𝒢((d))𝒢superscript𝑑\mathcal{G}\left(\left({\mathbb{R}}^{d}\right)\right)caligraphic_G ( ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) (cf. [7, Theorem 4.8]), there is M(d,𝔲(m))𝑀superscript𝑑𝔲𝑚M\in{\mathcal{L}}\left(\mathbb{R}^{d},\mathfrak{u}(m)\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) such that M~[Sig(𝐗1)]M~[Sig(𝐗2)]~𝑀delimited-[]Sigsubscript𝐗1~𝑀delimited-[]Sigsubscript𝐗2\widetilde{M}\left[{\rm Sig}(\mathbf{X}_{1})\right]\neq\widetilde{M}\left[{\rm Sig% }(\mathbf{X}_{2})\right]over~ start_ARG italic_M end_ARG [ roman_Sig ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] ≠ over~ start_ARG italic_M end_ARG [ roman_Sig ( bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ]; hence M(𝐗1)M(𝐗2)𝑀subscript𝐗1𝑀subscript𝐗2M(\mathbf{X}_{1})\neq M(\mathbf{X}_{2})italic_M ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≠ italic_M ( bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Therefore, by considering the U(m)𝑈𝑚U(m)italic_U ( italic_m )-valued equation d𝐘t=𝐘tM(d𝐗t)dsubscript𝐘𝑡subscript𝐘𝑡𝑀dsubscript𝐗𝑡{\rm d}{\bf Y}_{t}={\bf Y}_{t}\cdot M({\rm d}\mathbf{X}_{t})roman_d bold_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_M ( roman_d bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) with 𝐘0=Imsubscript𝐘0subscript𝐼𝑚{\bf Y}_{0}=I_{m}bold_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, we conclude that 𝒰M(𝐗1)𝒰M(𝐗2)subscript𝒰𝑀subscript𝐗1subscript𝒰𝑀subscript𝐗2\mathcal{U}_{M}(\mathbf{X}_{1})\neq\mathcal{U}_{M}(\mathbf{X}_{2})caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≠ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). ∎

Theorem A.8 (Universality of unitary feature).

Let 𝒦BV([0,T];d)𝒦normal-BV0𝑇superscript𝑑\mathcal{K}\subset{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)caligraphic_K ⊂ roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) be a compact subset. For any continuous function f:𝒦normal-:𝑓normal-→𝒦f:\mathcal{K}\rightarrow\mathbb{C}italic_f : caligraphic_K → blackboard_C and any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, there exists an msubscript𝑚normal-⋆m_{\star}\in\mathbb{N}italic_m start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ blackboard_N and finitely many M1,,MN(d,𝔲(m))subscript𝑀1normal-⋯subscript𝑀𝑁superscript𝑑𝔲subscript𝑚normal-⋆M_{1},\cdots,M_{N}\in{\mathcal{L}}\left({\mathbb{R}}^{d},\mathfrak{u}(m_{\star% })\right)italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_M start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ) as well as L1,,LN(U(m);)subscript𝐿1normal-…subscript𝐿𝑁𝑈subscript𝑚normal-⋆L_{1},\ldots,L_{N}\in{\mathcal{L}}\left(U(m_{\star});\mathbb{C}\right)italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ caligraphic_L ( italic_U ( italic_m start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ; blackboard_C ), such that

sup𝐗𝒦|f(𝐗)i=1NLi𝒰Mi(𝐗)|<ϵ.subscriptsupremum𝐗𝒦𝑓𝐗superscriptsubscript𝑖1𝑁subscript𝐿𝑖subscript𝒰subscript𝑀𝑖𝐗italic-ϵ\displaystyle\sup_{\mathbf{X}\in\mathcal{K}}\left|f(\mathbf{X})-\sum_{i=1}^{N}% L_{i}\circ\mathcal{U}_{M_{i}}(\mathbf{X})\right|<\epsilon.roman_sup start_POSTSUBSCRIPT bold_X ∈ caligraphic_K end_POSTSUBSCRIPT | italic_f ( bold_X ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ caligraphic_U start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_X ) | < italic_ϵ . (12)
Proof.

It follows from [26, Theorem A.4] and the universality of signature in [8] that Eq. (12) holds with Mj(d,m𝔲(m))subscript𝑀𝑗superscript𝑑subscriptdirect-sum𝑚𝔲𝑚M_{j}\in{\mathcal{L}}\left({\mathbb{R}}^{d},\bigoplus_{m\in\mathbb{N}}% \mathfrak{u}(m)\right)italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ⨁ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT fraktur_u ( italic_m ) ) and Lj(mU(m);)subscript𝐿𝑗subscriptdirect-sum𝑚𝑈𝑚L_{j}\in{\mathcal{L}}\left(\bigoplus_{m\in\mathbb{N}}U(m);\mathbb{C}\right)italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_L ( ⨁ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT italic_U ( italic_m ) ; blackboard_C ) and ϵ/2italic-ϵ2\epsilon/2italic_ϵ / 2 in place of ϵitalic-ϵ\epsilonitalic_ϵ. By a simple approximation via restricting the ranges of Mjsubscript𝑀𝑗M_{j}italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and domains of Ljsubscript𝐿𝑗L_{j}italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we may obtain (without relabelling) Mj(d,m=0m𝔲(m))subscript𝑀𝑗superscript𝑑superscriptsubscriptdirect-sum𝑚0subscript𝑚𝔲𝑚M_{j}\in{\mathcal{L}}\left({\mathbb{R}}^{d},\bigoplus_{m=0}^{m_{\star}}% \mathfrak{u}(m)\right)italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ⨁ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT fraktur_u ( italic_m ) ) and Lj(m=0mU(m);)subscript𝐿𝑗superscriptsubscriptdirect-sum𝑚0subscript𝑚𝑈𝑚L_{j}\in{\mathcal{L}}\left(\bigoplus_{m=0}^{m_{\star}}U(m);\mathbb{C}\right)italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_L ( ⨁ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_U ( italic_m ) ; blackboard_C ) that verify Eq. (12). We conclude by the flag structure of U(1)U(2)U(3)𝑈1𝑈2𝑈3U(1)\subset U(2)\subset U(3)\subset\ldotsitalic_U ( 1 ) ⊂ italic_U ( 2 ) ⊂ italic_U ( 3 ) ⊂ … and 𝔲(1)𝔲(2)𝔲(3)𝔲1𝔲2𝔲3\mathfrak{u}(1)\subset\mathfrak{u}(2)\subset\mathfrak{u}(3)\subset\ldotsfraktur_u ( 1 ) ⊂ fraktur_u ( 2 ) ⊂ fraktur_u ( 3 ) ⊂ …. ∎

Appendix B Path Characteristic loss

B.1 Path Characteristic function

Theorem B.1.

Let 𝐗𝐗\mathbf{X}bold_X be a 𝒳𝒳\mathcal{X}caligraphic_X-valued random variable with associated probability measure 𝐗subscript𝐗\mathbb{P}_{\mathbf{X}}blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT. The path characteristic function 𝚽𝐗subscript𝚽𝐗\mathbf{\Phi}_{\mathbf{X}}bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT uniquely characterises 𝐗subscript𝐗\mathbb{P}_{\mathbf{X}}blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT.

Proof.

Assume that 𝐗1𝐗2subscriptsubscript𝐗1subscriptsubscript𝐗2\mathbb{P}_{\mathbf{X}_{1}}\neq\mathbb{P}_{\mathbf{X}_{2}}blackboard_P start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≠ blackboard_P start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Then Sig(𝐗1)Sig(𝐗2)Sigsubscript𝐗1Sigsubscript𝐗2{\rm Sig}(\mathbf{X}_{1})\neq{\rm Sig}(\mathbf{X}_{2})roman_Sig ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≠ roman_Sig ( bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) by the uniqueness of signature over BV-paths (cf. [15]). It is proved in [26, Lemma 2.6] that 𝒰M(𝐗i)=M~(Sig(𝐗i))subscript𝒰𝑀subscript𝐗𝑖~𝑀Sigsubscript𝐗𝑖\mathcal{U}_{M}\left({\bf X}_{i}\right)=\widetilde{M}\left({\rm Sig}(\mathbf{X% }_{i})\right)caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = over~ start_ARG italic_M end_ARG ( roman_Sig ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) for any M(d,𝔲(m))𝑀superscript𝑑𝔲𝑚M\in\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ); i{1,2}𝑖12i\in\{1,2\}italic_i ∈ { 1 , 2 }. Hence 𝚽𝐗i=𝒳M~(Sig(𝐱))d𝐗i(𝐱)subscript𝚽subscript𝐗𝑖subscript𝒳~𝑀Sig𝐱differential-dsubscriptsubscript𝐗𝑖𝐱\boldsymbol{\Phi}_{\mathbf{X}_{i}}=\int_{\mathcal{X}}\widetilde{M}\left({\rm Sig% }(\mathbf{x})\right)\,{\rm d}\mathbb{P}_{\mathbf{X}_{i}}(\mathbf{x})bold_Φ start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT over~ start_ARG italic_M end_ARG ( roman_Sig ( bold_x ) ) roman_d blackboard_P start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x ). But as in the proof of Theorem A.7, 𝒜(d)𝒜superscript𝑑{\mathcal{A}\left({\mathbb{R}}^{d}\right)}caligraphic_A ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) separates points over 𝒢((d))𝒢superscript𝑑\mathcal{G}\left(\left({\mathbb{R}}^{d}\right)\right)caligraphic_G ( ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) (cf. [7, Theorem 4.8]) and the signature of BV-paths lies in 𝒢((d))𝒢superscript𝑑\mathcal{G}\left(\left({\mathbb{R}}^{d}\right)\right)caligraphic_G ( ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ). Therefore, there is an M(d,𝔲(m))𝑀superscript𝑑𝔲𝑚M\in\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) such that 𝚽𝐗𝚽𝐘subscript𝚽𝐗subscript𝚽𝐘\boldsymbol{\Phi}_{\mathbf{X}}\neq\boldsymbol{\Phi}_{\mathbf{Y}}bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ≠ bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT. ∎

B.2 Distance metric via path characteristic function

Lemma B.2.

PCFDsubscriptPCFD{\rm PCFD}_{\mathcal{M}}roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT in Eq. (5) defines a pseudometric on the path space 𝒳𝒳\mathcal{X}caligraphic_X for any m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N and 𝒫((d,𝔲(m)))𝒫superscript𝑑𝔲𝑚{\mathcal{M}}\in\mathcal{P}\left(\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u% }(m)\right)\right)caligraphic_M ∈ caligraphic_P ( caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) ). In addition, suppose that {j}jsubscriptsubscript𝑗𝑗\left\{{\mathcal{M}}_{j}\right\}_{j\in\mathbb{N}}{ caligraphic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT is a countable dense subset in 𝒫((d,m𝔲(m)))𝒫superscript𝑑subscriptdirect-sum𝑚𝔲𝑚\mathcal{P}\left(\mathcal{L}\left({\mathbb{R}}^{d},\bigoplus_{m\in\mathbb{N}}% \mathfrak{u}(m)\right)\right)caligraphic_P ( caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ⨁ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT fraktur_u ( italic_m ) ) ). Then the following defines a metric on 𝒳𝒳\mathcal{X}caligraphic_X:

PCFD~(𝐗,𝐘):=j=1min{1,PCFDj(𝐗,𝐘)}2j.assign~PCFD𝐗𝐘superscriptsubscript𝑗11subscriptPCFDsubscript𝑗𝐗𝐘superscript2𝑗\displaystyle\widetilde{\rm PCFD}(\mathbf{X},\mathbf{Y}):=\sum_{j=1}^{\infty}% \frac{\min\left\{1,\,{\rm PCFD}_{{\mathcal{M}}_{j}}(\mathbf{X},\mathbf{Y})% \right\}}{2^{j}}.over~ start_ARG roman_PCFD end_ARG ( bold_X , bold_Y ) := ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG roman_min { 1 , roman_PCFD start_POSTSUBSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_X , bold_Y ) } end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG . (13)

In Lemma B.2 above, (d,m𝔲(m))(d)*^π(m𝔲(m))superscript𝑑subscriptdirect-sum𝑚𝔲𝑚superscriptsuperscript𝑑subscript^tensor-product𝜋subscriptdirect-sum𝑚𝔲𝑚\mathcal{L}\left({\mathbb{R}}^{d},\bigoplus_{m\in\mathbb{N}}\mathfrak{u}(m)% \right)\cong\left({\mathbb{R}}^{d}\right)^{*}\widehat{\otimes}_{\pi}\left(% \bigoplus_{m\in\mathbb{N}}\mathfrak{u}(m)\right)caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ⨁ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT fraktur_u ( italic_m ) ) ≅ ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT over^ start_ARG ⊗ end_ARG start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( ⨁ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT fraktur_u ( italic_m ) ) where ^πsubscript^tensor-product𝜋\widehat{\otimes}_{\pi}over^ start_ARG ⊗ end_ARG start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT is the completion of the projective tensor product and m𝔲(m)subscriptdirect-sum𝑚𝔲𝑚\bigoplus_{m\in\mathbb{N}}\mathfrak{u}(m)⨁ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT fraktur_u ( italic_m ) is a Banach space under the norm T:=mT(m)HS<assignnorm𝑇subscript𝑚subscriptnormsuperscript𝑇𝑚HS\|T\|:=\sum_{m\in\mathbb{N}}\left\|T^{(m)}\right\|_{\rm HS}<\infty∥ italic_T ∥ := ∑ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT ∥ italic_T start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT < ∞. Here T(m)superscript𝑇𝑚T^{(m)}italic_T start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT denotes the mthsuperscript𝑚thm^{\text{th}}italic_m start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT-projection of T𝑇Titalic_T on 𝔲(m)𝔲𝑚\mathfrak{u}(m)fraktur_u ( italic_m ). Therefore, such a sequence {j}jsubscriptsubscript𝑗𝑗\left\{{\mathcal{M}}_{j}\right\}_{j\in\mathbb{N}}{ caligraphic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT always exists since 𝒫((d,m𝔲(m)))𝒫superscript𝑑subscriptdirect-sum𝑚𝔲𝑚\mathcal{P}\left(\mathcal{L}\left({\mathbb{R}}^{d},\bigoplus_{m\in\mathbb{N}}% \mathfrak{u}(m)\right)\right)caligraphic_P ( caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ⨁ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT fraktur_u ( italic_m ) ) ), being the space of Borel probability measures over a Polish space, is itself a Polish space. See [33].

Proof.

Non-negativity, symmetry, and that PCFD(𝐗,𝐗)=0subscriptPCFD𝐗𝐗0{\rm PCFD}_{\mathcal{M}}(\mathbf{X},\mathbf{X})=0roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_X , bold_X ) = 0 are clear. That PCFD(𝐗,𝐘)PCFD(𝐗,𝐙)+PCFD(𝐙,𝐘)subscriptPCFD𝐗𝐘subscriptPCFD𝐗𝐙subscriptPCFD𝐙𝐘{\rm PCFD}_{\mathcal{M}}(\mathbf{X},\mathbf{Y})\leq{\rm PCFD}_{\mathcal{M}}(% \mathbf{X},\mathbf{Z})+{\rm PCFD}_{\mathcal{M}}(\mathbf{Z},\mathbf{Y})roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_X , bold_Y ) ≤ roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_X , bold_Z ) + roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_Z , bold_Y ) follows from the triangle inequality of the Hilbert–Schmidt norm and the linearity of expectation. This shows that PCFDsubscriptPCFD{\rm PCFD}_{\mathcal{M}}roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT is a pseudometric for each {\mathcal{M}}caligraphic_M.

In addition, PCFD(𝐗,𝐘)=0subscriptPCFD𝐗𝐘0{\rm PCFD}_{\mathcal{M}}(\mathbf{X},\mathbf{Y})=0roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_X , bold_Y ) = 0 implies that

(d,𝔲(m))dHS2(𝚽𝐗(M),𝚽𝐘(M))d(M)subscriptsuperscript𝑑𝔲𝑚subscriptsuperscript𝑑2HSsubscript𝚽𝐗𝑀subscript𝚽𝐘𝑀differential-dsubscript𝑀\displaystyle\int_{\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)}d^% {2}_{\text{HS}}\big{(}\mathbf{\Phi}_{\mathbf{X}}(M),\mathbf{\Phi}_{\mathbf{Y}}% (M)\big{)}\,{\rm d}\mathbb{P}_{\mathcal{M}}(M)∫ start_POSTSUBSCRIPT caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT HS end_POSTSUBSCRIPT ( bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) , bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ) roman_d blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_M )
=(d,𝔲(m))𝔼[𝒰M(𝐗)]𝔼[𝒰M(𝐘)]HS2d(M)=0.absentsubscriptsuperscript𝑑𝔲𝑚subscriptsuperscriptnorm𝔼delimited-[]subscript𝒰𝑀𝐗𝔼delimited-[]subscript𝒰𝑀𝐘2HSdifferential-dsubscript𝑀0\displaystyle\qquad\qquad=\int_{\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}% (m)\right)}\big{\|}\mathbb{E}\left[\mathcal{U}_{M}(\mathbf{X})\right]-\mathbb{% E}\left[\mathcal{U}_{M}(\mathbf{Y})\right]\big{\|}^{2}_{\rm HS}\,{\rm d}% \mathbb{P}_{\mathcal{M}}(M)=0.= ∫ start_POSTSUBSCRIPT caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) end_POSTSUBSCRIPT ∥ blackboard_E [ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X ) ] - blackboard_E [ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_Y ) ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_M ) = 0 .

So, if subscript\mathbb{P}_{\mathcal{M}}blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT is supported on the whole of (d,𝔲(m))superscript𝑑𝔲𝑚\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ), then 𝚽𝐗(M)=𝚽𝐘(M)subscript𝚽𝐗𝑀subscript𝚽𝐘𝑀\mathbf{\Phi}_{\mathbf{X}}(M)=\mathbf{\Phi}_{\mathbf{Y}}(M)bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) = bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) for any Msimilar-to𝑀subscriptM\sim\mathbb{P}_{\mathcal{M}}italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT.

Now, by density of {j}jsubscriptsubscript𝑗𝑗\left\{{\mathcal{M}}_{j}\right\}_{j\in\mathbb{N}}{ caligraphic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT in 𝒫((d,m𝔲(m)))𝒫superscript𝑑subscriptdirect-sum𝑚𝔲𝑚\mathcal{P}\left(\mathcal{L}\left({\mathbb{R}}^{d},\bigoplus_{m\in\mathbb{N}}% \mathfrak{u}(m)\right)\right)caligraphic_P ( caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ⨁ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT fraktur_u ( italic_m ) ) ), there exists a subsequence {j(m)}subscript𝑗𝑚\left\{{\mathcal{M}}_{j(m)}\right\}{ caligraphic_M start_POSTSUBSCRIPT italic_j ( italic_m ) end_POSTSUBSCRIPT } such that j(m)subscript𝑗𝑚{\mathcal{M}}_{j(m)}caligraphic_M start_POSTSUBSCRIPT italic_j ( italic_m ) end_POSTSUBSCRIPT has full support on (d,𝔲(m))superscript𝑑𝔲𝑚\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) for each m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N. Thus, PCFD~(𝐗,𝐘)=0~PCFD𝐗𝐘0\widetilde{\rm PCFD}(\mathbf{X},\mathbf{Y})=0over~ start_ARG roman_PCFD end_ARG ( bold_X , bold_Y ) = 0 implies that 𝚽𝐗=𝚽𝐘subscript𝚽𝐗subscript𝚽𝐘\mathbf{\Phi}_{\mathbf{X}}=\mathbf{\Phi}_{\mathbf{Y}}bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT = bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT on a dense subset of (d,𝔲(m))superscript𝑑𝔲𝑚\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) for every m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N. We conclude by the characteristicity Theorem 3.2 and a continuity argument. ∎

Lemma B.3 (Lemma 3.5).

Let \mathcal{M}caligraphic_M be an (d,𝔲(m))superscript𝑑𝔲𝑚{\mathcal{L}}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) )-valued random variable. Then for any BV([0,T];d)normal-BV0𝑇superscript𝑑{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT )-valued random variables 𝐗𝐗\mathbf{X}bold_X and 𝐘𝐘\mathbf{Y}bold_Y, it holds that

PCFD(𝐗,𝐘)2m2.subscriptPCFD𝐗𝐘2superscript𝑚2\displaystyle{\rm PCFD}_{\mathcal{M}}(\mathbf{X},\mathbf{Y})\leq 2m^{2}.roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_X , bold_Y ) ≤ 2 italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .
Proof.

As (m×m,HS)\left({\mathbb{C}}^{m\times m},\|\bullet\|_{\rm HS}\right)( blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT , ∥ ∙ ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ) is a Hilbert space, from the Pythagorean theorem one deduces that dHS2(𝚽𝐗(M),𝚽𝐘(M))𝚽𝐗(M)HS2+𝚽𝐘(M)HS2subscriptsuperscript𝑑2HSsubscript𝚽𝐗𝑀subscript𝚽𝐘𝑀superscriptsubscriptnormsubscript𝚽𝐗𝑀HS2superscriptsubscriptnormsubscript𝚽𝐘𝑀HS2d^{2}_{\rm HS}\left({\bf\Phi}_{\mathbf{X}}(M),{\bf\Phi}_{\mathbf{Y}}(M)\right)% \leq\left\|{\bf\Phi}_{\mathbf{X}}(M)\right\|_{\rm HS}^{2}+\left\|{\bf\Phi}_{% \mathbf{Y}}(M)\right\|_{\rm HS}^{2}italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ( bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) , bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ) ≤ ∥ bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Both 𝚽𝐗(M)subscript𝚽𝐗𝑀{\bf\Phi}_{\mathbf{X}}(M)bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ), 𝚽𝐘(M)subscript𝚽𝐘𝑀{\bf\Phi}_{\mathbf{Y}}(M)bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) are expectations of U(m)𝑈𝑚U(m)italic_U ( italic_m )-valued random variables, and UHS:=tr(UU*)=tr(Im)=massignsubscriptnorm𝑈HStr𝑈superscript𝑈trsubscript𝐼𝑚𝑚\|U\|_{\rm HS}:=\sqrt{{\rm tr}(UU^{*})}=\sqrt{{\rm tr}(I_{m})}=\sqrt{m}∥ italic_U ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT := square-root start_ARG roman_tr ( italic_U italic_U start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) end_ARG = square-root start_ARG roman_tr ( italic_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_ARG = square-root start_ARG italic_m end_ARG for UU(m)𝑈𝑈𝑚U\in U(m)italic_U ∈ italic_U ( italic_m ). Thus dHS(𝚽𝐗(M),𝚽𝐘(M))2msubscript𝑑HSsubscript𝚽𝐗𝑀subscript𝚽𝐘𝑀2𝑚d_{\rm HS}\left({\bf\Phi}_{\mathbf{X}}(M),{\bf\Phi}_{\mathbf{Y}}(M)\right)\leq% \sqrt{2}mitalic_d start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ( bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) , bold_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ) ≤ square-root start_ARG 2 end_ARG italic_m. We take expectation over subscript\mathbb{P}_{\mathcal{M}}blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT to conclude. ∎

The result below is formulated in terms of the Hilbert–Schmidt norm of matrices in m×msuperscript𝑚𝑚{\mathbb{C}}^{m\times m}blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT. Any other norm on m×msuperscript𝑚𝑚{\mathbb{C}}^{m\times m}blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT is equivalent to that, modulo a constant depending on m𝑚mitalic_m only. In fact, the strict inequality TopTHSsubscriptnorm𝑇opsubscriptnorm𝑇HS\|T\|_{\rm op}\leq\|T\|_{\rm HS}∥ italic_T ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ≤ ∥ italic_T ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT for Tm×m𝑇superscript𝑚𝑚T\in{\mathbb{C}}^{m\times m}italic_T ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT holds. See, e.g., [48, Lemma 3.1.10, p.55].

Lemma B.4.

Let L,L~:[a,b]dnormal-:𝐿normal-~𝐿normal-→𝑎𝑏superscript𝑑L,\tilde{L}:[a,b]\to\mathbb{R}^{d}italic_L , over~ start_ARG italic_L end_ARG : [ italic_a , italic_b ] → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be two linear paths, and let M𝔲(m)d:=(d,𝔲(m))𝑀𝔲superscript𝑚𝑑assignsuperscript𝑑𝔲𝑚M\in\mathfrak{u}(m)^{d}:=\mathcal{L}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)italic_M ∈ fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT := caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) as before. Denote by e\|\bullet\|_{\rm e}∥ ∙ ∥ start_POSTSUBSCRIPT roman_e end_POSTSUBSCRIPT the usual Euclidean norm on dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and |M|delimited-‖|delimited-|‖𝑀\||M\||∥ | italic_M ∥ | the operator norm of M:(d,e)(𝔲(m),HS)M:\left({\mathbb{R}}^{d},\|\bullet\|_{\rm e}\right)\to\left(\mathfrak{u}(m),\|% \bullet\|_{\rm HS}\right)italic_M : ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ ∙ ∥ start_POSTSUBSCRIPT roman_e end_POSTSUBSCRIPT ) → ( fraktur_u ( italic_m ) , ∥ ∙ ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ). Then we have

eM(L(t))eM(L~(t))HS|M|L(t)L~(t)efor each t[a,b].subscriptnormsuperscript𝑒𝑀𝐿𝑡superscript𝑒𝑀~𝐿𝑡HSdelimited-‖|delimited-|‖𝑀subscriptnorm𝐿𝑡~𝐿𝑡efor each t[a,b]\displaystyle\left\|e^{M(L(t))}-e^{M(\tilde{L}(t))}\right\|_{\rm HS}\leq\||M\|% |\left\|L(t)-\tilde{L}(t)\right\|_{\rm e}\qquad\text{for each $t\in[a,b]$}.∥ italic_e start_POSTSUPERSCRIPT italic_M ( italic_L ( italic_t ) ) end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT italic_M ( over~ start_ARG italic_L end_ARG ( italic_t ) ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ≤ ∥ | italic_M ∥ | ∥ italic_L ( italic_t ) - over~ start_ARG italic_L end_ARG ( italic_t ) ∥ start_POSTSUBSCRIPT roman_e end_POSTSUBSCRIPT for each italic_t ∈ [ italic_a , italic_b ] .
Proof.

Let Γ(t,s):=M((1s)L(t)+sL~(t))assignΓ𝑡𝑠𝑀1𝑠𝐿𝑡𝑠~𝐿𝑡\Gamma(t,s):=M\left((1-s)L(t)+s\tilde{L}(t)\right)roman_Γ ( italic_t , italic_s ) := italic_M ( ( 1 - italic_s ) italic_L ( italic_t ) + italic_s over~ start_ARG italic_L end_ARG ( italic_t ) ) with t[a,b]𝑡𝑎𝑏t\in[a,b]italic_t ∈ [ italic_a , italic_b ] and s[0,1]𝑠01s\in[0,1]italic_s ∈ [ 0 , 1 ]. This is the linear interpolation between Γ(t,0)=ML(t)Γ𝑡0𝑀𝐿𝑡\Gamma(t,0)=M\circ L(t)roman_Γ ( italic_t , 0 ) = italic_M ∘ italic_L ( italic_t ) and Γ(t,1)=ML~(t)Γ𝑡1𝑀~𝐿𝑡\Gamma(t,1)=M\circ\tilde{L}(t)roman_Γ ( italic_t , 1 ) = italic_M ∘ over~ start_ARG italic_L end_ARG ( italic_t ). Then we have

eML(t)eML~(t)HSsubscriptnormsuperscript𝑒𝑀𝐿𝑡superscript𝑒𝑀~𝐿𝑡HS\displaystyle\left\|e^{ML(t)}-e^{M\tilde{L}(t)}\right\|_{\rm HS}∥ italic_e start_POSTSUPERSCRIPT italic_M italic_L ( italic_t ) end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT italic_M over~ start_ARG italic_L end_ARG ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT =01eΓs(t,s)dsHSabsentsubscriptnormsuperscriptsubscript01superscript𝑒Γ𝑠𝑡𝑠differential-d𝑠HS\displaystyle=\left\|\int_{0}^{1}\frac{\partial e^{\Gamma}}{\partial s}(t,s)\,% {\rm d}s\right\|_{\rm HS}= ∥ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG ∂ italic_e start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_s end_ARG ( italic_t , italic_s ) roman_d italic_s ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT
=0101e(1r)Γ(t,1s)Γs(t,s)erΓ(t,s)drdsHS,absentsubscriptnormsuperscriptsubscript01superscriptsubscript01superscript𝑒1𝑟Γ𝑡1𝑠Γ𝑠𝑡𝑠superscript𝑒𝑟Γ𝑡𝑠differential-d𝑟differential-d𝑠HS\displaystyle=\left\|\int_{0}^{1}\int_{0}^{1}e^{(1-r)\Gamma(t,1-s)}\frac{% \partial\Gamma}{\partial s}(t,s)e^{r\Gamma(t,s)}\,{\rm d}r\,{\rm d}s\right\|_{% \rm HS},= ∥ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT ( 1 - italic_r ) roman_Γ ( italic_t , 1 - italic_s ) end_POSTSUPERSCRIPT divide start_ARG ∂ roman_Γ end_ARG start_ARG ∂ italic_s end_ARG ( italic_t , italic_s ) italic_e start_POSTSUPERSCRIPT italic_r roman_Γ ( italic_t , italic_s ) end_POSTSUPERSCRIPT roman_d italic_r roman_d italic_s ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ,

thanks to an identity for differentiation of matrix exponential and the inequality T1T2HST1opT2HSsubscriptnormsubscript𝑇1subscript𝑇2HSsubscriptnormsubscript𝑇1opsubscriptnormsubscript𝑇2HS\|T_{1}T_{2}\|_{\rm HS}\leq\|T_{1}\|_{\rm op}\|T_{2}\|_{\rm HS}∥ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ≤ ∥ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_op end_POSTSUBSCRIPT ∥ italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT. Here e(1r)Γ(t,1s)superscript𝑒1𝑟Γ𝑡1𝑠e^{(1-r)\Gamma(t,1-s)}italic_e start_POSTSUPERSCRIPT ( 1 - italic_r ) roman_Γ ( italic_t , 1 - italic_s ) end_POSTSUPERSCRIPT and erΓ(t,s)superscript𝑒𝑟Γ𝑡𝑠e^{r\Gamma(t,s)}italic_e start_POSTSUPERSCRIPT italic_r roman_Γ ( italic_t , italic_s ) end_POSTSUPERSCRIPT take values in U(m)𝑈𝑚U(m)italic_U ( italic_m ), hence of operator norm 1 for any parameters t,s,r𝑡𝑠𝑟t,s,ritalic_t , italic_s , italic_r. So we infer that

eML(t)eML~(t)HSsubscriptnormsuperscript𝑒𝑀𝐿𝑡superscript𝑒𝑀~𝐿𝑡HS\displaystyle\left\|e^{ML(t)}-e^{M\tilde{L}(t)}\right\|_{\rm HS}∥ italic_e start_POSTSUPERSCRIPT italic_M italic_L ( italic_t ) end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT italic_M over~ start_ARG italic_L end_ARG ( italic_t ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT 0101Γs(t,s)HSdrdsabsentsuperscriptsubscript01superscriptsubscript01subscriptnormΓ𝑠𝑡𝑠HSdifferential-d𝑟differential-d𝑠\displaystyle\leq\int_{0}^{1}\int_{0}^{1}\left\|\frac{\partial\Gamma}{\partial s% }(t,s)\right\|_{\rm HS}\,{\rm d}r\,{\rm d}s≤ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∥ divide start_ARG ∂ roman_Γ end_ARG start_ARG ∂ italic_s end_ARG ( italic_t , italic_s ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT roman_d italic_r roman_d italic_s
=M(L~(t)L(t))HS|M|L~(t)L(t)e,absentsubscriptnorm𝑀~𝐿𝑡𝐿𝑡HSdelimited-‖|delimited-|‖𝑀subscriptnorm~𝐿𝑡𝐿𝑡e\displaystyle=\left\|M\left(\tilde{L}(t)-L(t)\right)\right\|_{\rm HS}\leq\||M% \||\left\|\tilde{L}(t)-L(t)\right\|_{\rm e},= ∥ italic_M ( over~ start_ARG italic_L end_ARG ( italic_t ) - italic_L ( italic_t ) ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ≤ ∥ | italic_M ∥ | ∥ over~ start_ARG italic_L end_ARG ( italic_t ) - italic_L ( italic_t ) ∥ start_POSTSUBSCRIPT roman_e end_POSTSUBSCRIPT ,

where the first inequality holds for Bochner integrals. See [46, Corollary 1, p.133]. ∎

Lemma B.5 (Subadditivity of unitary feature).

Let X,YBV([0,T];d)𝑋𝑌normal-BV0𝑇superscript𝑑X,Y\in{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)italic_X , italic_Y ∈ roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) be BV-paths, and 𝒰Msubscript𝒰𝑀\mathcal{U}_{M}caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT be the unitary feature associated with M(d,𝔲(m))=𝔲(m)d𝑀superscript𝑑𝔲𝑚𝔲superscript𝑚𝑑M\in{\mathcal{L}}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)=\mathfrak{u}(m)% ^{d}italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) = fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. For any 0<t<T0𝑡𝑇0<t<T0 < italic_t < italic_T we have

𝒰M(X)𝒰M(Y)HS𝒰M(X0,t)𝒰M(Y0,t)HS+𝒰M(Xt,T)𝒰M(Yt,T)HS.subscriptdelimited-∥∥subscript𝒰𝑀𝑋subscript𝒰𝑀𝑌HSsubscriptdelimited-∥∥subscript𝒰𝑀subscript𝑋0𝑡subscript𝒰𝑀subscript𝑌0𝑡HSsubscriptdelimited-∥∥subscript𝒰𝑀subscript𝑋𝑡𝑇subscript𝒰𝑀subscript𝑌𝑡𝑇HS\displaystyle\left\lVert\mathcal{U}_{M}(X)-\mathcal{U}_{M}(Y)\right\rVert_{\rm HS% }\leq\left\lVert\mathcal{U}_{M}(X_{0,t})-\mathcal{U}_{M}(Y_{0,t})\right\rVert_% {\rm HS}+\left\lVert\mathcal{U}_{M}(X_{t,T})-\mathcal{U}_{M}(Y_{t,T})\right% \rVert_{\rm HS}.∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ≤ ∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT + ∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t , italic_T end_POSTSUBSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t , italic_T end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT .
Proof.

We apply the multiplicative property of unitary feature in Lemma A.5, the triangle inequality, and the unitary invariance of the Hilbert–Schmidt norm to estimate that

𝒰M(X)𝒰M(Y)HSsubscriptdelimited-∥∥subscript𝒰𝑀𝑋subscript𝒰𝑀𝑌HS\displaystyle\left\lVert\mathcal{U}_{M}(X)-\mathcal{U}_{M}(Y)\right\rVert_{\rm HS}∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT
=𝒰M(X0,t)𝒰M(Xt,T)𝒰M(Y0,t)𝒰M(Yt,T)HSabsentsubscriptdelimited-∥∥subscript𝒰𝑀subscript𝑋0𝑡subscript𝒰𝑀subscript𝑋𝑡𝑇subscript𝒰𝑀subscript𝑌0𝑡subscript𝒰𝑀subscript𝑌𝑡𝑇HS\displaystyle\qquad=\left\lVert\mathcal{U}_{M}(X_{0,t})\cdot\mathcal{U}_{M}(X_% {t,T})-\mathcal{U}_{M}(Y_{0,t})\cdot\mathcal{U}_{M}(Y_{t,T})\right\rVert_{\rm HS}= ∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT ) ⋅ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t , italic_T end_POSTSUBSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT ) ⋅ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t , italic_T end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT
(𝒰M(X0,t)𝒰M(Y0,t))𝒰M(Xt,T)HS+𝒰M(Y0,t)(𝒰M(Xt,T)𝒰M(Yt,T))HSabsentsubscriptdelimited-∥∥subscript𝒰𝑀subscript𝑋0𝑡subscript𝒰𝑀subscript𝑌0𝑡subscript𝒰𝑀subscript𝑋𝑡𝑇HSsubscriptdelimited-∥∥subscript𝒰𝑀subscript𝑌0𝑡subscript𝒰𝑀subscript𝑋𝑡𝑇subscript𝒰𝑀subscript𝑌𝑡𝑇HS\displaystyle\qquad\leq\left\lVert(\mathcal{U}_{M}(X_{0,t})-\mathcal{U}_{M}(Y_% {0,t}))\cdot\mathcal{U}_{M}(X_{t,T})\right\rVert_{\rm HS}+\left\lVert\mathcal{% U}_{M}(Y_{0,t})(\mathcal{U}_{M}(X_{t,T})-\mathcal{U}_{M}(Y_{t,T}))\right\rVert% _{\rm HS}≤ ∥ ( caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT ) ) ⋅ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t , italic_T end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT + ∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT ) ( caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t , italic_T end_POSTSUBSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t , italic_T end_POSTSUBSCRIPT ) ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT
=𝒰M(X0,t)𝒰M(Y0,t)HS+𝒰M(Xt,T)𝒰M(Yt,T)HS.absentsubscriptdelimited-∥∥subscript𝒰𝑀subscript𝑋0𝑡subscript𝒰𝑀subscript𝑌0𝑡HSsubscriptdelimited-∥∥subscript𝒰𝑀subscript𝑋𝑡𝑇subscript𝒰𝑀subscript𝑌𝑡𝑇HS\displaystyle\qquad=\left\lVert\mathcal{U}_{M}(X_{0,t})-\mathcal{U}_{M}(Y_{0,t% })\right\rVert_{\rm HS}+\left\lVert\mathcal{U}_{M}(X_{t,T})-\mathcal{U}_{M}(Y_% {t,T})\right\rVert_{\rm HS}.= ∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT + ∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t , italic_T end_POSTSUBSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t , italic_T end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT .

Proposition B.6.

For 𝐗,𝐘𝒳𝐗𝐘𝒳\mathbf{X},\mathbf{Y}\in\mathcal{X}bold_X , bold_Y ∈ caligraphic_X, the unitary feature 𝒰M(𝐗)subscript𝒰𝑀𝐗\mathcal{U}_{M}(\mathbf{X})caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X ) with M(d,𝔲(m))=𝔲(m)d𝑀superscript𝑑𝔲𝑚𝔲superscript𝑚𝑑M\in\mathcal{L}\left(\mathbb{R}^{d},\mathfrak{u}(m)\right)=\mathfrak{u}(m)^{d}italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) = fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT satisfies

𝒰M(𝐗)𝒰M(𝐘)HS|M|Tot.Var.[𝐗𝐘],formulae-sequencesubscriptnormsubscript𝒰𝑀𝐗subscript𝒰𝑀𝐘HSdelimited-‖|delimited-|‖𝑀TotVardelimited-[]𝐗𝐘\displaystyle\left\|\mathcal{U}_{M}(\mathbf{X})-\mathcal{U}_{M}(\mathbf{Y})% \right\|_{\rm HS}\leq\||M\||\,\,{\rm Tot.Var.}[\mathbf{X}-\mathbf{Y}],∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_Y ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ≤ ∥ | italic_M ∥ | roman_Tot . roman_Var . [ bold_X - bold_Y ] ,

where Tot.Var.[𝐗𝐘]formulae-sequencenormal-Totnormal-Vardelimited-[]𝐗𝐘{\rm Tot.Var.}[\mathbf{X}-\mathbf{Y}]roman_Tot . roman_Var . [ bold_X - bold_Y ] denotes the total variation over [0,T]0𝑇[0,T][ 0 , italic_T ] of the path 𝐗𝐘𝐗𝐘\mathbf{X}-\mathbf{Y}bold_X - bold_Y.

Proof.

Given BV-paths 𝐗,𝐘𝐗𝐘\mathbf{X},\mathbf{Y}bold_X , bold_Y with the same initial point, there are piecewise linear approximations {𝐗n}superscript𝐗𝑛\left\{\mathbf{X}^{n}\right\}{ bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT }, {𝐘n}superscript𝐘𝑛\left\{\mathbf{Y}^{n}\right\}{ bold_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT } with common partition 0=t0<t1<<tn=T0subscript𝑡0subscript𝑡1subscript𝑡𝑛𝑇0=t_{0}<t_{1}<\cdots<t_{n}=T0 = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < ⋯ < italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_T converging respectively to 𝐗𝐗\mathbf{X}bold_X and 𝐘𝐘\mathbf{Y}bold_Y in the p𝑝pitalic_p-variation metric; p(1,2)𝑝12p\in(1,2)italic_p ∈ ( 1 , 2 ). Applying Lemma B.5 recursively, we obtain that

𝒰M(𝐗n)𝒰M(𝐘n)HSi=0n1𝒰M(𝐗ti,ti+1n)𝒰M(𝐘ti,ti+1n)HSsubscriptdelimited-∥∥subscript𝒰𝑀superscript𝐗𝑛subscript𝒰𝑀superscript𝐘𝑛HSsuperscriptsubscript𝑖0𝑛1subscriptdelimited-∥∥subscript𝒰𝑀subscriptsuperscript𝐗𝑛subscript𝑡𝑖subscript𝑡𝑖1subscript𝒰𝑀subscriptsuperscript𝐘𝑛subscript𝑡𝑖subscript𝑡𝑖1HS\left\lVert\mathcal{U}_{M}(\mathbf{X}^{n})-\mathcal{U}_{M}(\mathbf{Y}^{n})% \right\rVert_{\rm HS}\leq\sum_{i=0}^{n-1}\left\lVert\mathcal{U}_{M}(\mathbf{X}% ^{n}_{t_{i},t_{i+1}})-\mathcal{U}_{M}(\mathbf{Y}^{n}_{t_{i},t_{i+1}})\right% \rVert_{\rm HS}∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ≤ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT

By definition of unitary feature and Lemma B.4, one deduces that

𝒰M(𝐗n)𝒰M(𝐘n)HSi=0n1|M|𝐗ti,ti+1n𝐘ti,ti+1ne.subscriptdelimited-∥∥subscript𝒰𝑀superscript𝐗𝑛subscript𝒰𝑀superscript𝐘𝑛HSsuperscriptsubscript𝑖0𝑛1norm𝑀subscriptnormsubscriptsuperscript𝐗𝑛subscript𝑡𝑖subscript𝑡𝑖1subscriptsuperscript𝐘𝑛subscript𝑡𝑖subscript𝑡𝑖1e\left\lVert\mathcal{U}_{M}(\mathbf{X}^{n})-\mathcal{U}_{M}(\mathbf{Y}^{n})% \right\rVert_{\rm HS}\leq\sum_{i=0}^{n-1}\left\||M|\right\|\left\|\mathbf{X}^{% n}_{t_{i},t_{i+1}}-\mathbf{Y}^{n}_{t_{i},t_{i+1}}\right\|_{\rm e}.∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ≤ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ∥ | italic_M | ∥ ∥ bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - bold_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_e end_POSTSUBSCRIPT .

We may now conclude by taking supremum over all partitions and sending n𝑛n\to\inftyitalic_n → ∞ with the limit. This is a consequence of continuity of the Itô map with respect to the driving path in p𝑝pitalic_p-variation topology, since the vector field in (4) is Lipschitz ([27, Theorem 1.20]). ∎

Theorem B.7 (Dependence on continuous parameter).

Let 𝒳𝒳\mathcal{X}caligraphic_X and 𝒵𝒵\mathcal{Z}caligraphic_Z be subsets of BV([0,T];d)normal-BV0𝑇superscript𝑑{\rm BV}\left([0,T];{\mathbb{R}}^{d}\right)roman_BV ( [ 0 , italic_T ] ; blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ), (Θ,ρ)normal-Θ𝜌\left(\Theta,\rho\right)( roman_Θ , italic_ρ ) be a metric space, \mathbb{Q}blackboard_Q be a Borel probability measure on 𝒵𝒵\mathcal{Z}caligraphic_Z, and {\mathcal{M}}caligraphic_M be a Borel probability measure on 𝔲(m)d𝔲superscript𝑚𝑑\mathfrak{u}(m)^{d}fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Assume that g:Θ×𝒵𝒳normal-:𝑔normal-→normal-Θ𝒵𝒳g:\Theta\times\mathcal{Z}\to\mathcal{X}italic_g : roman_Θ × caligraphic_Z → caligraphic_X, (θ,𝐙)gθ(𝐙)maps-to𝜃𝐙subscript𝑔𝜃𝐙(\theta,\mathbf{Z})\mapsto g_{\theta}(\mathbf{Z})( italic_θ , bold_Z ) ↦ italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) is Lipschitz in θ𝜃\thetaitalic_θ such that Tot.Var.[gθ(𝐙)gθ(𝐙)]ω(𝐙)ρ(θ,θ)formulae-sequencenormal-Totnormal-Vardelimited-[]subscript𝑔𝜃𝐙subscript𝑔superscript𝜃normal-′𝐙𝜔𝐙𝜌𝜃superscript𝜃normal-′{\rm Tot.Var.}\left[g_{\theta}(\mathbf{Z})-g_{\theta^{\prime}}(\mathbf{Z})% \right]\leq\omega(\mathbf{Z})\rho\left(\theta,\theta^{\prime}\right)roman_Tot . roman_Var . [ italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) - italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) ] ≤ italic_ω ( bold_Z ) italic_ρ ( italic_θ , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). In addition, suppose that 𝔼M[|M|2]<subscript𝔼similar-to𝑀subscriptdelimited-[]superscriptdelimited-|‖delimited-‖|𝑀2\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\left[|\|M|\|^{2}\right]<\inftyblackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ | ∥ italic_M | ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] < ∞ and 𝔼𝐙[ω(𝐙)]<subscript𝔼similar-to𝐙delimited-[]𝜔𝐙\mathbb{E}_{\mathbf{Z}\sim\mathbb{Q}}\left[\omega(\mathbf{Z})\right]<\inftyblackboard_E start_POSTSUBSCRIPT bold_Z ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_ω ( bold_Z ) ] < ∞. Then PCFD(gθ(𝐙),𝐗)subscriptnormal-PCFDsubscript𝑔𝜃𝐙𝐗{\rm PCFD}_{\mathcal{M}}\left(g_{\theta}(\mathbf{Z}),\mathbf{X}\right)roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) is Lipschitz in θ𝜃\thetaitalic_θ.

Moreover, it holds that

|PCFD(gθ(𝐙),𝐗)PCFD(gθ(𝐙),𝐗)|𝔼M[|M|2]𝔼𝐙[ω(𝐙)]ρ(θ,θ)subscriptPCFDsubscript𝑔𝜃𝐙𝐗subscriptPCFDsubscript𝑔superscript𝜃𝐙𝐗subscript𝔼similar-to𝑀subscriptdelimited-[]superscriptdelimited-|‖delimited-‖|𝑀2subscript𝔼similar-to𝐙delimited-[]𝜔𝐙𝜌𝜃superscript𝜃\displaystyle\left|{\rm PCFD}_{\mathcal{M}}\left(g_{\theta}(\mathbf{Z}),% \mathbf{X}\right)-{\rm PCFD}_{\mathcal{M}}\left(g_{\theta^{{}^{\prime}}}(% \mathbf{Z}),\mathbf{X}\right)\right|\leq\sqrt{\mathbb{E}_{M\sim\mathbb{P}_{% \mathcal{M}}}\big{[}|\|M|\|^{2}\big{]}}\,\mathbb{E}_{\mathbf{Z}\sim\mathbb{Q}}% \left[\omega(\mathbf{Z})\right]\,\rho\left(\theta,\theta^{\prime}\right)| roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) - roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) | ≤ square-root start_ARG blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ | ∥ italic_M | ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG blackboard_E start_POSTSUBSCRIPT bold_Z ∼ blackboard_Q end_POSTSUBSCRIPT [ italic_ω ( bold_Z ) ] italic_ρ ( italic_θ , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

for any θ,θΘ𝜃superscript𝜃normal-′normal-Θ\theta,\theta^{\prime}\in\Thetaitalic_θ , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Θ, 𝐙𝒵𝐙𝒵{\mathbf{Z}}\in\mathcal{Z}bold_Z ∈ caligraphic_Z, 𝐗𝒳𝐗𝒳\mathbf{X}\in\mathcal{X}bold_X ∈ caligraphic_X, and 𝒫(𝔲(m)d)𝒫𝔲superscript𝑚𝑑{\mathcal{M}}\in\mathcal{P}\left(\mathfrak{u}(m)^{d}\right)caligraphic_M ∈ caligraphic_P ( fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ).

Proof.

As PCFDsubscriptPCFD{\rm PCFD}_{\mathcal{M}}roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT is a pseudometric (Lemma B.2), we have

|PCFD(gθ(𝐙),𝐗)PCFD(gθ(𝐙),𝐗)|PCFD(gθ(𝐙),gθ(𝐙)).subscriptPCFDsubscript𝑔𝜃𝐙𝐗subscriptPCFDsubscript𝑔superscript𝜃𝐙𝐗subscriptPCFDsubscript𝑔𝜃𝐙subscript𝑔superscript𝜃𝐙\displaystyle\left|{\rm PCFD}_{\mathcal{M}}\left(g_{\theta}(\mathbf{Z}),% \mathbf{X}\right)-{\rm PCFD}_{\mathcal{M}}\left(g_{\theta^{{}^{\prime}}}(% \mathbf{Z}),\mathbf{X}\right)\right|\leq{\rm PCFD}_{\mathcal{M}}\left(g_{% \theta}(\mathbf{Z}),g_{\theta^{\prime}}(\mathbf{Z})\right).| roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) - roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) , bold_X ) | ≤ roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) , italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) ) .

We may control the right-hand side as follows, using subsequentially the definitions of PCFDPCFD{\rm PCFD}roman_PCFD and PCF, Proposition B.6, and the assumptions in this theorem:

PCFD(gθ(𝐙),gθ(𝐙))subscriptPCFDsubscript𝑔𝜃𝐙subscript𝑔superscript𝜃𝐙\displaystyle{\rm PCFD}_{\mathcal{M}}\left(g_{\theta}(\mathbf{Z}),g_{\theta^{{% }^{\prime}}}(\mathbf{Z})\right)roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) , italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) )
={𝔲(m)d𝚽gθ(𝐙)(M)𝚽gθ(𝐙)(M)HS2d}12absentsuperscriptconditional-setsubscript𝔲superscript𝑚𝑑subscript𝚽subscript𝑔𝜃𝐙𝑀evaluated-atsubscript𝚽subscript𝑔superscript𝜃𝐙𝑀HS2dsubscript12\displaystyle\qquad=\left\{\int_{\mathfrak{u}(m)^{d}}\left\|\boldsymbol{\Phi}_% {g_{\theta}(\mathbf{Z})}(M)-\boldsymbol{\Phi}_{g_{\theta^{\prime}}(\mathbf{Z})% }(M)\right\|^{2}_{\rm HS}\,{\rm d}\mathbb{P}_{\mathcal{M}}\right\}^{\frac{1}{2}}= { ∫ start_POSTSUBSCRIPT fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ bold_Φ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) end_POSTSUBSCRIPT ( italic_M ) - bold_Φ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) end_POSTSUBSCRIPT ( italic_M ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
={𝔲(m)d𝒵[𝒰M(gθ(𝐙))𝒰M(gθ(𝐙))]d(𝐙)HS2d}12absentsuperscriptconditional-setsubscript𝔲superscript𝑚𝑑evaluated-atsubscript𝒵delimited-[]subscript𝒰𝑀subscript𝑔𝜃𝐙subscript𝒰𝑀subscript𝑔superscript𝜃𝐙differential-d𝐙HS2dsubscript12\displaystyle\qquad=\left\{\int_{\mathfrak{u}(m)^{d}}\left\|\int_{\mathcal{Z}}% \left[\mathcal{U}_{M}\left(g_{\theta}(\mathbf{Z})\right)-\mathcal{U}_{M}\left(% g_{\theta^{\prime}}(\mathbf{Z})\right)\right]\,{\rm d}\mathbb{Q}(\mathbf{Z})% \right\|^{2}_{\rm HS}\,{\rm d}\mathbb{P}_{\mathcal{M}}\right\}^{\frac{1}{2}}= { ∫ start_POSTSUBSCRIPT fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ ∫ start_POSTSUBSCRIPT caligraphic_Z end_POSTSUBSCRIPT [ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) ) ] roman_d blackboard_Q ( bold_Z ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
{𝔲(m)d|M|2{𝒵Tot.Var.[gθ(𝐙)gθ(𝐙)]d(𝐙)}2d}12\displaystyle\qquad\leq\left\{\int_{\mathfrak{u}(m)^{d}}\||M|\|^{2}\left\{\int% _{\mathcal{Z}}{\rm Tot.Var.}\left[g_{\theta}(\mathbf{Z})-g_{\theta^{\prime}}(% \mathbf{Z})\right]\,{\rm d}\mathbb{Q}(\mathbf{Z})\right\}^{2}\,{\rm d}\mathbb{% P}_{\mathcal{M}}\right\}^{\frac{1}{2}}≤ { ∫ start_POSTSUBSCRIPT fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ | italic_M | ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT { ∫ start_POSTSUBSCRIPT caligraphic_Z end_POSTSUBSCRIPT roman_Tot . roman_Var . [ italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_Z ) - italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_Z ) ] roman_d blackboard_Q ( bold_Z ) } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
𝔼M[|M|2]{𝒵ω(𝐙)ρ(θ,θ)d(𝐙)}.absentsubscript𝔼similar-to𝑀subscriptdelimited-[]superscriptdelimited-|‖delimited-‖|𝑀2subscript𝒵𝜔𝐙𝜌𝜃superscript𝜃differential-d𝐙\displaystyle\qquad\leq\sqrt{\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\big{[}% |\|M|\|^{2}\big{]}}\,\left\{\int_{\mathcal{Z}}\omega(\mathbf{Z})\rho\left(% \theta,\theta^{\prime}\right)\,{\rm d}\mathbb{Q}(\mathbf{Z})\right\}.≤ square-root start_ARG blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ | ∥ italic_M | ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG { ∫ start_POSTSUBSCRIPT caligraphic_Z end_POSTSUBSCRIPT italic_ω ( bold_Z ) italic_ρ ( italic_θ , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) roman_d blackboard_Q ( bold_Z ) } .

This completes the proof. ∎

The unitary feature is universal in the spirit of the Stone–Weierstrass theorem; i.e., continuous functions on paths can be uniformly approximated by linear functionals on unitary features.

As PCFD~~PCFD\widetilde{\rm PCFD}over~ start_ARG roman_PCFD end_ARG metrises the weak topology on the space of path-valued random variables, it emerges as a more sensible distance metric for training time series generations than metrics without this property; e.g., the Jensen–Shannon divergence.

Theorem B.8 (Metrisation of weak-star topology).

Let 𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X be a compact subset. Suppose that {j}jsubscriptsubscript𝑗𝑗\left\{{\mathcal{M}}_{j}\right\}_{j\in\mathbb{N}}{ caligraphic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT is a countable dense subset in 𝒫((d,m𝔲(m)))𝒫superscript𝑑subscriptdirect-sum𝑚𝔲𝑚\mathcal{P}\left(\mathcal{L}\left({\mathbb{R}}^{d},\bigoplus_{m\in\mathbb{N}}% \mathfrak{u}(m)\right)\right)caligraphic_P ( caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ⨁ start_POSTSUBSCRIPT italic_m ∈ blackboard_N end_POSTSUBSCRIPT fraktur_u ( italic_m ) ) ). Then PCFD~normal-~normal-PCFD\widetilde{\rm PCFD}over~ start_ARG roman_PCFD end_ARG defined by Eqn. 13 metrises the weak-star topology on 𝒫(𝒦)𝒫𝒦\mathcal{P}(\mathcal{K})caligraphic_P ( caligraphic_K ). That is, PCFD~(𝐗n,𝐗)0𝐗n𝑑𝐗iffnormal-→normal-~normal-PCFDsubscript𝐗𝑛𝐗0superscriptnormal-→𝑑subscript𝐗𝑛𝐗\widetilde{\rm PCFD}(\mathbf{X}_{n},\mathbf{X})\rightarrow 0\iff\mathbf{X}_{n}% \stackrel{{\scriptstyle\text{d}}}{{\rightarrow}}\mathbf{X}over~ start_ARG roman_PCFD end_ARG ( bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_X ) → 0 ⇔ bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG d end_ARG end_RELOP bold_X as nnormal-→𝑛n\rightarrow\inftyitalic_n → ∞, where 𝑑superscriptnormal-→𝑑\stackrel{{\scriptstyle\text{d}}}{{\rightarrow}}start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG d end_ARG end_RELOP denotes convergence in distribution of random variables.

The metrisability of 𝒫(𝒦)𝒫𝒦\mathcal{P}(\mathcal{K})caligraphic_P ( caligraphic_K ) follows from general theorems in functional analysis: 𝒦𝒦\mathcal{K}caligraphic_K is a compact metric space, hence C0(𝒦)superscript𝐶0𝒦C^{0}(\mathcal{K})italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( caligraphic_K ) is separable ([13, Lemma 3.23]). Then, viewing 𝒫(𝒦)𝒫𝒦\mathcal{P}(\mathcal{K})caligraphic_P ( caligraphic_K ) as the unit circle in [C0(𝒦)]*superscriptdelimited-[]superscript𝐶0𝒦\left[C^{0}(\mathcal{K})\right]^{*}[ italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( caligraphic_K ) ] start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT via Riesz representation, we infer from [13, Proposition 3.24] that 𝒫(𝒦)𝒫𝒦\mathcal{P}(\mathcal{K})caligraphic_P ( caligraphic_K ) is metrisable in the weak-star topology, which is equivalent to the distributional convergence of random variables.

Proof.

The backward direction is straightforward. By the Riesz representation theorem of Radon measures, the distributional convergence is equivalent to that 𝒦fd𝐗n𝒦fd𝐗subscript𝒦𝑓differential-dsubscriptsubscript𝐗𝑛subscript𝒦𝑓differential-dsubscript𝐗\int_{\mathcal{K}}f{\rm d}\mathbb{P}_{\mathbf{X}_{n}}\to\int_{\mathcal{K}}f{% \rm d}\mathbb{P}_{\mathbf{X}}∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT italic_f roman_d blackboard_P start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT → ∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT italic_f roman_d blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT for all continuous fC(𝒦)𝑓𝐶𝒦f\in C(\mathcal{K})italic_f ∈ italic_C ( caligraphic_K ). Thus 𝒦𝒰Md𝐗n𝒦𝒰Md𝐗HS0subscriptnormsubscript𝒦subscript𝒰𝑀differential-dsubscriptsubscript𝐗𝑛subscript𝒦subscript𝒰𝑀differential-dsubscript𝐗HS0\left\|\int_{\mathcal{K}}\mathcal{U}_{M}\,{\rm d}\mathbb{P}_{\mathbf{X}_{n}}-% \int_{\mathcal{K}}\mathcal{U}_{M}\,{\rm d}\mathbb{P}_{\mathbf{X}}\right\|_{\rm HS% }\to 0∥ ∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - ∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT → 0, namely that 𝚽𝐗n[M]𝚽𝐗[M]subscript𝚽subscript𝐗𝑛delimited-[]𝑀subscript𝚽𝐗delimited-[]𝑀\boldsymbol{\Phi}_{\mathbf{X}_{n}}[M]\to\boldsymbol{\Phi}_{\mathbf{X}}[M]bold_Φ start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_M ] → bold_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT [ italic_M ] for each M(d;𝔲(m))𝑀superscript𝑑𝔲𝑚M\in{\mathcal{L}}\left({\mathbb{R}}^{d};\mathfrak{u}(m)\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ; fraktur_u ( italic_m ) ). The unitary feature 𝒰Msubscript𝒰𝑀\mathcal{U}_{M}caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT is bounded as it is U(m)𝑈𝑚U(m)italic_U ( italic_m )-valued for some m𝑚mitalic_m, so we deduce from the dominated convergence theorem that PCFD~(𝐗n,𝐗)0~PCFDsubscript𝐗𝑛𝐗0\widetilde{\rm PCFD}(\mathbf{X}_{n},\mathbf{X})\to 0over~ start_ARG roman_PCFD end_ARG ( bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_X ) → 0.

Conversely, suppose that PCFD~(𝐗n,𝐗)0~PCFDsubscript𝐗𝑛𝐗0\widetilde{\rm PCFD}(\mathbf{X}_{n},\mathbf{X})\to 0over~ start_ARG roman_PCFD end_ARG ( bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_X ) → 0. Then

(d;𝔲(m))𝒦𝒰Md𝐗n𝒦𝒰Md𝐗HS2d(M)=0subscriptsuperscript𝑑𝔲𝑚subscriptsuperscriptnormsubscript𝒦subscript𝒰𝑀differential-dsubscriptsubscript𝐗𝑛subscript𝒦subscript𝒰𝑀differential-dsubscript𝐗2HSdifferential-dsubscript𝑀0\int_{{\mathcal{L}}\left({\mathbb{R}}^{d};\mathfrak{u}(m)\right)}\left\|\int_{% \mathcal{K}}\mathcal{U}_{M}\,{\rm d}\mathbb{P}_{\mathbf{X}_{n}}-\int_{\mathcal% {K}}\mathcal{U}_{M}\,{\rm d}\mathbb{P}_{\mathbf{X}}\right\|^{2}_{\rm HS}\,{\rm d% }\mathbb{P}_{\mathcal{M}}(M)=0∫ start_POSTSUBSCRIPT caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ; fraktur_u ( italic_m ) ) end_POSTSUBSCRIPT ∥ ∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - ∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_M ) = 0

for any m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N and 𝒫((d;𝔲(m)))𝒫superscript𝑑𝔲𝑚{\mathcal{M}}\in\mathcal{P}\left({\mathcal{L}}\left({\mathbb{R}}^{d};\mathfrak% {u}(m)\right)\right)caligraphic_M ∈ caligraphic_P ( caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ; fraktur_u ( italic_m ) ) ), in particular for those with full support. In view of the universality Theorem A.8 proved above, for any fixed ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 and any continuous function fC0(𝒦)𝑓superscript𝐶0𝒦f\in C^{0}(\mathcal{K})italic_f ∈ italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( caligraphic_K ), by approximating f𝑓fitalic_f with sum of finitely many Li𝒰Misubscript𝐿𝑖subscript𝒰subscript𝑀𝑖L_{i}\circ\mathcal{U}_{M_{i}}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ caligraphic_U start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT (the notations are as in Theorem A.8), one infers that for n𝑛nitalic_n and msubscript𝑚m_{\star}italic_m start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT sufficiently large, it holds that

(d;𝔲(m))|𝒦fd𝐗n𝒦fd𝐗|2d(M)<ϵ.subscriptsuperscript𝑑𝔲subscript𝑚superscriptsubscript𝒦𝑓differential-dsubscriptsubscript𝐗𝑛subscript𝒦𝑓differential-dsubscript𝐗2differential-dsubscript𝑀italic-ϵ\displaystyle\int_{{\mathcal{L}}\left({\mathbb{R}}^{d};\mathfrak{u}(m_{\star})% \right)}\left|\int_{\mathcal{K}}f\,{\rm d}\mathbb{P}_{\mathbf{X}_{n}}-\int_{% \mathcal{K}}f\,{\rm d}\mathbb{P}_{\mathbf{X}}\right|^{2}\,{\rm d}\mathbb{P}_{% \mathcal{M}}(M)<\epsilon.∫ start_POSTSUBSCRIPT caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ; fraktur_u ( italic_m start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ) end_POSTSUBSCRIPT | ∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT italic_f roman_d blackboard_P start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - ∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT italic_f roman_d blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_d blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_M ) < italic_ϵ .

By considering those measures with spt(M)=(d;𝔲(m))spt𝑀superscript𝑑𝔲subscript𝑚{\rm spt}(M)={\mathcal{L}}\left({\mathbb{R}}^{d};\mathfrak{u}(m_{\star})\right)roman_spt ( italic_M ) = caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ; fraktur_u ( italic_m start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ), we deduce that

limn|𝒦fd𝐗n𝒦fd𝐗|=0for any fC0(𝒦).subscript𝑛subscript𝒦𝑓differential-dsubscriptsubscript𝐗𝑛subscript𝒦𝑓differential-dsubscript𝐗0for any fC0(𝒦)\lim_{n\to\infty}\left|\int_{\mathcal{K}}f\,{\rm d}\mathbb{P}_{\mathbf{X}_{n}}% -\int_{\mathcal{K}}f\,{\rm d}\mathbb{P}_{\mathbf{X}}\right|=0\qquad\text{for % any $f\in C^{0}(\mathcal{K})$}.roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT | ∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT italic_f roman_d blackboard_P start_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - ∫ start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT italic_f roman_d blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT | = 0 for any italic_f ∈ italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( caligraphic_K ) .

This is tantamount to the distributional convergence. ∎

Proof.

We first prove the ’if’ direction of the statement. By the Portmanteau theorem [21], convergence in distribution XndXsuperscriptdsubscript𝑋𝑛𝑋X_{n}\stackrel{{\scriptstyle\text{d}}}{{\rightarrow}}Xitalic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG d end_ARG end_RELOP italic_X implies, for any bounded continuous map f𝑓fitalic_f, we have 𝔼xn[f(x)]𝔼x[f(x)]subscript𝔼similar-to𝑥subscript𝑛delimited-[]𝑓𝑥subscript𝔼similar-to𝑥delimited-[]𝑓𝑥\mathbb{E}_{x\sim\mathbb{P}_{n}}[f(x)]\rightarrow\mathbb{E}_{x\sim\mathbb{P}}[% f(x)]blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f ( italic_x ) ] → blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f ( italic_x ) ]. Therefore, for any M(d,𝔲(m))𝑀superscript𝑑𝔲𝑚M\in{\mathcal{L}}({\mathbb{R}}^{d},\mathfrak{u}(m))italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ), 𝔼xn[𝒰M(x)]𝔼x[𝒰M(x)]subscript𝔼similar-to𝑥subscript𝑛delimited-[]subscript𝒰𝑀𝑥subscript𝔼similar-to𝑥delimited-[]subscript𝒰𝑀𝑥\mathbb{E}_{x\sim\mathbb{P}_{n}}[\mathcal{U}_{M}(x)]\rightarrow\mathbb{E}_{x% \sim\mathbb{P}}[\mathcal{U}_{M}(x)]blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_x ) ] → blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P end_POSTSUBSCRIPT [ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_x ) ], which implies ΦXn(M)ΦX(M)HS20superscriptsubscriptnormsubscriptΦsubscript𝑋𝑛𝑀subscriptΦ𝑋𝑀𝐻𝑆20||\Phi_{X_{n}}(M)-\Phi_{X}(M)||_{HS}^{2}\rightarrow 0| | roman_Φ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_M ) - roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_M ) | | start_POSTSUBSCRIPT italic_H italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → 0 as n𝑛n\rightarrow\inftyitalic_n → ∞. Hence, it follows that, as n𝑛n\rightarrow\inftyitalic_n → ∞,

𝑃𝐶𝐹𝐷(Xn,X):=𝔼MΦXn(M)ΦX(M)HS20,assign𝑃𝐶𝐹𝐷subscript𝑋𝑛𝑋subscript𝔼similar-to𝑀subscriptsuperscriptsubscriptnormsubscriptΦsubscript𝑋𝑛𝑀subscriptΦ𝑋𝑀𝐻𝑆20\displaystyle\textit{PCFD}(X_{n},X):=\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}% }||\Phi_{X_{n}}(M)-\Phi_{X}(M)||_{HS}^{2}\rightarrow 0,PCFD ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X ) := blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | roman_Φ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_M ) - roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_M ) | | start_POSTSUBSCRIPT italic_H italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → 0 ,

which completes the proof of ’if’ direction.

Now we proceed with the ’only if’ direction. By the universality of the unitary path development from Theorem A.8, for any continuous function f:𝒦:𝑓𝒦f:\mathcal{K}\rightarrow\mathbb{C}italic_f : caligraphic_K → blackboard_C and ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, there exist M1,,MN(d,𝔲(m))subscript𝑀1subscript𝑀𝑁superscript𝑑𝔲𝑚M_{1},\cdots,M_{N}\in{\mathcal{L}}({\mathbb{R}}^{d},\mathfrak{u}(m))italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_M start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) and L1,,LN(𝒰(m);)subscript𝐿1subscript𝐿𝑁𝒰𝑚L_{1},\ldots,L_{N}\in{\mathcal{L}}\left(\mathcal{U}(m);\mathbb{C}\right)italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ caligraphic_L ( caligraphic_U ( italic_m ) ; blackboard_C ) such that

|𝔼x[f(x)]i=1NLi𝔼x[𝒰Mi(x)]|<ϵ.subscript𝔼similar-to𝑥delimited-[]𝑓𝑥superscriptsubscript𝑖1𝑁subscript𝐿𝑖subscript𝔼similar-to𝑥delimited-[]subscript𝒰subscript𝑀𝑖𝑥italic-ϵ\left|\mathbb{E}_{x\sim\mathbb{P}}\left[f(x)\right]-\sum_{i=1}^{N}L_{i}\circ% \mathbb{E}_{x\sim\mathbb{P}}\left[\mathcal{U}_{M_{i}}(x)\right]\right|<\epsilon.| blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f ( italic_x ) ] - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P end_POSTSUBSCRIPT [ caligraphic_U start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) ] | < italic_ϵ . (14)

or equivalently|𝔼x[f(x)]i=1NLiΦX(Mi)|<ϵsubscript𝔼similar-to𝑥delimited-[]𝑓𝑥superscriptsubscript𝑖1𝑁subscript𝐿𝑖subscriptΦ𝑋subscript𝑀𝑖italic-ϵ\left|\mathbb{E}_{x\sim\mathbb{P}}\left[f(x)\right]-\sum_{i=1}^{N}L_{i}\circ% \Phi_{X}(M_{i})\right|<\epsilon| blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f ( italic_x ) ] - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | < italic_ϵ. For simplicity, we denote 𝔼xnsubscript𝔼similar-to𝑥subscript𝑛\mathbb{E}_{x\sim\mathbb{P}_{n}}blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝔼xsubscript𝔼similar-to𝑥\mathbb{E}_{x\sim\mathbb{P}}blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P end_POSTSUBSCRIPT as 𝔼nsubscript𝔼𝑛\mathbb{E}_{n}blackboard_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and 𝔼𝔼\mathbb{E}blackboard_E respectively. Therefore,

|𝔼n[f(x)]𝔼[f(x)]|subscript𝔼𝑛delimited-[]𝑓𝑥𝔼delimited-[]𝑓𝑥absent\displaystyle\left|\mathbb{E}_{n}\left[f(x)\right]-\mathbb{E}\left[f(x)\right]% \right|\leq| blackboard_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT [ italic_f ( italic_x ) ] - blackboard_E [ italic_f ( italic_x ) ] | ≤ |𝔼n[f(x)]i=1NLiΦXn(Mi)|+|𝔼[f(x)]i=1NLiΦX(Mi)|subscript𝔼𝑛delimited-[]𝑓𝑥superscriptsubscript𝑖1𝑁subscript𝐿𝑖subscriptΦsubscript𝑋𝑛subscript𝑀𝑖𝔼delimited-[]𝑓𝑥superscriptsubscript𝑖1𝑁subscript𝐿𝑖subscriptΦ𝑋subscript𝑀𝑖\displaystyle\left|\mathbb{E}_{n}\left[f(x)\right]-\sum_{i=1}^{N}L_{i}\circ% \Phi_{X_{n}}(M_{i})\right|+\left|\mathbb{E}\left[f(x)\right]-\sum_{i=1}^{N}L_{% i}\circ\Phi_{X}(M_{i})\right|| blackboard_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT [ italic_f ( italic_x ) ] - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ roman_Φ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | + | blackboard_E [ italic_f ( italic_x ) ] - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | (15)
+|i=1NLi(ΦXn(Mi)ΦX(Mi))|superscriptsubscript𝑖1𝑁subscript𝐿𝑖subscriptΦsubscript𝑋𝑛subscript𝑀𝑖subscriptΦ𝑋subscript𝑀𝑖\displaystyle+\left|\sum_{i=1}^{N}L_{i}\circ(\Phi_{X_{n}}(M_{i})-\Phi_{X}(M_{i% }))\right|+ | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ ( roman_Φ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) | (16)
2ϵ+i=1N|Li|opΦXn(Mi)ΦX(Mi)HS2absent2italic-ϵsuperscriptsubscript𝑖1𝑁subscriptsubscript𝐿𝑖𝑜𝑝superscriptsubscriptnormsubscriptΦsubscript𝑋𝑛subscript𝑀𝑖subscriptΦ𝑋subscript𝑀𝑖𝐻𝑆2\displaystyle\leq 2\epsilon+\sum_{i=1}^{N}\left|L_{i}\right|_{op}\left\|\Phi_{% X_{n}}(M_{i})-\Phi_{X}(M_{i})\right\|_{HS}^{2}≤ 2 italic_ϵ + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT | italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ∥ roman_Φ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_H italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (17)

where |L|op:=supx𝒰(m)\0|L(x)|xHS2assignsubscript𝐿𝑜𝑝subscriptsupremum𝑥\𝒰𝑚0𝐿𝑥subscriptsuperscriptnorm𝑥2𝐻𝑆|L|_{op}:=\sup_{x\in\mathcal{U}(m)\backslash 0}\frac{|L(x)|}{||x||^{2}_{HS}}| italic_L | start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT := roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_U ( italic_m ) \ 0 end_POSTSUBSCRIPT divide start_ARG | italic_L ( italic_x ) | end_ARG start_ARG | | italic_x | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_H italic_S end_POSTSUBSCRIPT end_ARG the operator norm. Since 𝑃𝐶𝐹𝐷(Xn,X):=𝔼MΦXn(M)ΦX(M)HS20assign𝑃𝐶𝐹𝐷subscript𝑋𝑛𝑋subscript𝔼similar-to𝑀subscriptsuperscriptsubscriptnormsubscriptΦsubscript𝑋𝑛𝑀subscriptΦ𝑋𝑀𝐻𝑆20\textit{PCFD}(X_{n},X):=\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}||\Phi_{X_{n% }}(M)-\Phi_{X}(M)||_{HS}^{2}\rightarrow 0PCFD ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_X ) := blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | roman_Φ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_M ) - roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_M ) | | start_POSTSUBSCRIPT italic_H italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → 0 as n𝑛n\rightarrow\inftyitalic_n → ∞ and ϵitalic-ϵ\epsilonitalic_ϵ is arbitrary, 𝔼xn[f(x)]𝔼x[f(x)]subscript𝔼similar-to𝑥subscript𝑛delimited-[]𝑓𝑥subscript𝔼similar-to𝑥delimited-[]𝑓𝑥\mathbb{E}_{x\sim\mathbb{P}_{n}}[f(x)]\rightarrow\mathbb{E}_{x\sim\mathbb{P}}[% f(x)]blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f ( italic_x ) ] → blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P end_POSTSUBSCRIPT [ italic_f ( italic_x ) ] for any continuous bounded function f:𝒦:𝑓𝒦f:\mathcal{K}\rightarrow\mathbb{C}italic_f : caligraphic_K → blackboard_C, which implies XndXsuperscript𝑑subscript𝑋𝑛𝑋X_{n}\stackrel{{\scriptstyle d}}{{\rightarrow}}Xitalic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_d end_ARG end_RELOP italic_X by the Portmanteau theorem [21]. ∎

B.3 Relation with MMD

We now discuss linkages between PCFD and MMD (maximum mean discrepancy) defined over 𝒫(𝒳)𝒫𝒳\mathcal{P}(\mathcal{X})caligraphic_P ( caligraphic_X ), the space of Borel probability measures (equivalently, probability distributions) on 𝒳𝒳\mathcal{X}caligraphic_X.

Definition B.9.

Given a kernel function κ:𝒳×𝒳:𝜅𝒳𝒳\kappa:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R}italic_κ : caligraphic_X × caligraphic_X → blackboard_R, the MMD associated to κ𝜅\kappaitalic_κ is the function MMDκ:𝒫(𝒳)×𝒫(𝒳)+:subscriptMMD𝜅𝒫𝒳𝒫𝒳superscript{\rm MMD}_{\kappa}:\mathcal{P}(\mathcal{X})\times\mathcal{P}(\mathcal{X})% \rightarrow\mathbb{R}^{+}roman_MMD start_POSTSUBSCRIPT italic_κ end_POSTSUBSCRIPT : caligraphic_P ( caligraphic_X ) × caligraphic_P ( caligraphic_X ) → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT given as follows: for independent random variables 𝐗,𝐘𝐗𝐘\mathbf{X},\mathbf{Y}bold_X , bold_Y on 𝒳𝒳\mathcal{X}caligraphic_X, set

MMDκ2(𝐗,𝐘)=𝔼𝐗,𝐗iid𝐗[κ(𝐗,𝐗)]+𝔼𝐘,𝐘iid𝐘[κ(𝐘,𝐘)]2𝔼𝐗𝐗,𝐘𝐘[κ(𝐗,𝐘)].subscriptsuperscriptMMD2𝜅subscript𝐗subscript𝐘subscript𝔼𝐗superscript𝐗iidsimilar-tosubscript𝐗delimited-[]𝜅𝐗superscript𝐗subscript𝔼𝐘superscript𝐘iidsimilar-tosubscript𝐘delimited-[]𝜅𝐘superscript𝐘2subscript𝔼formulae-sequencesimilar-to𝐗subscript𝐗similar-to𝐘subscript𝐘delimited-[]𝜅𝐗𝐘{\rm MMD}^{2}_{\kappa}(\mathbb{P}_{\mathbf{X}},\mathbb{P}_{\mathbf{Y}})=% \mathbb{E}_{\mathbf{X},\mathbf{X}^{\prime}\overset{\text{iid}}{\sim}\mathbb{P}% _{\mathbf{X}}}[\kappa(\mathbf{X},\mathbf{X}^{\prime})]+\mathbb{E}_{\mathbf{Y},% \mathbf{Y}^{\prime}\overset{\text{iid}}{\sim}\mathbb{P}_{\mathbf{Y}}}[\kappa(% \mathbf{Y},\mathbf{Y}^{\prime})]-2\mathbb{E}_{\mathbf{X}\sim\mathbb{P}_{% \mathbf{X}},\mathbf{Y}\sim\mathbb{P}_{\mathbf{Y}}}[\kappa(\mathbf{X},\mathbf{Y% })].roman_MMD start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_κ end_POSTSUBSCRIPT ( blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT , blackboard_P start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT bold_X , bold_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT overiid start_ARG ∼ end_ARG blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_κ ( bold_X , bold_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] + blackboard_E start_POSTSUBSCRIPT bold_Y , bold_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT overiid start_ARG ∼ end_ARG blackboard_P start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_κ ( bold_Y , bold_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] - 2 blackboard_E start_POSTSUBSCRIPT bold_X ∼ blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT , bold_Y ∼ blackboard_P start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_κ ( bold_X , bold_Y ) ] .

The PCFD can be interpreted as an MMD on measures of the path space with a specific kernel. Compare with [40] for the case of dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Proposition B.10 (PCFD as MMD).

Given 𝒫(𝔲(m)d)𝒫𝔲superscript𝑚𝑑\mathcal{M}\in\mathcal{P}\left(\mathfrak{u}(m)^{d}\right)caligraphic_M ∈ caligraphic_P ( fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) and 𝒳𝒳\mathcal{X}caligraphic_X-valued random variables 𝐗𝐗\mathbf{X}bold_X and 𝐘𝐘\mathbf{Y}bold_Y with induced distributions 𝐗subscript𝐗\mathbb{P}_{\mathbf{X}}blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT and 𝐘subscript𝐘\mathbb{P}_{\mathbf{Y}}blackboard_P start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT, resp. Then PCFD(𝐗,𝐘)=MMDκ(𝐗,𝐘)subscriptnormal-PCFD𝐗𝐘subscriptnormal-MMD𝜅subscript𝐗subscript𝐘{\rm PCFD}_{\mathcal{M}}(\mathbf{X},\mathbf{Y})={\rm MMD}_{\kappa}(\mathbb{P}_% {\mathbf{X}},\mathbb{P}_{\mathbf{Y}})roman_PCFD start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_X , bold_Y ) = roman_MMD start_POSTSUBSCRIPT italic_κ end_POSTSUBSCRIPT ( blackboard_P start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT , blackboard_P start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ) with kernel κ(𝐱,𝐲):=𝔼M[𝒰M(𝐱)𝒰M(𝐲)HS]=𝔼M[tr(𝒰M(𝐱𝐲))]assign𝜅𝐱𝐲subscript𝔼similar-to𝑀subscriptdelimited-[]subscriptnormsubscript𝒰𝑀𝐱subscript𝒰𝑀𝐲normal-HSsubscript𝔼similar-to𝑀subscriptdelimited-[]normal-trsubscript𝒰𝑀normal-⋆𝐱normal-←𝐲\kappa(\mathbf{x},\mathbf{y}):=\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\left% [\big{\|}\mathcal{U}_{M}(\mathbf{x})-\mathcal{U}_{M}(\mathbf{y})\big{\|}_{\rm HS% }\right]=\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\left[{\rm tr}\left(% \mathcal{U}_{M}\left(\mathbf{x}\star\overleftarrow{\mathbf{y}}\right)\right)\right]italic_κ ( bold_x , bold_y ) := blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∥ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ) - caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_y ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ] = blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_tr ( caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ⋆ over← start_ARG bold_y end_ARG ) ) ].

Throughout, \star designates concatenation of paths and 𝐲𝐲\overleftarrow{\mathbf{y}}over← start_ARG bold_y end_ARG is the path obtained by running 𝐲𝐲\mathbf{y}bold_y backwards. The operation 𝐱𝐲𝐱𝐲\mathbf{x}\star\overleftarrow{\mathbf{y}}bold_x ⋆ over← start_ARG bold_y end_ARG on the path space is analogous to 𝐱𝐲𝐱𝐲\mathbf{x}-\mathbf{y}bold_x - bold_y on dsuperscript𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. If 𝐲=𝐱𝐲𝐱\mathbf{y}=\mathbf{x}bold_y = bold_x, then 𝐱𝐲𝐱𝐲\mathbf{x}\star\overleftarrow{\mathbf{y}}bold_x ⋆ over← start_ARG bold_y end_ARG is the null path. See the Appendix for proofs and further discussions.

Remark B.11 (Computational cost complexity).

By Proposition B.10, PCFD is an MMD. However, to compute EPCFD, we may directly calculate the expected distance between the PCFs, without going over the kernel calculations in the MMD approach. Our method is significantly more efficient, especially for large datasets. The computational complexity of EPCFD is linear in sample size, whereas the MMD approach is quadratic.

Proof.

By definition of PCFD, we have

PCFD2(μ,ν)subscriptsuperscriptPCFD2𝜇𝜈\displaystyle{\rm PCFD}^{2}_{\mathcal{M}}(\mu,\nu)roman_PCFD start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_μ , italic_ν ) =𝔼M[Φ𝐗(M)Φ𝐘(M)HS2]absentsubscript𝔼similar-to𝑀subscriptdelimited-[]superscriptsubscriptdelimited-∥∥subscriptΦ𝐗𝑀subscriptΦ𝐘𝑀HS2\displaystyle=\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\left[\left\lVert\Phi_% {\mathbf{X}}(M)-\Phi_{\mathbf{Y}}(M)\right\rVert_{\rm HS}^{2}\right]= blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∥ roman_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) - roman_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ∥ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=𝔼M[Φ𝐗(M)HS2+Φ𝐘(M)HS22Φ𝐗(M),Φ𝐘(M)HS]absentsubscript𝔼similar-to𝑀subscriptdelimited-[]subscriptsuperscriptnormsubscriptΦ𝐗𝑀2HSsubscriptsuperscriptnormsubscriptΦ𝐘𝑀2HS2subscriptsubscriptΦ𝐗𝑀subscriptΦ𝐘𝑀HS\displaystyle=\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\left[\|\Phi_{\mathbf{% X}}(M)\|^{2}_{\rm HS}+\|\Phi_{\mathbf{Y}}(M)\|^{2}_{\rm HS}-2\langle\Phi_{% \mathbf{X}}(M),\Phi_{\mathbf{Y}}(M)\rangle_{\rm HS}\right]= blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∥ roman_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT + ∥ roman_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT - 2 ⟨ roman_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) , roman_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ⟩ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ]

where Φ𝐗(M)=𝔼𝐗μ[𝒰M(𝐗)]subscriptΦ𝐗𝑀subscript𝔼similar-to𝐗𝜇delimited-[]subscript𝒰𝑀𝐗\Phi_{\mathbf{X}}(M)=\mathbb{E}_{\mathbf{X}\sim\mu}[\mathcal{U}_{M}(\mathbf{X})]roman_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) = blackboard_E start_POSTSUBSCRIPT bold_X ∼ italic_μ end_POSTSUBSCRIPT [ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X ) ] and Φ𝐘(M)=𝔼𝐘μ[𝒰M(𝐘)]subscriptΦ𝐘𝑀subscript𝔼similar-to𝐘𝜇delimited-[]subscript𝒰𝑀𝐘\Phi_{\mathbf{Y}}(M)=\mathbb{E}_{\mathbf{Y}\sim\mu}[\mathcal{U}_{M}(\mathbf{Y})]roman_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) = blackboard_E start_POSTSUBSCRIPT bold_Y ∼ italic_μ end_POSTSUBSCRIPT [ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_Y ) ], respectively. Using Fubini’s theorem and observing that Φ𝐗(M),Φ𝐘(M)HSL2()subscriptsubscriptΦ𝐗𝑀subscriptΦ𝐘𝑀HSsuperscript𝐿2subscript\langle\Phi_{\mathbf{X}}(M),\Phi_{\mathbf{Y}}(M)\rangle_{\rm HS}\in L^{2}(% \mathbb{P}_{\mathcal{M}})⟨ roman_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) , roman_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ⟩ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ∈ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ) (as Φ𝐗(M)subscriptΦ𝐗𝑀\Phi_{\mathbf{X}}(M)roman_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) and Φ𝐘(M)subscriptΦ𝐘𝑀\Phi_{\mathbf{Y}}(M)roman_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) are U(m)𝑈𝑚U(m)italic_U ( italic_m )-valued, they indeed lie in L(m×m;)superscript𝐿superscript𝑚𝑚subscriptL^{\infty}\left({\mathbb{C}}^{m\times m};\mathbb{P}_{\mathcal{M}}\right)italic_L start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT ; blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ) as U(m)𝑈𝑚U(m)italic_U ( italic_m ) is a compact Lie group under the Hilbert–Schmidt metric), we deduce that

𝔼M[Φ𝐗(M),Φ𝐘(M)HS]=𝔼𝐗μ[𝔼𝐘ν[𝔼M[𝒰M(𝐗),𝒰M(𝐘)HS]]].subscript𝔼similar-to𝑀subscriptdelimited-[]subscriptsubscriptΦ𝐗𝑀subscriptΦ𝐘𝑀HSsubscript𝔼similar-to𝐗𝜇delimited-[]subscript𝔼similar-to𝐘𝜈delimited-[]subscript𝔼similar-to𝑀subscriptdelimited-[]subscriptsubscript𝒰𝑀𝐗subscript𝒰𝑀𝐘HS\displaystyle\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}[\langle\Phi_{\mathbf{X% }}(M),\Phi_{\mathbf{Y}}(M)\rangle_{\rm HS}]=\mathbb{E}_{\mathbf{X}\sim\mu}[% \mathbb{E}_{\mathbf{Y}\sim\nu}[\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}[% \langle\mathcal{U}_{M}(\mathbf{X}),\mathcal{U}_{M}(\mathbf{Y})\rangle_{\rm HS}% ]]].blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ⟨ roman_Φ start_POSTSUBSCRIPT bold_X end_POSTSUBSCRIPT ( italic_M ) , roman_Φ start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( italic_M ) ⟩ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ] = blackboard_E start_POSTSUBSCRIPT bold_X ∼ italic_μ end_POSTSUBSCRIPT [ blackboard_E start_POSTSUBSCRIPT bold_Y ∼ italic_ν end_POSTSUBSCRIPT [ blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ⟨ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_X ) , caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_Y ) ⟩ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ] ] ] .

The first equality then follows from the identification κ(𝐱,𝐲)=𝔼M[𝒰M(𝐱),𝒰M(𝐲)HS]𝜅𝐱𝐲subscript𝔼similar-to𝑀subscriptdelimited-[]subscriptsubscript𝒰𝑀𝐱subscript𝒰𝑀𝐲HS\kappa(\mathbf{x},\mathbf{y})=\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}[% \langle\mathcal{U}_{M}(\mathbf{x}),\mathcal{U}_{M}(\mathbf{y})\rangle_{\rm HS}]italic_κ ( bold_x , bold_y ) = blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ⟨ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ) , caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_y ) ⟩ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ] and the definition of MMDκsubscriptMMD𝜅{\rm MMD}_{\kappa}roman_MMD start_POSTSUBSCRIPT italic_κ end_POSTSUBSCRIPT.

On the other hand, by Lemma A.5 and the definition of the Hilbert–Schmidt inner product on U(m)𝑈𝑚U(m)italic_U ( italic_m ), one may rewrite the kernel function as follows:

κ(𝐱,𝐲)𝜅𝐱𝐲\displaystyle\kappa(\mathbf{x},\mathbf{y})italic_κ ( bold_x , bold_y ) =𝔼M[𝒰M(𝐱),𝒰M(𝐲)HS]absentsubscript𝔼similar-to𝑀subscriptdelimited-[]subscriptsubscript𝒰𝑀𝐱subscript𝒰𝑀𝐲HS\displaystyle=\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\left[\langle\mathcal{% U}_{M}(\mathbf{x}),\mathcal{U}_{M}(\mathbf{y})\rangle_{\rm HS}\right]= blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ⟨ caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ) , caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_y ) ⟩ start_POSTSUBSCRIPT roman_HS end_POSTSUBSCRIPT ]
=𝔼M[tr(𝒰M(𝐱)𝒰M1(𝐲))]=𝔼M[tr(𝒰M(𝐱𝐲))],absentsubscript𝔼similar-to𝑀subscriptdelimited-[]trsubscript𝒰𝑀𝐱subscriptsuperscript𝒰1𝑀𝐲subscript𝔼similar-to𝑀subscriptdelimited-[]trsubscript𝒰𝑀𝐱𝐲\displaystyle=\mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\left[{\rm tr}(% \mathcal{U}_{M}(\mathbf{x})\cdot\mathcal{U}^{-1}_{M}(\mathbf{y}))\right]=% \mathbb{E}_{M\sim\mathbb{P}_{\mathcal{M}}}\left[{\rm tr}\left(\mathcal{U}_{M}% \left(\mathbf{x}\star\overleftarrow{\mathbf{y}}\right)\right)\right],= blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_tr ( caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ) ⋅ caligraphic_U start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_y ) ) ] = blackboard_E start_POSTSUBSCRIPT italic_M ∼ blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_tr ( caligraphic_U start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_x ⋆ over← start_ARG bold_y end_ARG ) ) ] ,

where \star denotes the concatenation of paths. The second equality now follows. ∎

B.4 Empirical PCFD

B.4.1 Initialisation of \mathcal{M}caligraphic_M

A linear map M(d,𝔲(m))𝑀superscript𝑑𝔲𝑚M\in{\mathcal{L}}\left({\mathbb{R}}^{d},\mathfrak{u}(m)\right)italic_M ∈ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) can be canonically represented by d𝑑ditalic_d independent anti-Hermitian matrices M1,,Md𝔲(m)m×msubscript𝑀1subscript𝑀𝑑𝔲𝑚superscript𝑚𝑚M_{1},\ldots,M_{d}\subset\mathfrak{u}(m)\in\mathbb{C}^{m\times m}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_M start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ⊂ fraktur_u ( italic_m ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT. To sample empiracal distribution of 𝒫[(d,𝔲(m))]𝒫delimited-[]superscript𝑑𝔲𝑚\mathcal{M}\in\mathcal{P}\left[{\mathcal{L}}\left({\mathbb{R}}^{d},\mathfrak{u% }(m)\right)\right]caligraphic_M ∈ caligraphic_P [ caligraphic_L ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , fraktur_u ( italic_m ) ) ] from subscript\mathbb{P}_{\mathcal{M}}blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT, we propose a sampling scheme over 𝔲(m)𝔲𝑚\mathfrak{u}(m)fraktur_u ( italic_m ). This can also be used as an effective initialisation of model parameters θM𝔲(m)d×ksubscript𝜃𝑀𝔲superscript𝑚𝑑𝑘\theta_{M}\in\mathfrak{u}(m)^{d\times k}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∈ fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d × italic_k end_POSTSUPERSCRIPT for the empirical measure of \mathcal{M}caligraphic_M.

In practice, when working with the Lie algebra 𝔲(m)𝔲𝑚\mathfrak{u}(m)fraktur_u ( italic_m ), i.e., the vector space of m×m𝑚𝑚m\times mitalic_m × italic_m complex-valued matrices that are anti-Hermitian (A*+A=0superscript𝐴𝐴0A^{*}+A=0italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_A = 0, where A*superscript𝐴A^{*}italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is the transpose conjugate of A𝐴Aitalic_A), we view each anti-Hermitian matrix as an 2m×2m2𝑚2𝑚2m\times 2m2 italic_m × 2 italic_m real matrix via the isomorphism of {\mathbb{R}}blackboard_R-vector spaces 2m×2mm×msuperscript2𝑚2𝑚superscript𝑚𝑚{\mathbb{R}}^{2m\times 2m}\cong{\mathbb{C}}^{m\times m}blackboard_R start_POSTSUPERSCRIPT 2 italic_m × 2 italic_m end_POSTSUPERSCRIPT ≅ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT.

Under the above identification, we have the decomposition

𝔲(m)𝔬(m)1(Symm×m/𝔷(m))1𝔷(m),𝔲𝑚direct-sum𝔬𝑚1subscriptSym𝑚𝑚𝔷𝑚1𝔷𝑚\displaystyle\mathfrak{u}(m)\cong\mathfrak{o}(m)\oplus\sqrt{-1}\left({\rm Sym}% _{m\times m}/\mathfrak{z}(m)\right)\oplus\sqrt{-1}\mathfrak{z}(m),fraktur_u ( italic_m ) ≅ fraktur_o ( italic_m ) ⊕ square-root start_ARG - 1 end_ARG ( roman_Sym start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT / fraktur_z ( italic_m ) ) ⊕ square-root start_ARG - 1 end_ARG fraktur_z ( italic_m ) , (18)

where 𝔬(m)𝔬𝑚\mathfrak{o}(m)fraktur_o ( italic_m ) is the Lie algebra of anti-symmetric m×m𝑚𝑚m\times mitalic_m × italic_m real matrices, Symm×msubscriptSym𝑚𝑚{\rm Sym}_{m\times m}roman_Sym start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT is the space of m×m𝑚𝑚m\times mitalic_m × italic_m real symmetric matrices, 𝔷(m)𝔷𝑚\mathfrak{z}(m)fraktur_z ( italic_m ) consists of m×m𝑚𝑚m\times mitalic_m × italic_m real diagonal matrices and Symm×m/𝔷(m)subscriptSym𝑚𝑚𝔷𝑚{\rm Sym}_{m\times m}/\mathfrak{z}(m)roman_Sym start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT / fraktur_z ( italic_m ) denotes the quotient space of real symmetric matrices by the real diagonal matrices.

The sampling procedure of subscript\mathbb{P}_{\mathcal{M}}blackboard_P start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT, is given as follows. First, we simulate m×msuperscript𝑚𝑚\mathbb{R}^{m\times m}blackboard_R start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT valued and i.i.d random variables A𝐴Aitalic_A and B𝐵Bitalic_B, whose elements are i.i.d and satisfy the pre-specified distribution in 𝒫()𝒫\mathcal{P}(\mathbb{R})caligraphic_P ( blackboard_R ). We have the decomposition B=DE𝐵direct-sum𝐷𝐸B=D\oplus Eitalic_B = italic_D ⊕ italic_E, where D and E are a diagonal random matrix and a off-diagonal random matrix respectively. Then we construct the anti-symmetric matrix R=12(ATA)𝑅12superscript𝐴𝑇𝐴R=\frac{1}{\sqrt{2}}{(A^{T}-A)}italic_R = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ( italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - italic_A ) and matrix in the quotient space Symm×m/𝔷(m)subscriptSym𝑚𝑚𝔷𝑚{\rm Sym}_{m\times m}/\mathfrak{z}(m)roman_Sym start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT / fraktur_z ( italic_m ), C=12(ET+E)𝐶12superscript𝐸𝑇𝐸C=\frac{1}{\sqrt{2}}{(E^{T}+E)}italic_C = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ( italic_E start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_E ), and diagonal matrix D𝐷Ditalic_D. Correspondingly, we simulate 𝔲(m)𝔲𝑚\mathfrak{u}(m)fraktur_u ( italic_m )-valued random variables by virtue of Eq. (18). As the empirical measure of the \mathcal{M}caligraphic_M can be fully characterised by the model parameters θM𝔲(m)d×ksubscript𝜃𝑀𝔲superscript𝑚𝑑𝑘\theta_{M}\in\mathfrak{u}(m)^{d\times k}italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∈ fraktur_u ( italic_m ) start_POSTSUPERSCRIPT italic_d × italic_k end_POSTSUPERSCRIPT, we sample d×k𝑑𝑘d\times kitalic_d × italic_k i.i.d. samples which take values in u(m)𝑢𝑚u(m)italic_u ( italic_m ).

B.4.2 Hypothesis test

In the following, we illustrate the efficacy of the proposed trainable EPCFD metric in the context of the hypothesis test on stochastic processes.

Example B.12 (Hypothesis testing on fractional Brownian motion).

Consider the 3-dimensional Brownian motion 𝐁:=(Bt)t[0,T]assign𝐁subscriptsubscript𝐵𝑡𝑡0𝑇\mathbf{B}:=(B_{t})_{t\in[0,T]}bold_B := ( italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT and the fraction Brownian motion 𝐁h:=(Bth)t[0,T]assignsuperscript𝐁subscriptsubscriptsuperscript𝐵𝑡𝑡0𝑇\mathbf{B}^{h}:=(B^{h}_{t})_{t\in[0,T]}bold_B start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT := ( italic_B start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT with the Hurst parameter hhitalic_h. We simulated 5000 sample paths for both 𝐁𝐁\bf{B}bold_B and 𝐁𝐡superscript𝐁𝐡\bf{B}^{h}bold_B start_POSTSUPERSCRIPT bold_h end_POSTSUPERSCRIPT with 50 discretized time steps. We apply the proposed optimized EPCFD metric to the two-sample testing problem: the null hypothesis H0:𝐁=𝑑𝐁hnormal-:subscript𝐻0superscript𝑑𝐁superscript𝐁H_{0}:\mathbf{B}\stackrel{{\scriptstyle\text{d}}}{{=}}\mathbf{B}^{h}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : bold_B start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG d end_ARG end_RELOP bold_B start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT against the alternative H1:𝐁𝑑𝐁hnormal-:subscript𝐻1superscript𝑑𝐁superscript𝐁H_{1}:\mathbf{B}\stackrel{{\scriptstyle\text{d}}}{{\neq}}\mathbf{B}^{h}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : bold_B start_RELOP SUPERSCRIPTOP start_ARG ≠ end_ARG start_ARG d end_ARG end_RELOP bold_B start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT. We compare the optimized EPCFD metric with EPCFD metric with the prespecified distribution (PCF) and the characteristic function distance (CF) on the flattened time series [25]. The optimized PCFs are trained on a separate set of 5000 training samples to maximise the PCFD. The details of training can be found at Section C.2.

We conduct the permutation test to compute the power of a test (i.e. the probability of correct rejection of the null H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) and Type I error (i.e. the probability of false acceptances of the null H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) for varying h{0.2+0.1i}i=06superscriptsubscript0.2normal-⋅0.1𝑖𝑖06h\in\{0.2+0.1\cdot i\}_{i=0}^{6}italic_h ∈ { 0.2 + 0.1 ⋅ italic_i } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT. Note that when h=0.50.5h=0.5italic_h = 0.5, 𝐁𝐁\mathbf{B}bold_B and 𝐁hsuperscript𝐁\mathbf{B}^{h}bold_B start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT have the same distribution and hence are indistinguishable. Therefore, the better the test metric is, the test power should be closer to 0 when hhitalic_h is close to 0.5, whereas it should be closer to 1 when hhitalic_h is away from 0.50.50.50.5. We refer to [22] for more in-depth information on hypothesis testing and permutation test statistics.

The plot of the test power and Type 1 error in Figure 6 shows that CF fails in the two sample tests, whilst both EPCFD and optimised EPCFD can distinguish the samples from the stochastic process when h0.50.5h\neq 0.5italic_h ≠ 0.5. It indicates that the EPCFD captures the distribution of time series much more effectively than the conventional CF metric. Moreover, optimization of EPCFD increases the test power while decreasing the type1 error, particularly when hhitalic_h is closer to 0.50.50.50.5.

Refer to caption
Figure 6: Plots of the test power (Left) and the Type-I error (Right) against the Hurst parameter h[0.2,0.8]0.20.8h\in[0.2,0.8]italic_h ∈ [ 0.2 , 0.8 ] on the two sample tests for the Brownian motion 𝐁𝐁\mathbf{B}bold_B against Fractional Brownian motions (𝐁hsuperscript𝐁\mathbf{B}^{h}bold_B start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT) by using three metrics, i.e., PCFD, optimized EPCFD and CFD.

Appendix C Numerical experiments

C.1 Experimental detail on PCF-GAN

C.1.1 General notes

Codes. The code for reproducing all experiments can be found in https://github.com/DeepIntoStreams/PCF-GAN.

Software. We conducted all experiments using PyTorch 1.13.1 [34] and performed hyperparameter tuning with Wandb [5]. To ensure reproducibility, we implemented benchmark models based on open-source code from [45, 43, 12]. We used the Ksig library [42] to calculate the Sig-MMD metrics. The codes in [25] were used to compute characteristics function distance in Example B.12.

Computing infrastructure. The experiments were performed on a computational system running Ubuntu 22.04.2 LTS, comprising three Quadro RTX 8000 and two RTX A6000 GPUs. Each experiment was run independently on a single GPU, with the training phase taking between 6 hours to 3 days, depending on the dataset and models used.

Architectures. To ensure a fair comparison, we employed identical network architectures, with two layers of LSTMs having 32 hidden units, for both the generator and discriminator across all models. For the generator, the output of the LSTM (full sequence) was passed through a Tanh activation function and a linear output layer. All generative models take a multi-dimensional discretized Brownian motion as the noise distribution, scaling it to ensure values were controlled within the range [1,1]11[-1,1][ - 1 , 1 ]. The dimension and scaling factor varied based on the dataset and were specified in the individual sections as below.

The PCF-GAN uses the development layers on the unitary matrix [26] to calculate the PCFD distance. For all experiments, we fixed the unitary matrix size and coefficient λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for the regularization loss to 10 and 1, respectively. The number of unitary linear maps and the coefficient λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT of the recovery loss were determined via hyper-parameter tuning, which varied depending on the dataset (see individual section for details).

Regarding TimeGAN, the following approach described in [45] and employed embedding, supervisor, and recovery modules. Each of these modules had two layers of LSTMs with 32 hidden units. For COT-GAN, we used two separate modules for discriminators, each with two layers of LSTMs with 32 hidden units. Based on the recommendation from COT-GAN [43] and informal hyperparameter tuning, we set λ=10𝜆10\lambda=10italic_λ = 10 and ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1 for all experiments.

Optimisation & training. We used the ADAM optimizer for all experiments [20], with a learning rate of 0.001 for both generators and discriminators. The learning rate for the unitary development network is 0.005. The initial decay rates in the ADAM optimizer are set β1=0subscript𝛽10\beta_{1}=0italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, β2=0.9subscript𝛽20.9\beta_{2}=0.9italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.9. The discriminator was trained for two iterations per iteration of the generator’s training. For TimeGAN, we followed the training scheme for each module as suggested in the original paper. The batch size was 64 for all experiments. These hyperparameters do not substantially affect the results.

To improve the training stability of GAN, we employed three techniques. Firstly, we applied a constant exponential decay rate of 0.97 to the learning rate for every 500 generator training iterations. Secondly, we clipped the norm of gradients in both generator and discriminator to 10. Thirdly, we used the Cesaro mean of the generator weights after certain iterations to improve the performance of the final model, as suggested by [44]. In all cases, we selected the number of training iterations such that all methods could produce stable generative samples. The optimal number of training iterations and weight averaging scheme varied for each dataset. More details can be found in the respective sections.

Test metrics. Discriminative score. The network architecture of the post-hoc classifier consists of two layers of LSTMs with 16 hidden units. The dataset was split into equal proportions of real and generated time series with labels 0 and 1, with an 80%percent8080\%80 % / 20%percent2020\%20 % train/test split for training and evaluation. The discriminative model was trained for 30 epochs using Adam with a learning rate of 0.001 and a batch size of 64. The best classification error on the test set was reported.

Predictive score. The network architecture of the post-hoc sequence-to-sequence regressor consists of two layers of LSTMs with 16 hidden units. The model was trained on the generated time series and evaluated on the real time series, using the first 80%percent8080\%80 % of the time series to predict the last 20%percent2020\%20 %. The predictive model was trained for 50 epochs using Adam with a learning rate of 0.001 and a batch size of 64. The best mean squared error on the test set was reported.

Sig-MMD. We directly computed the Sig-MMD by taking inputs of the real time series samples and generated time series samples. We used the radial basis function kernel applying to the truncated signature feature up to depth 5555.

C.2 Time dependent Ornstein-Uhlenbeck process

On this dataset, we experimented with the basic version of PCF-GAN, which only utilized the EPCFD as the discriminator without the autoencoder structure. The batch size is 256. The model are trained with 20000 generator training iterations and weight averaging on the generator was performed over the final 4000 generator training iterations. We used the 2-dimensional discretized Brownian motion as the noise distribution.

C.2.1 Rough volatility model

We followed [31] considering a rough stochastic volatility model for an asset price process (St)t[0,1]subscriptsubscript𝑆𝑡𝑡01(S_{t})_{t\in[0,1]}( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT, which satisfies the below stochastic differential equation,

dSt𝑑subscript𝑆𝑡\displaystyle dS_{t}italic_d italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =VtStdZt,absentsubscript𝑉𝑡subscript𝑆𝑡𝑑subscript𝑍𝑡\displaystyle=\sqrt{V_{t}}S_{t}dZ_{t},= square-root start_ARG italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (19)
Vtsubscript𝑉𝑡\displaystyle V_{t}italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT :=ξ(t)exp(ηBtH12η2t2H),assignabsent𝜉𝑡𝑒𝑥𝑝𝜂superscriptsubscript𝐵𝑡𝐻12superscript𝜂2superscript𝑡2𝐻\displaystyle:=\xi(t)exp\left(\eta B_{t}^{H}-\frac{1}{2}\eta^{2}t^{2H}\right),:= italic_ξ ( italic_t ) italic_e italic_x italic_p ( italic_η italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 italic_H end_POSTSUPERSCRIPT ) , (20)

where ξ(t)𝜉𝑡\xi(t)italic_ξ ( italic_t ) denotes the forward variance and BtHsuperscriptsubscript𝐵𝑡𝐻B_{t}^{H}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT denotes the frational Brownian motion (fBM) given by

BtH:=0tK(ts)𝑑Bs,K(r):=2HrH0.5formulae-sequenceassignsuperscriptsubscript𝐵𝑡𝐻subscriptsuperscript𝑡0𝐾𝑡𝑠differential-dsubscript𝐵𝑠assign𝐾𝑟2𝐻superscript𝑟𝐻0.5B_{t}^{H}:=\int^{t}_{0}K(t-s)dB_{s},\quad K(r):=\sqrt{2H}r^{H-0.5}italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT := ∫ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_K ( italic_t - italic_s ) italic_d italic_B start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_K ( italic_r ) := square-root start_ARG 2 italic_H end_ARG italic_r start_POSTSUPERSCRIPT italic_H - 0.5 end_POSTSUPERSCRIPT

where (Zt)t[0,1],(Bt)t[0,1]subscriptsubscript𝑍𝑡𝑡01subscriptsubscript𝐵𝑡𝑡01(Z_{t})_{t\in[0,1]},(B_{t})_{t\in[0,1]}( italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT , ( italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT are (possibly correlated) Brownian motions. In our experiments, the synthetic dataset is sampled from Equation 19 with t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ], H=0.25𝐻0.25H=0.25italic_H = 0.25, ξ(t)𝒩(0.1,0.01)similar-to𝜉𝑡𝒩0.10.01\xi(t)\sim\mathcal{N}(0.1,0.01)italic_ξ ( italic_t ) ∼ caligraphic_N ( 0.1 , 0.01 ), η=0.5𝜂0.5\eta=0.5italic_η = 0.5 and initial condition log(S0)𝒩(0,0.05)similar-tosubscript𝑆0𝒩00.05\log(S_{0})\sim\mathcal{N}(0,0.05)roman_log ( italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∼ caligraphic_N ( 0 , 0.05 ). Each sample path is sampled uniformly from [0,1]01[0,1][ 0 , 1 ] with the time discretization δt=0.005𝛿𝑡0.005\delta t=0.005italic_δ italic_t = 0.005, which consists of 200 time steps. We train the generators to learn the joint distribution of the log price and log volatility.

All methods are trained with 30000 generator training iterations and weight averaging on the generator was performed over the final 5000 generator training iterations. The input noise vectors have 5 dimension and 200 time steps.

For PCF-GAN, the coefficient λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for the recovery loss was 50, and the number of unitary linear maps was 6.

C.2.2 Stocks

We selected 10 large market cap stocks, which are Google, Apple, Amazon, Tesla, Meta, Microsoft, Nvidia, JP Morgan, Visa and P&G, from 2013 to 2021. The dataset consists of 5 features, including daily open, close, high, low prices and volume, available on https://finance.yahoo.com/lookup. We truncated the long stock time series into 20 days. The data were normalized with standard Min-Max normalisation on each feature channel. The Stock dataset used in our study is similar to the one employed in [25] but with a broader range of assets. Unlike the previous approach, we avoided sampling the time series using rolling windows with a stride of 1 to mitigate the presence of strong dependencies between samples.

All methods are trained with 30000 generator training iterations and weight averaging on the generator was performed over the final 5000 generator training iterations. The input noise vectors have 5 feature dimensions and 20 time steps.

For PCF-GAN, the coefficient λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for the recovery loss was 400, and the number of unitary linear maps was 6.

C.2.3 Beijing Air Quality

We used a dataset of the air quality in Beijing from the UCI repository [47] and available on https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data. Each sample is a 10-dimensional time series of the SO2, NO2, CO, O3, PM2.5, PM10 concentrations, temperature, pressure, dew point temperature and wind speed. Each time series is recorded hourly over the course of a day. The data were normalized with standard Min-Max normalisation on each feature channel.

All methods are trained with 20000 generator training iterations and weight averaging on the generator was performed over the final 4000 generator training iterations. The input noise vectors have 5 dimensions and 24 time steps.

For PCF-GAN, the coefficient λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for the recovery loss was 50, and the number of unitary linear maps was 6.

C.2.4 EEG

We obtained the EEG eye state dataset from https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State. The data is from one continuous EEG measurement on 14 variables with 14980 time steps. We truncated the long time series into smaller ones with 20 time steps. The data are subtracted by channel-wise mean, divided by three times the channel-wise standard deviation, and then passed through a tanh nonlinearity.

All methods are trained with 30000 generator training iterations and weight averaging on the generator was performed over the final 5000 generator training iterations. The input noise vectors have the 8 dimensional and 20 time steps.

For PCF-GAN, the coefficient λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for the recovery loss was 50, and the number of unitary linear maps was 8.

Appendix D Supplementary results

D.1 Ablation study

An ablation study was conducted on the PCF-GAN model to evaluate the importance of its various components. Specifically, the reconstruction loss and regularization loss were disabled in order to assess their impact on model performance across benchmark datasets and various test metrics. Table 3 consistently demonstrated that the PCF-GAN model outperformed the ablated versions, confirming the significance of these two losses in the overall model performance.

Table 3: Ablation study of PCF-GAN
Dataset Test Metrics PCF-GAN w/o Lrecoverysubscript𝐿recoveryL_{\text{recovery}}italic_L start_POSTSUBSCRIPT recovery end_POSTSUBSCRIPT w/o Lregularizationsubscript𝐿regularizationL_{\text{regularization}}italic_L start_POSTSUBSCRIPT regularization end_POSTSUBSCRIPT w/o Lregularizationsubscript𝐿regularizationL_{\text{regularization}}italic_L start_POSTSUBSCRIPT regularization end_POSTSUBSCRIPT &Lrecoverysubscript𝐿recoveryL_{\text{recovery}}italic_L start_POSTSUBSCRIPT recovery end_POSTSUBSCRIPT
RV Discriminative .0108±plus-or-minus\pm±.006 .0178±plus-or-minus\pm±.017 .0152±plus-or-minus\pm±.020 .0101±plus-or-minus\pm±.007
Predictive .0390±plus-or-minus\pm±.000 .0389±plus-or-minus\pm±.000 .0390±plus-or-minus\pm±.003 .0391±plus-or-minus\pm±.001
Sig-MMD .0024±plus-or-minus\pm±.001 .0037±plus-or-minus\pm±.001 .0036 ±plus-or-minus\pm±.002 .0027±plus-or-minus\pm±.001
Stock Discriminative .0784±plus-or-minus\pm±.028 .0963±plus-or-minus\pm±.011 .2538±plus-or-minus\pm±.052 .0815±plus-or-minus\pm±.001
Predictive .0125±plus-or-minus\pm±.000 .0123±plus-or-minus\pm±.000 .0127±plus-or-minus\pm±.000 .0126±plus-or-minus\pm±.001
Sig-MMD .0017±plus-or-minus\pm±.000 .0062±plus-or-minus\pm±.002 .0024±plus-or-minus\pm±.001 .0021±plus-or-minus\pm±.001
Air Discriminative .2326±plus-or-minus\pm±.058 .3940±plus-or-minus\pm±.068 .4783±plus-or-minus\pm±.029 .3875±plus-or-minus\pm±.009
Predictive .0237±plus-or-minus\pm±.000 .0239±plus-or-minus\pm±.000 .0283±plus-or-minus\pm±.001 .0240±plus-or-minus\pm±.000
Sig-MMD .0126±plus-or-minus\pm±.005 .0111±plus-or-minus\pm±.003 .0232±plus-or-minus\pm±.004 .0163±plus-or-minus\pm±.004
EEG Discriminative .3660±plus-or-minus\pm±.025 .4942±plus-or-minus\pm±.010 .5000±plus-or-minus\pm±.000 .4649±plus-or-minus\pm±.015
Predictive .0246±plus-or-minus\pm±.000 .0299±plus-or-minus\pm±.000 .0636±plus-or-minus\pm±.007 .0248±plus-or-minus\pm±.000
Sig-MMD .0180±plus-or-minus\pm±.004 .0296±plus-or-minus\pm±.008 1.197±plus-or-minus\pm±.234 .0278±plus-or-minus\pm±007

Notably, the inclusion of the two additional losses significantly improved model performance on high-dimensional time series datasets, such as Air Quality and EEG, indicating that the proposed auto-encoder architecture effectively learns meaningful low-dimensional sequential embeddings. Conversely, the exclusive use of the reconstruction loss led to a notable decrease in model performance, suggesting that the l2superscript𝑙2l^{2}italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT samplewise distance might not be suitable for time series data. However, the additional regularization loss helped overcome this issue by ensuring that the sequential embedding space is confined to a predetermined noise space, such as the discretized Brownian motion. As a result, the regularization loss helped to mitigate the problems that arose when relying solely on the reconstruction loss.

D.2 Generated samples

In this section, we present random samples from the four benchmark datasets generated by PCF-GAN, TimeGAN, RGAN, and COT-GAN. Although interpreting the sample plots of the generated time series poses a challenge, our observations reveal that PCF-GAN successfully generates time series that capture the temporal dependencies exhibited in the original time series across all datasets. Conversely, COT-GAN generates trajectories that are relatively smoother compared to the real time series samples, demonstrated on Stock and EEG datasets, by Figure 8 and Figure 10 respectively. Figure 10 shows that TimeGAN occasionally produces samples with higher oscillations than those found in the real samples.

Refer to caption
Figure 7: Generated samples from all models on Rough volatility dataset
Refer to caption
Figure 8: Generated samples from all models on Stock dataset
Refer to caption
Figure 9: Generated samples from all models on Air Quality dataset
Refer to caption
Figure 10: Generated samples from all models on EEG dataset

D.3 Reconstructed samples

In this section, we present additional reconstructed time series samples generated by PCF-GAN and TimeGAN. Figure 11 illustrates that PCF-GAN consistently outperforms TimeGAN by producing higher-quality reconstructed samples across all datasets.

Refer to caption
Figure 11: Reconstructed samples from PCF-GAN and TimeGAN on all benchmark datasets.

D.4 Test metrics on (auto-)correlation and marginal distribution

This subsection details the supplementary test metrics in terms of fitting the autocorrelation, cross-correlation, and marginal distribution, as presented in Table 4. This table confirms that our proposed PCF-GAN consistently outperforms the benchmarking models across all datasets.

Table 4: Performance comparison of PCF-GAN and baselines on auto-correlation, cross-correlation and marginal distribution metrics. Best for each task is shown in bold.
Task Generation
Dataset Test Metrics RGAN COT-GAN TimeGAN PCF-GAN
RV Auto-cor (lag 1) .0393±plus-or-minus\pm±.001 .0608±plus-or-minus\pm±.001 .0031±plus-or-minus\pm±.001 .0022±plus-or-minus\pm±.000
Auto-cor (lag 5) .0134±plus-or-minus\pm±.002 .119±plus-or-minus\pm±.002 .0035±plus-or-minus\pm±.002 .0030±plus-or-minus\pm±.002
Cross-cor (lag 0) .0193±plus-or-minus\pm±.007 .0234±plus-or-minus\pm±.002 .0187±plus-or-minus\pm±.011 .0264±plus-or-minus\pm±.011
Cross-cor (lag 5) .0222±plus-or-minus\pm±.007 .1441±plus-or-minus\pm±.012 .0219±plus-or-minus\pm±.010 .0158±plus-or-minus\pm±.011
Marginal Dist .311±plus-or-minus\pm±1.13 .2157±plus-or-minus\pm±.306 .1636±plus-or-minus\pm±.223 .1234±plus-or-minus\pm±.126
Stock Auto-cor (lag 1) .127±plus-or-minus\pm±.005 .202±plus-or-minus\pm±.0035 .210±plus-or-minus\pm±.005 .0123±plus-or-minus\pm±.005
Auto-cor (lag 5) .149±plus-or-minus\pm±.009 .267±plus-or-minus\pm±.006 .104±plus-or-minus\pm±.006 .0187±plus-or-minus\pm±.006
Cross-cor (lag 0) .145±plus-or-minus\pm±.031 .169±plus-or-minus\pm±.041 .549±plus-or-minus\pm±.034 .1815±plus-or-minus\pm±.058
Cross-cor (lag 5) .341±plus-or-minus\pm±.031 .456±plus-or-minus\pm±.053 .747±plus-or-minus\pm±.038 .2510±plus-or-minus\pm±.062
Marginal Dist .3276±plus-or-minus\pm±.044 .2826±plus-or-minus\pm±.061 .4264±plus-or-minus\pm±.063 .2730±plus-or-minus\pm±.033
Air Auto-cor (lag 1) .1678±plus-or-minus\pm±.010 .320±plus-or-minus\pm±.006 .1949±plus-or-minus\pm±.006 .0927±plus-or-minus\pm±.003
Auto-cor (lag 5) .3226±plus-or-minus\pm±.016 .520±plus-or-minus\pm±.028 .5349±plus-or-minus\pm±.034 .4739±plus-or-minus\pm±.023
Cross-cor (lag 0) 2.608±plus-or-minus\pm±.106 1.942±plus-or-minus\pm±.059 2.844±plus-or-minus\pm±.0812 2.687±plus-or-minus\pm±.149
Cross-cor (lag 5) 3.181±plus-or-minus\pm±.101 2.176±plus-or-minus\pm±.116 2.536±plus-or-minus\pm±.112 2.115±plus-or-minus\pm±.121
Marginal Dist .5527±plus-or-minus\pm±.523 .5142±plus-or-minus\pm±.600 .6229±plus-or-minus\pm±.595 .5066±plus-or-minus\pm±.572
EEG Auto-cor (lag 1) 5.918±plus-or-minus\pm±.116 6.202±plus-or-minus\pm±.111 5.754±plus-or-minus\pm±.083 5.668±plus-or-minus\pm±.079
Auto-cor (lag 5) 4.285±plus-or-minus\pm±.074 5.911±plus-or-minus\pm±.107 5.265±plus-or-minus\pm±.083 4.467±plus-or-minus\pm±.127
Cross-cor (lag 0) 51.16±plus-or-minus\pm±.508 24.12±plus-or-minus\pm±.702 26.84±plus-or-minus\pm±.638 22.27±plus-or-minus\pm±.550
Cross-cor (lag 5) 47.97±plus-or-minus\pm±.354 31.31±plus-or-minus\pm±.920 25.95±plus-or-minus\pm±.466 19.43±plus-or-minus\pm±.412
Marginal Dist 15.18±plus-or-minus\pm±21.94 8.518±plus-or-minus\pm±13.6 13.35±plus-or-minus\pm±21.7 10.09±plus-or-minus\pm±16.6