A Subsequence Approach to Topological Data Analysis for Irregularly-Spaced Time Series

Sixtus Dakurah
Department of Statistics, University of Wisconsin-Madison
and
Jessi Cisewski-Kehe¹¹footnotemark: 1
Department of Statistics, University of Wisconsin-Madison The authors were supported by NSF Grant DMS-2038556. Support for this research was also provided by the University of Wisconsin-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation. The authors thank Chris Geoga from the University of Wisconsin-Madison for useful discussion related to this work.

Abstract

A time-delay embedding (TDE), grounded in the framework of Takens’s Theorem, provides a mechanism to represent and analyze the inherent dynamics of time-series data. Recently, topological data analysis (TDA) methods have been applied to study this time series representation mainly through the lens of persistent homology. Current literature on the fusion of TDE and TDA are adept at analyzing uniformly-spaced time series observations. This work introduces a novel subsequence embedding method for irregularly-spaced time-series data. We show that this method preserves the original state space topology while reducing spurious homological features. Theoretical stability results and convergence properties of the proposed method in the presence of noise and varying levels of irregularity in the spacing of the time series are established. Numerical studies and an application to real data illustrates the performance of the proposed method.

Keywords: Persistent Homology, Time-Delay Embedding, State Space Reconstruction, Astrostatistics

1 Introduction

A time series measurement $x(t)\in\mathbb{R}$ at time $t$ can be considered as the outcome of a data-generating space of some dynamical system (i.e., mathematical models that describes the evolution of variables over time) with state vector $\mathbf{s}(t)\in\mathbb{R}^{N}$ . Constructing a meaningful approximation of this underlying data-generating space when only the scalar time series is observed can uncover latent patterns and structures not readily apparent in the raw time series. Time-delay embeddings (TDEs) are often employed for this state space reconstruction. The TDE transforms the time-series data from the time-domain to an estimate of the state space, which can reveal properties of the system such as periodicity and other structures not apparent in the time domain. The principle underlying TDEs is Takens’s Theorem, which asserts that even if the actual dynamics (i.e., the system’s behavior over time) are not known, a single time series can be treated as a one-dimensional projection of the path traced by the system’s state vector in a multi-dimensional space. An approximation to the actual dynamics can be constructed from this projection (Takens, 2006). Takens proved that assuming uniformly-spaced and noise-free measurements of unlimited length, there exists a diffeomorphism (i.e., a smooth and invertible function) between the true high-dimensional system and the (TDE) reconstructed system. This theorem forms the foundation for much of the discussions on reconstructing the multi-dimensional state of a system from a single time series (Ali et al., 2007).

More recently, there is renewed interest in coupling the TDE method with tools from topological data analysis (TDA) to study the underlying dynamics of time-series data using various geometric and topological features of the TDE such as clusters, loops, voids, and their higher dimensional analogs (Gholizadeh and Zadrozny, 2018). In particular, the reconstructed state space obtained from the TDE reveals consistent patterns and structures that reflect the shape and characteristics of the underlying time-series dynamics (Takens, 2006). TDA is a computational method for studying the shape of data, which can be applied to characterize the topological features of these reconstructed state spaces. The characterization is often carried out using persistent homology, a tool of TDA, which employs a multi-scale approach to quantify certain topological features (Edelsbrunner et al., 2000; Edelsbrunner and Harer, 2022) (see Section 2.1 for more details). TDA and TDE have been successfully applied to quantify periodicity in time-series data (Perea and Harer, 2015), analyze human speech (Brown and Knudson, 2009), detect motion patterns in video (Tralie and Perea, 2018), and in wheeze detection (Emrani et al., 2014).

In the applications noted above, the observed time series is uniformly-spaced. However, time series is often not uniformly-spaced due to measurement lapses (Stark et al., 1997), process errors (Casdagli et al., 1991), or inherent features of the data generating process (Stark et al., 1997; Lekscha and Donner, 2018), etc. The standard Takens’s theorem does not handle irregularly-spaced time series, but several options exist in the literature to address issues related to irregularly-spaced time series observations to make it amenable to TDE. Broadly, these can be classified into imputation or exclusion methods. Imputation methods involve predicting the missing observations, and then the analysis is carried out assuming a uniformly-spaced time series has been observed (Harvey and Pierse, 1984; Casdagli et al., 1991; Lekscha and Donner, 2018). Exclusion methods initially ignore the presence of missing values and assume a uniformly-spaced set. The TDE maps are then constructed and any map with a missing value is excluded (Boker et al., 2018; Johnson and Munch, 2022). If the imputation model is misspecified, it can produce structures in the TDE that do not reflect true properties of the data, and the exclusion method can significantly alter the shape of the TDE space (Huke and Broomhead, 2007; Boker et al., 2018). Since TDA can provide quantification of qualitative properties of the reconstructed state space, the drawbacks of the imputation and exclusion methods may distort topological features constructed from the TDE spaces.

In this work, we propose a subsequence method for carrying out a TDE of irregularly-spaced time-series data that preserves certain properties of the reconstructed state space. The level of irregularity of the time-series data is controlled by the regularity score (defined in Section 3.3). We show that the proposed method preserves the topological features of the original underlying state space of the time series while reducing spurious shape features. Theoretically, we prove stability and convergence results of the proposed subsequence method in the presence of noise and for varying levels of irregularity in the observed time series. Further, we demonstrate the competitiveness of the proposed subsequence method through simulation studies and an application to real data. We view this work as laying a necessary foundation for new statistical TDA methodology that has the potential for characterizing properties of time-series signals beyond period estimation; this point is expanded upon in the Discussion Section of the supplementary material (Section S2) included at the end of this article after the references.

2 Background on Topological Data Analysis

One of the main constructs for studying the topological features of data is persistent homology (Edelsbrunner et al., 2000; Edelsbrunner and Harer, 2022), which detects different dimensional holes in data. Persistent homology provides a multi-scale characterization of the homological features (holes) of a topological space by keeping track of when these topological features first appear (birth) and when they disappear (death) in a filtration. The main topological object in this work is a point cloud resulting from the $(M+1)$ -dimensional TDE space of time-series data. The following provide an introduction to persistent homology that are used in the proposed method.

2.1 Vietoris-Rips Filtration and Persistent Homology

Homology is an area of mathematics that looks for holes in a topological space, and persistent homology looks for holes in data. Topological holes are identified using ideas from algebraic topology and are characterized by different dimensional homology groups (Hatcher et al., 2002; Edelsbrunner and Harer, 2022). The zero-dimensional homology group ( ${H}_{0}$ ) contains connected components (clusters), the one-dimensional homology group ( ${H}_{1}$ ) contains loops, the two-dimensional homology group ( $H_{2}$ ) contains voids like the interior of a balloon, and more generally, the k-dimensional homology group ( ${H}_{k}$ ) can be thought of as representing k-dimensional holes.

The data for which we use persistent homology in the proposed method are point clouds derived from a timeseries. If we only computed the homology of the point-cloud data, it would not be useful because the data points are not connected resulting in only $H_{0}$ features. Instead, an intermediate structure is defined with simplicial complexes. A zero-simplex is a point, a one-simplex is a segment, a two-simplex is a triangle, and, more generally, a $k$ -simplex $\sigma=(v_{0},\cdots,v_{k})$ is a $k$ -dimensional polytope of $k+1$ affinely independent points (i.e., zero-simplexes) $v_{0},\cdots,v_{k}$ . A simplicial complex ${K}$ is a finite set of simplices such that for any $\sigma_{1},\sigma_{2}\in{K}$ , $\sigma_{1}\cap\sigma_{2}$ is a face (i.e., the convex hull of any non-empty subset of points that define the simplex) of both simplices, or the empty set; and a face of any simplex $\sigma\in{K}$ is also a simplex in ${K}$ . To compute the homology, simplicial complexes are built along a sequence of filtration values, which is discussed next.

A common approach to constructing simplicial complexes in TDA is the Vietoris-Rips (VR) complex (Vietoris, 1927; Edelsbrunner and Harer, 2022). A VR complex is constructed over a finite set of points $S=\{v_{1},v_{2},\cdots,v_{n}\}$ using a distance parameter $\delta\geq 0$ . For any subset of k-points $\{v_{i_{1}},\cdots,v_{i_{k}}\}$ , a $(k-1)$ -dimensional simplex is formed when the pairwise Euclidean distance between all points is at most $\delta$ . A collection of all such simplices forms the VR complex denoted as $VR(S,\delta)$ . The composition of the VR complex progresses hierarchically as $\delta$ increases. This leads to the concept of filtration, which defines an inclusion relation between the simplicial complexes for a set of $\delta$ values. More formally, for an ordered sequence of $\delta$ values: $0<\delta_{1}<\delta_{2}<\cdots<\delta_{q}<\infty$ , the VR complexes admits a nested structure, $VR(S,0)\subset VR(S,\delta_{1})\subset\cdots\subset VR(S,\delta_{q})\subset VR% (S,\infty).$ Figure 1 provides an illustration of a VR complex. The black points (zero-simplices) in Figure 1(a) and Figure 1(b) denotes the data with cyan balls of diameter $\delta=0.8$ and $\delta=1.5$ , respectively, along with the resulting one-simplices and two-simplices. The simplicial complex in Figure 1(a) is a subset of the simplicial complex displayed in Figure 1(b).

Refer to caption — (a) Point set with $\delta=0.8$

The inclusion relation between the VR complexes induces a map between the homology groups, ${H}_{k}(VR(S,0))\xrightarrow{}{H}_{k}(VR(S,\delta_{1}))\xrightarrow{}\cdots% \xrightarrow{}{H}_{k}(VR(S,\infty)).$ The notion of persistent homology is developed through these homology maps by tracking the changes in the generators (i.e., the different dimensional holes) of these nested homology groups. The birth time and death time of homology group generators along this sequence encodes topological changes in the groups. For a homology group ${H}_{k}$ , we denote the birth time and death time of the $j$ -th homology group generator (i.e., the $j$ -th hole) by $b_{j}$ and $d_{j}$ , respectively. The persistence, or lifetime, of the generator is given by $d_{j}-b_{j}$ , and a longer persistence is often considered to be topological signal while shorter persistence often represents topological noise. If we let $k_{j}$ be the homology group dimension of the $j$ -th generator, and ${J}$ the index set of the generators of the homology groups, then the set $\text{Dgm}(S)=\{(b_{j},d_{j},k_{j}):\forall j\in J\}\cup\Delta$ fully characterizes the persistence (life span) of the homology generators, and is used to construct a graphical summary referred to as the persistence diagram; see example in Figure 1(c). The $\Delta$ represents points where the birth time is equal to the death time, and it included in the definition of persistence diagrams for technical reasons (Mileyko et al., 2011).

A persistence diagram is a summary statistic that characterizes the birth and death times of different dimensional holes detected in a dataset. These diagrams can be useful on their own for visualizing topological information of complex data, but are often used for inference or prediction tasks (e.g., Fasy et al., 2014; Bubenik et al., 2015; Adams et al., 2017; Robinson and Turner, 2017; Xu et al., 2019; Berry et al., 2020; Pun et al., 2022; Glenn et al., 2024). In this work, persistence diagrams are used to characterize the TDE from time series and to quantify the performance of the proposed and comparison methods for reconstructing the state space of irregularly-spaced time-series data.

3 Method

The persistence of topological features forms the basis for most of our analysis on the TDE reconstructed state spaces. A TDE transforms a time series over a sliding window of size $M\tau$ to a point cloud in $\mathbb{R}^{M+1}$ . While Takens’s theorem guarantees exact reconstruction for uniformly-spaced time-series data with appropriate $M$ and $\tau$ , no such guarantees exist for irregularly-spaced time series. Here, we propose a new reconstruction method for irregularly-spaced time series that better preserves the topological features of the reconstructed state space compared to existing methods.

3.1 A Note on Terminology

For this work, the term uniformly-spaced time series is used to describe time series that have equally-spaced time intervals between successive observations, and is considered the “true” times series for purposes of evaluating the proposed method. The term irregularly-spaced time series refers to observations with unequally-spaced time interval between successive observations. To characterize how the two forms of time series are related, it is assumed throughout this work that the irregularly-spaced time series is a subset of the uniformly-spaced time series. Hence the irregularly-spaced time series always have fewer time measurements than the corresponding uniformly-spaced time series.

The concept of TDE is also referred to in the literature as a delayed coordinate embedding, a sliding window embedding, or simply a Takens embedding. For this work, only the term TDE is used. For any irregularly-spaced time series, it is assumed there is a “true” underlying uniformly-spaced time series. The use of “TDE” exclusively refers to an embedding constructed from this true underlying uniformly-spaced time series. After applying the proposed subsequence method, the resulting embedding is referred as the subsequence embedding (SSE). In instances where an exposition applies to both the TDE and SSE, the term embedding map is used as a collective reference to the two concepts.

For a given time interval, it is assumed that missing or unobserved values occurs with a given probability. That is, for a given time point in a time interval, a measurement is not observed at that point with some probability. Such probabilistic mechanism governing the observations of time series values is not uncommon in the literature (e.g., Dunsmuir and Robinson, 1981). This probability can be fixed for all time points or it can vary for each time point. This characterization is referred to as the missingness structure of the time series in context.

3.2 Basics of Time-Delay Embeddings

For this work, the discussion on TDEs is restricted to univariate time series. Assume this univariate time series is generated by a system with a state vector $\mathbf{s}(t)$ on a manifold which is a subset of some $N$ -dimensional space $\mathbb{R}^{N}$ . The state vector $\mathbf{s}(t)$ is not directly observable, however some measurement of it, denoted $x(t)=h(\mathbf{s}(t))$ , is observed through the measurement function $h(\cdot)$ . The measurement function $h(\cdot)$ can be thought of as rule that transforms the high-dimensional state vector into the observed univariate time series $x(t)$ . For instance, in astronomy, $\mathbf{s}(t)$ might include variables such as positions, velocities, and luminosities of various celestial bodies, such as exoplanets, stars, or galaxies. However, the measurement function is specifically designed to extract a single scalar value from this vector. The specific form of $h(\cdot)$ is influenced by many considerations, for example, the limitation of observational tools. For a star, the measurement function could be designed to extract a key observable from the state vector, such as its luminosity. Thus the measurement $h(\mathbf{s}(t))$ reflects the observed brightness of the star at any given time $t$ .

The scalar value $x(t)$ is the observed time series measurement. Define the function $F:\mathbb{R}^{N}\xrightarrow{}\mathbb{R}^{M+1}$ as the embedding map with the form $F(\mathbf{s}(t))=\left[x(t),x(t+\tau),\cdots,x(t+M\tau)\right]$ . If the measurement function $h(\cdot)$ is noise-free, and the embedding dimension $M+1$ is chosen to be more than twice the dimension of the attractor (i.e., the $N$ -dimensional region toward which the system evolves) of the system’s state space, Takens’s theorem guarantees that the embedding map $F(\mathbf{s}(t))$ has a one-to-one correspondence between the original state space of the system (from which the time series is derived) and the reconstructed state space formed by $F(\mathbf{s}(t))$ (Takens, 2006). This ensures that the dynamics of the system can be studied in the reconstructed space as if it were being studied in the original space. Figure 2 demonstrates this reconstruction process by mapping the scalar time series to the TDE matrix $F$ to reconstruct the state space.

The choice of embedding dimension $M+1$ and step size $\tau$ is a subject of considerable research in the literature (e.g., Cao, 1997; Kim et al., 1999). In this work, the embedding dimension is chosen manually. However, the method of false nearest neighbors is one common method for determining this dimension, which identifies points in a low-dimensional space that appear to be near each other but are not actually neighbors when the data is viewed in a high-dimensional space. By systematically increasing the embedding dimension, and evaluating the percentage of false nearest neighbors, the dimension can be set where this percentage drops significantly, indicating a suitable dimension. More details about this and other procedures for determining $M$ and $\tau$ can be found in Cao, (1997); Kim et al., (1999). A large value of $M$ is often preferred as it enables the embedding to capture more details inherent in the time series. If $M$ is too large, there may be an insufficient number of points in the embedding space. Furthermore, if $M\tau$ is too small due to a small $\tau$ , relatively fewer points fall in each embedding window. This results in points repeatedly appearing in windows, which can lead to redundant information. If $M\tau$ is too large due to a large value of $\tau$ , the reconstructed state space can be distorted because relevant periodic behavior of the time series may not be captured (Casdagli et al., 1991). Hence the choice of $\tau$ and $M$ is such that $M\tau$ is not too large or too small, but is application-dependent and requires empirical testing.

3.3 Subsequence Method

The TDE construction in the previous section assumes the observed time series is uniformly-spaced, but a time series is often irregularly-spaced in real data. We propose a method to extract uniformly-spaced subsequences from the observed irregularly-spaced time series and prove its topology-preserving properties, along with consistency and convergence results.

3.3.1 Subsequence construction

Let $\mathbf{x}=\left(x(t):t\in\mathcal{T}\right)$ be a time series of length $n$ where $\mathcal{T}=\{t_{1},\cdots,t_{n}\}\subset\mathbb{N}$ . Further assume that this time series is not uniformly spaced, that is, $t_{i+1}-t_{i}\neq t_{i+2}-t_{i+1}$ , for some $t_{i}\in\mathcal{T}$ such that $t_{i}<t_{i+1}$ . In this work, a subsequence of the set $\mathbf{x}$ is defined as any subset that omits elements of $\mathbf{x}$ without changing the order of the remaining elements. This definition does not guarantee that $t_{i+1}-t_{i}=t_{i+2}-t_{i+1},$ $\forall t_{i}\in\mathcal{T}$ , which is a condition we want to achieve with the proposed subsequence construction. Let $\mathbf{x}_{p,r}\subseteq\mathbf{x}$ be a subset of the original time series with time index $\mathcal{T}_{p,r}\subseteq\mathcal{T}$ with the condition that $t_{p,i+1}-t_{p,i}=r,$ $\forall t_{p,i}\in\mathcal{T}_{p,r}$ . The set $\mathbf{x}_{p,r}$ is the $p$ -th subsequence of regularity $r$ , and it is a uniformly-spaced subsequence. For any non-uniformly spaced time series, we can build a collection of such subsequence for various values of $r$ . The goal is to first obtain the longest subsequence for a small $r$ . As the subsequence length reduces for a given $r$ , the regularity value $r$ can increase to obtain more uniformly-spaced subsequences. An algorithm for computing this collection of subsequences is displayed in Algorithm 1, which is adapted from an algorithm that finds the longest arithmetic progression in a sequence developed in (Erickson, 1999). In the statement of the algorithm, the following notation is used: (i) the union symbol $\cup$ denotes the addition of a set (element) to a set (vector), (ii) the number of elements in a set or a vector $A$ is denoted by $|A|$ , and (iii) the notation $A\backslash B$ represents subset of elements in set $A$ obtained by excluding all elements from set $B$ .

Algorithm 1 Uniform subsequence construction

1:Time points

\mathcal{T}=\{t_{1},\cdots,t_{n}\}

, regularity score

r

, minimum sequence length

m

r\leq\ t_{n}-t_{1}

m\leq n

\mathcal{T}_{p}\leftarrow\{\dots\}

, temporary time index,

\mathbf{T}_{\text{reg}}\leftarrow\{\}

, uniformly-spaced subsequences.

4:while number of elements in

\mathcal{T}

is greater than m do

5: for

i=1:(|\mathcal{T}|-1)

\mathcal{T}_{\text{sub}}\leftarrow\mathcal{T}[i]

\triangleright

Initialize a subsequence.

7: for

j=(i+1):|\mathcal{T}|

8: if

\mathcal{T}[j]-\mathcal{T}_{\text{sub}}[j-i]=r

then

\triangleright

Check the regularity condition.

\mathcal{T}_{\text{sub}}\leftarrow\mathcal{T}_{\text{sub}}\cup\mathcal{T}[j]

; if

|\mathcal{T}_{\text{sub}}|>|\mathcal{T}_{p}|

then

\mathcal{T}_{p}\leftarrow\mathcal{T}_{\text{sub}}

10: else break

\triangleright

Initialize with the next point in the sequence.

11: if

|\mathcal{T}_{p}|\geq m

and

\mathcal{T}_{p}

is not identical to any other subsequence in

\mathbf{T}_{reg}

then

12:

\mathbf{T}_{reg}\leftarrow\mathbf{T}_{reg}\cup\mathcal{T}_{p}

;

\mathcal{T}\leftarrow\mathcal{T}\backslash\mathcal{T}_{p}

\triangleright

Remove the subset from the sequence.

13: else break

\triangleright

No uniformly-spaced subsequence of the required length exist.

14:return

\mathbf{T}_{\text{reg}}

\triangleright

Set of all regularly spaced subsequences each of regularity

r

Algorithm 1 returns all possible uniformly-spaced time points from the time index set $\mathcal{T}$ with regularity score $r$ . Note that for uniformly-spaced $\mathcal{T}$ , it returns the full sequence. The uniformly-spaced observations can now be obtained by simply matching these observations to the time points in each subsequence. Not all the subsequences returned by Algorithm 1 are required in the SSE (see Remark 1).

3.3.2 Subsequence embedding method

Takens’s theorem guiding the construction of the TDE in Section 3.2 involves a single measurement function $h(\cdot)$ , which generates each time series measurement (Takens, 2006). A generalization considers each coordinate in the embedding maps as a measurement function (see Remark $2.9$ in Sauer et al., (1991) and Theorem $2$ in Deyle and Sugihara, (2011)). Such generalizations allow for the extension of Takens’s theorem to multiple measurement functions involving multiple time series. This motivates the proposed SSE method where each subsequence is viewed as distinct time series.

To construct the proposed SSE, a single distinct measurement function is defined on each subsequence. Let $h_{p}(\cdot)$ be the measurement function associated with the $p$ -th subsequence. Then the $p$ -th embedding mapping has the form

F_{p}(\mathbf{s}(t_{p,i}))=\left[x_{p,r}(t_{p,i}),x_{p,r}(t_{p,i}+\tau_{p}),% \cdots,x_{p,r}(t_{p,i}+M\tau_{p})\right],\quad x_{p,r}(t_{p,i})=h_{p}(\mathbf{% s}(t_{p,i})).

(1)

The delay step $\tau_{p}$ is fixed for each subsequence map. The map is also constructed under the assumption that the length of each subsequence $n_{p}>\max(M+1,M*\tau_{p})$ . This ensures that there are sufficient observations within each subsequence to construct a point in the embedding space. The embedding matrix from the $p$ -th subsequence has the form

\mathbf{F}_{p}=\begin{bmatrix}F(\mathbf{s}(t_{p,1}))^{\top}&F(\mathbf{s}(t_{p,% 2}))^{\top}&\cdots&F(\mathbf{s}(t_{p,n_{p}-M}))^{\top}\end{bmatrix}^{\top}.

(2)

The embedding matrix of the irregularly-spaced time series is then obtained by vertically stacking the embedding matrices from each of the subsequences. This is denoted as $\mathbf{F}=\begin{bmatrix}\mathbf{F}_{1};&\cdots&;\mathbf{F}_{P}\end{bmatrix}$ , where $P$ is the total number of uniformly-spaced subsequences. The matrix $\mathbf{F}$ is of dimension $\left(\sum_{p=1}^{P}(n_{p}-M)\right)\times(M+1)$ . Note that when the original time series is uniformly-spaced, the SSE method is identical to the TDE. To see this, observe that the longest subsequence in the uniformly-spaced time series is the original sequence.

To illustrate the SSE framework, Figure 3(a) shows measurements at 1000 uniformly-spaced time points (orange points and blue diamonds combined) of which about $20\%$ are designated as missing values (blue diamonds), which creates an irregularly-spaced time series (orange points). Both the uniformly-spaced and irregularly-space time series were embedded into $\mathbb{R}^{4}$ using the TDE and SSE methods, respectively. Figure 3(b)-top gives the TDE of the uniformly-spaced 1000 measurements and contains two identical elliptical shapes. Figure 3(b)-bottom shows the proposed SSE of the irregularly-spaced time series and also contains two similar elliptical shapes, however, there is visible non-uniform spacing of the points compared to the TDE space. This is primarily due to the SSE using a subset of the original time series (i.e., it constructs a uniform subsample from the irregularly-spaced time series based on Algorithm 1); the SSE space may be considered as a sparse representation of the TDE space. The persistence diagram for the TDE is shown in Figure 3(c). Since Figure 3(b)-top has two identical elliptical shapes, the $H_{1}$ features have overlapping birth and death time, hence the appearance of a single blue triangle. Figure 3(d) shows the persistence diagram for the SSE, and correctly identifies the two loops but the birth and death time are non-overlapping due to the non-identical spacing of the points in the two elliptical shapes. In general, the SSE converges to the TDE in terms of the topological similarity of the reconstructed spaces and in the closeness of the persistence diagrams as the time sampling becomes more uniform. A more formal theoretical justification of this assertion, and other technical considerations are discussed in the next section.

Remark 1.

The choice of the number of subsequences $P$ and the number of subsequences of different regularity scores $r$ depend on the context and goals of the analysis. To reconstruct an $(M+1)$ -dimensional state space, subsequences must satisfy $n_{p}>\max(M+1,M*\tau_{p})$ . A set of subsequences with the same regularity score can lead to better reconstruction accuracy as it captures the dominant patterns of the underlying data-generating space of the time series more coherently. Combining subsequences with different regularity scores can improve topological approximations as it captures a wider range of structures of the underlying data-generating space. Thus, there is a trade-off between a better topology approximation and improved reconstruction accuracy. If subsequences with the same regularity score capture most of the time series, combining sequences with different regularity scores may offer limited benefits. While the simulations and real data analysis in this work utilize subsequences with the same regularity score, the methodology and theoretical results apply to subsequences with the same or different regularity scores.

4 Stability and Convergence Results

The reconstructed state space using the proposed SSE is an estimate of the “true” state space based on a uniformly-sampled time series (i.e., the TDE space). Persistence diagrams are used to quantify the stability of the estimate by measuring its closeness to the true state space. The results are stated below, and the proofs are included in the supplementary materials Section S1.

4.1 Stability Theory in Persistence Homology

The set of persistence diagrams from the true state space (TDE) and the reconstructed state space (SSE) can be endowed with a distance measure, such as the popular bottleneck distance. The bottleneck distance gives the minimal bijection between any two diagrams as a measure of the distance between the diagrams. Let $\mathbf{F}^{1}$ and $\mathbf{F}^{2}$ be two finite compact subsets of $\mathbb{R}^{M+1}$ , with $\text{Dgm}(\mathbf{F}^{1})$ and $\text{Dgm}(\mathbf{F}^{2})$ as their corresponding persistence diagrams. The bottleneck distance, $\text{d}_{\text{B}}$ , between the two persistence diagrams is defined as

\text{d}_{\text{B}}\left(\text{Dgm}(\mathbf{F}^{1}),\text{Dgm}(\mathbf{F}^{2})% \right)=\inf_{\gamma}\sup_{\mu\in\text{Dgm}(\mathbf{F}^{1})}||\mu-\gamma(\mu)|% |_{\infty},

(3)

where the infimum is taken over all bijections $\gamma:\text{Dgm}(\mathbf{F}^{1})\xrightarrow{}\text{Dgm}(\mathbf{F}^{2})$ . For the metric space $(\mathbb{R}^{M+1},\lVert\cdot\rVert_{2})$ , where $\lVert\cdot\rVert_{2}$ denotes the $l^{2}$ -norm, define the distance function as: $d(F,\mathbf{F}^{1})=\inf_{F^{1}\in\mathbf{F}^{1}}\lVert F-F^{1}\rVert_{2},$ $\forall F\in\mathbb{R}^{M+1}.$ Then the Hausdorff distance, $d_{H}$ , is given by

d_{H}(\mathbf{F}^{1},\mathbf{F}^{2})=\max\left\{\sup_{F^{1}\in\mathbf{F}^{1}}d% (F^{1},\mathbf{F}^{2}),\sup_{F^{2}\in\mathbf{F}^{2}}d(F^{2},\mathbf{F}^{1})% \right\}.

(4)

The Hausdorff distance quantifies the level to which two metric spaces represent the same object. A modification of the Hausdorff distance that is blind to isometric transformations is the Gromov-Hausdorff distance. Let $\nu_{1}:\mathbf{F}^{1}\hookrightarrow\mathbb{R}^{M+1}$ and $\nu_{2}:\mathbf{F}^{2}\hookrightarrow\mathbb{R}^{M+1}$ be isometric (distance preserving) embeddings into the metric space $(\mathbb{R}^{M+1},\lVert\cdot\rVert_{2})$ . Then the Gromov-Hausdorff distance between $\mathbf{F}^{1}$ and $\mathbf{F}^{2}$ is given by: $d_{GH}(\mathbf{F}^{1},\mathbf{F}^{2})=\inf_{\nu_{1},\nu_{2}}d_{H}(\nu_{1}(% \mathbf{F}^{1}),\nu_{2}(\mathbf{F}^{2}))$ . A fundamental result on persistence diagrams is that they are stable summaries in many settings (i.e., a small change in a point cloud result in a small change in the corresponding persistence diagram) (Chazal and Michel, 2021). The stability relation of the persistence diagrams is stated as

\text{d}_{\text{B}}\left(\text{Dgm}(\mathbf{F}^{1}),\text{Dgm}(\mathbf{F}^{2})% \right)\leq 2\text{d}_{\text{GH}}(\mathbf{F}^{1},\mathbf{F}^{2})\leq 2\text{d}% _{\text{H}}(\mathbf{F}^{1},\mathbf{F}^{2}).

(5)

In what follows, these stability results are established for the proposed SSE method and a denoising procedure to reduce the noise present in the observed time series.

4.2 Stability of Denoising Procedure

Time-series data is typically observed with noise. The level of noise in the reconstructed space influences the presence and the persistence of the topological features. A Fourier denoising procedure is proposed to filter out noise in the observed time series.

Let $\mathbf{x}=\begin{bmatrix}x(t_{1}),&x(t_{2}),&\cdots&,x(t_{n})\end{bmatrix}^{\top}$ be an observed time series vector. The first step in the denoising procedure is to transform this observed signal to the frequency domain. The discrete Fourier transform (DFT) of $x(t_{k})$ , denoted as $\tilde{x}(t_{k})$ is given by

\tilde{x}(t_{k})=\sum_{r=1}^{n}x(t_{r})e^{-j2\pi w_{r}f_{k}}=\sum_{r=1}^{n}x(t% _{r}){\phi}_{kr},\quad 1\leq k\leq n,

(6)

where $j$ is the imaginary unit ( $j^{2}=-1$ ), $\phi_{kr}=e^{-j2\pi w_{r}f_{k}}$ , $0\leq w_{r}\leq 1$ are sample points, $0\leq f_{k}\leq n$ are frequencies, and $\tilde{x}(t_{k})$ is the $k$ -th sample of the power spectrum at $f_{k}$ .

To filter out noise, the power spectral density is computed for each observation $\tilde{x}(t_{k})$ . Then a threshold is chosen, and any $\tilde{x}(t_{k})$ with power spectral density less than the threshold is set to zero. In selecting the threshold, the goal is to choose a value that does not smooth out the peaks in the true signal. The derivation that follows assumes the selected threshold preserves the peaks in the true signal. To simplify notations, the thresholded observations are also denoted as $\tilde{x}(t_{k})$ . The thresholded $\tilde{x}(t_{k})$ are transformed back to the time domain to get the noise-reduced signal. which typically involves multiplying $\tilde{x}(t_{k})$ by the inverse of a Fourier transform matrix.

Let $\tilde{\mathbf{x}}$ be the DFT of the time series vector with corresponding Fourier basis $\boldsymbol{\phi}_{k}$ , where

\tilde{\mathbf{x}}=\begin{bmatrix}\tilde{x}(t_{1}),&\tilde{x}(t_{2}),&\cdots,&% \tilde{x}(t_{n})\end{bmatrix}^{\top},\boldsymbol{\phi}_{k}=\begin{bmatrix}\phi% _{k1},&\phi_{k2},&\cdots,&\phi_{kn}\end{bmatrix}^{\top},k=1,\ldots,n.

(7)

The forward transform in Equation (6) can be vectorized:

\mathbf{\tilde{x}}=\boldsymbol{\Phi}\mathbf{x},\quad\boldsymbol{\Phi}=\begin{% bmatrix}\boldsymbol{\phi}_{1}&\boldsymbol{\phi}_{2}&\cdots&\boldsymbol{\phi}_{% n}\end{bmatrix}^{\top}.

(8)

The backward transform can then be determined by inverting the matrix $\boldsymbol{\Phi}$ . However, due to the non-uniformity in the spacing of the time series $\mathbf{x}$ , the columns of $\boldsymbol{\Phi}$ are not orthogonal, and it is not directly invertible, so the pseudo-inverse is used instead. The backward transform then has the form: $\mathbf{x}=\frac{1}{n}(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})^{\dagger}% \boldsymbol{\Phi}^{H}\tilde{\mathbf{x}}$ , where $\mathbf{A}^{H}$ and $\mathbf{A}^{\dagger}$ denote the complex conjugate transpose and the Moore-Penrose inverse of the matrix $\mathbf{A}$ , respectively. The matrix $(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})^{\dagger}\boldsymbol{\Phi}^{H}$ projects the frequency vector $\mathbf{\tilde{x}}$ onto the column space of $\boldsymbol{\Phi}$ . Let $\Pi_{\boldsymbol{\Phi}}(\mathbf{x})$ denote this projection operation. The modulus of off-diagonal elements of $\boldsymbol{\Phi}^{H}\boldsymbol{\Phi}$ is bounded as: $\left|\sum_{k=1}^{n}e^{j2\pi(w_{l_{1}}-w_{l_{2}})f_{k}}\right|\leq\sum_{k=1}^{% n}\left|e^{j2\pi(w_{l_{1}}-w_{l_{2}})f_{k}}\right|=n$ , where the equality follows from the definition of the complex modulus $|{z}|=\sqrt{{z}\bar{z}}$ , with $\bar{z}$ as the conjugate of the complex value $z$ . The matrix $\boldsymbol{\Phi}\boldsymbol{\Phi}^{H}$ has the structure

(\boldsymbol{\Phi}\boldsymbol{\Phi}^{H})_{l_{1},l_{2}}=\sum_{k=1}^{n}e^{(-1)^{% \delta}j2\pi(f_{l_{2}}-f_{l_{1}})w_{k}},\quad\delta=\mathbbm{1}(l_{1}\leq l_{2% }),

(9)

for indicator function $\mathbbm{1}(l_{1}\leq l_{2})$ , and $(\boldsymbol{\Phi}\boldsymbol{\Phi})^{H}_{l_{1},l_{2}}$ denotes the value in the $l_{1}$ -th row and $l_{2}$ -th column of $\boldsymbol{\Phi}\boldsymbol{\Phi}^{H}$ . Then $\boldsymbol{\Phi}\boldsymbol{\Phi}^{H}$ is Toeplitz when the frequency components are uniformly-spaced such that $f_{k}=k$ , and $\boldsymbol{\Phi}\boldsymbol{\Phi}^{H}$ is fully specified by its first row elements (HuoLiu and YuanTang, 1998).

Using these results, the following proposition asserts that the DFT preserves topological features and is stable with respect to the bottleneck distance. In particular, the bottleneck distance between the persistence diagrams of the embeddings of the noise-free and smoothed (i.e., noise-reduced) time series is bounded above by the embeddings of the observed noisy and noise-free time series.

Proposition 4.1.

Given $\mathbf{x}^{*}\in\mathbb{R}^{n}$ as a possibly irregularly-spaced scalar time series with additive noise of the form $\mathbf{x}^{*}=\mathbf{x}+\mathbf{\varepsilon}$ , where $\mathbf{x}$ is a noise-free scalar time series, and $\varepsilon$ is a zero-mean noise term, then let $\mathbf{x}^{\prime}$ be the time series vector after applying the proposed Fourier denoising to $\mathbf{x}^{*}$ , and $\mathbf{F}$ , $\mathbf{F}^{*}$ , and $\mathbf{F}^{\prime}$ be the embedding matrices associated with $\mathbf{x}$ , $\mathbf{x}^{*}$ , and $\mathbf{x}^{\prime}$ , respectively. Also, let $\text{Dgm}(\mathbf{F})$ and $\text{Dgm}({\mathbf{F}^{\prime}})$ denote the persistence diagrams associated with the Vietoris-Rips complex constructed from $\mathbf{F}$ and $\mathbf{F}^{\prime}$ , respectively. Then the bottleneck distance between these two persistence diagrams is bounded as

\text{d}_{\text{B}}(\text{Dgm}({\mathbf{F}}),\text{Dgm}({\mathbf{F}^{\prime}})% )\leq\frac{4n-2}{\gamma}\left(\sup\limits_{i,p}||F^{*}(\mathbf{s}(t_{p,i}))-F(% \mathbf{s}(t_{p,i}))||_{2}\right),

(10)

where $0<\gamma\leq 1,\quad 1\leq i\leq n_{p}-M\tau_{p},\quad 1\leq p\leq P.$

Remark 2.

When the samples are uniformly-spaced in both the time and frequency domain, the matrix $\boldsymbol{\Phi}$ is Hermitian with orthogonal columns, hence the factor $({2n-1})/{\gamma}$ $(\text{not }({4n-2})/{\gamma}$ , as the scaling by $2$ is still required $)$ is not required for the bound to hold. The factor $({2n-1})/{\gamma}$ makes the bound conservative. However, if the denoising is well constructed, this has minimal impact on the scale of the bound. Numerical experiments in the supplementary materials Section S2.1 suggest this is the case for the settings considered.

4.3 Stability of the Subsequence Embedding Method

In this section, an error bound and stability results for the SSE approximation of the TDE are established. To simplify the notation, the embedding matrices are represented as sets where the elements of the set are the row vectors of the corresponding TDE or SSE. Also, for an embedding from a single uniformly-spaced time series, the step-size is assumed to be $\tau$ . When constructing from a set of $P$ subsequences, a step-size of $\tau_{p}$ is assumed for the $p$ -th subsequence, where $1\leq p\leq P$ .

Because the SSE can have fewer elements than the TDE, the following lemma addresses how to expand the SSE without affecting its topology by repeating already existing points in the SSE, so distances can be computed between the TDE and the expanded SSE.

Lemma 4.2 (Topology-preserving transform).

Let $\mathbf{F}^{1}$ be an embedding matrix from a uniformly-spaced time sequence of length $n$ with the form:

\mathbf{F}^{1}=\left\{F^{1}(\mathbf{s}(t_{1})),F^{1}(\mathbf{s}(t_{2})),\cdots% ,F^{1}(\mathbf{s}(t_{n-M\tau}))\right\}\subset\mathbb{R}^{M+1}.

(11)

Also, let $\mathbf{F}^{2}$ be an embedding matrix from a set of $P$ subsequences with the form:

\mathbf{F}^{2}=\left\{F^{2}(\mathbf{s}(t_{1,1})),\cdots,F^{2}(\mathbf{s}(t_{1,% n_{1}-M\tau_{1}})),\cdots,F^{2}(\mathbf{s}(t_{P,1}),\cdots,F^{2}(\mathbf{s}(t_% {P,n_{P}-M\tau_{P}}))\right\}\subset\mathbb{R}^{M+1},

(12)

where $\sum_{p=1}^{P}(n_{p}-M\tau_{p})\leq n-M\tau$ . Consider the set extension $\widehat{\mathbf{F}}^{2}=\left\{\mathbf{F}^{2},{\mathbf{F}}^{2}_{k}\right\}$ , where ${\mathbf{F}}^{2}_{k}$ is a subset of $k$ elements from $\mathbf{F}^{2}$ . Then the persistence diagrams associated with $\mathbf{F}^{2}$ and $\widehat{\mathbf{F}}^{2}$ are identical, that is, $\text{Dgm}(\mathbf{F}^{2})\equiv\text{Dgm}(\widehat{\mathbf{F}}^{2})$ , and the three embedded spaces are related through the bottleneck distance as follows: $\text{d}_{\text{B}}(\text{Dgm}(\mathbf{F}^{1}),\text{Dgm}(\mathbf{F}^{2}))=% \text{d}_{\text{B}}(\text{Dgm}(\mathbf{F}^{1}),\text{Dgm}(\widehat{\mathbf{F}}% ^{2})).$

Lemma 4.2 asserts that a particular set of points can be added to an embedding matrix without changing the SSE’s persistence diagram. This is used to establish a bound on the SSE as an approximation to the TDE in the following proposition. In Lemma 4.2, when $k=(n-M\tau)-\sum_{p=1}^{P}(n_{p}-M\tau_{p})$ , the row dimension of $\widehat{\mathbf{F}}^{2}$ is the same as that of $\mathbf{F}^{1}$ . In such instances, when the interest is in a row-wise comparison of $\widehat{\mathbf{F}}^{2}$ and $\mathbf{F}^{1}$ the subsequence indexing in the time variable for any $\widehat{F}^{2}(\mathbf{s}(t_{p,i}))\in\widehat{\mathbf{F}}^{2}$ is ignored and a row is simply written as $\widehat{F}^{2}(\mathbf{s}(t_{i}))$ , where $1\leq i\leq n-M\tau$ .

Proposition 4.3.

Let $\mathbf{x}^{1}$ be a uniformly-spaced time series vector of length $n$ with TDE matrix $\mathbf{F}^{1}$ . Let $\mathbf{x}^{2}\subset\mathbf{x}^{1}$ be a time series vector where some of the elements are missing or unobserved. Denote the SSE matrix constructed from $\mathbf{x}^{2}$ as $\mathbf{F}^{2}$ . Define the extension $\widehat{\mathbf{F}}^{2}=\left\{\mathbf{F}^{2},{\mathbf{F}}^{2}_{k}\right\}$ , where ${\mathbf{F}}^{2}_{k}$ is a subset of $k=(n-M\tau)-\sum_{p=1}^{P}(n_{p}-M\tau_{p})$ elements from $\mathbf{F}^{2}$ . Then the bottleneck distance between $\mathbf{F}^{1}$ and $\mathbf{F}^{2}$ is bounded as: $\text{d}_{\text{B}}(\text{Dgm}(\mathbf{F}^{1}),\text{Dgm}(\mathbf{F}^{2}))\leq 2% \sup\limits_{1\leq i\leq n-M\tau}\lVert F^{1}(\mathbf{s}(t_{i}))-\widehat{F}^{% 2}(\mathbf{s}(t_{i}))\rVert_{2}.$

The choice of the $k$ subsets of embedding vectors $\mathbf{F}^{2}_{k}$ in Proposition 4.3 is arbitrary as any subset satisfies the bound. However, since they are chosen to match the subset $\{\mathbf{F}^{\prime}(\mathbf{s}_{t_{l}}):\sum_{p=1}^{P}(n_{p}-M\tau_{p})+1% \leq l\leq n-M\tau\}$ of $\mathbf{F}^{1}$ , the bound can be improved. The minimum bound can be attained by choosing a subset in $\mathbf{F}^{2}$ that has the smallest Euclidean distance to the subset $\{\mathbf{F}^{\prime}(\mathbf{s}_{t_{l}}):\sum_{p=1}^{P}(n_{p}-M\tau_{p})+1% \leq l\leq n-M\tau\}$ . This is summarized as a corollary below.

Corollary 4.4.

Let $\mathbf{x}^{1}$ be a uniformly spaced time series vector of length $n$ with TDE matrix $\mathbf{F}^{1}$ . Let $\mathbf{x}^{2}\subset\mathbf{x}^{1}$ be a time series vector where some of the elements are unobserved, and $\mathbf{F}^{2}$ its SSE matrix. Define the extension $\widehat{\mathbf{F}}^{2}=\left\{\mathbf{F}^{2},{\mathbf{F}}^{2}_{k}\right\}$ , where ${\mathbf{F}}^{2}_{k}$ is a random subset of $k=(n-M\tau)-\sum_{p=1}^{P}(n_{p}-M\tau_{p})$ elements from $\mathbf{F}^{2}$ . For some $F^{1}\in\mathbf{F}^{1}$ , define the set ${\mathbf{F}}^{2}_{k,\text{min}}$ as follows:

{\mathbf{F}}^{2}_{k,\text{min}}=\left\{F^{2}_{\text{min}}\in\mathbf{F}^{2}:% \lVert F^{2}_{\text{min}}-F^{1}\rVert_{2}\leq\lVert F^{2}-F^{1}\rVert_{2},% \forall F^{2}\in\mathbf{F}^{2},\text{ s.t. }{F}^{2}_{\text{min}}\neq{F}^{2}% \right\}.

(13)

That is, ${\mathbf{F}}^{2}_{k,\text{min}}$ is a subset of $k$ embedding vectors in $\mathbf{F}^{2}$ with minimum distance to some points in $\mathbf{F}^{1}$ . Let $\widehat{\mathbf{F}}^{2}_{\text{min}}=\left\{\mathbf{F}^{2},{\mathbf{F}}^{2}_{% k,\text{min}}\right\}$ , then it follows that

\sup\limits_{1\leq i\leq n}\lVert F^{1}(\mathbf{s}(t_{i}))-\widehat{F}^{2}_{% \text{min}}(\mathbf{s}(t_{i}))\rVert_{2}\leq\sup\limits_{1\leq i\leq n}\lVert F% ^{1}(\mathbf{s}(t_{i}))-\widehat{F}^{2}(\mathbf{s}(t_{i}))\rVert_{2},

(14)

where $F^{1}(\mathbf{s}(t_{i}))\in\mathbf{F}^{1}$ , $\widehat{F}^{2}(\mathbf{s}(t_{i}))\in\widehat{\mathbf{F}}^{2}$ , and $\widehat{F}^{2}_{\text{min}}(\mathbf{s}(t_{i}))\in\widehat{\mathbf{F}}^{2}_{% \text{min}}$ .

An immediate consequence of Proposition 4.3 and Corollary 4.4 is that the SSE matrix approximates the TDE for varying level of regularity. In particular, for a time series $\mathbf{x}=\{x(t):t\in\mathcal{T}\}$ , and $\mathcal{T}=\{t_{1},\cdots,t_{n}\}\subset\mathbb{N}$ , the sequence of embedding matrices for each $r$ where $1\leq r\leq t_{n}-t_{1}$ is finite. Hence for a fixed $n$ , the limiting persistence diagram as $r\xrightarrow[]{}1$ is close to the TDE’s persistence diagram. If $r=1$ , the SSE is exactly the same as the TDE, and the persistence diagrams would be identical. This reinforces the fact that the proposed reconstruction preserves the topological structures more accurately as the level of irregularity in the observed time series decreases. A more formal treatment of these observations is presented next.

4.4 Convergence Results

Throughout this section, it is assumed that the number of missing values increases at a slower rate than the sample size of the time series. Specifically, the missing values grows at a rate of $o(\log m)$ , for sample size $m$ . This assumption and others are formalized as follows.

Let $\mathbf{x}_{1},\mathbf{x}_{2},\cdots,$ be irregularly-spaced time series vectors where $|\mathbf{x}_{i}|<|\mathbf{x}_{j}|$ , $i<j$ . Denote by $\mathbf{F}_{m}$ , the SSE associated with $\mathbf{x}\in\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\}$ , i.e.,

\mathbf{F}_{m}=\left\{F(\mathbf{s}(t_{1})),F(\mathbf{s}(t_{2})),\cdots,F(% \mathbf{s}(t_{m}))\right\}.

(15)

Note the correspondence between the subscript $m$ and the number of points in $\mathbf{F}_{m}$ . $\mathbf{F}_{m}$ depends on which time series is selected; however, indexing over this selection is not needed for the following results. Recall that $\mathbf{F}_{m}$ is a compact subset of $(\mathbb{R}^{M+1},\lVert\cdot\rVert_{2})$ . Let the space $(\mathbb{R}^{M+1},\lVert\cdot\rVert_{2})$ be endowed with the unknown probability measure $\vartheta$ such that the set of $m$ time points are randomly sampled according to $\vartheta$ . Let $\vartheta$ be supported on an embedding $\mathbf{F}_{\vartheta}$ , and let $\varphi$ be the associated density function. Consider the following set of assumptions.

A1.

The sample size increases such that $\mathbf{x}_{i}\subset\mathbf{x}_{j}$ whenever $i<j$ .
A2.

Let $\varepsilon_{m}(r)$ be a function of $m$ and the regularity score $r$ such that $\varepsilon_{m}(1)\xrightarrow{}0$ as $m\xrightarrow{}\infty$ .
A3.

For any point $F_{\vartheta}\in\mathbf{F}_{\vartheta}$ , $\vartheta(B(F_{\vartheta},\delta))\geq\min(\kappa\delta^{M+1},1)$ , where $B(F_{\vartheta},\delta)$ is a closed ball of radius $\delta>0$ around $F_{\vartheta}$ , with constant $\kappa>0$ .

A4.

It is possible to create joint distributions based on the marginals of $\mathbf{F}_{m}$ that satisfy

\sup_{\mathbf{F}_{m}}\left|\frac{\varphi\left(F(\mathbf{s}(t_{1})),\cdots,F(% \mathbf{s}(t_{m}))\right)-\varphi\left(F(\mathbf{s}(t_{1}))\right)\times\cdots% \times\varphi\left(F(\mathbf{s}(t_{m}))\right)}{\varphi\left(F(\mathbf{s}(t_{1% }))\right)\times\cdots\times\varphi\left(F(\mathbf{s}(t_{m}))\right)}\right|% \leq\eta_{m},

where $\eta_{m}$ is such that $\sum_{m=1}^{\infty}\frac{\eta_{m}}{m^{\beta}\log(m)}<\infty$ for any $\beta>1$ .

Assumption A4 is to address the possible lack of independence of the vectors in $\mathbf{F}_{m}$ . Under this assumption, the dependence can be controlled and the vectors in $\mathbf{F}_{m}$ are regarded as the so-called $\eta_{m}$ -almost independent samples, which allows for $\mathbf{F}_{m}$ to converge in Hausdorff distance to $\mathbf{F}_{\vartheta}$ (Aaron et al., 2017; Picado and Oliveira, 2020). The SSE matrix $\mathbf{F}_{m}$ can be regarded as an estimator of $\mathbf{F}_{\vartheta}$ and convergence results can be established in the context of assumptions A1-A4. These results are analogous to convergence results established on support estimation of $d$ -dimensional sets (Cuevas and Rodríguez-Casal, 2004), its generalization to metric spaces, and on the space of persistence diagrams (Mileyko et al., 2011; Chazal et al., 2014). The following result gives the rate of convergence in estimating $\mathbf{F}_{\vartheta}$ .

Theorem 4.5.

Let $\mathbf{x}_{1},\mathbf{x}_{2},\cdots,$ be a sequence of irregularly-spaced time series vectors satisfying assumption A1, and $\mathbf{F}_{m}=\left\{F(\mathbf{s}(t_{1})),F(\mathbf{s}(t_{2})),\cdots,F(% \mathbf{s}(t_{m}))\right\}\subset\mathbb{R}^{M+1}$ be the SSE associated with some $\mathbf{x}\in\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\}$ , satisfying assumption A2. If the probability measure $\vartheta$ satisfies assumption A3 and A4, then with probability one,

\lim_{m\xrightarrow{}\infty}\sup\left(\varepsilon_{m}(r)\right)^{-\frac{1}{M+1% }}\text{d}_{H}(\mathbf{F}_{m},\mathbf{F}_{\vartheta})\leq K,

(16)

where $K$ is a constant depending on $\kappa$ and the embedding dimension $M+1$ .

From the stability relation in Equation (5), the Hausdorff metric can be replaced with the Gromov-Hausdorff metric and the results still holds. This also gives a similar convergence results on the space of persistence diagrams with respect to the bottleneck distance and is summarized as:

Corollary 4.6.

Let $\mathbf{x}_{1},\mathbf{x}_{2},\cdots,$ be a sequence of irregularly-spaced time series vectors satisfying assumption A1, and let $\mathbf{F}_{m}=\left\{F(\mathbf{s}(t_{1})),F(\mathbf{s}(t_{2})),\cdots,F(% \mathbf{s}(t_{m}))\right\}\subset\mathbb{R}^{M+1}$ be the SSE associated with some $\mathbf{x}\in\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\}$ , satisfying assumption A2. If the probability measure $\vartheta$ satisfies assumption A3 and A4, then with probability one,

\lim_{m\xrightarrow{}\infty}\sup\left(\varepsilon_{m}(r)\right)^{-\frac{1}{M+1% }}\text{d}_{B}(\text{Dgm}(\mathbf{F}_{m}),\text{Dgm}(\mathbf{F}_{\vartheta}))% \leq K,

(17)

where $K$ is a a constant depending on $\kappa$ and the embedding dimension $M+1$ .

5 Numerical Studies

This section presents numerical studies that shows the performance of the proposed SSE method.

5.1 Reconstruction Accuracy

This simulation assesses the SSE method’s effectiveness in preserving the original state space geometry using the Hénon map as an illustrative example (Hénon, 2004). The Hénon map recursively maps a point $(h_{t},g_{t})\in\mathbb{R}^{2}$ as follows: $h_{t+1}=1-ah_{t}^{2}+g_{t}$ , $g_{t+1}=bh_{t}$ , with $a=1.4$ and $b=0.3$ , their classical values. The map is initialized at $(h_{0},g_{0})=(0,0)$ , and simulated with $500$ points with observations designated as missing with a given probability. Figure 4 shows the 2D Hénon map and the corresponding time series for one dimensions. The measurement function (see Section 3.2) extracts observations along the $h$ -dimension, hence $\{h_{t}\}$ are used to reconstruct the space. Observations along the $g$ -dimension could be used instead.

The correlation dimension is used to assess how well the geometry of the original state space is preserved in the reconstruction. Specifically, for a given $\varepsilon>0$ , it measures the probability that two random points in a space are within $\varepsilon$ -distance of each other. To compute the correlation dimension, the correlation sum is computed using the following:

\text{Corr}(\varepsilon)=\lim_{m\xrightarrow{}\infty}\frac{2}{m(m-1)}\sum_{i=1% }^{m}\sum_{j=i+1}^{m}\mathbbm{1}\left(\lVert F(\mathbf{s}(t_{i}))-F(\mathbf{s}% (t_{j}))\rVert_{2}\leq\varepsilon\right),

(18)

for some embedding map $\mathbf{F}=\{F(\mathbf{s}({t_{1}})),\cdots,F(\mathbf{s}({t_{m}}))\}$ . Then the correlation dimension is estimated as: $\lim_{\varepsilon\xrightarrow{}0}{\log(\text{Corr}(\varepsilon))}/{\log(% \varepsilon)}.$ If the reconstructed space preserves relevant geometrical invariants, its correlation dimension should match that of the true state space. Other accuracy measures include the box-counting dimension, Hausdorff dimension, and information dimension. However, the correlation dimension is more robust to sample size, making it less noisy with fewer samples (Grassberger and Procaccia, 1983b ).

The SSE method is compared to common statistical interpolation methods used to impute missing data. A range of methods were considered¹¹1The comparison methods considered are available in the the R package, imputeTS (Moritz and Bartz-Beielstein, 2017): linear, spline, and Stineman interpolation methods, Kalman Smoothing with a structural model and autoregressive integrated moving average model, a moving average method with exponential and linear weighting, seasonal decomposition (imputation by interpolation is done on the deseasonalized component), seasonal split (imputation by interpolation is done on each split), imputing with the previous observation (LOCF) or next observation (NOCB), imputing with the mean, median, mode, and by a random point in the dataset. , but only the best three methods are presented, which were implemented using the R package, imputeTS (Moritz and Bartz-Beielstein, 2017):
(1) Kalman Smoothing (KS): This fits a structural time series model via maximum likelihood, using the linear local trend as the structural class (see referenced package for more details).
(2) Last Observation Carried Forward (LOCF): This methods replaces each missing value with the most immediate prior observed value.
(3) Next Observation Carried Backward (NOCB): This is similar to the LOCF, but instead replaces each missing value with the most immediate next observed value.

The results are presented in terms of the correlation dimension, with standard errors generated by applying each method to 100 independently generated instances of the Hénon map. A noise model (with no missing values) served as a baseline, with observations from a normal distribution (mean zero) and standard deviation equal to the probability of observing a missing value.

Figure 5 shows example reconstructed spaces using the proposed method and two imputation methods (the NOCB result is nearly identical to the LOCF and is not shown) with $500$ samples and a $0.25$ missingness probability. Note that for the comparison methods, after imputation the TDE method is used to estimate the state space. Only the SSE method preserves the original geometry, while the imputation methods introduce extraneous features.

Figure 6 shows the correlation dimension versus missingness probability for the SSE method and the three imputation methods. The black dashed lines indicate the established empirical estimate for the Hénon map’s correlation dimension $(1.22\pm 0.04)$ (Grassberger and Procaccia, 1983a ; Sprott and Rowlands, 2001). The SSE method performs well up to a missingness probability of $0.6$ , staying within or close to the empirical bounds. Beyond $0.6$ , its comparable to the three imputation methods. However, the SSE method is more variable due to the fewer points used to compute the correlation dimension compared to the other methods (which always have 500 points).

5.2 Periodicity Quantification

The periodicity of a time series can be quantified based on the $H_{1}$ features in the persistence diagram. This relies on the idea that periodic patterns yields elliptic curves in the reconstructed state space, and the roundness of the curves is an indicator of the periodicity in the time series (Perea and Harer, 2015). The roundness of these ellipses can be quantified by examining the maximum persistence of their associated $H_{1}$ features. For a time series vector $\mathbf{x}$ with its embedding map $\mathbf{F}$ , its periodicity score $\text{ps}(\mathbf{x})$ can be defined as (Perea and Harer, 2015):

\text{ps}(\mathbf{x})=\max_{{(b,d)}\in\text{Dgm}(\mathbf{F})}(d-b)/{\sqrt{3}},

(19)

where $\text{Dgm}(\mathbf{F})$ is the persistence diagram, and $\max_{{(b,d)}\in\text{Dgm}(\mathbf{F})}(d-b)$ is restricted to the $H_{1}$ features. For this calculation, the embedding map $\mathbf{F}$ is pointwise-centered and scaled. The motivation for the periodicity score is that during the VR filtration for a dataset with a large enough sample size, a unit circle ( $H_{1}$ feature) dies when an inscribed equilateral triangle appears in the VR complex at filtration value $\sqrt{3}$ , hence the maximum periodicity score of one is realized when either the TDE or SSE spaces has a well-sampled circle; a $\text{ps}(\mathbf{x})$ closer to one indicates a stronger periodic signal in $\mathbf{x}$ .

5.2.1 Simulation

To evaluate this framework, two different set of signals were generated with sample sizes $n\in\{50,100,500,1000\}$ and missingness probabilities $\{0,0.1,0.2,0.3,0.4\}$ . The first set follows $f(t)=50\times\cos(\pi t/4-\lambda\pi)\times\sin({\pi t}/{2})+50$ with $\lambda\in(0,1)$ and $t\in[0,12\pi]$ , having a longest period of $8$ . The second set is a non-periodic signals drawn from a $N(10,2)$ . Figure 7 shows samples of both signals.

To construct the embedding from the time series, the time points are rescaled to integers and the step-size of $\tau=1$ . The periodicity score $\text{ps}(\mathbf{x})$ is then compared to those obtained from the Lomb-Scargle periodogram method for both uniformly-spaced and irregularly-spaced observations (Lomb, 1976; Scargle, 1982; Ruf, 1999), the sliding windows method (Perea and Harer, 2015), and the $\text{JTK}\_\text{Cycle}$ algorithm for uniformly-spaced samples (Hughes et al., 2010).

The results are summarized in Table 1 (periodic signal) and in Table 2 (non-periodic signal). Table 1 shows that all the methods rate the periodic signals as highly periodic with increasing sample size. The proposed SSE method consistently identifies a distinct $H_{1}$ across all sample sizes and missing observations despite noisy features in the persistence diagram. The JTK $\_$ Cycle and the Lomb-Scargle method requires specifying a period search range. The proposed SSE method has the added advantage that its periodicity score has a geometric interpretation (Perea and Harer, 2015).

Table 1: Results for the periodic signal summarized as p-values for

\text{JTK}\_\text{Cycle}

and Lomb-Scargle with estimated period in brackets, and as periodicity scores for Sliding Windows and SSE methods.

n	$\pi$	$M$	Sliding Windows	$\text{JTK}\_\text{Cycle}$	Lomb-Scargle	Proposed SSE Method
50	$0.00$	$2$	$0.74$	$0.00$ $(7.69)$	$0.00$ $(8.02)$	$0.74$
	$0.10$	$2$	$-$	$-$	$0.00$ $(8.03)$	$0.74$
	$0.20$	$2$	$-$	$-$	$0.00$ $(8.03)$	$0.70$
	$0.30$	$2$	$-$	$-$	$0.00$ $(8.04)$	$0.67$
	$0.40$	$2$	$-$	$-$	$0.00$ $(8.04)$	$0.44$
100	$0.00$	$6$	$0.53$	$0.00$ $(8.00)$	$0.00$ $(8.02)$	$0.53$
	$0.10$	$6$	$-$	$-$	$0.00$ $(8.02)$	$0.53$
	$0.20$	$6$	$-$	$-$	$0.00$ $(8.02)$	$0.53$
	$0.30$	$4$	$-$	$-$	$0.00$ $(8.20)$	$0.49$
	$0.40$	$3$	$-$	$-$	$0.00$ $(8.02)$	$0.43$
500	$0.00$	$8$	$0.93$	$0.00$ $(2.64)$	$0.00$ $(8.02)$	$0.93$
	$0.10$	$8$	$-$	$-$	$0.00$ $(8.02)$	$0.85$
	$0.20$	$6$	$-$	$-$	$0.00$ $(8.02)$	$0.71$
	$0.30$	$2$	$-$	$-$	$0.00$ $(8.02)$	$0.63$
	$0.40$	$2$	$-$	$-$	$0.00$ $(8.02)$	$0.60$
1000	$0.00$	$26$	$0.90$	$0.00$ $(1.28)$	$0.00$ $(8.02)$	$0.90$
	$0.10$	$15$	$-$	$-$	$0.00$ $(8.02)$	$0.74$
	$0.20$	$12$	$-$	$-$	$0.00$ $(8.02)$	$0.69$
	$0.30$	$6$	$-$	$-$	$0.00$ $(8.02)$	$0.44$
	$0.40$	$4$	$-$	$-$	$0.00$ $(8.01)$	$0.43$

Table 2: Results for the non-periodic signal are given as p-values for

\text{JTK}\_\text{Cycle}

and Lomb-Scargle with estimated period in brackets, and as periodicity scores for Sliding Windows and SSE methods.

n	$\pi$	$M$	Sliding Windows	$\text{JTK}\_\text{Cycle}$	Lomb-Scargle	Proposed SSE Method
50	$0.00$	$3$	$0.30$	$1.00$ $(13.29)$	$0.15$ $(2.05)$	$0.30$
	$0.10$	$3$	$-$	$-$	$0.16$ $(2.04)$	$0.32$
	$0.20$	$3$	$-$	$-$	$0.28$ $(2.04)$	$0.23$
	$0.30$	$3$	$-$	$-$	$0.17$ $(2.04)$	$0.25$
	$0.40$	$3$	$-$	$-$	$0.41$ $(32.87)$	$0.14$
100	$0.00$	$3$	$0.26$	$1.00$ $(12.88)$	$0.10$ $(1.00)$	$0.26$
	$0.10$	$3$	$-$	$-$	$0.08$ $(1.00)$	$0.22$
	$0.20$	$3$	$-$	$-$	$0.27$ $(1.00)$	$0.25$
	$0.30$	$3$	$-$	$-$	$0.55$ $(16.39)$	$0.32$
	$0.40$	$3$	$-$	$-$	$0.33$ $(16.39)$	$0.29$
500	$0.00$	$11$	$0.14$	$0.28$ $(0.45)$	$0.11$ $(0.15)$	$0.14$
	$0.10$	$9$	$-$	$-$	$0.23$ $(0.80)$	$0.16$
	$0.20$	$0$	$-$	$-$	$0.13$ $(0.20)$	$0.10$
	$0.30$	$7$	$-$	$-$	$0.29$ $(0.79)$	$0.18$
	$0.40$	$3$	$-$	$-$	$0.19$ $(0.45)$	$0.28$
1000	$0.00$	$3$	$0.13$	$1.00$ $(1.28)$	$0.87$ $(0.25)$	$0.13$
	$0.10$	$3$	$-$	$-$	$0.50$ $(0.40)$	$0.14$
	$0.20$	$3$	$-$	$-$	$0.55$ $(0.26)$	$0.15$
	$0.30$	$3$	$-$	$-$	$0.22$ $(0.26)$	$0.16$
	$0.40$	$3$	$-$	$-$	$0.66$ $(0.26)$	$0.20$

For the non-periodic signal, all the methods performed reasonably well across all samples and missingness mechanisms. The performance of the SSE method in the non-periodic setting is not surprising. This is because as more observations are missing, the sampled time points appear random, and the resulting time series looks more like random random noise than signal.

5.2.2 Real Data Analysis

Irregularly-spaced times series data are common in astronomy such as those discussed in VanderPlas, (2018) and in exoplanet detection methods (Zhao et al., 2020, 2022). In this section, we examine an asteroid dataset from the Lincoln Near-Earth Asteroid Research (LINEAR) survey, which tracks near-Earth asteroids. The data includes $280$ magnitude measurements (brightness) of LINEAR object ID 11375941 over five and a half years. Magnitude measurements are unitless; lower values indicate brighter objects. Further details on the data and preprocessing are in Sesar et al., (2011); Palaversa et al., (2013); VanderPlas, (2018).

Figure 8(a) shows the observed magnitude over time, revealing no visible periodic pattern due to irregular sampling. The TDE method is unsuitable for such data, but the proposed SSE method can construct a geometric representation. Using $M=2$ , and rescaling the time points to integers and taking $\tau=1$ , the SSE in Figure 8(b) reveals a circular geometric object, indicating high periodicity. The $H_{1}$ feature on the persistence diagram (Figure 8(c)) is at the point $(b,d)=(0.31,1.74)$ .

The periodicity score obtained using Equation (19) is $0.82$ , indicating high periodicity in he observed magnitude of LINEAR object ID 11375941. Using the Lomb-Scargle method, the optimal period was found to be $2.58$ with a p-value of $0.00$ . These results confirm the SSE method’s periodicity findings and highlight its utility in quantifying and visualizing periodicity. Additional astronomy applications are discussed in the supplementary material Section S2.4.

6 Conclusion

The fusion of TDE with TDA holds significant promise for discerning system dynamics and quantifying properties like periodicity in uniformly-spaced time series. This work introduces a novel subsequence embedding method for irregularly-spaced time-series data. Irregular spacing can obscure patterns and introduce noise (e.g., Figure 8(a)). While data imputation can create uniformly-spaced series, it often fails to accurately represent the true state space (e.g., Figure 5). The proposed SSE method addresses these challenges, preserving the topology of the reconstructed state space and mitigating spurious homological features introduced by irregular spacing.

In Section 5.2, we note the need for statistical inference on periodicity scores to determine if the most persistent $H_{1}$ feature is due to a real periodic signal or chance. Existing methods for significance testing of homology generators, such as those using kernel density estimators, allow constructing confidence sets for homology generators (Fasy et al., 2014; Xu et al., 2019). Extending this to homology generators based on direct filtration on the point-cloud space requires bounding the bottleneck-distance with the Hausdorff distance. Initial investigations produced wide confidence sets, indicating the need for a more tailored method, which remains a future research direction.

Finally, Algorithm 1 constructs subsequences with a fixed regularity score $r$ . Extending this to $r\pm\epsilon$ for small $\epsilon$ would increase the length of each constructed subsequence, and the number of points in the reconstructed space. Future work will explore the robustness of this method to different $\epsilon$ values and the trade-off between sample size and reconstruction accuracy.

References

Aaron et al., (2017) Aaron, C., Cholaquidis, A., and Cuevas, A. (2017). Detection of low dimensionality and data denoising via set estimation techniques. Electronic Journal of Statistics, 11(2):4596 – 4628.
Adams et al., (2017) Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., and Ziegelmeier, L. (2017). Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18(8):1–35.
Ali et al., (2007) Ali, S., Basharat, A., and Shah, M. (2007). Chaotic invariants for human action recognition. In 2007 IEEE 11th International Conference on Computer Vision, pages 1–8. IEEE.
Attouch et al., (1991) Attouch, H., Lucchetti, R., and Wets, R. J.-B. (1991). The topology of the $\rho$ -Hausdorff distance. Annali di Matematica pura ed applicata, 160(1):303–320.
Berry et al., (2020) Berry, E., Chen, Y.-C., Cisewski-Kehe, J., and Fasy, B. T. (2020). Functional summaries of persistence diagrams. Journal of Applied and Computational Topology, 4(2):211–262.
Boker et al., (2018) Boker, S. M., Tiberio, S. S., and Moulder, R. G. (2018). Robustness of time delay embedding to sampling interval misspecification. Continuous time modeling in the behavioral and related sciences, pages 239–258.
Brown and Knudson, (2009) Brown, K. A. and Knudson, K. P. (2009). Nonlinear statistics of human speech data. International Journal of Bifurcation and Chaos, 19(07):2307–2319.
Bubenik et al., (2015) Bubenik, P. et al. (2015). Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res., 16(1):77–102.
Cantelli, (1917) Cantelli, F. (1917). Sulla probabilità como limite della frequenza. Atti Reale Academia Nazionale dei Lincei.
Cao, (1997) Cao, L. (1997). Practical method for determining the minimum embedding dimension of a scalar time series. Physica D: Nonlinear Phenomena, 110(1-2):43–50.
Casdagli et al., (1991) Casdagli, M., Eubank, S., Farmer, J. D., and Gibson, J. (1991). State space reconstruction in the presence of noise. Physica D: Nonlinear Phenomena, 51(1-3):52–98.
Chazal et al., (2014) Chazal, F., Glisse, M., Labruère, C., and Michel, B. (2014). Convergence rates for persistence diagram estimation in topological data analysis. In International Conference on Machine Learning, pages 163–171. PMLR.
Chazal and Michel, (2021) Chazal, F. and Michel, B. (2021). An introduction to topological data analysis: fundamental and practical aspects for data scientists. Frontiers in artificial intelligence, 4:108.
Chung and Erdös, (1952) Chung, K. L. and Erdös, P. (1952). On the application of the Borel-Cantelli lemma. Transactions of the American Mathematical Society, 72(1):179–186.
Cuevas and Rodríguez-Casal, (2004) Cuevas, A. and Rodríguez-Casal, A. (2004). On boundary estimation. Advances in Applied Probability, 36(2):340–354.
Davis et al., (2017) Davis, A. B., Cisewski, J., Dumusque, X., Fischer, D. A., and Ford, E. B. (2017). Insights on the spectral signatures of stellar activity and planets from PCA. The Astrophysical Journal, 846(1):59.
Deyle and Sugihara, (2011) Deyle, E. R. and Sugihara, G. (2011). Generalized theorems for nonlinear state space reconstruction. Plos one, 6(3):e18295.
Dumusque, (2016) Dumusque, X. (2016). Radial velocity fitting challenge-I. Simulating the data set including realistic stellar radial-velocity signals. Astronomy & Astrophysics, 593:A5.
Dumusque, (2018) Dumusque, X. (2018). Measuring precise radial velocities on individual spectral lines-I. Validation of the method and application to mitigate stellar activity. Astronomy & Astrophysics, 620:A47.
Dumusque et al., (2014) Dumusque, X., Boisse, I., and Santos, N. (2014). SOAP 2.0: a tool to estimate the photometric and radial velocity variations induced by stellar spots and plages. The Astrophysical Journal, 796(2):132.
Dunsmuir and Robinson, (1981) Dunsmuir, W. and Robinson, P. (1981). Estimation of time series models in the presence of missing data. Journal of the American Statistical Association, 76(375):560–568.
Edelsbrunner and Harer, (2022) Edelsbrunner, H. and Harer, J. L. (2022). Computational topology: an introduction. American Mathematical Society.
Edelsbrunner et al., (2000) Edelsbrunner, H., Letscher, D., and Zomorodian, A. (2000). Topological persistence and simplification. In Proceedings 41st annual symposium on foundations of computer science, pages 454–463. IEEE.
Émile Borel, (1909) Émile Borel, M. (1909). Les probabilités dénombrables et leurs applications arithmétiques. Rendiconti del Circolo Matematico di Palermo (1884-1940), 27(1):247–271.
Emrani et al., (2014) Emrani, S., Gentimis, T., and Krim, H. (2014). Persistent homology of delay embeddings and its application to wheeze detection. IEEE Signal Processing Letters, 21(4):459–463.
Erickson, (1999) Erickson, J. (1999). Finding longest arithmetic progressions. University of Illinois at Urbana-Champaign.
Fasy et al., (2014) Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., and Singh, A. (2014). Confidence sets for persistence diagrams. The Annals of Statistics, 42:2301–2339.
Fischer et al., (2016) Fischer, D. A., Anglada-Escude, G., Arriagada, P., Baluev, R. V., Bean, J. L., Bouchy, F., Buchhave, L. A., Carroll, T., Chakraborty, A., Crepp, J. R., et al. (2016). State of the field: extreme precision radial velocities. Publications of the Astronomical Society of the Pacific, 128(964):066001.
Gholizadeh and Zadrozny, (2018) Gholizadeh, S. and Zadrozny, W. (2018). A short survey of topological data analysis in time series and systems analysis. arXiv preprint arXiv:1809.10745.
Glenn et al., (2024) Glenn, S., Cisewski-Kehe, J., Zhu, J., and Bement, W. M. (2024). Confidence regions for a persistence diagram of a single image with one or more loops. arXiv preprint arXiv:2405.01651.
(31) Grassberger, P. and Procaccia, I. (1983a). Characterization of strange attractors. Physical review letters, 50(5):346.
(32) Grassberger, P. and Procaccia, I. (1983b). Measuring the strangeness of strange attractors. Physica D: nonlinear phenomena, 9(1-2):189–208.
Hara and Ford, (2023) Hara, N. C. and Ford, E. B. (2023). Statistical methods for exoplanet detection with radial velocities. Annual Review of Statistics and Its Application, 10:623–649.
Harvey and Pierse, (1984) Harvey, A. C. and Pierse, R. G. (1984). Estimating missing observations in economic time series. Journal of the American statistical Association, 79(385):125–131.
Hatcher et al., (2002) Hatcher, A., Press, C. U., and of Mathematics, C. U. D. (2002). Algebraic Topology. Algebraic Topology. Cambridge University Press.
Hénon, (2004) Hénon, M. (2004). A two-dimensional mapping with a strange attractor. The theory of chaotic attractors, pages 94–102.
Hertz, (1992) Hertz, D. (1992). Simple bounds on the extreme eigenvalues of Toeplitz matrices. IEEE transactions on information theory, 38(1):175–176.
(38) Holzer, P. H., Cisewski-Kehe, J., Fischer, D., and Zhao, L. (2021a). A Hermite-Gaussian based exoplanet radial velocity estimation method. The Annals of Applied Statistics, 15(2):527–555.
(39) Holzer, P. H., Cisewski-Kehe, J., Zhao, L., Ford, E. B., Gilbertson, C., and Fischer, D. A. (2021b). A stellar activity F-statistic for exoplanet surveys (SAFE). The Astronomical Journal, 161(6):272.
Huélamo et al., (2008) Huélamo, N., Figueira, P., Bonfils, X., Santos, N., Pepe, F., Gillon, M., Azevedo, R., Barman, T., Fernández, M., Di Folco, E., et al. (2008). TW Hydrae: evidence of stellar spots instead of a Hot Jupiter. Astronomy & Astrophysics, 489(2):L9–L13.
Hughes et al., (2010) Hughes, M. E., Hogenesch, J. B., and Kornacker, K. (2010). Jtk_cycle: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. Journal of biological rhythms, 25(5):372–380.
Huke and Broomhead, (2007) Huke, J. P. and Broomhead, D. S. (2007). Embedding theorems for non-uniformly sampled dynamical systems. Nonlinearity, 20(9):2205.
HuoLiu and YuanTang, (1998) HuoLiu, Q. and YuanTang, X. (1998). Iterative algorithm for nonuniform inverse fast Fourier transform (NU-IFFT). Electronics Letters, 34(20):1913–1914.
Johnson and Munch, (2022) Johnson, B. and Munch, S. B. (2022). An empirical dynamic modeling framework for missing or irregular samples. Ecological Modelling, 468:109948.
Jones et al., (2022) Jones, D. E., Stenning, D. C., Ford, E. B., Wolpert, R. L., Loredo, T. J., Gilbertson, C., and Dumusque, X. (2022). Improving exoplanet detection power: Multivariate Gaussian process models for stellar activity. The Annals of Applied Statistics, 16(2):652–679.
Kim et al., (1999) Kim, H., Eykholt, R., and Salas, J. (1999). Nonlinear dynamics, delay times, and embedding windows. Physica D: Nonlinear Phenomena, 127(1-2):48–60.
Lekscha and Donner, (2018) Lekscha, J. and Donner, R. V. (2018). Phase space reconstruction for non-uniformly sampled noisy time series. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(8):085702.
Lomb, (1976) Lomb, N. R. (1976). Least-squares frequency analysis of unequally spaced data. Astrophysics and space science, 39:447–462.
Mayor and Queloz, (1995) Mayor, M. and Queloz, D. (1995). A Jupiter-mass companion to a solar-type star. Nature, 378(6555):355–359.
Mileyko et al., (2011) Mileyko, Y., Mukherjee, S., and Harer, J. (2011). Probability measures on the space of persistence diagrams. Inverse Problems, 27(12):124007.
Moritz and Bartz-Beielstein, (2017) Moritz, S. and Bartz-Beielstein, T. (2017). imputeTS: time series missing value imputation in R. R J., 9(1):207.
Palaversa et al., (2013) Palaversa, L., Ivezić, Ž., Eyer, L., Ruždjak, D., Sudar, D., Galin, M., Kroflin, A., Mesarić, M., Munk, P., Vrbanec, D., et al. (2013). Exploring the variable sky with LINEAR. III. Classification of periodic light curves. The Astronomical Journal, 146(4):101.
Perea and Harer, (2015) Perea, J. A. and Harer, J. (2015). Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics, 15:799–838.
Picado and Oliveira, (2020) Picado, N. and Oliveira, P. E. (2020). Denoising and interior detection problems. arXiv preprint arXiv:2010.16360.
Pun et al., (2022) Pun, C. S., Lee, S. X., and Xia, K. (2022). Persistent-homology-based machine learning: a survey and a comparative study. Artificial Intelligence Review, 55(7):5169–5213.
Rajpaul et al., (2015) Rajpaul, V., Aigrain, S., Osborne, M. A., Reece, S., and Roberts, S. (2015). A Gaussian process framework for modelling stellar activity signals in radial velocity data. Monthly Notices of the Royal Astronomical Society, 452(3):2269–2291.
Robinson and Turner, (2017) Robinson, A. and Turner, K. (2017). Hypothesis testing for topological data analysis. Journal of Applied and Computational Topology, 1:241–261.
Ruf, (1999) Ruf, T. (1999). The Lomb-Scargle periodogram in biological rhythm research: analysis of incomplete and unequally spaced time-series. Biological Rhythm Research, 30(2):178–201.
Sauer et al., (1991) Sauer, T., Yorke, J. A., and Casdagli, M. (1991). Embedology. Journal of statistical Physics, 65:579–616.
Scargle, (1982) Scargle, J. D. (1982). Studies in astronomical time series analysis. ii-statistical aspects of spectral analysis of unevenly spaced data. Astrophysical Journal, Part 1, vol. 263, Dec. 15, 1982, p. 835-853., 263:835–853.
Sesar et al., (2011) Sesar, B., Stuart, J. S., Ivezić, Ž., Morgan, D. P., Becker, A. C., and Woźniak, P. (2011). Exploring the variable sky with LINEAR. I. Photometric recalibration with the sloan digital sky survey. The Astronomical Journal, 142(6):190.
Sprott and Rowlands, (2001) Sprott, J. C. and Rowlands, G. (2001). Improved correlation dimension calculation. International Journal of Bifurcation and Chaos, 11(07):1865–1880.
Stark et al., (1997) Stark, J., Broomhead, D. S., Davies, M. E., and Huke, J. (1997). Takens embedding theorems for forced and stochastic systems. Nonlinear Analysis: Theory, Methods & Applications, 30(8):5303–5314.
Takens, (2006) Takens, F. (2006). Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980: proceedings of a symposium held at the University of Warwick 1979/80, pages 366–381. Springer.
Tralie and Perea, (2018) Tralie, C. J. and Perea, J. A. (2018). (Quasi) Periodicity quantification in video data, using topology. SIAM Journal on Imaging Sciences, 11(2):1049–1077.
VanderPlas, (2018) VanderPlas, J. T. (2018). Understanding the Lomb-Scargle periodogram. The Astrophysical Journal Supplement Series, 236(1):16.
Vietoris, (1927) Vietoris, L. (1927). Über den höheren zusammenhang kompakter räume und eine klasse von zusammenhangstreuen abbildungen. Mathematische Annalen, 97(1):454–472.
Xu et al., (2019) Xu, X., Cisewski-Kehe, J., Green, S. B., and Nagai, D. (2019). Finding cosmic voids and filament loops using topological data analysis. Astronomy and Computing, 27:34–52.
Zhao et al., (2020) Zhao, L., Fischer, D. A., Ford, E. B., Henry, G. W., Roettenbacher, R. M., and Brewer, J. M. (2020). The EXPRES stellar-signals project. I. description of data. Research Notes of the AAS, 4(9):156.
Zhao et al., (2022) Zhao, L. L., Fischer, D. A., Ford, E. B., Wise, A., Cretignier, M., Aigrain, S., Barragan, O., Bedell, M., Buchhave, L. A., Camacho, J. D., et al. (2022). The EXPRES stellar signals project II. state of the field in disentangling photospheric velocities. The Astronomical Journal, 163(4):171.

Supplementary Material

S1 Stability and Convergence Results

S1.1 Proof of Proposition 4.1

Proof.

It suffices to bound the Hausdorff distance between $\mathbf{F}$ and $\mathbf{F}^{\prime}$ , then using the stability theory in persistence homology (Chazal and Michel, 2021), the bound on their persistence diagrams with respect to the Bottleneck distance can be established. The proof proceeds as follows.

There exists a subset $\mathbf{x}_{i}^{*}\subset\mathbf{x}^{*}$ that exactly equals $F^{*}(\mathbf{s}(t_{p,i}))$ (the $i$ -th row of the $p$ -th subsequence of the embedding matrix $\mathbf{F}^{*}$ ). This fact stems from the construction of $\mathbf{F}^{*}$ , whose rows are uniformly-spaced samples of $\mathbf{x}^{*}$ . The same guarantee holds for the pairs $(\mathbf{F},\mathbf{x})$ , and $(\mathbf{F}^{\prime},\mathbf{x}^{\prime})$ . The distance between the projection $\Pi_{\mathbf{\boldsymbol{\Phi}}}(\mathbf{x}_{i}^{*})$ and $F(\mathbf{s}(t_{p,i}))$ is given by

\lVert\Pi_{\mathbf{\boldsymbol{\Phi}}}(\mathbf{x}_{i}^{*})-F(\mathbf{s}(t_{p,i% }))\rVert_{2}=\lVert\Pi_{\mathbf{\boldsymbol{\Phi}}}\left(\mathbf{x}_{i}^{*}-F% (\mathbf{s}(t_{p,i}))\right)\rVert_{2},

(20)

where $\lVert\cdot\rVert_{2}$ denotes the $l^{2}$ -norm. Equation (20) is under the assumption that the choice of frequency threshold does not smooth out the peaks in the true signal $\mathbf{x}$ . Observe that each $F^{\prime}(\mathbf{s}(t_{p,i}))$ is isometric to $\Pi_{\boldsymbol{\Phi}}(\mathbf{x}^{*}_{i})$ , where $\mathbf{x}^{*}_{i}$ is a subset of length $M+1$ of the original noisy scalar time series. Then the Gromov-Hausdorff distance between $\mathbf{F}^{\prime}$ and $\mathbf{F}$ can be expressed as

\text{d}_{\text{GH}}\left(\mathbf{F}^{\prime},\mathbf{F}\right)=\text{d}_{% \text{GH}}\left(\widehat{\Pi}_{\boldsymbol{\Phi}}(\mathbf{x}^{*}),\mathbf{F}% \right),

(21)

where $\widehat{\Pi}_{\boldsymbol{\Phi}}(\mathbf{x}^{*})$ denotes embedding of the vector ${\Pi}_{\boldsymbol{\Phi}}(\mathbf{x}^{*})$ . Using the same isometric property, the Hausdorff distance between $\mathbf{F}^{\prime}$ and $\mathbf{F}$ can be expressed in terms of Equation $\eqref{eqn:init-dist}$ . This follows from the fact that

\displaystyle\lVert\Pi_{\mathbf{\boldsymbol{\Phi}}}\left(\mathbf{x}_{i}^{*}-F(% \mathbf{s}(t_{p,i}))\right)\rVert_{2}\begin{aligned} &=\lVert\Pi_{\mathbf{% \boldsymbol{\Phi}}}\left(F^{*}(\mathbf{s}(t_{p,i}))-F(\mathbf{s}(t_{p,i}))% \right)\rVert_{2}\\ &\leq\lVert F^{*}(\mathbf{s}(t_{p,i}))-F(\mathbf{s}(t_{p,i}))\rVert_{2}\lVert(% \boldsymbol{\Phi}^{H}\boldsymbol{\Phi})^{\dagger}\boldsymbol{\Phi}^{H}\rVert_{% 2}.\end{aligned}

(22)

The matrix $\boldsymbol{\Phi}^{H}\boldsymbol{\Phi}$ has the form:

\boldsymbol{\Phi}^{H}\boldsymbol{\Phi}=\begin{bmatrix}n&\sum_{k=1}^{n}e^{j2\pi% (w_{2}-w_{1})f_{k}}&\cdots&\sum_{k=1}^{n}e^{j2\pi(w_{n}-w_{1})f_{k}}\vspace{0.% 5cm}\\ \sum_{k=1}^{n}e^{-j2\pi(w_{2}-w_{1})f_{k}}&n&\cdots&\sum_{k=1}^{n}e^{j2\pi(w_{% n}-w_{2})f_{k}}\vspace{0.5cm}\\ \vdots&\vdots&\cdots&\vdots\vspace{0.5cm}\\ \sum_{k=1}^{n}e^{-j2\pi(w_{n}-w_{1})f_{k}}&\sum_{k=1}^{n}e^{-j2\pi(w_{n}-w_{2}% )f_{k}}&\cdots&n\end{bmatrix}.

(23)

The modulus of off-diagonal elements of $\boldsymbol{\Phi}^{H}\boldsymbol{\Phi}$ is bounded as

\left|\sum_{k=1}^{n}e^{j2\pi(w_{l_{1}}-w_{l_{2}})f_{k}}\right|\leq\sum_{k=1}^{% n}\left|e^{j2\pi(w_{l_{1}}-w_{l_{2}})f_{k}}\right|=n.

(24)

Observe that $\lVert(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})^{\dagger}\boldsymbol{\Phi}^{H}% \rVert_{2}\leq\lVert(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})^{\dagger}\rVert_{% 2}\lVert\boldsymbol{\Phi}^{H}\rVert_{2}$ . First we bound $\lVert\boldsymbol{\Phi}^{H}\rVert_{2}$ , by directly using the definition:

\lVert\boldsymbol{\Phi}^{H}\rVert_{2}=\sqrt{\lambda_{\text{max}}(\boldsymbol{% \Phi}\boldsymbol{\Phi}^{H})},

(25)

where $\lambda_{\text{max}}(\boldsymbol{\Phi}\boldsymbol{\Phi}^{H})$ is the maximum eigenvalue of $\boldsymbol{\Phi}\boldsymbol{\Phi}^{H}$ . Since $\boldsymbol{\Phi}\boldsymbol{\Phi}^{H}$ is Toeplitz, a bound on the maximum eigenvalue can be established as follows (Hertz, 1992). Let $\boldsymbol{\psi}=\left[\psi_{1},\psi_{2},\cdots,\psi_{n}\right]^{\top}$ be a vector such that $\psi_{1}=1$ and

\psi_{k}=2*\cos\left(\frac{\pi}{\lfloor(n-1)/(k-1)\rfloor+2}\right),\quad k=2,% \cdots,n.

(26)

Also, let $\boldsymbol{\zeta}=\left[|(\boldsymbol{\Phi}\boldsymbol{\Phi}^{H})_{1,1}|,|(% \boldsymbol{\Phi}\boldsymbol{\Phi}^{H})_{1,2}|,\cdots,|(\boldsymbol{\Phi}% \boldsymbol{\Phi}^{H})_{1,n}|\right]^{\top}$ , that is, the modulus of the terms in first row of $\boldsymbol{\Phi}\boldsymbol{\Phi}^{H}$ . Let $\lambda_{k}$ be the $k$ -th eigenvalue of $\boldsymbol{\Phi}\boldsymbol{\Phi}^{H}$ . Then it follows that (Hertz, 1992):

\max_{1\leq k\leq n}(\lambda_{k})\leq\boldsymbol{\zeta}^{T}\boldsymbol{\psi}.

(27)

Further observe that $\max_{1\leq k\leq n}(\psi_{k})=2$ , hence, together with the bound on the values of $\boldsymbol{\zeta}$ from Equation (24) it follows that

\displaystyle\boldsymbol{\zeta}^{T}\boldsymbol{\psi}\begin{aligned} &=n+\sum_{% k=2}^{n}|(\boldsymbol{\Phi}\boldsymbol{\Phi}^{H})_{1,k}|\psi_{k}&\leq\left(n+n% \sum_{k=2}^{n}2\right)=2n^{2}-n.\end{aligned}

(28)

Hence the norm $\lVert{{\boldsymbol{\Phi}}}^{H}\rVert_{2}\leq 2n^{2}-n$ . It now remains to bound the quantity $\lVert(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})^{\dagger}\rVert_{2}$ . By computing the singular value decomposition of $\lVert(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})^{\dagger}\rVert_{2}$ , it follows directly that

\lVert(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})^{\dagger}\rVert_{2}\leq\frac{1}% {\sigma^{2}_{\text{min}}(\boldsymbol{\Phi})},

(29)

where $\sigma^{2}_{\text{min}}(\boldsymbol{\Phi})>0$ is the smallest non-zero singular value of $\boldsymbol{\Phi}$ . Using the fact that

\sum_{k=1}^{n}\lambda_{k}(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})=\text{Tr}(% \boldsymbol{\Phi}^{H}\boldsymbol{\Phi})=n^{2},

(30)

where $\text{Tr}(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})$ is the matrix trace, it follows that the smallest non-zero eigenvalue is bounded as $0<\lambda_{\text{min}}(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})\leq n$ which implies that $\sigma^{2}_{\text{min}}(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})\leq n$ . For any such that $\sigma^{2}_{\text{min}}(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})$ , we can always find a $0<\gamma\leq 1$ such that $\sigma^{2}_{\text{min}}(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})>{\gamma n}$ . Hence the bound in Equation (29) can be extended to

\lVert(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi})^{\dagger}\rVert_{2}\leq\frac{1}% {\sigma^{2}_{\text{min}}(\boldsymbol{\Phi})}\leq\frac{1}{\gamma n}.

(31)

Now using the bound in Equations (28) and (31), the bound in Equation (22) has the form

\lVert\Pi_{\mathbf{\boldsymbol{\Phi}}}\left(\mathbf{x}_{i}^{*}-F(\mathbf{s}(t_% {p,i}))\right)\rVert_{2}\leq\frac{2n-1}{\gamma}\lVert F^{*}(\mathbf{s}(t_{p,i}% ))-F(\mathbf{s}(t_{p,i}))\rVert_{2}

(32)

The bound on the Hausdorff distance between $\mathbf{F}^{\prime}$ and $\mathbf{F}$ is then expressed as

\displaystyle\text{d}_{\text{H}}(\widehat{\Pi}_{\boldsymbol{\Phi}}(\mathbf{x}^% {*}),\mathbf{F})\begin{aligned} &\leq\sup\limits_{i,p}\lVert\Pi_{\mathbf{% \boldsymbol{\Phi}}}\left(\mathbf{x}_{i}^{*}-F(\mathbf{s}(t_{p,i}))\right)% \rVert_{2}\\ &\leq\frac{2n-1}{\gamma}\sup\limits_{i,p}\lVert F^{*}(\mathbf{s}(t_{p,i}))-F(% \mathbf{s}(t_{p,i}))\rVert_{2},1\leq i\leq n_{p}-M\tau_{p},1\leq p\leq P.\end{aligned}

(33)

From the equality in Equation (21), it holds that

\text{d}_{\text{B}}(\text{Dgm}({\mathbf{F}}),\text{Dgm}({\mathbf{F}^{\prime}})% )\leq 2\text{d}_{\textbf{GH}}\left(\mathbf{F},\mathbf{F}^{\prime}\right)=2% \text{d}_{\textbf{GH}}\left(\widehat{\Pi}_{\boldsymbol{\Phi}}(\mathbf{x}^{*}),% \mathbf{F}\right).

(34)

The Gromov-Hausdorff distance is further bounded above by the Hausdorff distance. From Equations (33), the bound on the bottleneck distance between $\text{Dgm}({\mathbf{F}})$ and $\text{Dgm}({\mathbf{F}^{\prime}})$ is established as

\text{d}_{\text{B}}(\text{Dgm}({\mathbf{F}}),\text{Dgm}({\mathbf{F}^{\prime}})% )\leq\frac{4n-2}{\gamma}\left(\sup\limits_{i,p}||F^{*}(\mathbf{s}(t_{p,i}))-F(% \mathbf{s}(t_{p,i}))||_{2}\right),

(35)

where $1\leq i\leq n_{p}-M\tau_{p},\quad 1\leq p\leq P$ . If the observed time series is ‘noise-free’ such that $\mathbf{x}^{*}=\mathbf{x}$ , then $\sup\limits_{i,p}||F(\mathbf{s}(t_{p,i}))-F^{*}(\mathbf{s}(t_{p,i}))||_{2}=0,% \forall i,p$ , and $\text{d}_{\text{B}}(\text{Dgm}({\mathbf{F}}),\text{Dgm}({\mathbf{F}^{\prime}})% )=0$ . ∎

S1.2 Proof of Lemma 4.2

Proof.

By construction, $\mathbf{F}^{2}\subset\widehat{\mathbf{F}}^{2}$ , thus for any $F^{2}(s(t_{p,i_{1}}))\in\mathbf{F}^{2}$ , $\exists\widehat{F}^{2}(s(t_{p,i_{2}}))\in\widehat{\mathbf{F}}^{2}$ such that $F^{2}(s(t_{p,i_{1}}))=\widehat{F}^{2}(s(t_{p,i_{2}}))$ . Further observe that $|\mathbf{F}^{2}|_{u}=|\hat{\mathbf{F}}^{2}|_{u}$ , where $|.|_{u}$ is a measure of the cardinality of unique observations. Hence it follows that $\text{Dgm}(\mathbf{F}^{2})\equiv\text{Dgm}(\widehat{\mathbf{F}}^{2})$ , and the conclusion is a direct consequence of this equivalence. ∎

S1.3 Proof of Theorem 4.5

Proof.

By construction of the subsequence and assumption A2, the Hausdorff distance between $\mathbf{F}_{m}$ and $\mathbf{F}_{\vartheta}$ has the form

\text{d}_{\text{H}}(\mathbf{F}_{m},\mathbf{F}_{\vartheta})=\sup_{F_{\vartheta}% \in\mathbf{F}_{\vartheta}}\min_{1\leq i\leq m}\lVert F(s(t_{i}))-F_{\vartheta}% \rVert_{2}.

(36)

Let $\mathbf{F}_{0}\subset\mathbf{F}_{\vartheta}$ be a set of ball centers such that

\mathbf{F}_{\vartheta}\subset\bigcup_{F_{0}\in\mathbf{F}_{0}}B(F_{0},\delta),

(37)

i.e., the minimal covering of $\mathbf{F}_{\vartheta}$ consisting of balls of radius $\delta$ around $F_{0}\in\mathbf{F}_{0}$ . For any $F_{\vartheta}\in\mathbf{F}_{\vartheta}$ and $F_{0}\in\mathbf{F}_{0}$ , the following inequality holds:

\displaystyle\min_{1\leq i\leq m}\lVert F(s(t_{i}))-F_{\vartheta}\rVert_{2}% \begin{aligned} &\leq\lVert F(s(t_{j}))-F_{\vartheta}\rVert_{2}\\ &=\lVert F(s(t_{j}))-F_{0}+F_{0}-F_{\vartheta}\rVert_{2}\\ &\leq\lVert F(s(t_{j}))-F_{0}\rVert_{2}+\lVert F_{0}-F_{\vartheta}\rVert_{2},% \quad j=1,\cdots,m.\end{aligned}

(38)

Observe that $\lVert F_{0}-F_{\vartheta}\rVert_{2}$ is bounded by the radius $\delta$ , hence using Equation (37), it follows that

\min_{1\leq i\leq m}\lVert F(s(t_{i}))-F_{\vartheta}\rVert_{2}\leq\delta+\max_% {F_{0}\in\mathbf{F}_{0}}\min_{1\leq i\leq m}\lVert F(s(t_{i}))-F_{0}\rVert_{2}% \leq\varepsilon,

(39)

for some $\epsilon>0$ . Further, taking the supremum over all $F_{\vartheta}$ , the relation still holds:

\sup_{F_{\vartheta}\in\mathbf{F}_{\vartheta}}\min_{1\leq i\leq m}\lVert F(s(t_% {i}))-F_{\vartheta}\rVert_{2}\leq\delta+\max_{F_{0}\in\mathbf{F}_{0}}\min_{1% \leq i\leq m}\lVert F(s(t_{i}))-F_{0}\rVert_{2}\leq\varepsilon,

(40)

Then the probability that $\sup_{F_{\vartheta}\in\mathbf{F}_{\vartheta}}\min_{1\leq i\leq m}\lVert F(s(t_% {i}))-F_{\vartheta}\rVert_{2}$ exceeds $\varepsilon$ is bounded as

\displaystyle\text{Pr}\left(\sup_{F_{\vartheta}\in\mathbf{F}_{\vartheta}}\min_% {1\leq i\leq m}\lVert F(s(t_{i}))-F_{\vartheta}\rVert_{2}>\varepsilon\right)% \begin{aligned} &\leq\text{Pr}\left(\delta+\max_{F_{0}\in\mathbf{F}_{0}}\min_{% 1\leq i\leq m}\lVert F(s(t_{i}))-F_{0}\rVert_{2}>\varepsilon\right)\\ &=\text{Pr}\left(\max_{F_{0}\in\mathbf{F}_{0}}\min_{1\leq i\leq m}\lVert F(s(t% _{i}))-F_{0}\rVert_{2}>\varepsilon-\delta\right)\end{aligned}

(41)

From Equation (36), $\sup_{F_{\vartheta}\in\mathbf{F}_{\vartheta}}\min_{1\leq i\leq m}\lVert F(s(t_% {i}))-F_{\vartheta}\rVert_{2}=\text{d}_{\text{H}}(\mathbf{F}_{m},\mathbf{F}_{% \vartheta})$ , hence a bound on the probability of $\text{d}_{\text{H}}(\mathbf{F}_{m},\mathbf{F}_{\vartheta})$ exceeding $\varepsilon$ can be obtained as

\text{Pr}\left(\text{d}_{\text{H}}(\mathbf{F}_{m},\mathbf{F}_{\vartheta})>% \varepsilon\right)\leq\text{Pr}\left(\max_{F_{0}\in\mathbf{F}_{0}}\min_{1\leq i% \leq m}\lVert F(s(t_{i}))-F_{0}\rVert_{2}>\varepsilon-\delta\right).

(42)

Observe that $\mathbf{F}_{\vartheta}$ endowed with the Hausdorff metric is complete and separable (Attouch et al., 1991). Then for $\varepsilon$ small enough, the following bound holds (Cuevas and Rodríguez-Casal, 2004):

\text{Pr}\left(\max_{F_{0}\in\mathbf{F}_{0}}\min_{1\leq i\leq m}\lVert F(s(t_{% i}))-F_{0}\rVert_{2}>\varepsilon-\delta\right)\leq C\left(1-\kappa\omega(% \varepsilon-\delta)^{M+1}\right)^{m}.

(43)

The constant $C$ is the number of points in the covering of $\mathbf{F}_{\vartheta}$ , i.e., $|\mathbf{F}_{0}|$ , and $\omega$ is the Lebesgue measure of the unit ball in $\mathbb{R}^{M+1}$ . Note that since $0\leq\kappa\omega(\varepsilon-\delta)^{M+1}\leq 1$ , it follows that $\left(1-\kappa\omega(\varepsilon-\delta)^{M+1}\right)^{m}\leq e^{-m\kappa% \omega(\varepsilon-\delta)^{M+1}}$ . This allows for Equations (42) and (43) to be rewritten as

\text{Pr}\left(\text{d}_{\text{H}}(\mathbf{F}_{m},\mathbf{F}_{\vartheta})>% \varepsilon\right)\leq\text{Pr}\left(\max_{F_{0}\in\mathbf{F}_{0}}\min_{1\leq i% \leq m}\lVert F(s(t_{i}))-F_{0}\rVert_{2}>\varepsilon-\delta\right)\leq Ce^{-m% \kappa\omega(\varepsilon-\delta)^{M+1}}.

(44)

Choose some $K>\left(\frac{2}{\kappa\omega}\right)^{\frac{1}{M+1}}$ and let $\varepsilon_{m}(r)=\left(r-\frac{m-l}{m}\right)$ , where $l$ is the number of missing observations in the initial time series and $m\gg l$ , then for $m$ large enough, it follows that

\text{Pr}\left(\left(\varepsilon_{m}(r)\right)^{-\frac{1}{M+1}}\text{d}_{\text% {H}}(\mathbf{F}_{m},\mathbf{F}_{\vartheta})>K\right)\leq Ce^{-m\kappa\omega% \left((\varepsilon_{m}(r)\frac{2}{\kappa\omega})^{\frac{1}{M+1}}-\delta\right)% ^{M+1}}.

(45)

The above bound can be obtained by simply substituting $\left(\frac{2\varepsilon_{m}(r)}{\kappa\omega}\right)^{\frac{1}{M+1}}$ for $\varepsilon$ in Equation (44). Now consider the sum

\sum_{m}e^{-m\kappa\omega\left((\varepsilon_{m}(r)\frac{2}{\kappa\omega})^{% \frac{1}{M+1}}-\delta\right)^{M+1}},

(46)

and observe that it is convergent if $\varepsilon_{m}(r)\geq\left(\frac{\delta}{K}\right)^{M+1}$ . This condition can always be satisfied given the restriction $K>\left(\frac{2}{\kappa\omega}\right)^{\frac{1}{M+1}}$ and for an appropriate choice of $\kappa$ and $\delta$ . Then by the Borel-Cantelli lemma (Émile Borel, 1909; Cantelli, 1917; Chung and Erdös, 1952), since

\sum_{m}\text{Pr}\left(\left(\varepsilon_{m}(r)\right)^{-\frac{1}{M+1}}\text{d% }_{\text{H}}(\mathbf{F}_{m},\mathbf{F}_{\vartheta})>K\right)<\infty,

(47)

it follows that

\lim_{m\xrightarrow{}\infty}\sup\left(\varepsilon_{m}(r)\right)^{-\frac{1}{M+1% }}\text{d}_{\text{H}}(\mathbf{F}_{m},\mathbf{F}_{\vartheta})\leq K.

(48)

∎

S2 Discussion

The proposed SSE methodology generalizes TDEs to allow for more realistic time-series data that are not uniformly spaced. The SSE method and TDA analysis of the reconstructed state space can be used to detect periodic signals, but there is potential for this framework to provide more useful information. This discussion highlights this claim by demonstrating that different time-series signals with common frequency domain and periodicity properties can exhibit different structural shapes in their reconstructed state spaces. The reconstruction provides qualitative and quantitative information that may be used for downstream analysis. We begin with artificial time-series data, and then provide a motivating application related to the detection and characterization of exoplanets.

S2.1 Denoising and Stability

To evaluate the performance of denoising procedure presented in in this work, and Proposition $4.1$ , the time series in Figure 3a of the main article was replicated at varying noise levels and sample sizes. The probability that any value is unobserved at a given time point is fixed at $0.25$ . Four noise levels $\{0,0.25,0.5,2\}$ and five samples sizes $\{50,100,500,1000,5000\}$ were used in the simulations. For each noise level and sample size combination, the denoising method outlined in Section $4.2$ of the main article was performed and the bottleneck distance between the corresponding persistence diagrams and the theoretical upper-bound are computed. The upper bound computed does not include the multiplicative factor $\frac{2n-1}{\gamma}$ . Figure S1 shows a noisy time series and the outcome after denoising at various frequency thresholds.

For an appropriate choice of frequency threshold, which is generally application dependent, the true underlying signal can be satisfactorily recovered.

For each combination of the noise-level and sample size, the process is repeated 100 times and standard errors are obtained. The results are presented in Figure S2.

The bottleneck distance is bounded above by the error bound for all noise levels and sample sizes as expected. At the same noise level, smaller sample sizes tend to have larger bottleneck distances. This can partly be explained by the fact that the reconstructed space is more sparse (i.e., points in the space are more spread out since there are fewer points). The $H_{0}$ features are more likely to persist longer in such sparse settings. The reverse is true for the error bound in Proposition $4.1$ , which for the same noise level, is higher for larger sample sizes. This follows from the fact that larger sample sizes increases the chance of observing highly noisy terms. These results demonstrate the denoising procedure’s efficacy in controlling noise effects on the SSE’s topological features.

Proposition 4.1 establishes a conservative bound on the bottleneck distance between persistence diagrams of a noise-free and a denoised time series using Fourier methods. The factor $(2n-1)/\gamma$ reflects the poor-conditioning of the Fourier matrix in non-uniform domains. Empirical evidence suggests this bound could be improved in more restricted settings, a topic for future research.

S2.2 Topology-Informed Features

Traditional time series methods (e.g., Lomb, 1976), such as Fourier analysis and autocorrelation, focus primarily on frequency components and periodicity. The SSE method coupled with TDA through techniques like persistent homology allow for further exploration of the time series beyond the observed scalar values. For instance, consider the square wave in Figure S3-bottom-left, where its abrupt transitions challenges frequency-based analyses (Lomb, 1976). Its sharp edges and discontinuities result in significant spectral leakage when analyzed using Fourier methods (Lomb, 1976). However, its SSE captures these sharp transitions (Figure S4)-bottom-left, while its associated persistence diagram in Figure S5-bottom-left effectively distinguishes between high and low states. Specifically, the high and low states of the square wave result in exactly a single cluster for each homology dimension. The persistence diagrams were computed on point-wise centered and scaled embedding maps.

The sawtooth wave (Figure S3-top-left), sine wave (Figure S3-top-right), and the triangular wave (Figure S3-bottom-right) display distinct geometric structures in their embeddings (Figure S4). Note that their persistence diagrams (Figure S5) do not exhibit more significant $H_{1}$ features compared to the square wave.

S2.3 Shape-Agnostic Analysis

Different time-series signals with the same periodicity and frequency information can exhibit unique and additional characteristics when examined through SSE and TDA. In particular, the SSE and TDA analysis pipeline further reveals the presence and longevity of topological features, emphasizing the stability of any cyclic structure in a way that complements traditional time-series methods. For example, the time-series signals in Figure S3 can be embedded in a space that emphasizes the stability of their cyclical structure. If one chooses an embedding dimension $M+1$ and step-size $\tau$ such that $(M+1)\tau$ approximates the period of the time series, the cyclical structure can be well approximated. The four signals were each embedded in $\mathbb{R}^{20}$ , and a scatterplot of their first two principal components are shown in Figure S6. The circular (loops) shapes are indicative of the cyclical structures in the time series. Their corresponding persistence diagrams, computed on the pointwise-centered and scaled SSE matrices are shown in Figure S7 where the persistence of the $H_{1}$ feature can be viewed as quantifying the stability and longevity of these cyclical structures.

S2.4 Exoplanet Time-Series Data

Exoplanets are planets that orbit stars other than our sun. The radial velocity (RV) method is one of the most prolific methods for the detection of exoplanets, and the method used to discover the first exoplanet orbiting a sun-like star (Mayor and Queloz, 1995). The RV method is sometimes referred to as the “wobble method” because an exoplanet is detected through the motion, or the wobble, of its host star. More specifically, an exoplanet is detected by estimating the forward and backward velocity (i.e., the RV) of the star across time, and observing a particular periodic signature in the time series suggesting the RV of the star is induced by an exoplanet. A simulated example of the Keplerian model of the RV of star across time when an exoplanet is orbiting in a circular orbit (i.e., with an eccentricy of zero) is displayed as the red point and line in Figure S8. More background on statistical aspects of the RV method can be found in (Hara and Ford, 2023). Though the RV method has been successful at detecting exoplanets, challenges exist in detecting low-mass exoplanets, such as earth analogs, which, all else equal, produce smaller amplitude signals compared to more massive exoplanets.²²2The exoplanet detected in Mayor and Queloz, (1995) had an RV time series semi-amplitude around 60 m/s, and the RV signal for an earth-like exoplanet orbiting a sun-like star is around 10 cm/s.

Until the recent development of extreme precision radial velocity (EPRV) spectrographs, the detection of low-mass exoplanet RV signals was not generally feasible (Fischer et al., 2016). Estimation of planetary RV signals is especially challenging in the low-mass regime because activity and variability in the atmosphere of a star (e.g., the formation and evolution of star spots) can produce periodic variation imprinted on the spectra that can hide or mimic exoplanet signals (Huélamo et al., 2008; Dumusque, 2016; Davis et al., 2017). For example, the green triangle line in Figure S8 displays the RV times series induced by a single simulated star spot orbiting a star (discussed in more detail below). While the spot is behind the star, the RV is zero. However, when the spot rotates in view, it produces a sinusoid-like signal that is challenging to distinguish from an exoplanet signal in real RV observations. New statistical methods have been developed to address the challenge of detecting low-mass exoplanets in the presence of stellar activity (e.g., Rajpaul et al., 2015; Dumusque, 2018; Holzer et al., 2021a ; Holzer et al., 2021b ; Jones et al., 2022), but none of these methods fully or generally mitigate the issues (Zhao et al., 2022).

While we do not solve the issue with detecting low-mass exoplanets in the presence of stellar activity, ideas from TDE and TDA may prove to be fruitful. In particular, RV measurements of stars other than our sun are highly non-uniform,³³3Figure 3 of Zhao et al., (2022) displays examples of the non-uniformity of RV time-series observations. the proposed SSE method is a necessary foundation to use TDEs for exoplanet detection using the RV method.

To illustrate how TDEs and TDA may be used in this setting, we consider a simple simulated example of a planet RV time series, a star spot RV time series, and the combination of a planet and spot RV times series in Figure S8. The host star is set to have a stellar rotation period of 25.05 days, the planet RV time series has an orbital period of 4 days and is on a circular orbit resulting in a semi-amplitude of 0.87 m/s, and the single spot covers 0.05% of the observed surface area of the star and orbits at 30 degrees latitude generated using the Spot Oscillation And Planet (SOAP) 2.0 code (Dumusque et al., 2014), which induces a semi-amplitude of 0.58 m/s. Realistic RV data has noise, irregular and sparse sampling, more complicated stellar variability, and many other complexities; our purpose here is to illustrate how extensions of SSE and TDA may be helpful in this setting.

The SSE matrices for the three time series are equivalent to the TDE matrices since the observations are uniformly-spaced (i.e., the time difference between observations in each of the three time series is $0.2505$ days). To quantify the longevity and stability of any cyclical structures, the embedding window $((M+1)\tau)$ was chosen to approximate the period of the time series. For the planet RV time series, choosing $M=15$ and taking $\tau=0.2505$ approximates the period of $4$ days. Similarly, for the star spot RV time series, taking a $M=49$ with the same $\tau=0.2505$ approximates the period of $12.75$ days. For the combination of planet and spot RV times series, the embedding window was chosen to approximate the dominant period in the two time series, which is choosing $M=49$ and $\tau=0.2505$ . We note that the embedding window can also be chosen to approximate the less dominant period. This can sometimes result in negligible decrease in the persistence of the most persistent $H_{1}$ feature. The embedding matrix denoted $\mathbf{F}\subset\mathbb{R}^{M+1}$ for each time series is pointwise-centered and scaled. Figure S9 shows the plot of the two principal components of the embedding matrices. The planet’s TDE displays a clear circular pattern reflecting the periodic nature of the planetary signal in its RV time series. In contrast, the spot’s TDE, while still appearing somewhat circular, does not form a closed loop. The combined planet and spot RV time series has a TDE that appears to have several closely matched loops. These differences in structure in the TDEs are apparent in the corresponding persistence diagram, which is shown in Figure S10. The planet’s persistence diagram has the most persistent $H_{1}$ feature, while the spot’s persistence diagram has a less persistent, though still apparent, $H_{1}$ feature. The persistence diagram of the combined planet and spot has a persistent $H_{1}$ feature, but also has a cluster of low persistence $H_{1}$ features.

The long-term goal of this sort of RV analysis is to detect and characterize the planetary signal in the presence of stellar activity and other sources of (possibility periodic) variability. Unlike the simplified simulated data used here to illustrate concepts, real RV times series are highly non-uniform and noisy, and contain many complicated and periodic stellar variability signals, along with instrumental biases and other sources of uncertainty. Dominant periodic signals in RV time series can vary widely among planetary, stellar variability, instrumental, and other sources.

S2.5 Summary

Our methodology has implications for real-world applications, such as the detection of exoplanets in the presence of stellar variability. The proposed SSE method addresses challenges posed by non-uniformly sampled time-series data for TDEs. The SSE method preserves the topology of the state space, mitigating the artificial noisy structures introduced by irregular time points, or structural shifts introduced by imputation methods. Coupling the SSE with TDA further enhances its robustness, since TDA is robust to noise and local distortions in the geometry of the state spaces. This advancement ensures that the reconstructed state space accurately reflects the true dynamics of the system, even with non-uniform and noisy sampling. This is particularly relevant in real-world applications where data is often sampled at irregular intervals with an inherent noisy structure.

Future research will further explore the full potential of this methodology, integrating it with conventional statistical techniques, developing more efficient algorithms, and applying these methods to a broader range of real-world data to uncover new insights and structures not readily apparent on scalar time-series data.