Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Independence Testing for Temporal Data

Cencheng Shen shenc@udel.edu
Department of Applied Economics and Statistics
University of Delaware
Jaewon Chung j1c@jhu.edu
Department of Biomedical Engineering
Johns Hopkins University
Ronak Mehta ronakdm@uw.edu
Department of Statistics
University of Washington
Ting Xu Ting.Xu@childmind.org
Child Mind Institute
Joshua T. Vogelstein jovo@jhu.edu
Department of Biomedical Engineering
Johns Hopkins University
Abstract

Temporal data are increasingly prevalent in modern data science. A fundamental question is whether two time series are related or not. Existing approaches often have limitations, such as relying on parametric assumptions, detecting only linear associations, and requiring multiple tests and corrections. While many non-parametric and universally consistent dependence measures have recently been proposed, directly applying them to temporal data can inflate the p-value and result in an invalid test. To address these challenges, this paper introduces the temporal dependence statistic with block permutation to test independence between temporal data. Under proper assumptions, the proposed procedure is asymptotically valid and universally consistent for testing independence between stationary time series, and capable of estimating the optimal dependence lag that maximizes the dependence. Moreover, it is compatible with a rich family of distance and kernel based dependence measures, eliminates the need for multiple testing, and exhibits excellent testing power in various simulation settings.

1 Introduction

Temporal data, often referred to as time series, finds wide applications across diverse domains, such as functional magnetic resonance imaging (fMRI) in neuroscience, dynamic social networks in sociology, financial indices, etc. In a broader context, temporal data can be seen as a type of structural data characterized by inherent underlying patterns. When dealing with temporal data, a fundamental problem is to determine the presence of a relationship between two jointly observed time series.

In the context of standard independent and identically distributed (i.i.d.) data, where observations (X1,Y1),(X2,Y2),,(Xn,Yn)subscript𝑋1subscript𝑌1subscript𝑋2subscript𝑌2subscript𝑋𝑛subscript𝑌𝑛(X_{1},Y_{1}),(X_{2},Y_{2}),\ldots,(X_{n},Y_{n})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) are drawn independently and identically from the joint distribution FXYsubscript𝐹𝑋𝑌F_{XY}italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT, the question simplifies to whether the underlying random variables X𝑋Xitalic_X and Y𝑌Yitalic_Y are independent, i.e., FXY=FXFYsubscript𝐹𝑋𝑌subscript𝐹𝑋subscript𝐹𝑌F_{XY}=F_{X}F_{Y}italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. Many recent dependence measures have been proposed to tackle this problem, aiming to achieve valid and universally consistent independence testing. These methods include distance correlation (Szekely et al., 2007; Szekely & Rizzo, 2009; 2014), Hilbert-Schmidt independence criterion (Gretton et al., 2005; Gretton & Gyorfi, 2010; Gretton et al., 2012), multiscale graph correlation (Vogelstein et al., 2019; Shen et al., 2020; Lee et al., 2019), and many others (Heller et al., 2013; Zhu et al., 2017; Pan et al., 2020).

However, the standard testing framework is not applicable to structured data such as time series, because the i.i.d. assumption often does not hold. As a result, standard testing procedures like the permutation test are known to produce inflated p-values and are thus unsuitable for testing structured data (Guillot & Rousset, 2013; DiCiccio & Romano, 2017). Existing research on testing independence for temporal data is limited, often relying on linear measures such as autocorrelation and cross-correlation, which may overlook potential nonlinear relationships (Wang et al., 2021). A commonly made assumption is to consider the sample data as stationary, meaning that the joint distribution of (Xt,Ytl)subscript𝑋𝑡subscript𝑌𝑡𝑙(X_{t},Y_{t-l})( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_t - italic_l end_POSTSUBSCRIPT ) depends only on the lag l𝑙litalic_l and not on any specific time index t𝑡titalic_t. Approaches for addressing the instantaneous time problem, where the goal is to detect whether Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are independent, have been explored in Chwialkowski & Gretton (2014). Moreover, Chwialkowski et al. (2014) investigates the problem of testing between Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Ytlsubscript𝑌𝑡𝑙Y_{t-l}italic_Y start_POSTSUBSCRIPT italic_t - italic_l end_POSTSUBSCRIPT for each lag l𝑙litalic_l separately, employing multiple testing techniques.

In this paper, we propose an aggregated temporal statistic and utilize a block permutation procedure to extend the scope of independence testing beyond the i.i.d. assumption. Given a standard dependence measure such as distance correlation, our method first calculates a set of cross dependence statistics. These statistics not only facilitate the estimation of the optimal dependence lag, but also enable the computation of the temporal dependence statistic as a weighted aggregation of all cross dependence statistics. Subsequently, we employ a block permutation procedure to derive a p-value for hypothesis testing. Under proper assumptions regarding the choice of the dependence measure, the joint distribution of the temporal data, and the parameters of the block permutation, we establish the asymptotic properties of the temporal dependence, and prove the asymptotic validity and universal consistency of our method. Notably, the proposed temporal dependence method is non-parametric and does not require multiple testing.

Numerically, we show that the proposed approach yields satisfactory testing power when applied to simulated time series with small sample sizes. It is compatible with various dependence measure choices, and numerically superior and more versatile than previously proposed time series testing procedures. Additionally, we present the results of two real-data experiments, utilizing the proposed method to analyze neural connectivity based on fMRI data, as well as uncovering interesting temporal dependencies between the general stock market and low-beta stocks.

2 Method

2.1 Hypothesis for Testing Temporal Dependence

Given the joint sample data {(X1,Y1),,(Xn,Yn)}subscript𝑋1subscript𝑌1subscript𝑋𝑛subscript𝑌𝑛\{(X_{1},Y_{1}),...,(X_{n},Y_{n})\}{ ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }, let X={X1,,Xn}p×n𝑋subscript𝑋1subscript𝑋𝑛superscript𝑝𝑛\vec{X}=\{X_{1},\ldots,X_{n}\}\in\mathbb{R}^{p\times n}over→ start_ARG italic_X end_ARG = { italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_n end_POSTSUPERSCRIPT and Y={Y1,,Yn}q×n𝑌subscript𝑌1subscript𝑌𝑛superscript𝑞𝑛\vec{Y}=\{Y_{1},\ldots,Y_{n}\}\in\mathbb{R}^{q\times n}over→ start_ARG italic_Y end_ARG = { italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_q × italic_n end_POSTSUPERSCRIPT represent each individual sample data. Here, p𝑝pitalic_p and q𝑞qitalic_q denote the dimensions and are positive integers, and n𝑛nitalic_n is the sample size.

Suppose (X,Y)𝑋𝑌(\vec{X},\vec{Y})( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) is strictly stationary, meaning the distribution at any set of indices remains the same. We can represent the distributions of Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at any point t𝑡titalic_t as FXsubscript𝐹𝑋F_{X}italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT and FYsubscript𝐹𝑌F_{Y}italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, and represent the distribution of (Xt,Ytl)subscript𝑋𝑡subscript𝑌𝑡𝑙(X_{t},Y_{t-l})( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_t - italic_l end_POSTSUBSCRIPT ) as FXYlsubscript𝐹𝑋subscript𝑌𝑙F_{XY_{-l}}italic_F start_POSTSUBSCRIPT italic_X italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT for each lag l0𝑙0l\geq 0italic_l ≥ 0.

We aim to test the following independence hypothesis between X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG:

H0:FXYl:subscript𝐻0subscript𝐹𝑋subscript𝑌𝑙\displaystyle H_{0}:F_{XY_{-l}}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_F start_POSTSUBSCRIPT italic_X italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT =FXFY for each l{0,1,,L}absentsubscript𝐹𝑋subscript𝐹𝑌 for each 𝑙01𝐿\displaystyle=F_{X}F_{Y}\text{ for each }l\in\{0,1,...,L\}= italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT for each italic_l ∈ { 0 , 1 , … , italic_L }
HA:FXYl:subscript𝐻𝐴subscript𝐹𝑋subscript𝑌𝑙\displaystyle H_{A}:F_{XY_{-l}}italic_H start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT : italic_F start_POSTSUBSCRIPT italic_X italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT FXFY for some l{0,1,,L},absentsubscript𝐹𝑋subscript𝐹𝑌 for some 𝑙01𝐿\displaystyle\neq F_{X}F_{Y}\text{ for some }l\in\{0,1,...,L\},≠ italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT for some italic_l ∈ { 0 , 1 , … , italic_L } ,

Here, L𝐿Litalic_L is a non-negative integer denoting the maximum lag under consideration. Essentially, the null hypothesis states that Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is independent of present and past values of Ytlsubscript𝑌𝑡𝑙Y_{t-l}italic_Y start_POSTSUBSCRIPT italic_t - italic_l end_POSTSUBSCRIPT for all of l=0,,L𝑙0𝐿l=0,\ldots,Litalic_l = 0 , … , italic_L. In contrast, the alternative hypothesis suggests (X,Yl)𝑋subscript𝑌𝑙(\vec{X},\vec{Y}_{-l})( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) are dependent for at least one l𝑙litalic_l in the range of [0,L]0𝐿[0,L][ 0 , italic_L ].

This setting is, in fact, a generalization of the standard i.i.d. setting, where it was assumed that (X1,Y1),(X2,Y2),,(Xn,Yn)i.i.d.FXY(X_{1},Y_{1}),(X_{2},Y_{2}),\ldots,(X_{n},Y_{n})\stackrel{{\scriptstyle i.i.d.% }}{{\sim}}F_{XY}( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG italic_i . italic_i . italic_d . end_ARG end_RELOP italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT, and the null hypothesis simplifies to FXY=FXFYsubscript𝐹𝑋𝑌subscript𝐹𝑋subscript𝐹𝑌F_{XY}=F_{X}F_{Y}italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT because there is no possible dependence other than l=0𝑙0l=0italic_l = 0. Hence, our subsequent method and theory for testing two time series are also applicable when only one of them is time series or when both are standard i.i.d. data. Moreover, they are applicable to any general structured data that can be assumed stationary.

2.2 Main Algorithm

The proposed method consists of four steps: computation of the cross-lag dependence statistics, estimation of the optimal dependence lag, computation of temporal dependence statistic, and block permutation to obtain the p-value for testing purposes. Details regarding the choice of the dependence measure, block permutation, and computational complexity are discussed in the following subsections.

  • Input:

    Two jointly-sampled datasets represented as Xp×n𝑋superscript𝑝𝑛\vec{X}\in\mathbb{R}^{p\times n}over→ start_ARG italic_X end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_n end_POSTSUPERSCRIPT and Yq×n𝑌superscript𝑞𝑛\vec{Y}\in\mathbb{R}^{q\times n}over→ start_ARG italic_Y end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_q × italic_n end_POSTSUPERSCRIPT, a given choice of sample dependence measure τn(,):p×n×q×n:subscript𝜏𝑛superscript𝑝𝑛superscript𝑞𝑛\tau_{n}(\cdot,\cdot):\mathbb{R}^{p\times n}\times\mathbb{R}^{q\times n}% \rightarrow\mathbb{R}italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ , ⋅ ) : blackboard_R start_POSTSUPERSCRIPT italic_p × italic_n end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_q × italic_n end_POSTSUPERSCRIPT → blackboard_R, and three positive integers: the lag limit L𝐿Litalic_L, the number of blocks B𝐵Bitalic_B, and the number of random permutations R𝑅Ritalic_R.

  • Step 1:

    Compute the set of cross dependence sample statistics {τn(X,Yl),l=0,,L}formulae-sequencesubscript𝜏𝑛𝑋subscript𝑌𝑙𝑙0𝐿\{\tau_{n}(\vec{X},\vec{Y}_{-l}),l=0,\ldots,L\}{ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) , italic_l = 0 , … , italic_L }. Here, (X,Yl)𝑋subscript𝑌𝑙(\vec{X},\vec{Y}_{-l})( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) denotes the sample data with l𝑙litalic_l lags apart, which consists of (nl)𝑛𝑙(n-l)( italic_n - italic_l ) pairs of observations:

    (X,Yl)={(X1+l,Y1),(X2+l,Y2),,(Xn,Ynl)}.𝑋subscript𝑌𝑙subscript𝑋1𝑙subscript𝑌1subscript𝑋2𝑙subscript𝑌2subscript𝑋𝑛subscript𝑌𝑛𝑙\displaystyle(\vec{X},\vec{Y}_{-l})=\{(X_{1+l},Y_{1}),(X_{2+l},Y_{2}),\ldots,(% X_{n},Y_{n-l})\}.( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) = { ( italic_X start_POSTSUBSCRIPT 1 + italic_l end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_X start_POSTSUBSCRIPT 2 + italic_l end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ) } .
  • Step 2:

    Estimate the optimal dependence lag:

    L^superscript^𝐿\displaystyle\hat{L}^{*}over^ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =argmaxl[0,L](nln)τn(X,Yl).absentsubscript𝑙0𝐿𝑛𝑙𝑛subscript𝜏𝑛𝑋subscript𝑌𝑙\displaystyle=\arg\max_{l\in[0,L]}\left(\frac{n-l}{n}\right)\cdot\tau_{n}(\vec% {X},\vec{Y}_{-l}).= roman_arg roman_max start_POSTSUBSCRIPT italic_l ∈ [ 0 , italic_L ] end_POSTSUBSCRIPT ( divide start_ARG italic_n - italic_l end_ARG start_ARG italic_n end_ARG ) ⋅ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) .

    Here, the weight (nln)𝑛𝑙𝑛\left(\frac{n-l}{n}\right)( divide start_ARG italic_n - italic_l end_ARG start_ARG italic_n end_ARG ) simply weights each cross dependence statistic based on the number of observations it uses.

  • Step 3:

    Compute the temporal dependence sample statistic:

    Tn(X,Y)subscriptT𝑛𝑋𝑌\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) =l=0L(nln)τn(X,Yl).absentsuperscriptsubscript𝑙0𝐿𝑛𝑙𝑛subscript𝜏𝑛𝑋subscript𝑌𝑙\displaystyle=\sum_{l=0}^{L}\left(\frac{n-l}{n}\right)\cdot\tau_{n}(\vec{X},% \vec{Y}_{-l}).= ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( divide start_ARG italic_n - italic_l end_ARG start_ARG italic_n end_ARG ) ⋅ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) .
  • Step 4:

    Compute the p-value using block permutation:

    p-val=r=1RI(Tn(X,Y)>Tn(X,YπB))/R,p-valsuperscriptsubscript𝑟1𝑅IsubscriptT𝑛𝑋𝑌subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵𝑅\displaystyle\mbox{p-val}=\sum_{r=1}^{R}\mbox{I}(\mathrm{T}_{n}(\vec{X},\vec{Y% })>\mathrm{T}_{n}(\vec{X},\vec{Y}_{\pi_{B}}))/R,p-val = ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT I ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) > roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) / italic_R ,

    where I()I\mbox{I}(\cdot)I ( ⋅ ) is the 0-1 indicator function, and πBsubscript𝜋𝐵\pi_{B}italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT is a randomly generated block permutation for each r𝑟ritalic_r.

  • Output:

    The temporal dependence statistic TT\mathrm{T}roman_T, the corresponding p-value, and the estimated optimal dependence lag L^superscript^𝐿\hat{L}^{*}over^ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

The null hypothesis is rejected if the p-value is less than a pre-specified Type 1 error level, such as 0.050.050.050.05.

2.3 Choice of Dependence Measure

While the algorithm can accommodate any dependence measure as the choice of τn(,)subscript𝜏𝑛\tau_{n}(\cdot,\cdot)italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ , ⋅ ), it is essential for the chosen measure to be well-behaved and satisfy the required assumptions outlined in Section 3.1. This ensures consistency in detecting dependence between temporal data, both in terms of performance and subsequent theory. In our experiments, we employed distance correlation, Hilbert-Schmidt independence criterion, and multiscale graph correlation. All of these measures meet the necessary assumptions, and the resulting tests appear valid and consistent in our numerical experiments.

As the proposed temporal statistic is essentially an aggregation of the underlying dependence measure, its effectiveness in capturing dependence is contingent upon the choice of dependence measure. It is well known that each dependence measure has its own unique strengths. Therefore, our usage of distance and kernel based statistics in this paper should be viewed as an illustration of the validity and consistency properties of the proposed temporal test.

Some examples of other dependence measures include correlation coefficients (Fukumizu et al., 2007; Bießmann et al., 2010), Chatterjee’s rank correlation (Chatterjee, 2021; Shi et al., 2021; 2022), the HHG method (Heller et al., 2013; 2016), projection correlation (Zhu et al., 2017), ball covariance (Pan et al., 2020), as well as recent high-dimensional dependence statistics (Zhu et al., 2020; Huang & Huo, 2022; Shen & Dong, 2024; Xu et al., 2024; Zhou et al., 2024). All of these dependence measures can be directly incorporated into our temporal testing framework by simply modifying the cross-dependence statistics in Step 1. Such adaptations may offer better testing power for certain dependence structures.

For instance, using the correlation coefficient with block permutation will only detect linear associations in temporal data, while a universally consistent dependence measure can detect all possible dependencies with a sufficiently large sample size; dependence measures that are better at detecting nonlinear or high-dimensional dependencies in standard i.i.d. data will also perform better under such dependencies in the case of temporal data, requiring a smaller sample size to achieve perfect testing power; rank-based dependence measures can be more robust against data noise.

2.4 The Block Permutation Test

The standard permutation test is widely used for independence testing (Good, 2005). In a standard permutation, π()𝜋\pi(\cdot)italic_π ( ⋅ ) randomly permutes the indices 1,2,,n12𝑛1,2,\ldots,n1 , 2 , … , italic_n, resulting in Yπsubscript𝑌𝜋\vec{Y}_{\pi}over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and X𝑋\vec{X}over→ start_ARG italic_X end_ARG that are mostly independent (except for a few indices that do not change position, which are asymptotically negligible as n𝑛nitalic_n increases). Given sufficiently many random permutations, this process allows the permuted test statistics to estimate the true null distribution.

However, the above is only true under the standard i.i.d. setting, and it no longer holds when there exists structural dependence within the sample sequence, such as when (Xt,Yt)subscript𝑋𝑡subscript𝑌𝑡(X_{t},Y_{t})( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) are dependent with (Xt1,Yt1)subscript𝑋𝑡1subscript𝑌𝑡1(X_{t-1},Y_{t-1})( italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ). Specifically, the permuted statistics would under-estimate the true null distribution, leading to an inflation of the testing power. This issue has been noted in Guillot & Rousset (2013); DiCiccio & Romano (2017), which can affect any dependence measure that relies on the standard permutation test.

To ensure validity of the test, we employ a block permutation procedure (Politis, 2003) denoted as πB()subscript𝜋𝐵\pi_{B}(\cdot)italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( ⋅ ), where B𝐵Bitalic_B denotes the number of blocks. The construction of πB()subscript𝜋𝐵\pi_{B}(\cdot)italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( ⋅ ) proceeds as follows:

We partition the index list into B𝐵Bitalic_B consecutive blocks. For j=1,,B𝑗1𝐵j=1,\ldots,Bitalic_j = 1 , … , italic_B, block j𝑗jitalic_j consists of indices

Bj=(nB(j1)+1,nB(j1)+2,,nBj1).subscript𝐵𝑗𝑛𝐵𝑗11𝑛𝐵𝑗12𝑛𝐵𝑗1\displaystyle B_{j}=(\lceil\frac{n}{B}\rceil*(j-1)+1,\lceil\frac{n}{B}\rceil*(% j-1)+2,\ldots,\lceil\frac{n}{B}\rceil*j-1).italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( ⌈ divide start_ARG italic_n end_ARG start_ARG italic_B end_ARG ⌉ ∗ ( italic_j - 1 ) + 1 , ⌈ divide start_ARG italic_n end_ARG start_ARG italic_B end_ARG ⌉ ∗ ( italic_j - 1 ) + 2 , … , ⌈ divide start_ARG italic_n end_ARG start_ARG italic_B end_ARG ⌉ ∗ italic_j - 1 ) .

Note that for the last block, the last few indices may exceed n𝑛nitalic_n, in which case the indices wrap around and restart from 1111.

As an example, consider a sample size of n=100𝑛100n=100italic_n = 100 and B=20𝐵20B=20italic_B = 20 blocks, with each block containing 5555 indices. Then the first block would be (Y1,Y2,,Y5)subscript𝑌1subscript𝑌2subscript𝑌5(Y_{1},Y_{2},...,Y_{5})( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ), the second block would be (Y6,Y7,,Y10)subscript𝑌6subscript𝑌7subscript𝑌10(Y_{6},Y_{7},...,Y_{10})( italic_Y start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ), etc. During the block permutation process, each block is shifted to another position. For instance, the first block might be permuted to the fourth block, resulting in πB(1)=16,πB(2)=17,πB(3)=18,πB(1)=19,πB(1)=20formulae-sequencesubscript𝜋𝐵116formulae-sequencesubscript𝜋𝐵217formulae-sequencesubscript𝜋𝐵318formulae-sequencesubscript𝜋𝐵119subscript𝜋𝐵120\pi_{B}(1)=16,\pi_{B}(2)=17,\pi_{B}(3)=18,\pi_{B}(1)=19,\pi_{B}(1)=20italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( 1 ) = 16 , italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( 2 ) = 17 , italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( 3 ) = 18 , italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( 1 ) = 19 , italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( 1 ) = 20. This shuffling of blocks ensures a randomized distribution of data while maintaining the block structure.

2.5 Parameter Choice and Computational Complexity

The choice of the maximum lag, denoted as L𝐿Litalic_L, is typically determined based on subject matter considerations. For example, if the signal from one region of the brain can only influence another region within a range of 20202020 time steps, then setting L=20𝐿20L=20italic_L = 20 would be appropriate. Similarly, when collecting daily stock trading data for two stocks, choosing L=30𝐿30L=30italic_L = 30 indicates that we are examining the dependence structure within the past month.

As for the number of blocks, we used B=20𝐵20B=20italic_B = 20 in our experiments, which is sufficient for our purposes. For the number of permutation, we used R=1000𝑅1000R=1000italic_R = 1000 replicates. Assuming that the dependence measure can be computed in O(n2)𝑂superscript𝑛2O(n^{2})italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time complexity (which is the case for distance correlation), the temporal independence test has a time complexity of O(n2RL)𝑂superscript𝑛2𝑅𝐿O(n^{2}RL)italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R italic_L ).

3 Supporting Theory

In this section, we establish the asymptotic properties of the test statistics and the resulting tests, which include asymptotic convergence, validity, and consistency. We begin by outlining the necessary assumptions for the theoretical results, followed by detailed elaborations on each assumption. All theorem proofs can be found in the Appendix Section B.

3.1 Assumptions

  • The observed data {(Xt,Yt)}t=1nsuperscriptsubscriptsubscript𝑋𝑡subscript𝑌𝑡𝑡1𝑛\{(X_{t},Y_{t})\}_{t=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is strictly stationary, non-constant, and the underlying distribution FXYlsubscript𝐹𝑋subscript𝑌𝑙F_{XY_{-l}}italic_F start_POSTSUBSCRIPT italic_X italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT has finite moments for any lag l0𝑙0l\geq 0italic_l ≥ 0.

  • There exists a maximum dependence lag M𝑀Mitalic_M such that for all lM𝑙𝑀l\geq Mitalic_l ≥ italic_M, the two time series are almost independent for large n𝑛nitalic_n, so are each time series within itself:

    sup|FXYlFXFY|supremumsubscript𝐹𝑋subscript𝑌𝑙subscript𝐹𝑋subscript𝐹𝑌\displaystyle\sup|F_{XY_{-l}}-F_{X}F_{Y}|roman_sup | italic_F start_POSTSUBSCRIPT italic_X italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT | =O(1n),absent𝑂1𝑛\displaystyle=O(\frac{1}{n}),= italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,
    sup|FXXlFXFX|supremumsubscript𝐹𝑋subscript𝑋𝑙subscript𝐹𝑋subscript𝐹𝑋\displaystyle\sup|F_{XX_{-l}}-F_{X}F_{X}|roman_sup | italic_F start_POSTSUBSCRIPT italic_X italic_X start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT | =O(1n),absent𝑂1𝑛\displaystyle=O(\frac{1}{n}),= italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,
    sup|FYYlFYFY|supremumsubscript𝐹𝑌subscript𝑌𝑙subscript𝐹𝑌subscript𝐹𝑌\displaystyle\sup|F_{YY_{-l}}-F_{Y}F_{Y}|roman_sup | italic_F start_POSTSUBSCRIPT italic_Y italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT | =O(1n).absent𝑂1𝑛\displaystyle=O(\frac{1}{n}).= italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) .
  • The maximum dependence lag M𝑀Mitalic_M and the maximum lag under consideration L𝐿Litalic_L are non-negative integers that satisfies LM𝐿𝑀L\geq Mitalic_L ≥ italic_M and L=o(n)𝐿𝑜𝑛L=o(n)italic_L = italic_o ( italic_n ), i.e., they may increase together with n𝑛nitalic_n but at a slower pace.

  • As the sample size n𝑛nitalic_n increases, both the number of blocks B𝐵Bitalic_B and the number of observations per block nB𝑛𝐵\frac{n}{B}divide start_ARG italic_n end_ARG start_ARG italic_B end_ARG increase to infinity. Moreover, nBM𝑛𝐵𝑀\frac{n}{B}\geq Mdivide start_ARG italic_n end_ARG start_ARG italic_B end_ARG ≥ italic_M for sufficiently large n𝑛nitalic_n.

  • The sample dependence measure has the following form:

    τn(X,Y)subscript𝜏𝑛𝑋𝑌\displaystyle\tau_{n}(\vec{X},\vec{Y})italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) =i=1nj=1nγn(i,j)n2,absentsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑛subscript𝛾𝑛𝑖𝑗superscript𝑛2\displaystyle=\frac{\sum_{i=1}^{n}\sum_{j=1}^{n}\gamma_{n}(i,j)}{n^{2}},= divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i , italic_j ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

    where each γn(i,j)subscript𝛾𝑛𝑖𝑗\gamma_{n}(i,j)italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i , italic_j ) is a function of (Xi,Xj,Yi,Yj)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑖subscript𝑌𝑗(X_{i},X_{j},Y_{i},Y_{j})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), and remaining sample pairs may also be used but with a weight of O(1/n)𝑂1𝑛O(1/n)italic_O ( 1 / italic_n ).

  • In the standard i.i.d. setting where (X1,Y1),(X2,Y2),,(Xn,Yn)i.i.d.FXY(X_{1},Y_{1}),(X_{2},Y_{2}),\ldots,(X_{n},Y_{n})\stackrel{{\scriptstyle i.i.d.% }}{{\sim}}F_{XY}( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG italic_i . italic_i . italic_d . end_ARG end_RELOP italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT, there exists a population statistic τ(X,Y)𝜏𝑋𝑌\tau(X,Y)italic_τ ( italic_X , italic_Y ) defined solely based on the joint distribution FXYsubscript𝐹𝑋𝑌F_{XY}italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT. When ij𝑖𝑗i\neq jitalic_i ≠ italic_j, each term in the sample statistic satisfies:

    𝔼(γn(i,j))𝔼subscript𝛾𝑛𝑖𝑗\displaystyle\mathbb{E}(\gamma_{n}(i,j))blackboard_E ( italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i , italic_j ) ) =τ(X,Y)+o(1).absent𝜏𝑋𝑌𝑜1\displaystyle=\tau(X,Y)+o(1).= italic_τ ( italic_X , italic_Y ) + italic_o ( 1 ) .

    Moreover, the population statistic τ(X,Y)𝜏𝑋𝑌\tau(X,Y)italic_τ ( italic_X , italic_Y ) is non-negative and equals 00 if and only if X𝑋Xitalic_X and Y𝑌Yitalic_Y are independent, i.e., FXY=FXFYsubscript𝐹𝑋𝑌subscript𝐹𝑋subscript𝐹𝑌F_{XY}=F_{X}F_{Y}italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT.

The first assumption is a common one in time series research. The key distinction from the standard i.i.d. setting is that the samples are no longer independent, but remain identically distributed. For non-stationary data, there exist many common techniques to remove trends and process them into approximately stationary processes (Cleveland et al., 1990; Hastie et al., 2009; Enders, 2010; Shumway & Stoffer, 2010; Box et al., 2015). Some examples include differencing, where one computes the difference between consecutive observations; detrending via linear regression or polynomial fitting and subtracting the trend component from the original series; seasonal adjustment by decomposition; log / square root / Box-Cox transformation to stabilize variance; smoothing via moving averages to reduce noise and short-term fluctuations; filtering to remove specific frequencies from the data.

The second and third assumptions require that the time series exhibit independence for sufficiently large lags beyond M𝑀Mitalic_M, and that the maximum lag to be examined, L𝐿Litalic_L, must be no less than M𝑀Mitalic_M. Such an assumption shares similarity with the mixing property, where a stochastic process is mixing if its values at widely-separated times are asymptotically independent (Pham & Tran, 1985; McDonald et al., 2011; Ziemann & Tu, 2022). Hence, our results can also be considered approximately true for mixing time series.

The fourth assumption imposes a regularity condition on block permutation. In theory, choices for B𝐵Bitalic_B can be log(n)𝑙𝑜𝑔𝑛log(n)italic_l italic_o italic_g ( italic_n ) or n𝑛\sqrt{n}square-root start_ARG italic_n end_ARG, while a practical choice like B=20𝐵20B=20italic_B = 20 is sufficient for our simulations. This resembles the Bayes optimal condition for K-nearest-neighbor, where K𝐾Kitalic_K is required to increase to infinity but slower than n𝑛nitalic_n.

The remaining assumptions regarding the dependence measure are satisfied by a variety of distance and kernel measures that have been recently proposed. For example, distance covariance satisfies the two assumptions, with

γn(i,j)={d(Xi,Xj)μXiμXj+μX}{d(Yi,Yj)μYiμYj+μY}.subscript𝛾𝑛𝑖𝑗𝑑subscript𝑋𝑖subscript𝑋𝑗subscript𝜇subscript𝑋𝑖subscript𝜇subscript𝑋𝑗subscript𝜇𝑋𝑑subscript𝑌𝑖subscript𝑌𝑗subscript𝜇subscript𝑌𝑖subscript𝜇subscript𝑌𝑗subscript𝜇𝑌\displaystyle\gamma_{n}(i,j)=\{d(X_{i},X_{j})-\mu_{X_{i}}-\mu_{X_{j}}+\mu_{X}% \}\{d(Y_{i},Y_{j})-\mu_{Y_{i}}-\mu_{Y_{j}}+\mu_{Y}\}.italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i , italic_j ) = { italic_d ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT } { italic_d ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT } .

Here, d(,)𝑑d(\cdot,\cdot)italic_d ( ⋅ , ⋅ ) is the Euclidean distance, μXisubscript𝜇subscript𝑋𝑖\mu_{X_{i}}italic_μ start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT denotes the mean of all distance pairs relative to Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT within X𝑋\vec{X}over→ start_ARG italic_X end_ARG, and μXsubscript𝜇𝑋\mu_{X}italic_μ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT is the mean of the whole pairwise distance matrices of X𝑋\vec{X}over→ start_ARG italic_X end_ARG. Furthermore, the population distance covariance is defined in terms of characteristic functions and equals 00 if and only if FXY=FXFYsubscript𝐹𝑋𝑌subscript𝐹𝑋subscript𝐹𝑌F_{XY}=F_{X}F_{Y}italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT in the standard i.i.d. settings. Indeed, many dependence measures that are universal consistent in the standard i.i.d. setting satisfy this assumption. For example, the Hilbert-Schmidt independence criterion utilizes the same formulation (Shen & Vogelstein, 2021; Sejdinovic et al., 2013) on the Gaussian kernel. Additionally, the unbiased distance covariance and distance correlation, as well as the multiscale graph correlation – a truncated version of distance correlation where large distance pairs may be unused – also satisfy this assumption.

3.2 Convergence of the Sample Statistics

We begin by proving the convergence of the sample cross dependence to the population cross dependence:

Theorem 1.

The cross dependence sample statistic satisfies:

𝔼(τn(X,Yl))τ(X,Yl)=o(1),𝔼subscript𝜏𝑛𝑋subscript𝑌𝑙𝜏𝑋subscript𝑌𝑙𝑜1\displaystyle\mathbb{E}(\tau_{n}(\vec{X},\vec{Y}_{-l}))-\tau(X,Y_{-l})=o(1),blackboard_E ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) ) - italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) = italic_o ( 1 ) ,
Var(τn(X,Yl)))=O(1nl).\displaystyle\text{Var}(\tau_{n}(\vec{X},\vec{Y}_{-l})))=O(\frac{1}{n-l}).Var ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) ) ) = italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n - italic_l end_ARG ) .

Therefore, for each l{0,,L}𝑙0𝐿l\in\{0,...,L\}italic_l ∈ { 0 , … , italic_L }, we have

τn(X,Yl)nτ(X,Yl)superscript𝑛subscript𝜏𝑛𝑋subscript𝑌𝑙𝜏𝑋subscript𝑌𝑙\displaystyle\tau_{n}(\vec{X},\vec{Y}_{-l})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}\tau(X,Y_{-l})italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT )

in probability.

Theorem 1 shows that both the bias and variance of the cross dependence statistic diminish to 00 as the sample size n𝑛nitalic_n increases. Consequently, this guarantees that the aggregated temporal dependence statistic and the estimated optimal lag also converge to their corresponding population forms in probability.

Theorem 2.

The temporal dependence sample statistic satisfies:

Tn(X,Y)nl=0Lτ(X,Yl).superscript𝑛subscriptT𝑛𝑋𝑌superscriptsubscript𝑙0𝐿𝜏𝑋subscript𝑌𝑙\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}\sum_{l=0}^{L}\tau(X,Y_{-l}).roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) .

The estimated optimal dependence lag satisfies:

L^nargmaxl[0,L]τ(X,Yl).superscript𝑛superscript^𝐿subscript𝑙0𝐿𝜏𝑋subscript𝑌𝑙\displaystyle\hat{L}^{*}\stackrel{{\scriptstyle n\rightarrow\infty}}{{% \rightarrow}}\arg\max_{l\in[0,L]}\tau(X,Y_{-l}).over^ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP roman_arg roman_max start_POSTSUBSCRIPT italic_l ∈ [ 0 , italic_L ] end_POSTSUBSCRIPT italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) .

3.3 Validity and Consistency for Testing Temporal Independence

In this subsection we establish the validity and consistency of the method. Specifically, if X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG are independent, the power of the test equals the Type 1 error level α𝛼\alphaitalic_α. Conversely, if X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG are dependent, the power of the test converges to 1111, and the method can consistently detect any dependence.

Given Tn(X,Y)subscriptT𝑛𝑋𝑌\mathrm{T}_{n}(\vec{X},\vec{Y})roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) as the observed test statistic, let FTnB(z)subscript𝐹superscriptsubscript𝑇𝑛𝐵𝑧F_{T_{n}^{B}}(z)italic_F start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_z ) be the empirical distribution of the block-permuted statistics {Tn(X,YπB)}subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵\{\mathrm{T}_{n}(\vec{X},\vec{Y}_{\pi_{B}})\}{ roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) }, and denote zn,αsubscript𝑧𝑛𝛼z_{n,\alpha}italic_z start_POSTSUBSCRIPT italic_n , italic_α end_POSTSUBSCRIPT as the critical value where:

FTnB(z)(zn,α)=1α.subscript𝐹superscriptsubscript𝑇𝑛𝐵𝑧subscript𝑧𝑛𝛼1𝛼\displaystyle F_{T_{n}^{B}}(z)(z_{n,\alpha})=1-\alpha.italic_F start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_z ) ( italic_z start_POSTSUBSCRIPT italic_n , italic_α end_POSTSUBSCRIPT ) = 1 - italic_α .

The following theorem establishes the asymptotic validity of our block permutation test:

Theorem 3 (Asymptotic Validity).

Under the null hypothesis that X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG are independent for all lags l[0,L]𝑙0𝐿l\in[0,L]italic_l ∈ [ 0 , italic_L ], the test statistic satisfies:

Tn(X,Y)n0.superscript𝑛subscriptT𝑛𝑋𝑌0\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}0.roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP 0 .

Moreover, the block-permutation test is asymptotically valid, i.e.,

Prob(Tn(X,Y)zn,α)nα.superscript𝑛𝑃𝑟𝑜𝑏subscriptT𝑛𝑋𝑌subscript𝑧𝑛𝛼𝛼\displaystyle Prob(\mathrm{T}_{n}(\vec{X},\vec{Y})\geq z_{n,\alpha})\stackrel{% {\scriptstyle n\rightarrow\infty}}{{\rightarrow}}\alpha.italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) ≥ italic_z start_POSTSUBSCRIPT italic_n , italic_α end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP italic_α .

The next theorem proves that the method is universally consistent against any alternative.

Theorem 4 (Testing Consistency).

Under the alternative hypothesis that X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG are dependent for some lag l[0,L]𝑙0𝐿l\in[0,L]italic_l ∈ [ 0 , italic_L ], the test statistic satisfies

Tn(X,Y)nc>0.superscript𝑛subscriptT𝑛𝑋𝑌𝑐0\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}c>0.roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP italic_c > 0 .

Moreover, the block-permutation test is asymptotically consistent, i.e.,

Prob(Tn(X,Y)zn,α)n1.superscript𝑛𝑃𝑟𝑜𝑏subscriptT𝑛𝑋𝑌subscript𝑧𝑛𝛼1\displaystyle Prob(\mathrm{T}_{n}(\vec{X},\vec{Y})\geq z_{n,\alpha})\stackrel{% {\scriptstyle n\rightarrow\infty}}{{\rightarrow}}1.italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) ≥ italic_z start_POSTSUBSCRIPT italic_n , italic_α end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP 1 .

4 Simulations

We estimated the testing power of the proposed approach through simulations on various temporal dependence structures. Specifically, we considered three different implementations of the proposed temporal dependence statistic, which utilized distance correlation (DCorr), Hilbert-Schmidt independence criterion (HSIC), and multiscale graph correlation (MGC). For comparison, we included ShiftHSIC (Chwialkowski & Gretton, 2014), WildHSIC (Chwialkowski et al., 2014), and the widely recognized Ljung-Box test (Ljung & Box, 1978) using traditional cross-correlations. Each simulation was repeated 300300300300 times, with 1000100010001000 permutations and a Type 1 error level of α=0.05𝛼0.05\alpha=0.05italic_α = 0.05 used to compute the p𝑝pitalic_p-values. The testing power is measured by how often the p-value is lower than 0.050.050.050.05 out of the 300300300300 Monte-Carlo simulations. Analysis of ShiftHSIC and WildHSIC was performed using MATLAB code111https://github.com/kacperChwialkowski/HSIC/ and wildBootstrap222https://github.com/kacperChwialkowski/wildBootstrap.

4.1 Testing Power Evaluation

Independence

First, we check the validity of the tests by generating two independent, stationary autoregressive time series with a lag of one:

[XtYt]=[ϕ00ϕ][Xt1Yt1]+[ϵtηt].matrixsubscript𝑋𝑡subscript𝑌𝑡matrixitalic-ϕ00italic-ϕmatrixsubscript𝑋𝑡1subscript𝑌𝑡1matrixsubscriptitalic-ϵ𝑡subscript𝜂𝑡\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}=\begin{bmatrix}\phi&0\\ 0&\phi\end{bmatrix}\begin{bmatrix}X_{t-1}\\ Y_{t-1}\end{bmatrix}+\begin{bmatrix}\epsilon_{t}\\ \eta_{t}\end{bmatrix}.[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_ϕ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_ϕ end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] .

Here, (ϵt,ηt)subscriptitalic-ϵ𝑡subscript𝜂𝑡(\epsilon_{t},\eta_{t})( italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) are standard normal noise terms. As shown in Figure 1, the proposed methods maintain a testing power close to α=0.05𝛼0.05\alpha=0.05italic_α = 0.05 across varying n𝑛nitalic_n and ϕitalic-ϕ\phiitalic_ϕ, regardless of the statistic used.

Refer to caption
Figure 1: This figure illustrates the validity of the tests using two independent time series. In the left panel, the testing power is computed as the sample size increases, with an AR coefficient of ϕ=0.5italic-ϕ0.5\phi=0.5italic_ϕ = 0.5. The right panel keeps the sample size at n=1200𝑛1200n=1200italic_n = 1200 while varying the AR coefficient ϕitalic-ϕ\phiitalic_ϕ, with the noise variance appropriately adjusted by (1ϕ2)1superscriptitalic-ϕ2(1-\phi^{2})( 1 - italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), based on the same simulation as in Chwialkowski & Gretton (2014). The dashed black line represents the significance level α=0.05𝛼0.05\alpha=0.05italic_α = 0.05.
Linear Dependence

Next, we assess our methods’ ability to capture linear relationships in the following simulation:

[XtYt]=[0ϕϕ0][Xt1Yt1]+[ϵtηt].matrixsubscript𝑋𝑡subscript𝑌𝑡matrix0italic-ϕitalic-ϕ0matrixsubscript𝑋𝑡1subscript𝑌𝑡1matrixsubscriptitalic-ϵ𝑡subscript𝜂𝑡\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}=\begin{bmatrix}0&\phi\\ \phi&0\end{bmatrix}\begin{bmatrix}X_{t-1}\\ Y_{t-1}\end{bmatrix}+\begin{bmatrix}\epsilon_{t}\\ \eta_{t}\end{bmatrix}.[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_ϕ end_CELL end_ROW start_ROW start_CELL italic_ϕ end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] .

As this represents a straightforward linear relationship, the Ljung-Box test, based on auto-correlation, is expected to perform best. This is indeed the case in the left panel of Figure 2. Our proposed methods using DCorr, MGC, and HSIC follow closely, quickly converging to perfect power around n=100𝑛100n=100italic_n = 100. In contrast, the other competitors do not perform well in this scenario. This is not surprising, as the ShiftHSIC method is designed to detect whether Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are dependent at lag 00, whereas the linear dependence here is of lag 1111. The WildHSIC method used a wild bootstrap method to estimate the null distribution, which can be inaccurate at small sample size.

Nonlinear Dependence

The next simulation considers a nonlinear dependent model:

[XtYt]=[ϵtYt1ηt].matrixsubscript𝑋𝑡subscript𝑌𝑡matrixsubscriptitalic-ϵ𝑡subscript𝑌𝑡1subscript𝜂𝑡\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}=\begin{bmatrix}\epsilon_{t}Y_{t-1}\\ \eta_{t}\end{bmatrix}.[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] .

In the right panel of Figure 2, our proposed methods utilizing DCorr, MGC, and HSIC demonstrate superior performance compared to other competing methods. Notably, the HSIC and MGC implementations exhibit better finite-sample power, as these two dependence measures are better at identifying nonlinear relationships than DCorr. In contrast, all other tests fail to detect dependence in this scenario.

Refer to caption
Figure 2: The testing power for linear (left panel) and nonlinear (right panel) simulations based on 300300300300 replicates.
Extinct Gaussian

This simulation uses the same extinct Gaussian process from Chwialkowski & Gretton (2014), where

[XtYt]=[ϕ00ϕ][Xt1Yt1]+[ϵtηt],matrixsubscript𝑋𝑡subscript𝑌𝑡matrixitalic-ϕ00italic-ϕmatrixsubscript𝑋𝑡1subscript𝑌𝑡1matrixsubscriptitalic-ϵ𝑡subscript𝜂𝑡\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}=\begin{bmatrix}\phi&0\\ 0&\phi\end{bmatrix}\begin{bmatrix}X_{t-1}\\ Y_{t-1}\end{bmatrix}+\begin{bmatrix}\epsilon_{t}\\ \eta_{t}\end{bmatrix},[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_ϕ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_ϕ end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ,

and we set n=1200𝑛1200n=1200italic_n = 1200. Here, the (ϵt,ηt)subscriptitalic-ϵ𝑡subscript𝜂𝑡(\epsilon_{t},\eta_{t})( italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) pair are dependent and drawn from an Extinct Gaussian distribution with two additional parameters: e𝑒eitalic_e (extinction rate) and r𝑟ritalic_r (radius). Both variables are initially drawn from independent standard normal, and U𝑈Uitalic_U is sampled from standard uniform. If either ϵt2+ηt2>rsuperscriptsubscriptitalic-ϵ𝑡2superscriptsubscript𝜂𝑡2𝑟\epsilon_{t}^{2}+\eta_{t}^{2}>ritalic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_r or U>e𝑈𝑒U>eitalic_U > italic_e holds, then (ϵt,ηt)subscriptitalic-ϵ𝑡subscript𝜂𝑡(\epsilon_{t},\eta_{t})( italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) are returned; otherwise, they are discarded and the process is repeated. In this process, the dependence between ϵtsubscriptitalic-ϵ𝑡\epsilon_{t}italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT increases with extinction rate e𝑒eitalic_e. Therefore, we expect power to increase with the extinction rate, which is indeed the case as shown in Figure 3. While all methods, except Ljung-Box, are consistent and eventually achieve perfect power, our proposed method using MGC stands out as the best performer.

Refer to caption
Figure 3: The testing power for the extinct gaussian simulation based on 300300300300 replicates.

4.2 Optimal Dependence Lag Estimation

In this subsection, we evaluate the method’s performance in estimating the optimal dependence lag in both linear and nonlinear settings. The linear setting is

[XtYt]=[0ϕ1ϕ10][Xt1Yt1]+[0ϕ3ϕ30][Xt3Yt3]+[ϵtηt],matrixsubscript𝑋𝑡subscript𝑌𝑡matrix0subscriptitalic-ϕ1subscriptitalic-ϕ10matrixsubscript𝑋𝑡1subscript𝑌𝑡1matrix0subscriptitalic-ϕ3subscriptitalic-ϕ30matrixsubscript𝑋𝑡3subscript𝑌𝑡3matrixsubscriptitalic-ϵ𝑡subscript𝜂𝑡\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}=\begin{bmatrix}0&\phi_{1}\\ \phi_{1}&0\end{bmatrix}\begin{bmatrix}X_{t-1}\\ Y_{t-1}\end{bmatrix}+\begin{bmatrix}0&\phi_{3}\\ \phi_{3}&0\end{bmatrix}\begin{bmatrix}X_{t-3}\\ Y_{t-3}\end{bmatrix}+\begin{bmatrix}\epsilon_{t}\\ \eta_{t}\end{bmatrix},[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t - 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t - 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ,

where we set ϕ3=0.8>ϕ1=0.1subscriptitalic-ϕ30.8subscriptitalic-ϕ10.1\phi_{3}=0.8>\phi_{1}=0.1italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.8 > italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.1 such that the true optimal dependence lag equals 3333. The nonlinear simulation is

[XtYt]=[ϵtYt3ηt].matrixsubscript𝑋𝑡subscript𝑌𝑡matrixsubscriptitalic-ϵ𝑡subscript𝑌𝑡3subscript𝜂𝑡\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}=\begin{bmatrix}\epsilon_{t}Y_{t-3}\\ \eta_{t}\end{bmatrix}.[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t - 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] .

In both simulations, the true optimal dependence lag equals 3333. Figure 4 shows that the proposed method using either DCorr or MGC consistently estimates the optimal dependence lag as the sample size increases, and MGC outperforms DCorr in the nonlinear setting.

Refer to caption
Refer to caption
Figure 4: This figure displays the performance of our proposed method using both MGC and DCorr for estimating the optimal dependence lag L^superscript^𝐿\hat{L}^{*}over^ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in linear and nonlinear relationships. The colored bar above lag l𝑙litalic_l shows the empirical frequency of L^=jsuperscript^𝐿𝑗\hat{L}^{*}=jover^ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_j, with red representing MGC and purple representing DCorr. The probability is estimated based on 100100100100 trials. The first row shows DCorr estimation performance at sample sizes n=15,30,60𝑛153060n=15,30,60italic_n = 15 , 30 , 60 for linear relationships, while the second row shows the MGC performance on the same data. The third row displays DCorr estimation for nonlinear relationships, and the last row presents the same for MGC.

4.3 Multivariate Simulations

In this subsection, we revisit the testing power and dependence lag estimation in both linear and nonlinear settings, maintaining a fixed sample size of n=100𝑛100n=100italic_n = 100 and increasing the dimensionality p𝑝pitalic_p, to evaluate performance for multivariate data.

For testing power evaluation, we use the following multivariate linear setting:

[XtYt]=[0ϕDϕD0][Xt1Yt1]+[ϵtηt],matrixsubscript𝑋𝑡subscript𝑌𝑡matrix0italic-ϕ𝐷italic-ϕ𝐷0matrixsubscript𝑋𝑡1subscript𝑌𝑡1matrixsubscriptitalic-ϵ𝑡subscript𝜂𝑡\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}=\begin{bmatrix}0&\phi D\\ \phi D&0\end{bmatrix}\begin{bmatrix}X_{t-1}\\ Y_{t-1}\end{bmatrix}+\begin{bmatrix}\epsilon_{t}\\ \eta_{t}\end{bmatrix},[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_ϕ italic_D end_CELL end_ROW start_ROW start_CELL italic_ϕ italic_D end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ,

where ϕ=0.65italic-ϕ0.65\phi=0.65italic_ϕ = 0.65, Dp×p𝐷superscript𝑝𝑝D\in\mathbb{R}^{p\times p}italic_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT is a diagonal matrix where the elements are Dii=1/isubscript𝐷𝑖𝑖1𝑖D_{ii}=1/iitalic_D start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = 1 / italic_i, and ϵt,ηtsubscriptitalic-ϵ𝑡subscript𝜂𝑡\epsilon_{t},\eta_{t}italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are standard normal of dimension p𝑝pitalic_p. In a similar manner, we use the following multivariate nonlinear setting:

[XtYt]=[D(ϵtYt1)ηt],matrixsubscript𝑋𝑡subscript𝑌𝑡matrix𝐷direct-productsubscriptitalic-ϵ𝑡subscript𝑌𝑡1subscript𝜂𝑡\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}=\begin{bmatrix}D(\epsilon_{t}\odot Y_{t-1})\\ \eta_{t}\end{bmatrix},[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_D ( italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊙ italic_Y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ,

where direct-product\odot denotes element-wise multiplication. We intentionally design the matrix D𝐷Ditalic_D as a decaying weight, reflecting a meaningful multivariate simulation where additional dimensions contain weaker dependence signals.

Figure 5 illustrates the testing power as dimensionality increases. At a fixed sample size, all testing powers gradually decrease as p𝑝pitalic_p increases. The proposed method using any of MGC, DCorr, or HSIC maintains relatively stable power with slow degradation in the case of linear dependence. The same trend is observed for nonlinear dependence, although the degradation is faster, with the MGC statistic performing the best. It is worth emphasizing that due to the consistent property, if we fix p𝑝pitalic_p and let n𝑛nitalic_n increase, the testing power for our method shall increase to 1111.

Refer to caption
Figure 5: This figure shows the testing power for multivariate simulations, with a constant sample size of n=100𝑛100n=100italic_n = 100 while increasing the dimensionality.

Similarly, we extend the optimal lag estimation into the following two multivariate settings:

[XtYt]matrixsubscript𝑋𝑡subscript𝑌𝑡\displaystyle\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] =[0ϕ1Dϕ1D0][Xt1Yt1]+[0ϕ3Dϕ3D0][Xt3Yt3]+[ϵtηt].absentmatrix0subscriptitalic-ϕ1𝐷subscriptitalic-ϕ1𝐷0matrixsubscript𝑋𝑡1subscript𝑌𝑡1matrix0subscriptitalic-ϕ3𝐷subscriptitalic-ϕ3𝐷0matrixsubscript𝑋𝑡3subscript𝑌𝑡3matrixsubscriptitalic-ϵ𝑡subscript𝜂𝑡\displaystyle=\begin{bmatrix}0&\phi_{1}D\\ \phi_{1}D&0\end{bmatrix}\begin{bmatrix}X_{t-1}\\ Y_{t-1}\end{bmatrix}+\begin{bmatrix}0&\phi_{3}D\\ \phi_{3}D&0\end{bmatrix}\begin{bmatrix}X_{t-3}\\ Y_{t-3}\end{bmatrix}+\begin{bmatrix}\epsilon_{t}\\ \eta_{t}\end{bmatrix}.= [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D end_CELL end_ROW start_ROW start_CELL italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_D end_CELL end_ROW start_ROW start_CELL italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_D end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t - 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t - 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] .
[XtYt]matrixsubscript𝑋𝑡subscript𝑌𝑡\displaystyle\begin{bmatrix}X_{t}\\ Y_{t}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] =[D(ϵtYt3)ηt],absentmatrix𝐷direct-productsubscriptitalic-ϵ𝑡subscript𝑌𝑡3subscript𝜂𝑡\displaystyle=\begin{bmatrix}D(\epsilon_{t}\odot Y_{t-3})\\ \eta_{t}\end{bmatrix},= [ start_ARG start_ROW start_CELL italic_D ( italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊙ italic_Y start_POSTSUBSCRIPT italic_t - 3 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ,

where ϕ1=0.1subscriptitalic-ϕ10.1\phi_{1}=0.1italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.1 and ϕ3=0.65subscriptitalic-ϕ30.65\phi_{3}=0.65italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.65. These settings are similar to those in Section 4.2, with the addition of increasing dimension and D𝐷Ditalic_D. The true optimal lag remains 3333. The estimation accuracy, as shown in Figure 6, demonstrates successful detection at small p𝑝pitalic_p, with accuracy gradually degrading as p𝑝pitalic_p increases.

Refer to caption
Figure 6: This figure shows the accuracy of estimating the true optimal lag in multivariate simulations, maintaining a constant sample size of n=100𝑛100n=100italic_n = 100 while increasing the dimensionality. Note that in the left panel, the lines representing DCorr  and HSIC  overlap due to their almost identical performance.

5 Real Data

5.1 Analyzing Connectivity in the Human Brain

This study is based on data from an individual (Subject ID: 100307) of the Human Connectome Project (HCP), which can be downloaded online333https://www.humanconnectome.org/study/hcp-young-adult/data-releases. The human cortex is parcellated into 180 parcels per hemisphere using the HCP multi-modal parcellation atlas (Glasser et al., 2016). For this study, 22 parcels were selected as regions of interest (ROIs), representing various locations across the cortex. These parcels are denoted as X(1),,X(22)superscript𝑋1superscript𝑋22X^{(1)},\dots,X^{(22)}italic_X start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_X start_POSTSUPERSCRIPT ( 22 ) end_POSTSUPERSCRIPT. Each parcel consists of a contiguous set of vertices whose fMRI signal is projected on the cortical surface. Averaging the vertices within a parcel yields a univariate time series X(u)=(X1(u),,Xn(u))superscript𝑋𝑢subscriptsuperscript𝑋𝑢1subscriptsuperscript𝑋𝑢𝑛X^{(u)}=(X^{(u)}_{1},\dots,X^{(u)}_{n})italic_X start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT = ( italic_X start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUPERSCRIPT ( italic_u ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), where n=1200𝑛1200n=1200italic_n = 1200 in this particular case. The selected ROIs, their parcel number in the HCP multi-modal parcellation (Glasser et al., 2016), and assigned network are listed in Table 1.

ROI ID Network Shorthand Parcel Key Parcel Name
18 Default Mode Network DMN 150 PGi
19 Default Mode Network DMN 65 p32pr
20 Default Mode Network DMN 161 32pd
21 Default Mode Network DMN 132 TE1a
22 Default Mode Network DMN 71 9p
6 Dorsal Attention Network dAtt 96 6a
7 Dorsal Attention Network dAtt 117 API
8 Dorsal Attention Network dAtt 50 MIP
9 Dorsal Attention Network dAtt 143 PGp
10 Ventral Attention Network vAtt 109 MI
11 Ventral Attention Network vAtt 148 PF
12 Ventral Attention Network vAtt 60 p32pr
13 Ventral Attention Network vAtt 38 23c
1 Visual Network Visual 1 V1
2 Visual Network Visual 23 MT
3 Visual Network Visual 18 FFC
16 FrontoParietal Network FP 83 p9-46v
17 FrontoParietal Network FP 149 PFm
14 Limbic Network Limbic 135 TF
15 Limbic Network Limbic 93 OFC
4 Somatomotor Network SM 53 3a
5 Somatomotor Network SM 24 A1
Table 1: This table displays the parcellation information for the parcels used in our analysis. They are listed based on the numeric order they appear in Figure 7.

As the temporal dependence method using MGC performed well in our simulations, we simply use the MGC implementation in this analysis. In the left panel of Figure 7, we present the optimal dependence lag for each interdependency, ranging up to L=10𝐿10L=10italic_L = 10. Meanwhile, the right panel of Figure 7 displays the log-scale p𝑝pitalic_p-values of temporal dependence for each pair of parcels. Generally, we observe strong relationships with small lags within the same region, such as an optimal lag of usually 00 within the "DMN" region with significant p-values. In contrast, inter-region dependencies are less significant and typically exist at longer lags.

Refer to caption
Refer to caption
Figure 7: This displays the results of applying the temporal dependence method using MGC to resting-state fMRI data.

5.2 Discovering Temporal Dependence Structure of Low-Beta Stocks

In this experiment, we apply the proposed methodology to analyze the financial market and uncover interesting nonlinear dependencies between low-beta stocks and the S&P 500. In the US financial market, it is well-known that almost all stocks are linearly related to the broad market. A commonly used statistic to measure this association is the beta value, which quantifies a stock’s volatility relative to the market (S&P 500). Beta is defined as the covariance between an individual stock and the general market, divided by the variance of the general market, and it utilizes the rate of return per month rather than the stock price. A beta less than 1 suggests that the stock is less volatile than the market, while a beta greater than 1 indicates higher volatility.

Low-beta stocks are an interesting concept in investing, defined by having a relatively small beta value, typically around 0.50.50.50.5 or less. These stocks tend to have lower correlations with market movements compared to high-beta stocks and are often associated with companies that have stable earnings, strong cash flows, and less uncertainty about their future prospects. Therefore, low-beta stocks can play a valuable role in a well-diversified investment portfolio by providing stability, reducing risk, and potentially enhancing long-term performance. Moreover, with the right strategy, stocks with low volatility can generate high risk-adjusted returns (Blitz & Vliet, 2007; Frazzini & Pedersen, 2014).

We collected weekly closing stock prices from January 1, 2014, to May 1, 2014, using Yahoo Finance data, for the S&P 500 ETF (the benchmark) and 10 individual stocks, as shown in Figure 8. In addition to NVDA, AAPL, and MSFT, which are mega cap stocks and were included for comparison purposes, the remaining stocks are commonly found in low-beta portfolios. Given the high volatility of daily stock prices, we chose to collect the closing price per week, and process each stock’s weekly price into rates of return to make the data resemble a stationary sequence.

As the stock market is highly related and linear relationships are dominant among stocks, we chose to use Pearson correlation and distance correlation as our choice of dependence measures. For each choice, we computed the cross-dependence measures between each individual stock and the S&P 500 from lag 00 to 4444. Subsequently, we computed the aggregated temporal statistic, followed by optimal lag estimation and p-value computation using block permutations. The sample size is n=538𝑛538n=538italic_n = 538, and the number of blocks is 20202020. Therefore, our aim is to test the existence of temporal dependence between each individual stock and the general market, from concurrent testing to a lag of up to 1111 month.

For each individual stock, the aggregated test always yielded a significant result, with a p-value of 00 and an optimal lag of 00 in every case. This indicates that all individual stocks are dependent on the general market, with the strongest dependence observed at concurrent (lag 0) intervals. This outcome is not surprising, as even a beta of 0.30.30.30.3 readily implies a significant linear relationship at lag 00.

Next, we examine the cross-lag dependence structure and their individual p-values in block-permutation, as reported in Figure 9. We first consider the Pearson correlation: the lag 0 concurrent testing statistic is very similar to the beta value of each stock, all of which have significant p-values of 0. On the other hand, all other lags have insignificant p-values, suggesting there is no linear relationship beyond lag 0. Namely, the S&P 500 price this week has no linear effect on itself next week or any individual stock next week, e.g., a high return of +2%percent2+2\%+ 2 % this week does not always imply another +2%percent2+2\%+ 2 % next week.

Inspecting the distance correlation measure yields interesting new insights: the general market, the three mega-cap stocks, and a few low-beta stocks are actually dependent on the general market at lag 1 and beyond. Since the Pearson correlation indicates the lack of a linear relationship, this dependence must be nonlinear, which is weakened as the lag increases. This insight aligns with empirical experience: for example, a high return of +2%percent2+2\%+ 2 % this week may indicate that the next week will be volatile, potentially resulting in another week of high return or a significant pullback from the highs. Such dependence constitutes a nonlinear association, while the linear association could be almost 0. This type of volatility is often utilized in option trading and holds promise for future applications.

Finally, Figure 9 also reveal that some low-beta stocks, such as T, JNJ, WMT, and LLY, exhibit independence from the general market beyond the concurrent lag 0. This insight suggests that these stocks could be ideal candidates for a portfolio that offers temporal independence, rather than just a lack of linear relationship, from the general market.

Refer to caption
Figure 8: This figure shows the selected stocks, their current beta, and their rate of return over 10 years on a weekly basis.
Refer to caption
Figure 9: This figure shows the cross-correlation between each individual stock and the S&P 500, using Pearson correlation at the top and distance correlation at the bottom, for lags from 0 to 4, along with the corresponding p-values. Significant correlations, where larger values are better, are marked in green, while significant p-values, where smaller values are better, are marked in red.

6 Conclusion

This paper introduces a new independence testing procedure for temporal data. The method combined the strengths of nonparametric dependence measures, the specialized cross-lag statistic for time series, and the block permutation procedure. As a result, it provides an asymptotically valid and universally consistent approach with outstanding numerical performance. While the exposition of this manuscript is focused on time series data, this work marks an important step in extending independence testing to structural data beyond the realm of standard i.i.d. data, making them more attractive and broadly applicable.

There are several avenues for future research that warrant exploration. Firstly, although we have demonstrated the asymptotic validity of the block permutation test, its computational efficiency remains a challenge when dealing with large sample sizes. Recent studies (Zhang et al., 2018; Shen et al., 2022) have investigated faster testing procedures by approximating the null distribution of distance and kernel correlations under the standard i.i.d. setting. Extending such approaches to structural data could significantly enhance computational scalability.

Secondly, dependence measures are commonly employed in dimension reduction techniques, such as screening (Fan & Lv, 2008; Li et al., 2012), especially in high-dimensional data settings. However, little attention has been given to the temporal domain. While it is straightforward to utilize dependence measures for dimension reduction in multivariate time series, delving into their theoretical properties and their relationships with other standard tools, such as independence component analysis, could provide valuable insights.

Thirdly, causal inference in time series data is an important task (Haufe et al., 2010; Winkler et al., 2016). While it is widely recognized that correlation does not imply causality, recent research has demonstrated the utility of dependence and conditional dependence tests in causal inference (Cai et al., 2022; Laumann et al., 2023). Therefore, extending this framework to encompass conditional independence and causal inference may significantly advance the understanding of causal inference in time series data.

Acknowledgment

This work was supported by the National Institutes of Health award RF1MH128696 and RO1MH120482, the National Science Foundation award DMS-1921310 and DMS-2113099, and the Defense Advanced Research Projects Agency (DARPA) Lifelong Learning Machines program through contract FA8650-18-2-7834. The authors would like to thank Sambit Panda, Hayden Helm, Benjamin Pedigo, and Bijan Varjavand for their help and discussions in preparation of the paper. The authors also extend thanks to the action editor for the expert handling of the manuscript and to the anonymous reviewers for their valuable suggestions to improve the paper.

References

  • Bießmann et al. (2010) F. Bießmann, F. C. Meinecke, A. Gretton, A. Rauch, G. Rainer, N. K. Logothetis, and K. Müller. Temporal kernel CCA and its application in multimodal neuronal data analysis. Machine Learning, 79:5–27, 2010.
  • Blitz & Vliet (2007) D. Blitz and P. Vliet. The volatility effect: Lower risk without lower return. The Journal of Portfolio Management, 34(1):12–17, 2007.
  • Box et al. (2015) G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung. Time Series Analysis: Forecasting and Control. John Wiley & Sons, 2015.
  • Cai et al. (2022) Z. Cai, R. Li, and Y. Zhang. A distribution free conditional independence test with applications to causal discovery. Journal of Machine Learning Research, 23:1–41, 2022.
  • Chatterjee (2021) S. Chatterjee. A new coefficient of correlation. Journal of the American Statistical Association, 116(536):2009–2022, 2021.
  • Chwialkowski & Gretton (2014) K. Chwialkowski and A. Gretton. A kernel independence test for random processes. In 31st International Conference on Machine Learning, pp. 1422–1430, 2014.
  • Chwialkowski et al. (2014) K. Chwialkowski, D. Sejdinovic, and A. Gretton. A wild bootstrap for degenerate kernel tests. In Advances in neural information processing systems, pp. 3608–3616, 2014.
  • Cleveland et al. (1990) R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning. Stl: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1):3–73, 1990.
  • DiCiccio & Romano (2017) C. J. DiCiccio and J. P. Romano. Robust permutation tests for correlation and regression coefficients. Journal of the American Statistical Association, 112(519):1211–1220, 2017.
  • Enders (2010) W. Enders. Applied Econometric Time Series. John Wiley & Sons, 2010.
  • Fan & Lv (2008) J. Fan and J. Lv. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B, 70(5):849–911, 2008.
  • Frazzini & Pedersen (2014) A. Frazzini and L. Pedersen. Betting against beta. Journal of Financial Economics, 111(1):1–25, January 2014.
  • Fukumizu et al. (2007) K. Fukumizu, F. R. Bach, and A. Gretton. Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8:361–383, 2007.
  • Glasser et al. (2016) M. Glasser, T. Coalson, E. Robinson, C. Hacker, J. Harwell, E. Yacoub, K. Uğurbil, J. Andersson, C. Beckmann, M. Jenkinson, S. Smith, and D. Van Essen. A multi-modal parcellation of human cerebral cortex. Nature, 536:171–178, 2016.
  • Good (2005) P. Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer, 2005.
  • Gretton & Gyorfi (2010) A. Gretton and L. Gyorfi. Consistent nonparametric tests of independence. Journal of Machine Learning Research, 11:1391–1423, 2010.
  • Gretton et al. (2005) A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Scholkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6:2075–2129, 2005.
  • Gretton et al. (2012) A. Gretton, K. Borgwardt, M. Rasch, B. Scholkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723–773, 2012.
  • Guillot & Rousset (2013) G. Guillot and F. Rousset. Dismantling the mantel tests. Methods in Ecology and Evolution, 4(4):336–344, 2013.
  • Hastie et al. (2009) T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media, 2009.
  • Haufe et al. (2010) S. Haufe, K. Müller, G. Nolte, and N. Krämer. Sparse causal discovery in multivariate time series. In Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008, volume 6, pp.  97–106, 2010.
  • Heller et al. (2013) R. Heller, Y. Heller, and M. Gorfine. A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2):503–510, 2013.
  • Heller et al. (2016) R. Heller, Y. Heller, S. Kaufman, B. Brill, and M. Gorfine. Consistent distribution-free k𝑘kitalic_k-sample and independence tests for univariate random variables. Journal of Machine Learning Research, 17(29):1–54, 2016.
  • Huang & Huo (2022) C. Huang and X. Huo. A statistically and numerically efficient independence test based on random projections and distance covariance. Frontiers in Applied Mathematics and Statistics, 7:779841, 2022.
  • Laumann et al. (2023) F. Laumann, J. Kügelgen, J. Park, B. Schölkopf, and M. Barahona. Kernel-based independence tests for causal structure learning on functional data. Entropy, 25(12):1597, 2023.
  • Lee et al. (2019) Y. Lee, C. Shen, C. E. Priebe, and J. T. Vogelstein. Network dependence testing via diffusion maps and distance-based correlations. Biometrika, 106(4):857–873, 2019.
  • Li et al. (2012) R. Li, W. Zhong, and L. Zhu. Feature screening via distance correlation learning. Journal of American Statistical Association, 107:1129–1139, 2012.
  • Ljung & Box (1978) G. M. Ljung and G. E. P. Box. On a measure of a lack of fit in time series models. Biometrika, 65(2):297–303, 1978.
  • McDonald et al. (2011) D. McDonald, C. Shalizi, and M. Schervish. Estimating beta-mixing coefficients. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pp.  516–524, 2011.
  • Pan et al. (2020) W. Pan, X. Wang, H. Zhang, H. Zhu, and J. Zhu. Ball covariance: A generic measure of dependence in banach space. Journal of the American Statistical Association, 115(529):307–317, 2020.
  • Pham & Tran (1985) T. D. Pham and L. T. Tran. Some mixing properties of time series models. Stochastic Processes and their Applications, 19(2):297–303, 1985.
  • Politis (2003) D. Politis. The impact of bootstrap methods on time series analysis. Statistical Science, 18(2):219–230, 2003.
  • Sejdinovic et al. (2013) D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu. Equivalence of distance-based and rkhs-based statistics in hypothesis testing. Annals of Statistics, 41(5):2263–2291, 2013.
  • Shen & Dong (2024) C. Shen and Y. Dong. High-dimensional independence testing via maximum and average distance correlations. arXiv preprint arXiv:2001.01095, 2024.
  • Shen & Vogelstein (2021) C. Shen and J. T. Vogelstein. The exact equivalence of distance and kernel methods in hypothesis testing. AStA Advances in Statistical Analysis, 105(3):385–403, 2021.
  • Shen et al. (2020) C. Shen, C. E. Priebe, and J. T. Vogelstein. From distance correlation to multiscale graph correlation. Journal of the American Statistical Association, 115(529):280–291, 2020.
  • Shen et al. (2022) C. Shen, S. Panda, and J. T. Vogelstein. The chi-square test of distance correlation. Journal of Computational and Graphical Statistics, 31(1):254–262, 2022.
  • Shi et al. (2021) H. Shi, M. Drton, and F. Han. On azadkia-chatterjee’s conditional dependence coefficient, 2021.
  • Shi et al. (2022) H. Shi, M. Drton, and F. Han. On the power of chatterjee’s rank correlation. Biometrika, 109(2):317–333, 2022.
  • Shumway & Stoffer (2010) R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications: With R Examples. Springer Science & Business Media, 2010.
  • Szekely & Rizzo (2009) G. Szekely and M. Rizzo. Brownian distance covariance. Annals of Applied Statistics, 3(4):1233–1303, 2009.
  • Szekely & Rizzo (2014) G. Szekely and M. Rizzo. Partial distance correlation with methods for dissimilarities. Annals of Statistics, 42(6):2382–2412, 2014.
  • Szekely et al. (2007) G. Szekely, M. Rizzo, and N. Bakirov. Measuring and testing independence by correlation of distances. Annals of Statistics, 35(6):2769–2794, 2007.
  • Vogelstein et al. (2019) J. T. Vogelstein, Q. Wang, E. Bridgeford, C. E. Priebe, M. Maggioni, and C. Shen. Discovering and deciphering relationships across disparate data modalities. eLife, 8:e41690, 2019.
  • Wang et al. (2021) G. Wang, W. Li, and K. Zhu. New HSIC-based tests for independence between two stationary multivariate time series. Statistica Sinica, 31(1):269–300, 2021.
  • Winkler et al. (2016) I. Winkler, D. Panknin, D. Bartz, K. Mülle, and S. Haufe. Validity of time reversal for testing granger causality. IEEE Transactions on Signal Processing, 64(11):2746–2760, 2016.
  • Xu et al. (2024) K. Xu, Y. Zhou, L. Zhu, and R. Li. Reducing multivariate independence testing to two bivariate means comparisons. arXiv preprint arXiv:2402.16053, 2024.
  • Zhang et al. (2018) Q. Zhang, S. Filippi, A. Gretton, and D. Sejdinovic. Large-scale kernel methods for independence testing. Statistics and Computing, 28(1):113–130, 2018.
  • Zhou et al. (2024) Y. Zhou, K. Xu, L. Zhu, and R. Li. Rank-based indices for testing independence between two high-dimensional vectors. The Annals of Statistics, 52(1):184–206, 2024.
  • Zhu et al. (2020) C. Zhu, S. Yao, X. Zhang, and X. Shao. Distance-based and rkhs-based dependence metrics in high dimension. The Annals of Statistics, 48(6):3366–3394, 2020.
  • Zhu et al. (2017) L. Zhu, K. Xu, R. Li, and W. Zhong. Projection correlation between two random vectors. Biometrika, 104(4):829–843, 2017.
  • Ziemann & Tu (2022) I. Ziemann and S. Tu. Learning with little mixing. In Advances in Neural Information Processing Systems, volume 35, pp.  4626–4637, 2022.

APPENDIX

Appendix A Assumptions

We begin by revisiting the theoretical assumptions listed in the main paper:

  • The observed data {(Xt,Yt)}t=1nsuperscriptsubscriptsubscript𝑋𝑡subscript𝑌𝑡𝑡1𝑛\{(X_{t},Y_{t})\}_{t=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is strictly stationary, non-constant, and the underlying distribution FXYlsubscript𝐹𝑋subscript𝑌𝑙F_{XY_{-l}}italic_F start_POSTSUBSCRIPT italic_X italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT has finite moments for any lag l0𝑙0l\geq 0italic_l ≥ 0.

  • There exists a maximum dependence lag M𝑀Mitalic_M such that for all lM𝑙𝑀l\geq Mitalic_l ≥ italic_M, the two time series are almost independent for large n𝑛nitalic_n, so are each time series within itself:

    sup|FXYlFXFY|supremumsubscript𝐹𝑋subscript𝑌𝑙subscript𝐹𝑋subscript𝐹𝑌\displaystyle\sup|F_{XY_{-l}}-F_{X}F_{Y}|roman_sup | italic_F start_POSTSUBSCRIPT italic_X italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT | =O(1n),absent𝑂1𝑛\displaystyle=O(\frac{1}{n}),= italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,
    sup|FXXlFXFX|supremumsubscript𝐹𝑋subscript𝑋𝑙subscript𝐹𝑋subscript𝐹𝑋\displaystyle\sup|F_{XX_{-l}}-F_{X}F_{X}|roman_sup | italic_F start_POSTSUBSCRIPT italic_X italic_X start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT | =O(1n),absent𝑂1𝑛\displaystyle=O(\frac{1}{n}),= italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,
    sup|FYYlFYFY|supremumsubscript𝐹𝑌subscript𝑌𝑙subscript𝐹𝑌subscript𝐹𝑌\displaystyle\sup|F_{YY_{-l}}-F_{Y}F_{Y}|roman_sup | italic_F start_POSTSUBSCRIPT italic_Y italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT | =O(1n).absent𝑂1𝑛\displaystyle=O(\frac{1}{n}).= italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) .
  • The maximum dependence lag M𝑀Mitalic_M and the maximum lag under consideration L𝐿Litalic_L are non-negative integers that satisfies LM𝐿𝑀L\geq Mitalic_L ≥ italic_M and L=o(n)𝐿𝑜𝑛L=o(n)italic_L = italic_o ( italic_n ), i.e., they may increase together with n𝑛nitalic_n but at a slower pace.

  • As the sample size n𝑛nitalic_n increases, both the number of blocks B𝐵Bitalic_B and the number of observations per block nB𝑛𝐵\frac{n}{B}divide start_ARG italic_n end_ARG start_ARG italic_B end_ARG increase to infinity. Moreover, nBM𝑛𝐵𝑀\frac{n}{B}\geq Mdivide start_ARG italic_n end_ARG start_ARG italic_B end_ARG ≥ italic_M for sufficiently large n𝑛nitalic_n.

  • The sample dependence measure has the following form:

    τn(X,Y)subscript𝜏𝑛𝑋𝑌\displaystyle\tau_{n}(\vec{X},\vec{Y})italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) =i=1nj=1nγn(i,j)n2,absentsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑛subscript𝛾𝑛𝑖𝑗superscript𝑛2\displaystyle=\frac{\sum_{i=1}^{n}\sum_{j=1}^{n}\gamma_{n}(i,j)}{n^{2}},= divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i , italic_j ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

    where each γn(i,j)subscript𝛾𝑛𝑖𝑗\gamma_{n}(i,j)italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i , italic_j ) is a function of (Xi,Xj,Yi,Yj)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑖subscript𝑌𝑗(X_{i},X_{j},Y_{i},Y_{j})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), and remaining sample pairs may also be used but with a weight of O(1/n)𝑂1𝑛O(1/n)italic_O ( 1 / italic_n ).

  • In the standard i.i.d. setting where (X1,Y1),(X2,Y2),,(Xn,Yn)i.i.d.FXY(X_{1},Y_{1}),(X_{2},Y_{2}),\ldots,(X_{n},Y_{n})\stackrel{{\scriptstyle i.i.d.% }}{{\sim}}F_{XY}( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG italic_i . italic_i . italic_d . end_ARG end_RELOP italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT, there exists a population statistic τ(X,Y)𝜏𝑋𝑌\tau(X,Y)italic_τ ( italic_X , italic_Y ) defined solely based on the joint distribution FXYsubscript𝐹𝑋𝑌F_{XY}italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT. Each term in the sample statistic satisfies:

    𝔼(γn(i,j))𝔼subscript𝛾𝑛𝑖𝑗\displaystyle\mathbb{E}(\gamma_{n}(i,j))blackboard_E ( italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i , italic_j ) ) =τ(X,Y)+o(1).absent𝜏𝑋𝑌𝑜1\displaystyle=\tau(X,Y)+o(1).= italic_τ ( italic_X , italic_Y ) + italic_o ( 1 ) .

    Moreover, the population statistic τ(X,Y)𝜏𝑋𝑌\tau(X,Y)italic_τ ( italic_X , italic_Y ) is non-negative and equals 00 if and only if X𝑋Xitalic_X and Y𝑌Yitalic_Y are independent, i.e., FXY=FXFYsubscript𝐹𝑋𝑌subscript𝐹𝑋subscript𝐹𝑌F_{XY}=F_{X}F_{Y}italic_F start_POSTSUBSCRIPT italic_X italic_Y end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT.

Appendix B Theorem Proofs

Theorem 1.

The cross dependence sample statistic satisfies:

EE(τn(X,Yl))τ(X,Yl)=o(1),𝐸𝐸subscript𝜏𝑛𝑋subscript𝑌𝑙𝜏𝑋subscript𝑌𝑙𝑜1\displaystyle EE(\tau_{n}(\vec{X},\vec{Y}_{-l}))-\tau(X,Y_{-l})=o(1),italic_E italic_E ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) ) - italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) = italic_o ( 1 ) ,
Var(τn(X,Yl))=O(1nl).Varsubscript𝜏𝑛𝑋subscript𝑌𝑙𝑂1𝑛𝑙\displaystyle\text{Var}(\tau_{n}(\vec{X},\vec{Y}_{-l}))=O(\frac{1}{n-l}).Var ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) ) = italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n - italic_l end_ARG ) .

Therefore, for each l{0,,L}𝑙0𝐿l\in\{0,...,L\}italic_l ∈ { 0 , … , italic_L }, we have

τn(X,Yl)nτ(X,Yl)superscript𝑛subscript𝜏𝑛𝑋subscript𝑌𝑙𝜏𝑋subscript𝑌𝑙\displaystyle\tau_{n}(\vec{X},\vec{Y}_{-l})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}\tau(X,Y_{-l})italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT )

in probability.

Proof.

First, applying the assumptions on the dependence measure to the cross dependence statistics yields:

τn(X,Yl)subscript𝜏𝑛𝑋subscript𝑌𝑙\displaystyle\tau_{n}(\vec{X},\vec{Y}_{-l})italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) =i=l+1nj=l+1nγnl(i,j)(nl)2,absentsuperscriptsubscript𝑖𝑙1𝑛superscriptsubscript𝑗𝑙1𝑛subscript𝛾𝑛𝑙𝑖𝑗superscript𝑛𝑙2\displaystyle=\frac{\sum_{i=l+1}^{n}\sum_{j=l+1}^{n}\gamma_{n-l}(i,j)}{(n-l)^{% 2}},= divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_i , italic_j ) end_ARG start_ARG ( italic_n - italic_l ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,
𝔼(γnl(i,j))𝔼subscript𝛾𝑛𝑙𝑖𝑗\displaystyle\mathbb{E}(\gamma_{n-l}(i,j))blackboard_E ( italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_i , italic_j ) ) =τ(X,Yl)+o(1).absent𝜏𝑋subscript𝑌𝑙𝑜1\displaystyle=\tau(X,Y_{-l})+o(1).= italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) + italic_o ( 1 ) .

Here, each γnl(i,j)subscript𝛾𝑛𝑙𝑖𝑗\gamma_{n-l}(i,j)italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_i , italic_j ) is a function of (Xi,Xj,Yil,Yjl)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑖𝑙subscript𝑌𝑗𝑙(X_{i},X_{j},Y_{i-l},Y_{j-l})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i - italic_l end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_j - italic_l end_POSTSUBSCRIPT ), and remaining sample pairs like (Xu,Xv,Yw,Yz)subscript𝑋𝑢subscript𝑋𝑣subscript𝑌𝑤subscript𝑌𝑧(X_{u},X_{v},Y_{w},Y_{z})( italic_X start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) may also be used but with a weight of O(1/n)𝑂1𝑛O(1/n)italic_O ( 1 / italic_n ).

As expectations are additive, it immediately follows that

𝔼(τn(X,Yl))𝔼subscript𝜏𝑛𝑋subscript𝑌𝑙\displaystyle\mathbb{E}(\tau_{n}(\vec{X},\vec{Y}_{-l}))blackboard_E ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) ) =i=l+1nj=l+1n𝔼(γnl(i,j))(nl)2absentsuperscriptsubscript𝑖𝑙1𝑛superscriptsubscript𝑗𝑙1𝑛𝔼subscript𝛾𝑛𝑙𝑖𝑗superscript𝑛𝑙2\displaystyle=\frac{\sum_{i=l+1}^{n}\sum_{j=l+1}^{n}\mathbb{E}(\gamma_{n-l}(i,% j))}{(n-l)^{2}}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E ( italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_i , italic_j ) ) end_ARG start_ARG ( italic_n - italic_l ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=i=l+1nj=l+1n{τ(X,Yl)+o(1)}(nl)2absentsuperscriptsubscript𝑖𝑙1𝑛superscriptsubscript𝑗𝑙1𝑛𝜏𝑋subscript𝑌𝑙𝑜1superscript𝑛𝑙2\displaystyle=\frac{\sum_{i=l+1}^{n}\sum_{j=l+1}^{n}\{\tau(X,Y_{-l})+o(1)\}}{(% n-l)^{2}}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) + italic_o ( 1 ) } end_ARG start_ARG ( italic_n - italic_l ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=τ(X,Yl)+o(1).absent𝜏𝑋subscript𝑌𝑙𝑜1\displaystyle=\tau(X,Y_{-l})+o(1).= italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) + italic_o ( 1 ) .

Next, the variance equals

Var(τn(X,Yl))Varsubscript𝜏𝑛𝑋subscript𝑌𝑙\displaystyle\text{Var}(\tau_{n}(\vec{X},\vec{Y}_{-l}))Var ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) ) =Cov(i=l+1nj=l+1nγnl(i,j),u=l+1nv=l+1nγnl(u,v))(nl)4.absent𝐶𝑜𝑣superscriptsubscript𝑖𝑙1𝑛superscriptsubscript𝑗𝑙1𝑛subscript𝛾𝑛𝑙𝑖𝑗superscriptsubscript𝑢𝑙1𝑛superscriptsubscript𝑣𝑙1𝑛subscript𝛾𝑛𝑙𝑢𝑣superscript𝑛𝑙4\displaystyle=\frac{Cov(\sum_{i=l+1}^{n}\sum_{j=l+1}^{n}\gamma_{n-l}(i,j),\sum% _{u=l+1}^{n}\sum_{v=l+1}^{n}\gamma_{n-l}(u,v))}{(n-l)^{4}}.= divide start_ARG italic_C italic_o italic_v ( ∑ start_POSTSUBSCRIPT italic_i = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_i , italic_j ) , ∑ start_POSTSUBSCRIPT italic_u = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_v = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_u , italic_v ) ) end_ARG start_ARG ( italic_n - italic_l ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG .

Therefore, it suffices to consider each covariance term Cov(γnl(i,j),γnl(u,v))𝐶𝑜𝑣subscript𝛾𝑛𝑙𝑖𝑗subscript𝛾𝑛𝑙𝑢𝑣Cov(\gamma_{n-l}(i,j),\gamma_{n-l}(u,v))italic_C italic_o italic_v ( italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_i , italic_j ) , italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_u , italic_v ) ), and there are (nl)4superscript𝑛𝑙4(n-l)^{4}( italic_n - italic_l ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT such terms.

When both |ui|>M𝑢𝑖𝑀|u-i|>M| italic_u - italic_i | > italic_M and |vj|>M𝑣𝑗𝑀|v-j|>M| italic_v - italic_j | > italic_M, the maximum dependent lag possible, we have

Cov(γnl(i,j),γnl(u,v))=O(1nl).𝐶𝑜𝑣subscript𝛾𝑛𝑙𝑖𝑗subscript𝛾𝑛𝑙𝑢𝑣𝑂1𝑛𝑙\displaystyle Cov(\gamma_{n-l}(i,j),\gamma_{n-l}(u,v))=O(\frac{1}{n-l}).italic_C italic_o italic_v ( italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_i , italic_j ) , italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_u , italic_v ) ) = italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n - italic_l end_ARG ) .

Otherwise it is

Cov(γnl(i,j),γnl(u,v))=O(1).𝐶𝑜𝑣subscript𝛾𝑛𝑙𝑖𝑗subscript𝛾𝑛𝑙𝑢𝑣𝑂1\displaystyle Cov(\gamma_{n-l}(i,j),\gamma_{n-l}(u,v))=O(1).italic_C italic_o italic_v ( italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_i , italic_j ) , italic_γ start_POSTSUBSCRIPT italic_n - italic_l end_POSTSUBSCRIPT ( italic_u , italic_v ) ) = italic_O ( 1 ) .

There are a total of O((nl)2(nM)2)𝑂superscript𝑛𝑙2superscript𝑛𝑀2O((n-l)^{2}(n-M)^{2})italic_O ( ( italic_n - italic_l ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n - italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) covariance terms of magnitude O(1nl)𝑂1𝑛𝑙O(\frac{1}{n-l})italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n - italic_l end_ARG ), while the remaining O((nl)3)𝑂superscript𝑛𝑙3O((n-l)^{3})italic_O ( ( italic_n - italic_l ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) covariance terms are of magnitude O(1)𝑂1O(1)italic_O ( 1 ). Consequently, as M=o(n)𝑀𝑜𝑛M=o(n)italic_M = italic_o ( italic_n ), we have

Var(τn(X,Yl))Varsubscript𝜏𝑛𝑋subscript𝑌𝑙\displaystyle\text{Var}(\tau_{n}(\vec{X},\vec{Y}_{-l}))Var ( italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) ) =O((nl)2(nM)2)O(1nl)+O((nl)3)O(1)(nl)4absent𝑂superscript𝑛𝑙2superscript𝑛𝑀2𝑂1𝑛𝑙𝑂superscript𝑛𝑙3𝑂1superscript𝑛𝑙4\displaystyle=\frac{O((n-l)^{2}(n-M)^{2})*O(\frac{1}{n-l})+O((n-l)^{3})*O(1)}{% (n-l)^{4}}= divide start_ARG italic_O ( ( italic_n - italic_l ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n - italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∗ italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n - italic_l end_ARG ) + italic_O ( ( italic_n - italic_l ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ∗ italic_O ( 1 ) end_ARG start_ARG ( italic_n - italic_l ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG
=O(1nl),absent𝑂1𝑛𝑙\displaystyle=O(\frac{1}{n-l}),= italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n - italic_l end_ARG ) ,

which converges to 00 as n𝑛nitalic_n increases.

With the expectation converging to the population statistic and the variance approaching 00, we can conclude that

τn(X,Yl)nτ(X,Yl)superscript𝑛subscript𝜏𝑛𝑋subscript𝑌𝑙𝜏𝑋subscript𝑌𝑙\displaystyle\tau_{n}(\vec{X},\vec{Y}_{-l})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}\tau(X,Y_{-l})italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT )

in probability. ∎

Theorem 2.

The temporal dependence sample statistic satisfies:

Tn(X,Y)nl=0Lτ(X,Yl).superscript𝑛subscriptT𝑛𝑋𝑌superscriptsubscript𝑙0𝐿𝜏𝑋subscript𝑌𝑙\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}\sum_{l=0}^{L}\tau(X,Y_{-l}).roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) .

The estimated optimal dependence lag satisfies:

L^nargmaxl[0,L]τ(X,Yl).superscript𝑛superscript^𝐿subscript𝑙0𝐿𝜏𝑋subscript𝑌𝑙\displaystyle\hat{L}^{*}\stackrel{{\scriptstyle n\rightarrow\infty}}{{% \rightarrow}}\arg\max_{l\in[0,L]}\tau(X,Y_{-l}).over^ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP roman_arg roman_max start_POSTSUBSCRIPT italic_l ∈ [ 0 , italic_L ] end_POSTSUBSCRIPT italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) .
Proof.

By Theorem 1, each τn(X,Yl)subscript𝜏𝑛𝑋subscript𝑌𝑙\tau_{n}(\vec{X},\vec{Y}_{-l})italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) converges to τ(X,Yl)𝜏𝑋subscript𝑌𝑙\tau(X,Y_{-l})italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) with a variance of O(1nl)𝑂1𝑛𝑙O(\frac{1}{n-l})italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n - italic_l end_ARG ).

Recall the definition of the temporal dependence statistic Tn(X,Y)subscriptT𝑛𝑋𝑌\mathrm{T}_{n}(\vec{X},\vec{Y})roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) as

Tn(X,Y)subscriptT𝑛𝑋𝑌\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) =l=0L(nln)τn(X,Yl).absentsuperscriptsubscript𝑙0𝐿𝑛𝑙𝑛subscript𝜏𝑛𝑋subscript𝑌𝑙\displaystyle=\sum_{l=0}^{L}\left(\frac{n-l}{n}\right)\cdot\tau_{n}(\vec{X},% \vec{Y}_{-l}).= ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( divide start_ARG italic_n - italic_l end_ARG start_ARG italic_n end_ARG ) ⋅ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) .

Then the expectation satisfies

𝔼(Tn(X,Y))𝔼subscriptT𝑛𝑋𝑌\displaystyle\mathbb{E}(\mathrm{T}_{n}(\vec{X},\vec{Y}))blackboard_E ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) ) =l=0L(nln)τ(X,Yl)+o(L)absentsuperscriptsubscript𝑙0𝐿𝑛𝑙𝑛𝜏𝑋subscript𝑌𝑙𝑜𝐿\displaystyle=\sum_{l=0}^{L}\left(\frac{n-l}{n}\right)\cdot\tau(X,Y_{-l})+o(L)= ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( divide start_ARG italic_n - italic_l end_ARG start_ARG italic_n end_ARG ) ⋅ italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) + italic_o ( italic_L )
nl=0Lτn(X,Yl)superscript𝑛absentsuperscriptsubscript𝑙0𝐿subscript𝜏𝑛𝑋subscript𝑌𝑙\displaystyle\stackrel{{\scriptstyle n\rightarrow\infty}}{{\rightarrow}}\sum_{% l=0}^{L}\tau_{n}(\vec{X},\vec{Y}_{-l})start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT )

by noting that L𝐿Litalic_L is fixed and the weight nln𝑛𝑙𝑛\frac{n-l}{n}divide start_ARG italic_n - italic_l end_ARG start_ARG italic_n end_ARG converges to 1111. Moreover, the variance satisfies

Var(Tn(X,Y))=O(L2nL),𝑉𝑎𝑟subscriptT𝑛𝑋𝑌𝑂superscript𝐿2𝑛𝐿\displaystyle Var(\mathrm{T}_{n}(\vec{X},\vec{Y}))=O(\frac{L^{2}}{n-L}),italic_V italic_a italic_r ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) ) = italic_O ( divide start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n - italic_L end_ARG ) ,

which also converges to 00. Consequently,

Tn(X,Y)nl=0Lτ(X,Yl)superscript𝑛subscriptT𝑛𝑋𝑌superscriptsubscript𝑙0𝐿𝜏𝑋subscript𝑌𝑙\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}\sum_{l=0}^{L}\tau(X,Y_{-l})roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT )

in probability.

Similarly, the estimated optimal dependence lag satisfies

L^superscript^𝐿\displaystyle\hat{L}^{*}over^ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =argmaxl[0,L](nln)τn(X,Yl)absentsubscript𝑙0𝐿𝑛𝑙𝑛subscript𝜏𝑛𝑋subscript𝑌𝑙\displaystyle=\arg\max_{l\in[0,L]}\left(\frac{n-l}{n}\right)\cdot\tau_{n}(\vec% {X},\vec{Y}_{-l})= roman_arg roman_max start_POSTSUBSCRIPT italic_l ∈ [ 0 , italic_L ] end_POSTSUBSCRIPT ( divide start_ARG italic_n - italic_l end_ARG start_ARG italic_n end_ARG ) ⋅ italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT )
nargmaxl[0,L]τ(X,Yl)superscript𝑛absentsubscript𝑙0𝐿𝜏𝑋subscript𝑌𝑙\displaystyle\stackrel{{\scriptstyle n\rightarrow\infty}}{{\rightarrow}}\arg% \max_{l\in[0,L]}\tau(X,Y_{-l})start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP roman_arg roman_max start_POSTSUBSCRIPT italic_l ∈ [ 0 , italic_L ] end_POSTSUBSCRIPT italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT )

in probability. ∎

Theorem 3 (Asymptotic Validity).

Under the null hypothesis that X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG are independent for all lags l[0,L]𝑙0𝐿l\in[0,L]italic_l ∈ [ 0 , italic_L ], the test statistic satisfies:

Tn(X,Y)n0.superscript𝑛subscriptT𝑛𝑋𝑌0\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}0.roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP 0 .

Moreover, the block-permutation test is asymptotically valid, i.e.,

Prob(Tn(X,Y)zn,α)nα.superscript𝑛𝑃𝑟𝑜𝑏subscriptT𝑛𝑋𝑌subscript𝑧𝑛𝛼𝛼\displaystyle Prob(\mathrm{T}_{n}(\vec{X},\vec{Y})\geq z_{n,\alpha})\stackrel{% {\scriptstyle n\rightarrow\infty}}{{\rightarrow}}\alpha.italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) ≥ italic_z start_POSTSUBSCRIPT italic_n , italic_α end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP italic_α .
Proof.

By Theorem 2, we have

Tn(X,Y)nl=0Lτ(X,Yl).superscript𝑛subscriptT𝑛𝑋𝑌superscriptsubscript𝑙0𝐿𝜏𝑋subscript𝑌𝑙\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}\sum_{l=0}^{L}\tau(X,Y_{-l}).roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) .

From the assumption of the population measure, when Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are independent for all lags l[0,L]𝑙0𝐿l\in[0,L]italic_l ∈ [ 0 , italic_L ], we must have

τ(X,Yl)=0𝜏𝑋subscript𝑌𝑙0\displaystyle\tau(X,Y_{-l})=0italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) = 0

for all l[0,L]𝑙0𝐿l\in[0,L]italic_l ∈ [ 0 , italic_L ]. As a result,

Tn(X,Y)n0superscript𝑛subscriptT𝑛𝑋𝑌0\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}0roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP 0

To establish the asymptotic validity of the block permutation test, it suffices to prove that when X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG are independent, we have:

sup|FTn(X,Y)FTnb|n0.superscript𝑛supsubscript𝐹subscript𝑇𝑛𝑋𝑌subscript𝐹superscriptsubscript𝑇𝑛𝑏0\displaystyle\mbox{sup}|F_{T_{n}(\vec{X},\vec{Y})}-F_{T_{n}^{b}}|\stackrel{{% \scriptstyle n\rightarrow\infty}}{{\rightarrow}}0.sup | italic_F start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP 0 .

In other words, if the true null distribution and the permuted distribution is asymptotically the same, then it follows that under the null hypothesis:

Prob(Tn(X,Y)zn,α)nα.superscript𝑛𝑃𝑟𝑜𝑏subscriptT𝑛𝑋𝑌subscript𝑧𝑛𝛼𝛼\displaystyle Prob(\mathrm{T}_{n}(\vec{X},\vec{Y})\geq z_{n,\alpha})\stackrel{% {\scriptstyle n\rightarrow\infty}}{{\rightarrow}}\alpha.italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) ≥ italic_z start_POSTSUBSCRIPT italic_n , italic_α end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP italic_α .

Here, Tn(X,Y)subscriptT𝑛𝑋𝑌\mathrm{T}_{n}(\vec{X},\vec{Y})roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) is a function of (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑢subscript𝑌𝑣(X_{i},X_{j},Y_{u},Y_{v})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) for i,j,u,v=1,2,,nformulae-sequence𝑖𝑗𝑢𝑣12𝑛i,j,u,v=1,2,\ldots,nitalic_i , italic_j , italic_u , italic_v = 1 , 2 , … , italic_n, and the permuted statistic Tn(X,YπB)subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵\mathrm{T}_{n}(\vec{X},\vec{Y}_{\pi_{B}})roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) is the same function but on (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌superscript𝑢subscript𝑌superscript𝑣(X_{i},X_{j},Y_{u^{\prime}},Y_{v^{\prime}})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), where usuperscript𝑢u^{\prime}italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and vsuperscript𝑣v^{\prime}italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT represent the permuted indices of u𝑢uitalic_u and v𝑣vitalic_v. Therefore, it suffices to prove that under the null hypothesis, the distribution of (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌superscript𝑢subscript𝑌superscript𝑣(X_{i},X_{j},Y_{u^{\prime}},Y_{v^{\prime}})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) converges to the distribution of (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑢subscript𝑌𝑣(X_{i},X_{j},Y_{u},Y_{v})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) for sufficiently large n𝑛nitalic_n. Note that under the standard i.i.d. setting, these two distributions are identical under the null hypothesis where X𝑋Xitalic_X and Y𝑌Yitalic_Y are independent.

We first consider the case where both u𝑢uitalic_u and v𝑣vitalic_v belong to the same block. In this case, usuperscript𝑢u^{\prime}italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and vsuperscript𝑣v^{\prime}italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT will also be in the same block and differ by the same lag difference. Furthermore, due to the stationary assumption, FYu,Yv=FYu,Yvsubscript𝐹subscript𝑌𝑢subscript𝑌𝑣subscript𝐹subscript𝑌superscript𝑢subscript𝑌superscript𝑣F_{Y_{u},Y_{v}}=F_{Y_{u^{\prime}},Y_{v^{\prime}}}italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Now, as we are examining the null distribution where X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG are independent, it follows that

FXi,Xj,Yu,Yv=FXi,XjFYu,Yv=FXi,XjFYu,Yv=FXi,Xj,Yu,Yv.subscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑢subscript𝑌𝑣subscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝐹subscript𝑌𝑢subscript𝑌𝑣subscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝐹subscript𝑌superscript𝑢subscript𝑌superscript𝑣subscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝑌superscript𝑢subscript𝑌superscript𝑣\displaystyle F_{X_{i},X_{j},Y_{u},Y_{v}}=F_{X_{i},X_{j}}F_{Y_{u},Y_{v}}=F_{X_% {i},X_{j}}F_{Y_{u^{\prime}},Y_{v^{\prime}}}=F_{X_{i},X_{j},Y_{u^{\prime}},Y_{v% ^{\prime}}}.italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

Namely, (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑢subscript𝑌𝑣(X_{i},X_{j},Y_{u},Y_{v})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) and (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌superscript𝑢subscript𝑌superscript𝑣(X_{i},X_{j},Y_{u^{\prime}},Y_{v^{\prime}})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) are identically distributed in this case.

Next we examine the case where u𝑢uitalic_u and v𝑣vitalic_v belong to different blocks. Given our assumption of a maximum dependence lag M𝑀Mitalic_M, if |uv|>M𝑢𝑣𝑀|u-v|>M| italic_u - italic_v | > italic_M and |uv|>Msuperscript𝑢superscript𝑣𝑀|u^{\prime}-v^{\prime}|>M| italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | > italic_M for the permuted indices, we can establish the following:

sup|FXi,Xj,Yu,YvFXi,Xj,Yu,Yv|supsubscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑢subscript𝑌𝑣subscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝑌superscript𝑢subscript𝑌superscript𝑣\displaystyle\mbox{sup}|F_{X_{i},X_{j},Y_{u},Y_{v}}-F_{X_{i},X_{j},Y_{u^{% \prime}},Y_{v^{\prime}}}|sup | italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT |
=\displaystyle== sup|FXi,XjFYu,YvFXi,XjFYu,Yv|supsubscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝐹subscript𝑌𝑢subscript𝑌𝑣subscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝐹subscript𝑌superscript𝑢subscript𝑌superscript𝑣\displaystyle\mbox{sup}|F_{X_{i},X_{j}}F_{Y_{u},Y_{v}}-F_{X_{i},X_{j}}F_{Y_{u^% {\prime}},Y_{v^{\prime}}}|sup | italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT |
\displaystyle\leq sup|FXi,Xj(FYu,YvFYuFYv)|+sup|FXi,Xj(FYuFYvFYuFYv)|supsubscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝐹subscript𝑌𝑢subscript𝑌𝑣subscript𝐹subscript𝑌𝑢subscript𝐹subscript𝑌𝑣supsubscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝐹subscript𝑌𝑢subscript𝐹subscript𝑌𝑣subscript𝐹subscript𝑌superscript𝑢subscript𝐹subscript𝑌superscript𝑣\displaystyle\mbox{sup}|F_{X_{i},X_{j}}(F_{Y_{u},Y_{v}}-F_{Y_{u}}F_{Y_{v}})|+% \mbox{sup}|F_{X_{i},X_{j}}(F_{Y_{u}}F_{Y_{v}}-F_{Y_{u^{\prime}}}F_{Y_{v^{% \prime}}})|sup | italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) | + sup | italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) |
+sup|FXi,Xj(FYuFYvFYu,Yv)|supsubscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝐹subscript𝑌superscript𝑢subscript𝐹subscript𝑌superscript𝑣subscript𝐹subscript𝑌superscript𝑢subscript𝑌superscript𝑣\displaystyle+\mbox{sup}|F_{X_{i},X_{j}}(F_{Y_{u^{\prime}}}F_{Y_{v^{\prime}}}-% F_{Y_{u^{\prime}},Y_{v^{\prime}}})|+ sup | italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) |
=o(1)absent𝑜1\displaystyle=o(1)= italic_o ( 1 )

Here, the first and third terms are o(1)𝑜1o(1)italic_o ( 1 ) as per our maximum dependence lag assumption, while the second term is exactly 00 because the marginals within the brackets remain identical before and after permutation. Consequently, in this case, (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑢subscript𝑌𝑣(X_{i},X_{j},Y_{u},Y_{v})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) is asymptotically equivalent in distribution to (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌superscript𝑢subscript𝑌superscript𝑣(X_{i},X_{j},Y_{u^{\prime}},Y_{v^{\prime}})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ).

Finally, in the case where u𝑢uitalic_u and v𝑣vitalic_v belong to different blocks, there are two additional possibilities: either |uv|M𝑢𝑣𝑀|u-v|\leq M| italic_u - italic_v | ≤ italic_M or |uv|M𝑢𝑣𝑀|u-v|\leq M| italic_u - italic_v | ≤ italic_M. In either case, we no longer have exact distribution equivalence nor asymptotic equivalence. The number of instances where (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑢subscript𝑌𝑣(X_{i},X_{j},Y_{u},Y_{v})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) does not match (Xi,Xj,Yu,Yv)subscript𝑋𝑖subscript𝑋𝑗subscript𝑌superscript𝑢subscript𝑌superscript𝑣(X_{i},X_{j},Y_{u^{\prime}},Y_{v^{\prime}})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) in distribution is at most O(MB)𝑂𝑀𝐵O(MB)italic_O ( italic_M italic_B ), which equals o(n2)𝑜superscript𝑛2o(n^{2})italic_o ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) by our assumption on M𝑀Mitalic_M and B𝐵Bitalic_B.

Therefore, taking all the above arguments together, as the sample size n𝑛nitalic_n goes to infinity, we have:

Prob(sup|FXi,Xj,Yu,YvFXi,Xj,Yu,Yv|0)1𝑃𝑟𝑜𝑏supsubscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝑌𝑢subscript𝑌𝑣subscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝑌superscript𝑢subscript𝑌superscript𝑣01\displaystyle Prob(\mbox{sup}|F_{X_{i},X_{j},Y_{u},Y_{v}}-F_{X_{i},X_{j},Y_{u^% {\prime}},Y_{v^{\prime}}}|\rightarrow 0)\rightarrow 1italic_P italic_r italic_o italic_b ( sup | italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT | → 0 ) → 1

for any random block permutation πBsubscript𝜋𝐵\pi_{B}italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT satisfying our assumption. As the result,

Prob(sup|FTn(X,Y)FTn(X,YπB)|0)1.𝑃𝑟𝑜𝑏supsubscript𝐹subscriptT𝑛𝑋𝑌subscript𝐹subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵01\displaystyle Prob(\mbox{sup}|F_{\mathrm{T}_{n}(\vec{X},\vec{Y})}-F_{\mathrm{T% }_{n}(\vec{X},\vec{Y}_{\pi_{B}})}|\rightarrow 0)\rightarrow 1.italic_P italic_r italic_o italic_b ( sup | italic_F start_POSTSUBSCRIPT roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | → 0 ) → 1 .

Namely, the sample statistic and the block-permuted statistic have asymptotically the same distribution, and the test is asymptotically valid. ∎

Theorem 4 (Testing Consistency).

Under the alternative hypothesis that X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG are dependent for some lag l[0,L]𝑙0𝐿l\in[0,L]italic_l ∈ [ 0 , italic_L ], the test statistic satisfies

Tn(X,Y)nc>0.superscript𝑛subscriptT𝑛𝑋𝑌𝑐0\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}c>0.roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP italic_c > 0 .

Moreover, the block-permutation test is asymptotically consistent, i.e.,

Prob(Tn(X,Y)zn,α)n1.superscript𝑛𝑃𝑟𝑜𝑏subscriptT𝑛𝑋𝑌subscript𝑧𝑛𝛼1\displaystyle Prob(\mathrm{T}_{n}(\vec{X},\vec{Y})\geq z_{n,\alpha})\stackrel{% {\scriptstyle n\rightarrow\infty}}{{\rightarrow}}1.italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) ≥ italic_z start_POSTSUBSCRIPT italic_n , italic_α end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP 1 .
Proof.

From the assumption on the dependence measure, when there exists at least one lag l𝑙litalic_l such that the two time series are dependent, we must have:

τ(X,Yl)=cl>0.𝜏𝑋subscript𝑌𝑙subscript𝑐𝑙0\displaystyle\tau(X,Y_{-l})=c_{-l}>0.italic_τ ( italic_X , italic_Y start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT ) = italic_c start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT > 0 .

As all other cross dependence sample statistics are non-negative, it follows that

Tn(X,Y)nl=0Lcl>0superscript𝑛subscriptT𝑛𝑋𝑌superscriptsubscript𝑙0𝐿subscript𝑐𝑙0\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y})\stackrel{{\scriptstyle n% \rightarrow\infty}}{{\rightarrow}}\sum_{l=0}^{L}c_{-l}>0roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_n → ∞ end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT - italic_l end_POSTSUBSCRIPT > 0

To prove consistency under the permutation test, it suffices to show that at any type 1111 error level α𝛼\alphaitalic_α, when the two time series are dependent for some lag, the p-value of sample dependence measure is less than α𝛼\alphaitalic_α as the sample size approaches infinity. Note that Theorem 8 in Shen et al. (2020) proved consistency of standard permutation test between two i.i.d. sample data, and the follow-on proof has similar steps but with significant adjustment for the block permutation procedure.

In the block permutation test, the p-value can be expressed as follows:

Prob(Tn(X,YπB)>Tn(X,Y))𝑃𝑟𝑜𝑏subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵subscriptT𝑛𝑋𝑌\displaystyle Prob(\mathrm{T}_{n}(\vec{X},\vec{Y}_{\pi_{B}})>\mathrm{T}_{n}(% \vec{X},\vec{Y}))italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) > roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) )
=\displaystyle== w=0BProb(Tn(X,YπB)>Tn(X,Y)|πB is a partial derangement of size w)superscriptsubscript𝑤0𝐵𝑃𝑟𝑜𝑏subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵conditionalsubscriptT𝑛𝑋𝑌subscript𝜋𝐵 is a partial derangement of size w\displaystyle\ \sum_{w=0}^{B}Prob(\mathrm{T}_{n}(\vec{X},\vec{Y}_{\pi_{B}})>% \mathrm{T}_{n}(\vec{X},\vec{Y})|\pi_{B}\mbox{ is a partial derangement of size% $w$})∑ start_POSTSUBSCRIPT italic_w = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) > roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) | italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT is a partial derangement of size italic_w )
×Prob(partial derangement of size w).absent𝑃𝑟𝑜𝑏partial derangement of size w\displaystyle\times Prob(\mbox{partial derangement of size $w$}).× italic_P italic_r italic_o italic_b ( partial derangement of size italic_w ) .

This expression conditions on the block permutation being a partial derangement of size w[0,B]𝑤0𝐵w\in[0,B]italic_w ∈ [ 0 , italic_B ], where w=0𝑤0w=0italic_w = 0 implies that πBsubscript𝜋𝐵\pi_{B}italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT is a derangement where no two blocks remain in their original positions, and w=B𝑤𝐵w=Bitalic_w = italic_B means that πBsubscript𝜋𝐵\pi_{B}italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT does not permute any blocks.

As B𝐵B\rightarrow\inftyitalic_B → ∞, from the basic property of derangement444https://en.wikipedia.org/wiki/Rencontres_numbers we have

Prob(partial derangement of size w)e1/w!.𝑃𝑟𝑜𝑏partial derangement of size wsuperscript𝑒1𝑤\displaystyle Prob(\mbox{partial derangement of size $w$})\rightarrow e^{-1}/w!.italic_P italic_r italic_o italic_b ( partial derangement of size italic_w ) → italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT / italic_w ! .

Because Tn(X,Y)c>0subscriptT𝑛𝑋𝑌𝑐0\mathrm{T}_{n}(\vec{X},\vec{Y})\rightarrow c>0roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG ) → italic_c > 0 under dependence, it suffices to prove that for any c>0𝑐0c>0italic_c > 0,

limne1w=0BProb(Tn(X,YπB)>c| partial derangement of size w)/w!0.subscript𝑛superscript𝑒1superscriptsubscript𝑤0𝐵𝑃𝑟𝑜𝑏subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵conditional𝑐 partial derangement of size w𝑤0\displaystyle\lim_{n\rightarrow\infty}e^{-1}\sum_{w=0}^{B}Prob(\mathrm{T}_{n}(% \vec{X},\vec{Y}_{\pi_{B}})>c|\mbox{ partial derangement of size $w$})/w!% \rightarrow 0.roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_w = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) > italic_c | partial derangement of size italic_w ) / italic_w ! → 0 . (1)

We decompose the above summations into two different cases. The first case is when w𝑤witalic_w is of fixed size, then X𝑋\vec{X}over→ start_ARG italic_X end_ARG and YπBsubscript𝑌subscript𝜋𝐵\vec{Y}_{\pi_{B}}over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT are asymptotically independent. This is because, for fixed w𝑤witalic_w, the number of observations that are not moved is fixed and asymptotically goes to 00, and all remaining blocks are shifted to different positions. By the maximum dependence lag M𝑀Mitalic_M, which is o(n)𝑜𝑛o(n)italic_o ( italic_n ), and the number of samples per block being larger than M𝑀Mitalic_M, the block permutation makes all other observation pairs asymptotically independent. Therefore, given i,j𝑖𝑗i,jitalic_i , italic_j, and i,jsuperscript𝑖superscript𝑗i^{\prime},j^{\prime}italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT being their block-permuted indices, we must have

sup|FXi,Xj,Yi,YjFXi,XjFYi,Yj|=o(1)supsubscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝑌superscript𝑖subscript𝑌superscript𝑗subscript𝐹subscript𝑋𝑖subscript𝑋𝑗subscript𝐹subscript𝑌superscript𝑖subscript𝑌superscript𝑗𝑜1\displaystyle\mbox{sup}|F_{X_{i},X_{j},Y_{i^{\prime}},Y_{j^{\prime}}}-F_{X_{i}% ,X_{j}}F_{Y_{i^{\prime}},Y_{j^{\prime}}}|=o(1)sup | italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT | = italic_o ( 1 )

so long |ii|>Msuperscript𝑖𝑖𝑀|i^{\prime}-i|>M| italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_i | > italic_M and |jj|>Msuperscript𝑗𝑗𝑀|j^{\prime}-j|>M| italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_j | > italic_M, which asymptotically holds for all blocks who moved the block position. Therefore, when w𝑤witalic_w is a fixed size, X𝑋\vec{X}over→ start_ARG italic_X end_ARG and YπBsubscript𝑌subscript𝜋𝐵\vec{Y}_{\pi_{B}}over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT are asymptotically independent, and we have

Tn(X,YπB)0subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵0\displaystyle\mathrm{T}_{n}(\vec{X},\vec{Y}_{\pi_{B}})\rightarrow 0roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) → 0

as the sample statistic converges to the population, and the population statistic equals 00 under independence.

The other case is the remaining partial derangements πBsubscript𝜋𝐵\pi_{B}italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT of increasing size w𝑤witalic_w, but these partial derangements occur with probability converging to 00. Formally, for any α>0𝛼0\alpha>0italic_α > 0, there exists B1subscript𝐵1B_{1}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT such that

e1w=B1+1+1/w!<α/2.superscript𝑒1superscriptsubscript𝑤subscript𝐵111𝑤𝛼2\displaystyle e^{-1}\sum_{w=B_{1}+1}^{+\infty}1/w!<\alpha/2.italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_w = italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT 1 / italic_w ! < italic_α / 2 .

This is because w=0B1/w!superscriptsubscript𝑤0𝐵1𝑤\sum\limits_{w=0}^{B}1/w!∑ start_POSTSUBSCRIPT italic_w = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT 1 / italic_w ! is bounded above and converges to e𝑒eitalic_e. Then back to the first case, we can find B2>B1subscript𝐵2subscript𝐵1B_{2}>B_{1}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT such that for any wB1𝑤subscript𝐵1w\leq B_{1}italic_w ≤ italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and all B>B2𝐵subscript𝐵2B>B_{2}italic_B > italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT,

Prob(Tn(X,YπB)>c| partial derangement of size w)<α/2.𝑃𝑟𝑜𝑏subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵conditional𝑐 partial derangement of size w𝛼2\displaystyle Prob(\mathrm{T}_{n}(\vec{X},\vec{Y}_{\pi_{B}})>c|\mbox{ partial % derangement of size $w$})<\alpha/2.italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) > italic_c | partial derangement of size italic_w ) < italic_α / 2 .

Therefore, for all B>B2𝐵subscript𝐵2B>B_{2}italic_B > italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT:

e1w=0BProb(Tn(X,YπB)>c| partial derangement of size w)/w!superscript𝑒1superscriptsubscript𝑤0𝐵𝑃𝑟𝑜𝑏subscriptT𝑛𝑋subscript𝑌subscript𝜋𝐵conditional𝑐 partial derangement of size w𝑤\displaystyle e^{-1}\sum_{w=0}^{B}Prob(\mathrm{T}_{n}(\vec{X},\vec{Y}_{\pi_{B}% })>c|\mbox{ partial derangement of size $w$})/w!italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_w = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT italic_P italic_r italic_o italic_b ( roman_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over→ start_ARG italic_X end_ARG , over→ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) > italic_c | partial derangement of size italic_w ) / italic_w !
<\displaystyle<< e1w=0B1α/2w!+e1w=B1+1B1/w!superscript𝑒1superscriptsubscript𝑤0subscript𝐵1𝛼2𝑤superscript𝑒1superscriptsubscript𝑤subscript𝐵11𝐵1𝑤\displaystyle\ e^{-1}\sum_{w=0}^{B_{1}}\alpha/2w!+e^{-1}\sum_{w=B_{1}+1}^{B}1/w!italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_w = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_α / 2 italic_w ! + italic_e start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_w = italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT 1 / italic_w !
<\displaystyle<< α.𝛼\displaystyle\ \alpha.italic_α .

Thus the convergence in Equation 1 holds.

In conclusion, at any type 1111 error level α>0𝛼0\alpha>0italic_α > 0, the p-value of the temporal dependence sample statistic under the block permutation test will eventually be less than α𝛼\alphaitalic_α as n𝑛nitalic_n increases. Therefore, the proposed test is consistent against all dependencies with finite second moments, and its testing power converges to 1111 when the time series X𝑋\vec{X}over→ start_ARG italic_X end_ARG and Y𝑌\vec{Y}over→ start_ARG italic_Y end_ARG are dependent. ∎