Multiple Time Series: X y X T T N X y N T R T T
Multiple Time Series: X y X T T N X y N T R T T
Multiple Time Series: X y X T T N X y N T R T T
7.2 Cross-correlation
Given two time series xt and yt we can delay xt by T samples and then calculate the
cross-covariance between the pair of signals. That is
XN
xy (T ) = N 1 1 (xt T x)(yt y ) (7.1)
t=1
where x and y are the means of each time series and there are N samples in
each. The function xy (T ) is the cross-covariance function. The cross-correlation is
a normalised version
rxy (T ) = q xy (T ) (7.2)
xx(0)yy (0)
where we note that xx(0) = x2 and yy (0) = y2 are the variances of each signal.
Note that
rxy (0) = xy (7.3)
x y
which is the correlation between the two variables. Therefore unlike the autocorre-
lation, rxy is not, generally, equal to 1. Figure 7.1 shows two time series and their
cross-correlation.
87
7.2.1 Cross-correlation is asymmetric
First, we re-cap as to why the auto-correlation is a symmetric function. The autoco-
variance, for a zero mean signal, is given by
1 XN
xx(T ) = N 1 xt T xt (7.4)
t=1
This can be written in the shorthand notation
xx(T ) =< xt T xt > (7.5)
where the angled brackets denote the average value or expectation. Now, for negative
lags
xx( T ) =< xt+T xt > (7.6)
Subtracting T from the time index (this will make no dierence to the expectation)
gives
xx( T ) =< xt xt T > (7.7)
which is identical to xx(T ), as the ordering of variables makes no dierence to the
expected value. Hence, the autocorrelation is a symmetric function.
The cross-correlation is a normalised cross-covariance which, assuming zero mean
signals, is given by
xy (T ) =< xt T yt > (7.8)
and for negative lags
xy ( T ) =< xt+T yt > (7.9)
Subtracting T from the time index now gives
xy ( T ) =< xt yt T > (7.10)
which is dierent to xy (T ). To see this more clearly we can subtract T once more
from the time index to give
xy ( T ) =< xt T yt 2T > (7.11)
Hence, the cross-covariance, and therefore the cross-correlation, is an asymmetric
function.
To summarise: moving signal A right (forward in time) and multiplying with signal
B is not the same as moving signal A left and multiplying with signal B; unless signal
A equals signal B.
7.2.2 Windowing
When calculating cross-correlations there are fewer data points at larger lags than
at shorter lags. The resulting estimates are commensurately less accurate. To take
account of this the estimates at long lags can be smoothed using various window
operators. See lecture 5.
4
−1
−2
(a) 0 20 40 60 80 100
0.5
0.4
0.3
0.2
0.1
−0.1
−0.2
−0.3
−0.4
(a) −100 −50 0 50 100
Figure 7.2: Cross-correlation function rxy (T ) for the data in Figure 7.1. A lag of
T denotes the top series, x, lagging the bottom series, y. Notice the big positive
correlation at a lag of 25. Can you see from Figure 7.1 why this should occur ?
7.2.3 Time-Delay Estimation
If we suspect that one signal is a, possibly noisy, time-delayed version of another signal
then the peak in the cross-correlation will identify the delay. For example, gure 7.1
suggests that the top signal lags the bottom by a delay of 25 samples. Given that the
sample rate is 125Hz this corresponds to a delay of 0.2 seconds.
7.3.2 Example
Given two time series and a MAR(3) model, for example, the MAR predictions are
x^(t) = x~(t)A (7.24)
2 3
a (1)
x^(t) = [x(t 1); x(t 2); x(t 3)] 64 a(2) 75
a(3)
h i h i
x^1 (t) x^2 (t) = x1 (t 1)x2(t 1)x1 (t 2)x2(t 2)x1 (t 3)x2(t 3) (7.25)
2 a^ (1) a^ (1) 3
66 a^2111 (1) a^2212 (1) 77
66 a^ (2) a^ (2) 77
66 11 12 7
a^
66 21 (2) a^ 22 (2) 77
4 a^11 (3) a^12 (3) 75
a^21 (3) a^22 (3)
1The MDL criterion is identical to the negative value of the Bayesian Information Criterion (BIC)
ie. MDL(p) = BIC (p), and Neumaier and Schneider refer to this measure as BIC.
4
−1
−2
0 20 40 60 80 100
(a) t
Figure 7.3: Signals x1 (t) (top) and x2 (t) (bottom) and predictions from MAR(3)
model.
Applying an MAR(3) model to our data set gave the following estimates for the AR
coecients, ap, and noise covariance C , which were estimated from equations 7.17
and 7.20
" #
1 :2813
a1 = 0:0018 1:0816 0 :2394
" #
a2 = 0:7453 0:2822
0:0974 0:6044
" #
a3 = 0:3259 0:0576
0:0764 0:2699
" #
C = 00::0714 0:0054
0054 0:0798
By noting that
exp( iwn) = exp( iwl) exp(iwk) (7.29)
where k = l n we can see that the CSD splits into the product of two integrals
P12 (w) = X1 (w)X2( w) (7.30)
where
X
1
X1(w) = x1 (l) exp( iwl) (7.31)
l= 1
X
1
X2( w) = x2 (k) exp(+iwk)
k= 1
For real signals X2(w) = X2( w) where * denotes the complex conjugate. Hence,
the cross spectral density is given by
P12(w) = X1(w)X2(w) (7.32)
This means that the CSD can be evaluated in one of two ways (i) by rst estimating
the cross-covariance and Fourier transforming or (ii) by taking the Fourier transforms
of each signal and multiplying (after taking the conjugate of one of them). A number
of algorithms exist which enhance the spectral estimation ability of each method.
These algorithms are basically extensions of the algorithms for PSD estimation, for
example, for type (i) methods we can perform Blackman-Tukey windowing of the
cross-covariance function and for type (ii) methods we can employ Welch's algorithm
for averaging modied periodograms before multiplying the transforms. See Carter
[8] for more details.
0.9
0.9
0.8
0.8
0.7
0.6 0.7
0.5 0.6
0.4
0.5
0.3
0.4
0.2
where C is the residual covariance matrix and H denotes the Hermitian transpose.
This is formed by taking the complex conjugate of each matrix element and then
applying the usual transpose operator.
Just as A T denotes the transpose of the inverse so A H denotes the Hermitian
transpose of the inverse. Once the PSD matrix has been calculated, we can calculate
the coherences of interest using equation 7.35.
7.5 Example
To illustrate the estimation of coherence we generated two signals. The rst, x, being
a 10Hz sine wave with additive Gaussian noise of standard deviation 0:3 and the
second y being equal to the rst but with more additive noise of the same standard
deviation. Five seconds of data were generated at a sample rate of 128Hz. We
then calculated the coherence using (a) Welch's modied periodogram method with
N = 128 samples per segment and a 50% overlap between segments and smoothing
via a Hanning window and (b) an MAR(8) model. Ideally, we should see a coherence
near to 1 at 10Hz and zero elsewhere. However, the coherence is highly non-zero at
other frequencies. This is because due to the noise component of the signal there
is power (and some cross-power) at all frequencies. As coherence is a ratio of cross-
power to power it will have a high variance unless the number of data samples is
large.
You should therefore be careful when interpreting coherence values. Preferably you
should perform a signicance test, either based on an assumption of Gaussian signals
[8] or using a Monte-Carlo method [38]. See also the text by Bloomeld [4].
7.6 Partial Coherence
There is a direct analogy to partial correlation. Given a target signal y and other
signals x1 ; x2 ; :::; xm we can calculate the `error' at a given frequency after including
k = 1::m variables Em (f ). The partial coherence is
km(f ) = Em 1E(f ) (fE)m(f ) (7.42)
m 1
See Carter [8] for more details.