Exploring Dynamic Structures in Matrix-Valued Time Series via Principal Component Analysis

Billard, Lynne; Douzal-Chouakria, Ahlame; Samadi, S. Yaser

doi:10.3390/axioms12060570

Open AccessFeature PaperArticle

Exploring Dynamic Structures in Matrix-Valued Time Series via Principal Component Analysis

by

Lynne Billard

¹,

Ahlame Douzal-Chouakria

² and

S. Yaser Samadi

^1,3,*

¹

Department of Statistics, University of Georgia, Athens, GA 30602, USA

²

Centre National de Recherche Scientifique—Laboratoire d’Informatique de Grenoble, Université Grenoble Alpes, 38401 Saint-Martin-d’Hères, France

³

School of Mathematical and Statistical Sciences, Southern Illinois University, Carbondale, IL 62901, USA

^*

Author to whom correspondence should be addressed.

Axioms 2023, 12(6), 570; https://doi.org/10.3390/axioms12060570

Submission received: 16 May 2023 / Revised: 3 June 2023 / Accepted: 5 June 2023 / Published: 8 June 2023

(This article belongs to the Special Issue Statistical Modeling of Modern Multivariate Data)

Download

Browse Figures

Versions Notes

Abstract

:

Time-series data are widespread and have inspired numerous research works in machine learning and data analysis fields for the classification and clustering of temporal data. While there are several clustering methods for univariate time series and a few for multivariate series, most methods are based on distance and/or dissimilarity measures that do not fully utilize the time-dependency information inherent to time-series data. To highlight the main dynamic structure of a set of multivariate time series, this study extends the use of standard variance–covariance matrices in principal component analysis to cross-autocorrelation matrices at time lags

k = 1, 2, \dots

. This results in “principal component time series”. Simulations and a sign language dataset are used to demonstrate the effectiveness of the proposed method and its benefits in exploring the main structural features of multiple time series.

Keywords:

principal component time series; multivariate time series; clustering; dynamic structure

MSC:

62M10; 91B84

1. Introduction

Multivariate time-series data are becoming increasingly common in many fields, as they provide a rich source of information about the behavior of complex systems. For example, in meteorology, data collected from multiple weather stations can be combined to form a multivariate time series that captures changes in temperature, humidity, wind speed, and other variables over time. Similarly, in economics, time-series data can be used to track changes in stock prices, exchange rates, and other financial metrics, e.g., [1,2,3,4], among others.

Matrix-valued time-series data are a special type of multivariate time series where the measurements at each time point are themselves matrices. These types of data are particularly useful in situations where the underlying system being studied has a complex structure, such as in neuroscience, where brain activity can be measured using electroencephalography (EEG) or functional magnetic resonance imaging (fMRI), e.g., [5,6].

The challenge in analyzing matrix-valued time-series data lies in identifying meaningful patterns or clusters in the data and exploring the dynamic structure of the system over time. This requires developing new statistical methods and algorithms that can handle the complex nature of the data. However, the potential benefits of analyzing matrix-valued time-series data are significant, as they can provide insights into the underlying mechanisms driving complex systems and inform the development of more effective interventions or treatments. Therefore, our focus is on the growing problem of identifying patterns or clusters and exploring dynamic structures in matrix-valued time series.

When dealing with multiple time series arranged in rows and columns of a matrix, the focus often centers on identifying similarities between the different series. For example, in a collection of stock market series, one interest may be in determining whether certain stocks, such as mining stocks, behave similarly but differently from other stocks, such as pharmaceuticals. Similarly, identifying similarities between measurements of J rows of different stimuli in EEGs or fMRIs may suggest particular disease patterns. In the case of time series recordings of J sensors on different bridges as trains travel across them, identifying new patterns may alert engineers to possible structural issues. Meteorological data measuring J different weather conditions, temperature, pressures, and moisture, may show patterns across S different cities. Network traffic in the same networks may behave dependently but differently from other networks. Similarly, communication networks may exhibit different patterns in phone usage across S age-groups and/or regions. The list of examples is endless.

By incorporating both the between-column (or between-row) and within-column (or within-row) dependencies, the cross-autocorrelation/autocovariance functions can help to identify any underlying trends or patterns that may be missed by traditional methods. This can lead to more accurate predictions and a better understanding of the dynamics of the system under study. Various techniques have been developed to address this challenge, including multivariate time series models, machine learning algorithms, and network analysis approaches. The choice of method depends on the specific application and the nature of the data, as well as the desired level of complexity and interpretability.

By extending the principle component analysis (PCA) technique in both the row-wise and column-wise directions for a

J \times S

-dimensional matrix-valued time-series data, we can gain deeper insights and improve our classification capabilities.

In the row-wise approach, we focus on classifying the column series. Each column series represents a distinct entity or category, and we want to understand the variations and patterns within each series. To achieve this, we compute the PCAs for each observation of each column series based on the row cross-autocovariance functions. This allows us to identify the key components or factors that contribute to the variability within each column series. By examining the resulting PCAs, we can distinguish and classify the column series based on their behavior and similarities.

On the other hand, in the column-wise approach, our objective is to classify the row variables. Each row variable represents a specific aspect or measurement, and we want to explore the patterns and relationships among these variables. In this case, the PCAs are computed for each observation of each row series, utilizing the column cross-autocovariance functions. By analyzing the PCAs, we can uncover the underlying structure and associations among the row variables, aiding us in their classification and understanding.

These extended PCA techniques offer flexibility in capturing the distinct characteristics of either the column series or the row variables. By considering the impact of either the S column series or the J row series, we can gain a comprehensive understanding of the data and enhance our ability to classify and interpret them effectively.

Although very little work yet exists for the clustering of sets of multivariate time series, there is an extensive literature on clustering for univariate time series, with a nice detailed summary in [7], dating back to the seminal work of [8]. Most of these methods are based on differing distance or dissimilarity measures between observations of two series

(s, s^{'})

at the same time t,

d (X_{s t}, X_{s^{'} t}), s \neq s^{'}

. Many subsequent authors, such as [9,10,11], took the Euclidean distance

D = \sum_{t} {[d (X_{s t}, X_{s^{'} t})]}^{2}

as their distance function, or variations thereof. However, usually, these distances are functions of specific times, and do not incorporate time dependencies of the data. Some authors assumed the time series satisfied specific autoregressive models, and then calculated distance measures between model parameters, e.g., [12,13,14], while [15,16] used distances between forecasts of autoregressive structures. Owsley et al. [17] reformulated their data as hidden Markov models; Refs. [18,19,20] used time-warping approaches, among others.

The more difficult case of multivariate time series clustering is not yet well developed. A two-step procedure was established in [21]. The first step converted the J-dimensional multivariate time series into a single

J = 1

univariate time series by a time-stripping process. This was followed by a second step involving univariate clustering based on time-warping and Kullback–Leibler distance measures. Košmelj and Zabkar [22] extended the compound interest weighted Euclidean distances of [8] to

J > 1

variables.

These approaches relate to “time domain” descriptions of time-series data, and this is to be our focus herein. However, we note that some researchers have worked on “spectral domain” formulations. One approach involved calculating the Euclidean distance of cepstral coefficients, which are derived from the inverse transform of the short-time logarithmic amplitude spectrum ([23]). For multivariate series, different methods were employed, Refs. [24,25] utilized J divergences and symmetric Chernoff information divergences based on spectral matrices, while [26] analyzed spectra of detrended residuals. Spectral impacts, as

J \to \infty

, were investigated by [27,28]. Alternatively, researchers such as [29,30,31,32] adopted a functional analysis approach in their studies.

Typically, once dissimilarity measures were suitably defined, clustering and/or partitioning algorithms, such as k-medoids (introduced by [33]), Ward’s method ([34]), agglomerative hierarchical partitioning ([35]), and fuzzy c-means ([36]), all well-established methodologies, were applied. As pointed out by Liao, most clustering used the data directly (as in, for example, the Euclidean distances D), with the occasional exception, assuming all series followed the same model (e.g., [37]) or were generated from known models (e.g., [38]).

Despite the extensive research and well-established literature on univariate and vector autoregressive models ([4,39,40,41,42,43], among others), the exploration of matrix-valued time series has received comparatively less attention. Walden and Serroukh [44] introduced the concept of matrix-valued time series as part of their wavelet analysis for signal and image processing. Wang and West [45] further expanded on this idea by developing a Bayesian analysis approach for dynamic matrix-variate normal graphical models. In addition, Refs. [46,47] proposed and formalized a matrix autoregressive model of order p, known as MAR(p).

While these contributions all helped to advance the subject, a major limiting feature is the fact that the dissimilarity or related measures usually do not adequately retain the time-dependency information inherent to time-series data. However, a fundamental feature of time series is the time dependence between observations, from one time t to a later time

t + k

. These dependencies are measured through the autocovariance functions, defined by

γ_{k} = E [(X_{t} - μ) (X_{t + k} - μ)]

between observations

X_{t}

and

X_{t + k}

, for lag

k = 0, 1, 2, \dots

, with

μ = E (X_{t})

(or, equivalently, the autocorrelation functions given by

ρ_{k} = γ_{k} / γ_{0}

).

Our approach is to utilize these time dependencies by combining the concepts of autocovariance functions for a single univariate time series with those for the cross-covariance functions for a single multivariate time series to produce cross-autocovariance functions for multiple multivariate time series; see Section 2. Then, we extend the ideas for standard (non-time series) data principal component analyses to an exploratory analysis of multiple sets of multivariate time series so as to identify patterns or clusters in the data; see Section 3. Then, in Section 4, we conduct a simulation study to illustrate the methodology; and a dataset that motivated this work is analyzed in Section 5, demonstrating the effectiveness of the procedure. We close with some concluding remarks in Section 6.

2. Cross-Autocorrelation Functions

Let

X_{t} = {{(X_{j s})}_{t} \equiv X_{j t s}, j = 1, \dots, J, s = 1, \dots, S}

, be a

J \times S

-dimensional matrix-valued time series process of length T. We assume that the data are transformed appropriately so that each series is a stationary time series. Autocovariance (or autocorrelation) functions measure time dependencies across the observations k units apart for a given variable

X_{j s}

. These functions provide insights into the persistence of the time-series data and help to identify any seasonality or trends present in the data. Cross-covariance functions measure the dependencies across rows of the matrix for a given column s (or across columns for a given row j), which can help to identify relationships between different series. A major challenge in the analysis of matrix-valued time-series data is to identify and incorporate both the between-column (or between-row) and within-column (or within-row) dependencies simultaneously. To address this challenge, the aim is to find a function that can effectively capture both types of dependencies. This function should take into account the complex interplay between the different variables and provide a comprehensive understanding of the relationships and patterns present in the data.

An example of a measure that can be used to assess the relationship between different variables in a time series is the sample cross-autocovariance function at lag k. For the row variables, it is denoted by

{\hat{γ}}_{j j^{'}} (k)

, where j and

j^{'}

represent two different row matrix variables, defined by

{\hat{γ}}_{j j^{'}} (k) = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{t = 1}^{T - k} (X_{s t j} - {\bar{X}}_{j}) (X_{s, t + k, j^{'}} - {\bar{X}}_{j^{'}}), k = 0, 1, \dots,

(1)

where

{\bar{X}}_{j} = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{t = 1}^{T} X_{s t j}, j, j^{'} = 1, \dots, J;

(2)

and hence the sample cross-autocorrelation function at lag k,

{\hat{ρ}}_{j j^{'}} (k)

,

j, j^{'} = 1, \dots, J

, is given by

{\hat{ρ}}_{j j^{'}} (k) = {\hat{γ}}_{j j^{'}} (k) / {{\hat{γ}}_{j j} (0) {\hat{γ}}_{j^{'} j^{'}} (0)}^{1 / 2}, k = 0, 1, \dots .

(3)

Similarly, the sample cross-autocovariance and cross-autocorrelation functions at lag k for the column variables can be defined with

{\hat{γ}}_{s s^{'}} (k)

and

{\hat{ρ}}_{s s^{'}} (k)

, respectively, where

s, s^{'} = 1, \dots, S

.

The terms

{\hat{γ}}_{j j^{'}} (k)

represent elements in the

J \times J

sample row autocovariance function matrix

{\hat{Γ}}_{r} (k)

, while

{\hat{ρ}}_{j j^{'}} (k)

represent elements in the

J \times J

sample row autocorrelation function matrix

{\hat{ρ}}_{r} (k)

,

k = 0, 1, \dots

. Similarly, the terms

{\hat{γ}}_{s s^{'}} (k)

are elements in the

S \times S

sample column autocovariance function matrix

{\hat{Γ}}_{c} (k)

, while

{\hat{ρ}}_{s s^{'}} (k)

represent elements in the

S \times S

sample column autocorrelation function matrix

{\hat{ρ}}_{c} (k)

.

When

J = 1

and

S = 1

, Equation (1) reduces to the well-known sample autocovariance function at lag k for a single univariate time series. Typically, plots of the sample autocorrelation functions across lags decay as the lags increase, allowing for the identification of possible models. If these plots fail to decay, non-stationarity is indicated, thus alerting the analyst to the need to suitably transform the data as appropriate. For a detailed treatment, refer to any of the many introductory texts on time series, such as [39,40,48,49].

When

J \geq 1

and

S = 1

(or when

J = 1

and

S \geq 1

), the matrix-valued time series reduces to a vector-valued time series. In this case, Equations (1) and (2) become

{\hat{γ}}_{j j^{'}} (k) = \frac{1}{T} \sum_{t = 1}^{T - k} (X_{t j} - {\bar{X}}_{j}) (X_{t + k, j^{'}} - {\bar{X}}_{j^{'}}), k = 0, 1, \dots, {\bar{X}}_{j} = \frac{1}{T} \sum_{t = 1}^{T} X_{t j} .

(4)

This autocovariance function in Equation (4) dates back to [50]. Jones [51] showed that it is non-symmetric. For variables

j \neq j^{'}

,

ρ_{j j^{'}} (k) \neq ρ_{j^{'} j} (k)

, except when

k = 0

. However, the autocorrelation matrix

ρ (k) = ρ^{'} (- k)

,

k = 0, 1, \dots

.

The sample means

{\bar{X}}_{j}

in Equation (2) are based on column times series and all times. Alternatively, these could be calculated by each series in the matrix time series so that the functions of Equations (1)–(3) become

{\hat{γ}}_{j j^{'}}^{(W)} (k) = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{t = 1}^{T - k} (X_{s t j} - {\bar{X}}_{s j}) (X_{s, t + k, j} - {\bar{X}}_{s j^{'}}), k = 0, 1, \dots,

(5)

where

{\bar{X}}_{s j} = \frac{1}{T} \sum_{t = 1}^{T} X_{s t j}, j = 1, \dots, J, s = 1, \dots, S,

(6)

and

{\hat{ρ}}_{j j^{'}}^{(W)} (k) = {\hat{γ}}_{j j^{'}}^{(W)} (k) / {{\hat{γ}}_{j j}^{(W)} (0) {\hat{γ}}_{j^{'} j^{'}}^{(W)} (0)}^{1 / 2}, k = 0, 1, \dots .

(7)

In contrast to the sample autocovariance functions in Equation (1), which reflect the total variation between the observations between all column series, the autocovariance functions in Equation (5) reflect within-series variations. Therefore, Equations (5) and (7) are referred to as within (W) autocovariance and autocorrelation functions, respectively.

Alternative methods exist for calculating autocovariance (and hence autocorrelation) functions. One such method involves adjusting the sample means for lag k, replacing

{\bar{X}}_{j}

of Equation (2) and

{\bar{X}}_{s j}

Equation (6) with

{\bar{X}}_{j (k)}

and

{\bar{X}}_{s j (k)}

, respectively, where the lag-adjusted means are given by

{\bar{X}}_{j (k)} = \frac{1}{(T - k) S} \sum_{s = 1}^{S} \sum_{t = 1}^{T - k} X_{s t j}, {\bar{X}}_{s j (k)} = \frac{1}{T - k} \sum_{t = 1}^{T - k} X_{s t j}, j = 1, \dots, J .

(8)

Then, the total autocovariance function in Equation (1) is replaced by

{\hat{γ}}_{j j^{'}}^{(T_{k})} (k) = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{t = 1}^{T - k} (X_{s t j} - {\bar{X}}_{j (k)}) (X_{s, t + k, j^{'}} - {\bar{X}}_{j^{'} (k)}), k = 0, 1, \dots;

(9)

and similarly, the within-autocovariance function in Equation (5) is replaced by

{\hat{γ}}_{j j^{'}}^{(W_{k})} (k) = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{t = 1}^{T - k} (X_{s t j} - {\bar{X}}_{s j (k)}) (X_{s, t + k, j^{'}} - {\bar{X}}_{s j^{'} (k)}), k = 0, 1, \dots .

(10)

Likewise, the autocorrelation functions of Equations (3) and (7) are adjusted accordingly. For

j, j^{'} = 1, \dots, J, k = 0, 1, \dots

, it can be shown that

{\hat{γ}}_{j j^{'}}^{(T_{k})} (k) = {\hat{γ}}_{j j^{'}}^{(W_{k})} (k) + {\hat{γ}}_{j j^{'}}^{(B_{k})} (k)

(11)

where

{\hat{γ}}_{j j^{'}}^{(B_{k})} (k) = \frac{T - k}{T S} \sum_{s = 1}^{S} ({\bar{X}}_{s j (k)} - {\bar{X}}_{j (k)}) ({\bar{X}}_{s j^{'} (k)} - {\bar{X}}_{j^{'} (k)}) .

(12)

The between

(B_{k})

autocovariances,

γ_{j j^{'}}^{(B_{k})} (k)

, estimated from Equation (12), measure the variations between the average values per series and, as such, lose the dependencies over time inherent to any time series.

If it is known a priori that some of the S column time series cluster into (sub)classes, then the within-variation functions used in Equations (5) and (7) can be refined to accommodate this. Suppose there are H classes

{C_{1}, \dots, C_{H}}

with class

C_{h}

containing

m_{h}

series

{s_{1}, \dots, s_{m_{h}}}

, where

\cup_{h} C_{h}

is the complete set of S column series and

C_{h} \cap C_{h^{'}} = \emptyset, h \neq h^{'}

,

h, h^{'} = 1, \dots, H

. Then, the sample class (C) cross-autocovariance function matrix,

Γ^{(C)} (k),

has elements estimated by the following equation, for

k = 0, 1, \dots, j, j^{'} = 1, \dots, J

:

{\hat{γ}}_{j j^{'}}^{(C)} (k) = \frac{1}{T S} \sum_{h = 1}^{H} \sum_{s = s_{1}}^{s_{m_{h}}} \sum_{t = 1}^{T - k} (X_{s t j} - {\bar{X}}_{h j}) (X_{s, t + k, j^{'}} - {\bar{X}}_{h j^{'}})

(13)

with the sample mean of all

T_{h} = (T \times s_{m_{h}})

observations in class

C_{h}

as

{\bar{X}}_{h j} = \frac{1}{T_{h}} \sum_{s = s_{1}}^{s_{m_{h}}} \sum_{t = 1}^{T} X_{s t j}, j = 1, \dots, J, h = 1, \dots, H;

(14)

and hence, the sample class cross-autocorrelation matrix,

{\hat{ρ}}^{(C)} (k)

, has elements

{\hat{ρ}}_{j j^{'}}^{(C)} (k) = {\hat{γ}}_{j j^{'}}^{(C)} (k) / {{\hat{γ}}_{j j}^{(C)} (0) {\hat{γ}}_{j^{'} j^{'}}^{(C)} (0)}^{1 / 2}, k = 0, 1, \dots .

(15)

These class cross-autocorrelation functions, known as “class”, or more technically “within-class” functions, since they capture variations within prior-defined classes, can be useful in a supervised learning setting. When using a class-autocovariance approach, the between-autocovariances (of Equation (12)) may also be informative ([52]). It is worth noting that other suitable formulations for cross-autocorrelation functions are also possible.

The divisor T in Equations (1), (4), (5), (9), (10), (12) and (13) is used instead of the intuitive

(T - k)

(for unbiasedness), in order to ensure that the resulting sample cross-autocorrelation matrices are non-negative definite. Although this leads to a biased estimator, it is asymptotically unbiased as

T \to \infty

.

Weight functions can be incorporated into all these autocovariance and cross-autocovariance functions, extending to the corresponding autocorrelation and cross-autocorrelation functions as well. The formulas of Equations (1), (4) and (5) are written under the assumption that each time observation in a time series carries the same weight

ω_{t} \equiv 1 / T

, and that each column series is equally weighted at

ω_{s}^{'} \equiv 1 / S

. In general, Equation (1) can be expressed as

{\hat{γ}}_{j j^{'}} (k) = \sum_{s = 1}^{S} \sum_{t = 1}^{T - k} ω_{t} ω_{s}^{'} (X_{s t j} - {\bar{X}}_{j}) (X_{s, t + k, j^{'}} - {\bar{X}}_{j^{'}}), k = 0, 1, \dots,

(16)

where the weighted by time and column series sample mean becomes

{\bar{X}}_{j} = \sum_{s = 1}^{S} \sum_{t = 1}^{T} ω_{t} ω_{s}^{'} X_{s t j}, j = 1, \dots, J .

(17)

The other cross/autocovariance functions are adjusted in a similar manner.

For example, in the application described in Section 5, we analyze a matrix time series with dimensions

(J = 8) \times (S = 24)

. The sample means

{\bar{X}}_{j}, j = 1, \dots, J

, calculated in Equation (2) are determined across all

S = 24

column series regardless of “word” or “speaker”. In contrast, the sample means

{\bar{X}}_{s j}, j = 1, \dots, J, s = 1, \dots, S

, computed in Equation (6), are specific to each column series, i.e., each word spoken by a particular speaker. Hence, perforce variations (e.g., cross-autocovariance functions) tend to be large for the so-called total cross-autocovariance functions compared to the within-cross-autocovariance functions (likewise, for the other functions defined herein). On the other hand, in a scenario where we conduct a supervised learning study and possess priori knowledge that the words “know” and “yes” (column series with s = 1–6, 19–24) form one group, while “maybe” (column series s = 7–12) and “shop” (column series s = 13–18) follow separate distinct patterns (as illustrated in Section 5), we can adjust the sample means

{\bar{X}}_{r j}, j = 1, \dots, J, r = 1, \dots, R

, as indicated in Equation (14). These means are calculated for all observations within each of these

R = 3

predefined clusters.

3. Principal Component Analysis

3.1. Principal Components for Time Series

The use of principal components is a valuable and informative exploratory data-analysis technique. While it is not necessarily an optimal clustering method, as some information in the data may be contained in later higher-order principal components, it is a useful technique for distinguishing between observations that exhibit similar behavior. Principal component analysis (PCA) is a well-established methodology for reducing the dimensionality to

Q < J

. This enables easier visualization and interpretation of the complete dataset, including the identification of variables that contribute more or less to the variations in the data.

Let us first consider a p-dimensional vector time series

X_{t} = (X_{t 1}, \dots, X_{t p})

,

t = 1, \dots, T

. To compute the principal components, the first step is to calculate the

p \times p

variance–covariance matrix

V

(say), based on the p-dimensional observations the

{X_{t j}, t = 1, \dots, T, j = 1, \dots, p}

. Then, for each observation

t = 1, \dots, T

, the

ν

th principal component is computed as follows:

P C_{ν} (t) = w_{ν 1} X_{t 1} + \dots + w_{ν p} X_{t p}, ν = 1, \dots, p,

(18)

where

w_{ν} = (w_{ν 1}, \dots, w_{ν p})

is the

ν

th eigenvector of

V

, and the total variance is

σ^{2} = \sum_{ν = 1}^{p} λ_{ν}

, where

λ_{ν}

represents the

ν

th eigenvalue of

V

, with

λ_{1} \geq λ_{2} \geq \dots \geq λ_{p} \geq 0

. When observations are standardized, the covariance matrix

V

is replaced by the corresponding correlation matrix

R

and

\sum_{ν = 1}^{p} λ_{ν} = p

. For a detailed description of this methodology for standard data, see any of the numerous texts on multivariate analysis, e.g., [53,54] for an applied approach, and [55] for theoretical details. Canonical correlation analysis (CCA) employed by [56] to assess the temporal relationships between variables and establish the covariance matrix

V

in Equation (18). This study builds upon the previous work conducted by [57], where PCA was employed to analyze multiple sets of multivariate time series.

For matrix-valued time-series data of dimension

J \times S

, the PCA technique can be extended in two directions: row wise and column wise. In each case, instead of using a standard variance–covariance matrix, we utilize a matrix whose elements are cross-autocovariance functions. When the data are standardized, the elements represent cross-autocorrelation functions. This extension allows us to consider the impact of the S column series in the row-wise approach and the impact of the J row series in the column-wise approach. If the objective is to classify the column series, the row-wise approach is appropriate. Here, we compute the PCAs for each observation of each column series based on the row cross-autocovariance functions. On the other hand, if the goal is to classify the row variables, the column-wise PCAs should be considered. In the column-wise scenario, we calculate the PCAs for each observation of each row series, taking into account the cross-autocovariance functions among the columns. By extending the PCA technique in these two directions, we can effectively classify either the column series or the row variables, depending on the specific objectives of the analysis. In this discussion, we focus on describing the row-wise approach, but it is important to note that the column-wise approach can be extended in a similar manner.

Given that the cross-autocorrelation functions are each a function of a time lag k, there is a distinct cross-autocorrelation matrix for each lag value. Consequently, in the illustrations presented in Section 4 and Section 5, we replace the conventional non-time series correlation matrix

R

with the time-dependent cross-autocorrelation matrix

\hat{ρ} (k)

. The elements of this matrix, represented as

{\hat{ρ}}_{j j^{'}} (k)

and obtained from Equation (3), capture the cross-autocorrelation values between different variables at lag k. Additionally, alternative cross-autocorrelation matrices discussed in Section 2 can also be utilized. By employing these time-dependent cross-autocorrelation matrices, we can proceed to compute the corresponding eigenvalues and eigenvectors. These eigenvalues and eigenvectors provide valuable insights into the underlying structure and dynamics of the data, enabling us to identify the principal components and the extent to which each component contributes to the overall variability.

Now, let us consider a matrix-valued time series

X_{t} = (X_{j t s})

of size

J \times S

. Suppose the objective is to classify the column time series variables. To achieve this, we employ the row-wise approach as described. Initially, we calculate the

J \times J

row-wise cross-autocovariance (or cross-autocorrelation) matrix at lag k, denoted as

V_{r} (k)

(or

ρ_{r} (k))

, based on the J row time-series variables. The elements of

V_{r} (k)

and

ρ_{r} (k)

correspond to the cross-autocovariance (

{\hat{γ}}_{j j^{'}} (k)

) and cross-autocorrelation (

{\hat{ρ}}_{j j^{'}} (k)

) functions, respectively, obtained using Equations (1) and (3). Next, by extending the computation of principal components from Equation (18), we obtain the principal component analysis (PCA) results for each time observation (

t = 1, \dots, T

) of each column series (

s = 1, \dots, S

) based on the row-wise cross-autocovariance functions. The PCA is computed as follows:

P C_{ν} (s, t) = w_{ν 1} X_{s t 1} + \dots + w_{ν J} X_{s t J}, ν = 1, \dots, J,

(19)

where

λ = (λ_{1}, \dots, λ_{J})

and

w_{ν} = (w_{ν 1}, \dots, w_{ν J)}

represent the eigenvalues and eigenvectors, respectively, of the matrix

{\hat{ρ}}_{r} (k)

. The

ν

th principal component of the observation

X_{s t} = {X_{s t j} j = 1, \dots, J}

is denoted as

P C_{ν} (s, t)

, with

s = 1, \dots, S

and

t = 1, \dots, T

. Note that when the original data are time series, the resulting principal components are also “time series” data. Consequently, instead of a two-dimensional plot typically used for non-time series multivariate data (

P C_{ν_{1}} \times P C_{ν_{2}}

), we now have a three-dimensional plot (

P C_{ν_{1}} \times P C_{ν_{2}} \times time

). This is demonstrated in Section 5, illustrating the discussed application.

Since cross-autocorrelation matrices are generally non-symmetric, they may not always be positive definite. In situations where negative eigenvalues are encountered, it is advisable to employ the approach proposed by [58,59]. This approach involves setting any negative eigenvalues to zero, which helps ensure that the resulting matrices are valid and suitable for further analysis. By applying this adjustment, potential issues arising from non-positive definiteness can be effectively addressed. For more detailed information, additional insights can be found in the work of Samadi et al. [56].

In high-dimensional settings, the challenges arise from the growing dimensions of matrix time series, characterized by the potential growth rate of J and S. This phenomenon is commonly known as the “curse of dimensionality”, and poses unique challenges, such as sparsity in covariance or correlation matrices. To address these challenges, alternative techniques, such as sparse PCA ([60,61,62], among others), can be extended to accommodate high-dimensional matrix-valued time series.

3.2. Interpretation—Correlations Circles

The interpretation aids commonly used in standard principal component analyses can be directly applied in this context. For instance, the proportion of the total variance explained by the

ν

th principal component is given by

λ_{ν} / \sum_{ν = 1}^{J} λ_{ν}

. Therefore, it is still useful to examine the cumulative sum of these eigenvalues (

λ_{ν}

) and consider the scree plot to determine the number of principal components (Q) that should be analyzed. Similarly, the practice of plotting two principal components (e.g.,

P C_{1}

versus

P C_{2}

)) to identify potential patterns in the data remains valid. However, due to the inclusion of the time component, these plots would now be three-dimensional, allowing for a comprehensive visualization of the data structure. An example of such a three-dimensional plot can be observed in Section 5, illustrating the discussed application.

Correlations between a specific variable

X_{j}

and a particular principal component

P C_{ν}

can be computed to create correlation circles. These circles visually tell us how variables are correlated with the principal components. We will discuss three possibilities for constructing these correlation circles. The following descriptions assume that the variance–covariance matrix utilized corresponds to the total cross-autocorrelation functions as defined in Equation (3), denoted as

\hat{ρ} (k)

. However, it is important to note that the adaptation of these techniques to other cross-autocorrelation matrices, such as within-cross-autocorrelation or class-cross-autocorrelation functions, can be easily accomplished.

(i): Pearson product moment correlation
The Pearson product moment covariance function between $X_{j}$ and $P C_{ν}$ is defined as

$C o v_{1} (X_{j}, P C_{ν}) = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{t = 1}^{T} (X_{s t j} - {\bar{X}}_{j}) (P C_{ν} (s, t) - {\bar{P C}}_{ν})$

(20)

where ${\bar{X}}_{j}$ , $j = 1, \dots, J$ , is as defined in Equation (3) and

$\bar{P C_{ν}} = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{t = 1}^{T} P C_{ν} (s, t), ν = 1, \dots, J,$

(21)

where the principal components $P C_{ν} (s, t)$ are calculated from Equation (19). Hence, the Pearson correlation function between $X_{j}$ and $P C_{ν}$ is

$C o r r_{1} (X_{j}, P C_{ν}) = C o v_{1} (X_{j}, P C_{ν}) / ({\hat{σ}}_{j} {\hat{σ}}_{ν})$

(22)

where ${\hat{σ}}_{j}^{2}, {\hat{σ}}_{ν}^{2}$ are the sample variances of $X_{j}$ and $P C_{ν}$ , respectively, given by

${\hat{σ}}_{j}^{2} = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{t = 1}^{T} {(X_{s t j} - {\bar{X}}_{j})}^{2}, j = 1, \dots, J,$

(23)

${\hat{σ}}_{ν}^{2} = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{t = 1}^{T} {(P C_{ν} (s, t) - {\bar{P C}}_{ν})}^{2}, ν = 1, \dots, J .$

(24)
(ii): Moran correlation
In the context of spatial neighbors, Ref. [63] introduced an index to measure spatial correlations. The Moran correlation is commonly employed to measure the correlation between variables while considering a predefined spatial, geographical, or temporal structure. Extending this concept to time series and focusing the correlation measure on lagged observations (i.e., neighbors), we refer to it as Moran’s correlation measure. To define the Moran covariance function, we consider $j = 1, \dots, J,$ $ν = 1, \dots, J$ ,

$C o v_{2} (X_{j}, P C_{ν}) = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{(i, i^{'})} (X_{s i j} - X_{s i^{'} j}) (P C_{ν} (s, i) - P C_{ν} (s, i^{'}))$

(25)

where $(i, i^{'})$ represents the set of neighboring observations. For a lag of k, $i^{'} = i + k$ , meaning that the neighbors are observations $X_{s t j}$ (and similarly for principal components $P C_{ν} (s, t)$ ) separated by k units.
Then, the Moran correlation between the variable $X_{j}$ and principal component $P C_{ν}$ is calculated as

$C o r r_{2} (X_{j}, P C_{ν}) = \frac{C o v_{2} (X_{j}, P C_{ν})}{{C o v_{2} (X_{j}, X_{j}) C o v_{2} (P C_{ν}, P C_{ν})}^{1 / 2}}$

(26)

where

$C o v_{2} (X_{j}, X_{j}) = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{(i, i^{'})} {(X_{s i j} - X_{s i^{'} j})}^{2}, j = 1, \dots, J,$

(27)

$C o v_{2} (P C_{ν}, P C_{ν}) = \frac{1}{T S} \sum_{s = 1}^{S} \sum_{(i, i^{'})} {(P C_{ν} (s, i) - P C_{ν} (s, i^{'}))}^{2}, ν = 1, \dots, J .$

(28)
(iii): Loadings based dependence
In a standard principal component analysis of non-time-series data, the correlation or measure of loadings-based dependence between $X_{j}$ and $P C_{ν}$ is given by

$C o r r_{3} (X_{j}, P C_{ν}) = w_{ν j} {λ_{ν} σ_{j}^{2}}^{1 / 2}, j = 1, \dots, J, ν = 1, \dots, J .$

(29)

In this equation, $σ_{j}^{2}$ is estimated using ${\hat{σ}}_{j}^{2}$ as defined in Equation (23). The expression of Equation (29) provides the contribution or the weight of each original variable to the calculation of the principal component. This weight can be interpreted as a measure of dependence. The same definition applies directly to time-series data, although the eigenvectors and eigenvalues indirectly incorporate the time structure through the variance–covariance matrix ${\hat{Γ}}_{r} (k)$ .

The choice of correlation circles depends on the specific characteristics and objectives of the analysis. The Pearson correlation is suitable for capturing linear relationships, but it may not be effective in capturing non-linear dependencies. The Moran correlation is appropriate for spatially dependent data, and it can detect clustering and spatial heterogeneity in the data. The loadings-based dependence captures the dependence between variables based on their contribution to the principal components and provides a comprehensive view of the underlying structure and variability of the data. Selecting the most appropriate correlation measure depends on the nature of the data, the presence of spatial patterns, and the type of relationships that aim to be captured.

4. Simulation Studies

In this section, we aim to assess and evaluate the performance of the proposed methodology by conducting two simulation studies. In the first example, we consider a simulated dataset consisting of matrix-valued time series with a dimension of

(J = 5) \times (S = 11)

. The dataset in this example is generated based on four distinct classes denoted as

C_{1}, \dots, C_{4}

(

H = 4

). Each class is characterized by a matrix-valued time series of dimension

(J = 5) \times s_{h}

,

h = 1, \dots, 4

, and the total number of series is

S = \sum_{h = 1}^{4} s_{h} = 11

. To generate the data for each class, we employ a vector autoregressive moving average (VARMA) model. Specifically, we apply the vec(·) transformation to the matrix variate, which follows a VARMA(

p, q

) process. The column series (

s = 1, 2, 3

) are generated using a VARMA(

1, 0

) process, column series (

s = 4, 5, 6

) are generated using a VARMA(

0, 1

) process, column series (

s = 7, 8, 9

) are generated using a VARMA(

1, 1

) process, and column series (

s = 10, 11

) are generated using a VARMA(

2, 0

) process. By simulating the data according to these specified models, we can effectively create distinct classes with varying temporal dependencies. This allows us to evaluate the performance of our methodology under different scenarios and assess its ability to accurately classify the matrix-valued time-series data.

To generate coefficient matrices for each class of series, we employ a procedure that involves generating random eigenvalues and random orthogonal matrices. This process ensures the uniqueness of the coefficient matrices within each class. Specifically, the eigenvalues for each class are generated from a distinctive distribution, allowing some differentiation between the classes. Additionally, we impose certain mild constraints to ensure the stability of the time series. We employ

{A = PDP}^{⊤}

to generate the coefficient matrix

A

(say), where

D

is a diagonal matrix consisting of the eigenvalues, and

P

is a randomly generated orthogonal matrix. By applying this equation, we create coefficient matrices that exhibit the desired eigenvalue structure. Furthermore, each class of the series possesses a distinct covariance matrix. We generate these covariance matrices randomly, ensuring that they are positive definite. As a result, all

H = 4

classes of time series have unique covariance matrices that allow for distinct patterns and structures within each class of time series.

Plots of each series based on the row variables are presented in Figure 1. In these plots, the series is visually distinguished using line types and colors, which correspond to the specific model processes employed during the simulation. This coding scheme allows for clear identification and comparison of the different series within the generated data.

The proposed methodology was applied by first calculating the cross-autocorrelation matrices for lag

k = 1

based on the total variations defined in Equation (3), and then the principal components were determined over time using Equation (19). The plots of the projections of the resultant

P C_{1} \times P C_{2} \times

time points onto the

P C_{1} \times P C_{2}

space are shown in Figure 2a. The series are line type, color, and symbol coded according to the model process used in the simulation. The groups of time series corresponding to the different models are clearly identified. For example, column series

s = 1, 2, 3

, VARMA(1,0), in black, full lines and coded by ∘ and ‘lty=1’, column series

s = 4, 5, 6

, VARMA(0,1), in red Δ with

- \cdot -

lines (‘lty = 4’), column series

s = 7, 8, 9

, VARMA(1,1), in green ∗ with

\cdot \cdot \cdot

lines (‘lty = 3’), and column series

s = 10, 11

, VARMA(2, 0), in blue □ with

- - -

lines (‘lty = 8’). Additionally, it is evident from Figure 2 that column series

s = 1, \dots, 6

exhibit noticeable differences compared to column series

s = 7, \dots, 11

.

These results align with the information presented in Figure 1, where the VARMA(1, 0) and VARMA(0, 1) processes appear more similar compared to the VARMA(1, 1) and VARMA(2, 0) processes, with the latter two also exhibiting some degree of similarity. Furthermore, when comparing

P C_{1}

with the other

P C_{ν}

, where

ν \neq 2

, similar patterns emerge. Notably, the patterns observed in the

P C_{1} \times P C_{3}

space (depicted in Figure 2b) are particularly distinct, indicating that the first two principal components adequately capture the essential characteristics of the data.

In the second example, we provide a demonstration of the foregoing theory using another simulated dataset. This dataset comprises matrix-valued time series with a dimension of

(J = 5) \times (S = 12)

and exhibits more complex structures. Similar to the first example, the dataset consists of four distinct classes (

H = 4

) denoted as

C_{1}, \dots, C_{4}

. Each class is characterized by a matrix-valued time series of dimension

(J = 5) \times s_{h}

,

h = 1, \dots, 4

, with

\sum_{h = 1}^{4} s_{h} = 12

. To generate the dataset, we employ the vec(·) operation to convert the matrix-variate series into vector variables for each class of series. Particularly, the column series

s = 1, 2, 3

are generated by a classical VAR(1) model, while the column series

s = 4, 5, 6

are generated by a threshold VAR process of order one, denoted as TVAR(

1, 1

) (refer to Equation (30)). Furthermore, the column series

s = 7, 8, 9

are generated by a vector error correction model (VECM) process with

Ψ = 0

(refer to Equation (31)) and are labeled as VECM1. Additionally, the column series

s = 10, 11, 12

are also generated using a VECM process but with a different structure (

Ψ \neq 0

in Equation (31)), and are labeled as VECM2. The key question we seek to address is whether the proposed methodology can effectively separate and cluster the series based solely on the available data for these S column time series.

The TVAR(

1, 1

) model describes a q-dimensional time series process

X_{t}

according to the following equation:

X_{t} = \{\begin{matrix} μ_{1} + Φ_{1} X_{t - 1} + ε_{t 1}, & if z_{t} \leq γ, \\ μ_{2} + Φ_{2} X_{t - 1} + ε_{t 2}, & if z_{t} > γ, \end{matrix}

(30)

where

z_{t}

is the threshold variable and

γ

is the threshold value. Moreover,

Φ_{i}, i = 1, 2

, are coefficient matrices of dimension

q \times q

, and

ε_{t i} \sim N_{q} (0, Σ_{i}), i = 1, 2

. See [64] for the details.

On the other hand, the VECM process characterizes a q-dimensional time-series process

X_{t}

as follows:

\nabla X_{t} = μ + Π X_{t - 1} + Ψ \nabla X_{t - 1} + ε_{t}

(31)

where

Π

and

Ψ

are coefficient matrices of dimension

q \times q

, and

\nabla X_{t} = X_{t} - X_{t - 1}

. Moreover,

ε_{t}

are generated from ∼

N_{q} (0, I)

. See [65].

Figure 3 displays the plot of all column series by the row variables. The series are differentiated by line type, color, and symbol, indicating the model process used in the simulation. The black solid lines (‘lty = 1’) represent column series s = 1–3 generated by the VAR(1) model, the red dashed lines (‘lty = 4’) correspond to column series s = 4–6, generated by the TVAR(

1, 1

) model, the green dotted lines (‘lty = 3’) represent column series s = 7–9, generated by the VECM1 model, and finally the blue dot-dash-dot lines (‘lty = 8’) depict the column series s = 10–12, generated by the VECM2 model. Based solely on these plots, it is evident that classifying all four groups of series is not possible. For instance, when focusing on the plots based on the row variable

X_{1}

in Figure 3, only the column series with the blue colors appear distinct from the others. Similarly, for the plots based on

X_{2}

, the red series seems different from the remaining series. Although there is a better sense of differentiation between the column series with different colors when observing the plots based on

X_{3}

, there is still some overlap between the blue, black, and red series. Furthermore, there are no discernible clues for distinguishing different series when considering the plots based on

X_{4}

and

X_{5}

. However, our proposed PCA methodology successfully classifies all four types of series into distinct clusters as demonstrated in Figure 4.

The methodology involves calculating the cross-autocorrelation matrices for lag

k = 1

based on the total variations of Equation (3) and then deriving the principal components over time using Equation (19). Figure 4a displays the projections of the resulting

P C_{1} \times P C_{2} \times

time points onto the

P C_{1} \times P C_{2}

space, color-coded identically to the time series depicted in Figure 3. Groups of time series corresponding to the different models are clearly identified. Additionally, Figure 4b presents the corresponding plots projected onto the

P C_{1} \times P C_{3}

space, further highlighting the distinct clusters of the different series.

Table 1 presents a summary of the total and within variations at lag one for the second simulated data. Specifically, Table 1a displays the eigenvalues along with the percentage of variation explained by each principal component

P C_{ν}

,

ν = 1, \dots, 5,

as well as the cumulative variation percentage of the simulated data. From the table, we observe that the first two principal components account for 73.6% of the total variation, while the first three components explain 91.8% of the total variation. It is worth noting that the sum of the

\sum_{ν} λ_{ν}

equals 4.64, which is less than the dimensionality J = 5. This is due to the calculation of the cross-autocorrelation matrix using lagged values.

In both examples, data were simulated for time series of length

T = 100

. Similar results were obtained for

T = 50

and

T = 500

(plots not shown), which further validate the robustness of the methodology across varying time series lengths. The consistency of the results across different lengths confirms the applicability of the methodology for both short-term and long-term time-series analysis.

5. Illustration

5.1. Auslan Sign Language Data

The methodology is applied to a matrix-valued time-series dataset capturing Auslan (Australian Sign Language) gestures. The dataset consists of measurements recorded from a fitted glove (representing the row variables), as different words are signed (representing column variables). The complete dataset, including additional words, series, and variables not utilized in our analysis, can be accessed on the webpage (https://archive.ics.uci.edu/ml/datasets/Australian+Sign+Language+signs, accessed on 20 April 2023) or from the UCI KDD Archive. Our analysis focuses on matrix-valued time series of dimension

(J = 8) \times (S = 24)

and of length

T = 68

. The row variables include

X_{1} = X,

X_{2} = Y, and X_{3} = Z

, (signed in sign language in three-dimensional space (

X, Y, Z

)),

X_{4} = roll (of palm as it moved),

X_{5} = thumb bend,

X_{6} = forefinger bend, X_{7} =

index finger bend,

X_{8} =

ring finger bend (with the bends in

X_{5}, \dots, X_{8}

going from straight to fully bent), for four different words (

H = 4

):

1 \equiv

know (where indicated in relevant figures, this word is represented by the symbol black ∘ and full line (‘lty = 1’) ),

2 \equiv

maybe (red Δ and line

- \cdot -

(‘lty = 4’))

3 \equiv

shop (green ∗ and line

\cdot \cdot \cdot

(‘lty = 3’)), and

4 \equiv

yes (blue □ and line

- - -

(‘lty = 8’)). Each word was spoken once by each of the six “speakers”, resulting in a total of

S = 24

column time series, see Figure 5.

The objective of our analysis was to explore whether different speakers exhibited similar expressions for a given word. For instance, we investigated if there were similarities in the gestures used by different speakers when saying the word “shop”, as depicted by the green set of six series in the lower-right portion of Figure 6. Conversely, we also examined instances where there were noticeable differences across speakers, which could pose challenges for comprehending the intended meaning of the gestures. The application of the proposed row-wise matrix PCA approach is suitable for addressing these research objectives. More complete descriptive details of the variables and the experiment can be found in [66,67].

Figure 5 presents plots of each variable over time, with the color-coded representation indicating the word spoken. A quick examination reveals potential differences and noticeable variations across the series as the spoken word changes. Specifically, the plots for

X_{1}

and

X_{2}

, which represent the glove position in the

(X, Y)

plane, exhibit discernible differences. Conversely, certain features, such as

X_{1}

and

X_{2}

, demonstrate similar behaviors for the same word, while others, such as

X_{4}

, exhibit dissimilar patterns, even for the same word. In addition, we observe that some variables, including

X_{3}, X_{4}

, and

X_{5}

, exhibit the presence of outlier values. However, the subsequent analysis conducted in Section 5.2 reveals that these specific variables do not significantly contribute to the principal component results, thereby minimizing their impact on the overall findings.

Our analysis delved into the similarities and differences in gesture expressions among speakers in Auslan using the row-wise PCA approach of the matrix-valued time-series data. By applying the proposed methodology, we aimed to gain insights into the patterns and variations in sign language gestures, contributing to a better understanding of communication dynamics in sign language contexts.

5.2. Analysis Based on Total Variations

Consider first the principal component analysis, which uses the cross-autocorrelation matrix at lag k, incorporating the cross-autocovariance based on the total variations given in Equation (3). Since

k = 0

corresponds to no lag-time dependencies, in this analysis, we specifically focus on the first-order time dependence, where

k = 1

. The resulting row cross-autocorrelation matrix, denoted as

{\hat{ρ}}_{r} (1)

, is presented in Table 2. It is important to note that, unlike the case when

k = 0

, the matrix does not contain 1’s along the diagonal. This absence of 1’s in the diagonal stems from the fact that these represent lagged cross-autocorrelations.

By calculating the eigenvectors of

{\hat{ρ}}_{r} (1)

and substituting them into Equation (19), we obtain the principal components

P C_{ν} (s, t), ν = 1, \dots, J

, for each column series

s = 1, \dots, S

, evaluated at each time

t = 1, \dots, T

. Figure 6 displays a plot of the first and second principal components over time, revealing the temporal dependencies. However, these plots partially obscure the underlying patterns. To provide a clearer view of the distinct patterns between series, Figure 7 depicts the first and second principal components for all series and times in a 2-dimensional

P C_{1} \times P C_{2}

plane, disregarding the time dimension. This projection effectively highlights the unique patterns present in the principal component time series, revealing both the differences and similarities across the series.

The first observation from these plots is that all six speakers tend to produce similar patterns for a given word, indicating coherence in the results. Therefore, we conclude that series associated with the same word represent consistent speech movements. Furthermore, certain words tend to follow similar patterns. As depicted in Figure 6 and Figure 7, there is some overlap between the patterns of “know” and “yes”. On the other hand, the patterns of “shop” and “maybe” exhibit distinct characteristics.

Table 3a gives the eigenvalues together with the percent of that variation explained by each principal component

P C_{ν}

,

ν = 1, \dots, 8,

and the cumulative variation percentage. Therefore, we see that 78.6% of the variation is explained by the first four principal components, and 86.4% by the first five. Here,

\sum_{ν} λ_{ν} = 7.52 < J = 8

since the cross-autocorrelation matrix is calculated using lagged values.

The Pearson correlation functions

C o r r_{1} (X_{j}, P C_{ν})

between the variable

X_{j}

,

j = 1, \dots, J,

and the principal component

P C_{ν}

,

ν = 1, \dots, 8,

based on the Pearson product moments given in Equation (22) are shown in Table 4a. The corresponding correlations circle, for

P C_{1} \times P C_{2}

, is plotted in Figure 8. From these values, we see that the variables

X_{6}, X_{1},

and

X_{8}

(with correlations 0.799, 0.653, and −0.700, respectively) contribute the most to the

P C_{1}

values, while the variables

X_{2}

,

X_{3}

and

X_{4}

(with correlations 0.028, −0.040 and −0.004, respectively) contribute barely. That is, the position of the hand in the X-axis in space (

X_{1}

) and the amount of bend in the first two fingers (fore- and ring-fingers) are important variables. It is the variable

X_{2}

(the Y-position in space) that dominates the values of

P C_{2}

with a correlation coefficient of 0.855. Notice that the Z-position in space (

X_{3}

) is not at all important (except marginally for the sixth principal component); likewise, the variable

X_{4}

(palm roll) contributes very little except for its contribution to the third principal component. Collectively, it is the hand position in the X-Y plane in space and the movement of the forefinger (

X_{6}

) and the ring finger (

X_{8}

) that are the most important variables.

The Moran correlations

C o r r_{2} (X_{j}, P C_{ν}), j, ν = 1, \dots, 8,

calculated from Equation (26) are shown in Table 4b; the so-called loadings-based dependence measures calculated from Equation (29) are in Table 4c. Comparisons with Pearson’s correlations show that the four variables (

X_{1}, X_{6}, X_{7}, X_{8}

) are the most important determining factors. In addition, the variable

X_{2}

is strongly positive with the second principal component for all three measures.

Finally, plots of

P C_{ν_{1}}

by

P C_{ν_{2}}

over time, for

ν_{1}, ν_{2} \neq 1, 2

, can also be executed. These provide no further substantial information than those discussed thus far for

ν_{1} = 1, ν_{2} = 2

, with one exception. The exception is the plot of

P C_{1}

versus

P C_{3}

against time, shown in Figure 9. A feature of this plot is the separation of all four words. In particular, the words “know” and “yes” no longer have overlapping patterns but are well separated. Further, we note from Table 4 that the principal component

P C_{3}

is relatively highly correlated with the variables

X_{4}

= roll of the palm and

X_{5}

= thumb bend for all three correlation functions. From the

X_{4}

and

X_{5}

plots in Figure 4, we observe that these values do not vary very much within each series but that the series for “know” and “yes” are distinct (especially for

X_{5}

, being at the top and bottom of the plots, respectively). However, since

P C_{3}

explained only 16.8% of the overall variation (see Table 3a), too much importance should not be placed on this; rather it is an artifact of the nature of the role played by the roll of the palm and thumb bend in this particular dataset. Nevertheless, distinctions between all four words are in fact revealed by all these three principal components.

5.3. Within Variations and $k \neq 1$

5.3.1. Principal Components Based on within Variations

Results can be obtained for principal components when using the cross-autocorrelations based on the within variations of Equation (7). Again, take lag

k = 1

. The resulting eigenvalues are shown in Table 3b, along with the percent overall variation explained by a given principal component and the corresponding cumulative percent variation. Thus, the first four and first five principal components explain 70.5% and 79.9% of the variation, respectively; these are lower than the respective 78.6% and 86.4% obtained for the analysis based on the total variations. Additionally, we observe that

\sum λ_{ν} = 6.71

, which is less than the

7.52

value obtained when all the variations were used in the analysis. These lower variations simply reflect the fact that not all the variations in the data are being used, whereas all this information is used in the total variation results.

Since the eigenvalues differ, so do the principal component values differ from those discussed in Section 5.2. The plots of the first two principal components over time can be determined and are shown in Figure 10, while the projection onto the

P C_{1} \times P C_{2}

plane is shown in Figure 11; these can be compared with Figure 6 and Figure 7, respectively. In this case, we observe that the word “maybe” stands alone, as before. The other three words are similar to each other relative to

P C_{1}

but differ relative to

P C_{2}

. However, when compared with Figure 6 and Figure 7 based on total variations, the word patterns are not as sharply defined. This highlights the importance of an analysis based on the total variations rather than one on the within variations alone, which are just partial variations.

The correlations between principal components and variables for the analysis based on the within variations are shown in Table 5. In this case, we observe that it is the

X_{2}

variable (and to a lesser extent,

X_{1}

) that is highly correlated with the first principal component

P C_{1}

. Recall, from Table 4 and the discussion in Section 5.2, that this variable

X_{2}

has little impact on

P C_{1}

but considerable influence on

P C_{2}

when using the total variations. The within variations for

P C_{2}

correlations suggest that either

X_{1}

or

X_{7}

, depending on the type of correlation circle considered (see Table 5), have some influence, though the results are not overly strong.

In general, the within-variation results provide a different perspective, as they do not utilize all the variations present in the data. Consequently, these results are not as conclusive as those based on the total variations inherent in the data. However, they still offer valuable additional insights. In the case of these data, the within-variation analysis confirms the importance of

X_{2}

as a significant variable. This finding supports the overall conclusion that the variables

X_{1}, X_{2}, X_{6}

, and

X_{8}

play the most crucial role in determining the separation of the spoken words, while the other variables are comparatively less pertinent.

5.3.2. Lag $k \neq 1$

By changing the lag values from

k = 0

to

k = 1

and beyond, the eigenvalues of the corresponding cross-autocorrelation matrices exhibit varying magnitudes, both for total and within variation matrices. As k increases, the cumulative variation explained by the resulting principal components also increases more rapidly. For example, the first five principal components based on the total variations of Equation (3) accounted for 86.07%, 86.43%, 87.40%, and 89.27% for

k = 0, 1, 2, 3,

respectively, of the overall variation; those based on the within variations of Equation (7) accounted for 79.17%, 79.88%, 82.92% and 85.42% for

k = 0, 1, 2, 3,

respectively. The overall variance

σ^{2}

decreased as k increased, with

σ^{2}

taking values

8 (= J)

,

7.52

,

6.88

, and

6.50

when using total variations, for

k = 0, 1, 2, 3,

respectively.

Therefore, the actual values of the principal components varied over time for different values of k. Nevertheless, the fundamental patterns observed in the analysis of results based on the total variations and

k = 1

(Section 5.2) remained consistent. If higher values of k had produced different clusterings, say

k = 2

, it would have indicated that the time series had second-order time dependencies, which must be an essential aspect of any analysis.

6. Conclusions

In this study, we focused on exploring and uncovering the underlying structure of matrix-valued time-series data. Specifically, we extended the use of standard variance–covariance matrices for non-time-series data in principal component analyses to cross-autocorrelation matrices at time lags.

k = 1, 2, \dots

, for matrix time series. This extension allowed us to capture the time dependencies present in matrix-valued time-series data by utilizing cross-autocorrelation functions. We introduced row-wise and column-wise methodologies that utilize cross-autocorrelation functions to retain the time dependencies inherent in time-series data. These methods are particularly valuable when the objective is to cluster the column or row variables of a matrix time series. By employing these methodologies, we obtained principal component time series, enabling us to visualize and compare the data using principal component plots that incorporate the additional dimension of time. Our approach is non-parametric, eliminating the need for explicit hypotheses or an a priori model, such as an autoregressive model or probabilistic structure, for time-series distributions. This allows for greater flexibility and broader applicability in analyzing diverse types of time-series data.

While our focus in this work was on comparing column (or row) series using cross-autocorrelation functions at the same lag values, it is possible that some series may exhibit similarities at different lag values. For instance, column series

s_{1}

at lag k may resemble series

s_{2}

at lag

k^{'}

, where

k \neq k^{'}

. In such cases, the entities

{\hat{γ}}_{j j^{'}} (k)

in Equation (3) can be replaced by cross-autocovariances

{\hat{γ}}_{j j^{'}} (k, k^{'})

between variable

X_{j}

at lag k and variable

X_{j^{'}}

at lag

k^{'}

. This flexibility allows for further exploration and analysis of complex time-series relationships.

In summary, our findings demonstrate the efficacy of the proposed methodology in uncovering the main structure of matrix-valued time-series data. By leveraging the strengths of principal component analysis and cross-autocorrelation functions, we can gain valuable insights into the underlying patterns and relationships within complex time series data.

Author Contributions

All authors contributed equally. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data set can be found at https://archive.ics.uci.edu/ml/datasets/Australian+Sign+Language+signs (accessed on 20 April 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Lin, J.; Michailidis, G. Regularized estimation and testing for high-dimensional multi- block vector-autoregressive models. J. Mach. Learn. Res. 2017, 18, 4188–4236. [Google Scholar]
Mills, T.C. The Econometric Modelling of Financial Time Series; Cambridge University Press: Cambridge, UK, 1993. [Google Scholar]
Samadi, S.Y.; Billard, L. Analysis of dependent data aggregated into intervals. J. Multivar. Anal. 2021, 186, 104817. [Google Scholar] [CrossRef]
Samadi, S.Y.; Herath, H.M.W.B. Reduced-rank envelope vector autoregressive models. 2023; preprint. [Google Scholar]
Seth, A.K.; Barrett, A.B.; Barnett, L. Granger causality analysis in neuroscience and neuroimaging. J. Neurosci. 2015, 35, 3293–3297. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Li, L.; Zhu, H. Tensor regression with applications in neuroimaging data analysis. J. Am. Stat. Assoc. 2013, 108, 540–552. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liao, T.W. Clustering of time series—A survey. Pattern Recognit. 2005, 38, 1857–1874. [Google Scholar] [CrossRef]
Košmelj, K.; Batagelj, V. Cross-sectional approach for clustering time varying data. J. Classif. 1990, 7, 99–109. [Google Scholar] [CrossRef]
Goutte, C.; Toft, P.; Rostrup, E. On clustering fMRI time series. Neuroimage 1999, 9, 298–310. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Policker, S.; Geva, A.B. Nonstationary time series analysis by temporal clustering. IEEE Trans. Syst. Man Cybernet-B Cybernet 2000, 30, 339–343. [Google Scholar] [CrossRef]
Prekopcsák, Z.; Lemire, D. Time series classification by class-specific Mahalanobis distance measures. Adv. Data Anal. Classif. 2012, 6, 185–200. [Google Scholar]
Harrison, L.; Penny, W.D.; Friston, K. Multivariate autoregressive modeling of fMRI time series. Neuroimage 2003, 19, 1477–1491. [Google Scholar] [CrossRef]
Maharaj, E.A. Clusters of time series. J. Classif. 2000, 17, 297–314. [Google Scholar] [CrossRef]
Piccolo, D. A distance measure for classifying ARIMA models. J. Time Ser. Anal. 1990, 11, 153–164. [Google Scholar] [CrossRef]
Alonso, A.M.; Berrendero, J.R.; Hernández, A.; Justel, A. Time series clustering based on forecast densities. Comput. Stat. Data Anal. 2006, 51, 762–776. [Google Scholar] [CrossRef]
Vilar, J.A.; Alonso, A.M.; Vilar, J.M. Non-linear time series clustering based on non-parametric forecast densities. Comput. Stat. Data Anal. 2010, 54, 2850–2865. [Google Scholar] [CrossRef]
Owsley, L.M.D.; Atlas, L.E.; Bernard, G.D. Self-organizing feature maps and hidden Markov models for machine-tool monitoring. IEEE Trans. Signal Process. 1997, 45, 2787–2798. [Google Scholar] [CrossRef]
Douzal-Chouakria, A.; Nagabhushan, P. Adaptive dissimilarity index for measuring time series proximity. Adv. Data Anal. Classif. 2007, 1, 5–21. [Google Scholar] [CrossRef]
Jeong, Y.; Jeong, M.; Omitaomu, O. Weighted dynamic time warping for time series classification. Pattern Recognit. 2011, 44, 2231–2240. [Google Scholar] [CrossRef]
Yu, D.; Yu, X.; Hu, Q.; Liu, J.; Wu, A. Dynamic time warping constraint learning for large margin nearest neighbor classification. Inf. Sci. 2011, 181, 2787–2796. [Google Scholar] [CrossRef]
Liao, T.W. A clustering procedure for exploratory mining of vector time series. Pattern Recognit. 2007, 40, 2550–2562. [Google Scholar] [CrossRef]
Košmelj, K.; Zabkar, V. A methodology for identifying time-trend patterns: An application to the advertising expenditure of 28 European countries in the 1994–2004 period. In Lecture Notes in Computer Science, KI: Advances in Artificial Inteligence; Furbach, U., Ed.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 92–106. [Google Scholar]
Kalpakis, K.; Gada, D.; Puttagunta, V. Distance measures for effective clustering of ARIMA time-series. Data Min. 2001, 1, 273–280. [Google Scholar]
Kakizawa, Y.; Shumway, R.H.; Taniguchi, N. Discrimination and clustering for mulitvariate time series. J. Am. Stat. Assoc. 1998, 93, 328–340. [Google Scholar] [CrossRef]
Shumway, R.H. Time-frequency clustering and discriminant analysis. Stat. Probab. Lett. 2003, 63, 307–314. [Google Scholar] [CrossRef]
Vilar, J.M.; Vilar, J.A.; Pértega, S. Classifying time series data: A nonparametric approach. J. Classif. 2009, 26, 3–28. [Google Scholar] [CrossRef]
Forni, M.; Hallin, M.; Lippi, M.; Reichlin, L. The generalized dynamic factor model: Identification and estimation. Rev. Econ. Stat. 2000, 82, 540–554. [Google Scholar] [CrossRef]
Forni, M.; Lippi, M. The generalized factor model: Representation theory. Econom. Theory 2001, 17, 1113–1141. [Google Scholar] [CrossRef] [Green Version]
Garcia-Escudero, L.A.; Gordaliza, A. A proposal for robust curve clustering. J. Classif. 2005, 22, 185–201. [Google Scholar] [CrossRef]
Hebrail, G.; Hugueney, B.; Lechevallier, Y.; Rossi, F. Exploratory analysis of functional data via clustering and optimal segmentation. Neurocomputing 2010, 73, 1125–1141. [Google Scholar] [CrossRef] [Green Version]
Huzurbazar, S.; Humphrey, N.F. Functional clustering of time series: An insight into length scales in subglacial water flow. Water Resour. Res. 2008, 44, W11420. [Google Scholar] [CrossRef] [Green Version]
Serban, N.; Wasserman, L. CATS: Clustering after transformation and smoothing. J. Am. Stat. Assoc. 2005, 100, 990–999. [Google Scholar] [CrossRef]
MacQueen, J.B. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics; Le Cam, L., Neyman, J., Eds.; University of California Press: Oakland, CA, USA, 1967; pp. 281–297. [Google Scholar]
Ward, J.H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Asoc. 1963, 58, 236–244. [Google Scholar] [CrossRef]
Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; Wiley: New York, NY, USA, 1990. [Google Scholar]
Dunn, J. A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters. J. Cybern. 1974, 3, 32–57. [Google Scholar] [CrossRef]
Beran, J.; Mazzola, G. Visualizing the relationship between time series by hierarchical smoothing models. J. Comput. Graph. Stat. 1999, 8, 213–228. [Google Scholar]
Wismüller, A.; Lange, O.; Dersch, D.R.; Leinsinger, G.L.; Hahn, K.; Pütz, B.; Auer, D. Cluster analysis of biomedical image time series. Int. J. Comput. Vis. 2002, 46, 103–128. [Google Scholar] [CrossRef]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 4th ed.; Holden-Day: San Francisco, CA, USA, 2015. [Google Scholar]
Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods; Springer: New York, NY, USA, 1991. [Google Scholar]
Park, J.H.; Samadi, S.Y. Heteroscedastic modelling via the autoregressive conditional variance subspace. Can. J. Stat. 2014, 42, 423–435. [Google Scholar] [CrossRef]
Park, J.H.; Samadi, S.Y. Dimension Reduction for the Conditional Mean and Variance Functions in Time Series. Scand. J. Stat. 2020, 47, 134–155. [Google Scholar] [CrossRef]
Samadi, S.Y.; Hajebi, M.; Farnoosh, R. A semiparametric approach for modelling multivariate nonlinear time series. Can. J. Stat. 2019, 47, 668–687. [Google Scholar] [CrossRef]
Walden, A.; Serroukh, A. Wavelet Analysis of Matrix-valued Time Series. Proc. Math. Phys. Eng. Sci. 2002, 458, 157–179. [Google Scholar] [CrossRef]
Wang, H.; West, M. Bayesian analysis of Matrix Normal Graphical Models. Biometrika 2009, 96, 821–834. [Google Scholar] [CrossRef] [Green Version]
Samadi, S.Y. Matrix Time Series Analysis. Ph.D. Dissertation, University of Georgia, Athens, GA, USA, 2014. [Google Scholar]
Samadi, S.Y.; Billard, L. Matrix time series models. 2023; preprint. [Google Scholar]
Cryer, J.D.; Chan, K.-S. Time Series Analysis; Springer: New York, NY, USA, 2008. [Google Scholar]
Shumway, R.H.; Stoffer, D.S. Time Series Analysis and Its Applications; Springer: New York, NY, USA, 2011. [Google Scholar]
Whittle, P. On the fitting of multivariate autoregressions, and the approximate canonical factorization of a spectral density matrix. Biometrika 1963, 50, 129–134. [Google Scholar] [CrossRef]
Jones, R.H. Prediction of multivariate time series. J. Appl. Meteorol. 1964, 3, 285–289. [Google Scholar] [CrossRef]
Webb, A. Statistical Pattern Recognition; Hodder Headline Group: London, UK, 1999. [Google Scholar]
Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 7th ed.; Prentice Hall: Hoboken, NJ, USA, 2007. [Google Scholar]
Joliffe, I.T. Principal Component Analysis; Springer: New York, NY, USA, 1986. [Google Scholar]
Anderson, T.W. An Introduction to Multivariate Statistical Analysis, 2nd ed.; John Wiley: New York, NY, USA, 1984. [Google Scholar]
Samadi, S.Y.; Billard, L.; Meshkani, M.R.; Khodadadi, A. Canonical correlation for principal components of time series. Comput. Stat. 2017, 32, 1191–1212. [Google Scholar] [CrossRef]
Billard, L.; Douzal-Chouakria, A.; Samadi, S.Y. An Exploratory Analysis of Multiple Multivariate Time Series. In Proceedings of the 1st International Workshop Advanced Analytics Learning on Temporal Data AALTD 2015, Porto, Portugal, 11 September 2015; Volume 3, pp. 1–8. [Google Scholar]
Jäckel, P. Monte Carlo Methods in Finance; John Wiley: New York, NY, USA, 2002. [Google Scholar]
Rousseeuw, P.; Molenberghs, G. Transformation of non positive semidefnite correlation matrices. Commun. Stat. Theory Methods 1993, 22, 965–984. [Google Scholar] [CrossRef]
Cai, T.T.; Ma, Z.; Wu, Y. Sparse PCA: Optimal rates and adaptive estimation. Ann. Stat. 2013, 41, 3074–3110. [Google Scholar] [CrossRef]
Guan, Y.; Dy, J.G. Sparse probabilistic principal component analysis. J. Mach. Learn. Res. 2009, 5, 185–192. [Google Scholar]
Lu, M.; Huang, J.Z.; Qian, X. Sparse exponential family Principal Component Analysis. Pattern Recognit. 2016, 60, 681–691. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Moran, P.A.P. The interpretation of statistical maps. J. R. Stat. Soc. A 1948, 10, 243–251. [Google Scholar] [CrossRef]
Lo, C.M.; Zivot, E. Threshold cointegration and nonlinear adjustment to the law of one price. Macroecon. Dyn. 2001, 5, 533–576. [Google Scholar] [CrossRef]
Engle, R.F.; Granger, C.W.J. Co-integration and error correction: Representation, estimation and testing. Econometrica 1987, 55, 251–276. [Google Scholar] [CrossRef]
Kadous, M.W. Recognition of Australian Sign Language Using Instrumented Gloves. Bachelor’s Thesis, University of South Wales, Newport, UK, 1995; 136p. [Google Scholar]
Kadous, M.W. Learning comprehensible descriptions and multivariate time series. In Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, 27–30 June 1999; Bratko, I., Dzeroski, S., Eds.; Morgan Kaufmann Publishers: San Fransisco, CA, USA, 1999; pp. 454–463. [Google Scholar]

Figure 1. The first simulated matrix time-series data: row variables

X_{1}, \dots, X_{5}

by time, for all

S = 11

columns series.

Figure 1. The first simulated matrix time-series data: row variables

X_{1}, \dots, X_{5}

by time, for all

S = 11

columns series.

Figure 2. The first simulated data: (a)

P C_{1} \times P C_{2}

, (b)

P C_{1} \times P C_{3}

, All Series, All Times, Lag

k = 1

.

Figure 2. The first simulated data: (a)

P C_{1} \times P C_{2}

, (b)

P C_{1} \times P C_{3}

, All Series, All Times, Lag

k = 1

.

Figure 3. The second simulated matrix time-series data: row variables

X_{1}, \dots, X_{5}

by time, for all

S = 12

columns series.

Figure 3. The second simulated matrix time-series data: row variables

X_{1}, \dots, X_{5}

by time, for all

S = 12

columns series.

Figure 4. The second simulated data: (a)

P C_{1} \times P C_{2}

, (b)

P C_{1} \times P C_{3}

, All Series, All Times, Lag

k = 1

.

Figure 4. The second simulated data: (a)

P C_{1} \times P C_{2}

, (b)

P C_{1} \times P C_{3}

, All Series, All Times, Lag

k = 1

.

Figure 5. Auslan data:

X_{1}, \dots, X_{8}

by Time, All Series.

Figure 5. Auslan data:

X_{1}, \dots, X_{8}

by Time, All Series.

Figure 6. Auslan data:

P C_{1} \times P C_{2}

over Time, All Series, Lag

k = 1

(based on total variations).

Figure 6. Auslan data:

P C_{1} \times P C_{2}

over Time, All Series, Lag

k = 1

(based on total variations).

Figure 7. Auslan data:

P C_{1} \times P C_{2}

, All Series, All Times, Lag

k = 1

(based on total variations).

Figure 7. Auslan data:

P C_{1} \times P C_{2}

, All Series, All Times, Lag

k = 1

(based on total variations).

Figure 8. Auslan data: Pearson circle for

P C_{1}

x

P C_{2}

,

k = 1

.

Figure 8. Auslan data: Pearson circle for

P C_{1}

x

P C_{2}

,

k = 1

.

Figure 9. Auslan data:

P C_{1}

×

P C_{3}

by Time,

k = 1

(based on total variations).

Figure 9. Auslan data:

P C_{1}

×

P C_{3}

by Time,

k = 1

(based on total variations).

Figure 10. Auslan data:

P C_{1} \times P C_{2}

over Time, All Series,

k = 1

(based on within variations).

Figure 10. Auslan data:

P C_{1} \times P C_{2}

over Time, All Series,

k = 1

(based on within variations).

Figure 11. Auslan data:

P C_{1} \times P C_{2}

, All Series, All Times,

k = 1

(based on within variations).

Figure 11. Auslan data:

P C_{1} \times P C_{2}

, All Series, All Times,

k = 1

(based on within variations).

Table 1. Second simulated data—eigenvalues and percent variations.

$ν$	$λ_{ν}$	Percent Variation	Cumulative Variation	$ν$	$λ_{ν}$	Percent Variation	Cumulative Variation
	(a) Total Variations—lag $k = 1$				(b) Within Variations—lag $k = 1$
1	2.31907	0.49939	0.49939	1	1.96330	0.48941	0.48941
2	1.10152	0.23720	0.73659	2	0.79967	0.19934	0.68876
3	0.84684	0.18236	0.91895	3	0.63151	0.15742	0.84619
4	0.28645	0.06168	0.98064	4	0.35461	0.08839	0.93459
5	0.08990	0.01935	1.00000	5	0.26238	0.06540	1.00000
	$\sum = 4.64$	$\sum = 1$			$\sum = 4.01$	$\sum = 1$

Table 2. Auslan data—cross-autocorrelations—

ρ_{r}^{(T)} (k), k = 1

.

Table 2. Auslan data—cross-autocorrelations—

ρ_{r}^{(T)} (k), k = 1

.

	Cross-Autocorrelations $ρ_{{jj}^{'}}^{(T)} (1)$
$X_{j}$	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$
$X_{1}$	0.986	−0.299	−0.010	−0.071	−0.410	0.414	0.205	−0.274
$X_{2}$	−0.330	0.977	0.071	0.244	0.065	0.215	0.418	0.111
$X_{3}$	0.015	0.042	0.849	−0.108	0.238	−0.072	0.253	0.017
$X_{4}$	−0.101	0.247	−0.132	0.820	−0.184	0.028	0.039	0.131
$X_{5}$	−0.413	0.065	0.238	−0.184	0.984	−0.355	0.011	0.092
$X_{6}$	0.403	0.217	−0.071	0.022	−0.353	0.945	0.376	−0.494
$X_{7}$	0.202	0.421	0.250	0.038	0.012	0.383	0.978	−0.033
$X_{8}$	−0.273	0.111	0.016	0.128	0.092	−0.498	−0.032	0.979

Table 3. Auslan data—eigenvalues and percent variations.

$ν$	$λ_{ν}$	Percent Variation	Cumulative Variation	$ν$	$λ_{ν}$	Percent Variation	Cumulative Variation
	(a) Total Variations—lag $k = 1$				(b) Within Variations—lag $k = 1$
1	2.11118	0.28087	0.28087	1	1.92391	0.28656	0.28656
2	1.62876	0.21669	0.49756	2	1.07780	0.16053	0.44709
3	1.26361	0.16811	0.66567	3	0.95661	0.14248	0.58957
4	0.90345	0.12019	0.78587	4	0.77506	0.11544	0.70501
5	0.58972	0.07846	0.86432	5	0.63006	0.09384	0.79885
6	0.48044	0.06392	0.92824	6	0.54432	0.08107	0.87992
7	0.29035	0.03863	0.96687	7	0.44756	0.06666	0.94659
8	0.24902	0.03313	1.00000	8	0.35862	0.05341	1.00000
	$\sum = 7.52$	$\sum = 1$			$\sum = 6.71$	$\sum = 1$

Table 4. Auslan data—correlations

C o r r (X_{j}, P C_{ν}), j = 1, \dots, 8, ν = 1, \dots, 8,

for lag

k = 1

—using total variations.

Table 4. Auslan data—correlations

C o r r (X_{j}, P C_{ν}), j = 1, \dots, 8, ν = 1, \dots, 8,

for lag

k = 1

—using total variations.

$X_{j}$	${PC}_{1}$	${PC}_{2}$	${PC}_{3}$	${PC}_{4}$	${PC}_{5}$	${PC}_{6}$	${PC}_{7}$	${PC}_{8}$
	(a) Pearson Correlation Circle
$X_{1}$	0.653	−0.235	−0.161	−0.216	−0.121	0.083	−0.323	−0.177
$X_{2}$	0.028	0.855	0.273	0.069	0.448	−0.087	0.194	−0.534
$X_{3}$	−0.040	0.190	−0.295	−0.082	0.103	0.249	−0.123	−0.032
$X_{4}$	−0.004	0.203	0.487	−0.124	−0.139	−0.095	0.070	−0.061
$X_{5}$	−0.560	0.211	−0.584	0.287	0.196	0.740	0.320	0.186
$X_{6}$	0.799	0.230	−0.139	0.254	−0.082	−0.078	−0.360	−0.358
$X_{7}$	0.456	0.784	−0.251	−0.275	0.616	0.513	−0.674	−0.325
$X_{8}$	−0.700	0.202	0.589	−0.807	0.690	−0.003	0.416	0.670
	(b) Moran Correlation Circle
$X_{1}$	0.273	−0.263	−0.121	−0.247	−0.023	0.153	0.195	−0.097
$X_{2}$	0.144	0.716	0.211	0.384	0.268	0.011	0.352	−0.323
$X_{3}$	−0.383	−0.507	−0.001	−0.413	−0.449	−0.469	−0.062	−0.064
$X_{4}$	−0.033	0.220	0.797	−0.081	−0.768	0.509	0.009	0.098
$X_{5}$	0.076	0.394	−0.229	0.347	0.286	0.578	0.160	0.161
$X_{6}$	0.587	0.403	−0.040	0.368	0.311	0.282	0.210	0.522
$X_{7}$	0.487	0.735	−0.243	−0.147	0.436	0.635	−0.699	0.000
$X_{8}$	−0.656	0.103	0.416	−0.818	0.347	−0.048	0.314	0.617
	(c) Loadings Based Dependence Circle
$X_{1}$	0.742	−0.310	−0.129	−0.416	0.026	0.210	0.265	−0.169
$X_{2}$	−0.009	0.863	0.249	0.190	0.084	−0.161	0.232	−0.221
$X_{3}$	−0.078	0.290	−0.567	−0.332	−0.465	−0.349	0.057	0.046
$X_{4}$	0.002	0.271	0.641	−0.056	−0.499	0.247	0.002	0.037
$X_{5}$	−0.594	0.240	−0.538	0.293	0.050	0.419	0.175	0.066
$X_{6}$	0.824	0.204	−0.010	0.224	0.064	−0.035	0.135	0.354
$X_{7}$	0.401	0.706	−0.211	−0.272	0.201	0.225	−0.313	−0.021
$X_{8}$	−0.602	0.180	0.344	−0.609	0.265	0.002	0.126	0.195

Table 5. Correlations

C o r r (X_{j}, P C_{ν}), j = 1, \dots, 8, ν = 1, \dots, 8,

for lag

k = 1

—using within variations.

Table 5. Correlations

C o r r (X_{j}, P C_{ν}), j = 1, \dots, 8, ν = 1, \dots, 8,

for lag

k = 1

—using within variations.

$X_{j}$	${PC}_{1}$	${PC}_{2}$	${PC}_{3}$	${PC}_{4}$	${PC}_{5}$	${PC}_{6}$	${PC}_{7}$	${PC}_{8}$
	(a) Pearson Correlation Circle
$X_{1}$	0.540	0.620	0.065	0.397	0.232	0.175	−0.307	0.126
$X_{2}$	−0.860	−0.002	0.222	−0.211	0.416	0.406	−0.127	−0.849
$X_{3}$	−0.083	0.090	−0.172	−0.349	0.270	−0.143	−0.030	−0.115
$X_{4}$	−0.266	−0.082	0.305	0.338	0.145	−0.030	0.070	−0.062
$X_{5}$	−0.323	−0.258	−0.741	−0.746	−0.057	−0.523	0.167	−0.384
$X_{6}$	0.019	0.466	0.019	0.329	0.350	0.636	−0.712	−0.129
$X_{7}$	−0.451	0.771	−0.142	−0.407	0.983	0.263	−0.349	−0.287
$X_{8}$	−0.474	0.124	0.519	−0.472	−0.086	−0.695	0.930	0.011
	(b) Moran Correlation Circle
$X_{1}$	0.390	0.268	0.039	0.103	−0.006	−0.151	0.065	−0.243
$X_{2}$	−0.743	0.035	−0.011	0.011	0.000	0.433	−0.225	−0.796
$X_{3}$	0.570	−0.484	0.502	0.048	0.047	−0.345	0.303	0.356
$X_{4}$	−0.302	−0.342	0.146	0.889	0.471	−0.671	0.017	0.267
$X_{5}$	−0.435	0.313	−0.626	−0.197	−0.034	−0.001	−0.187	−0.347
$X_{6}$	−0.515	0.614	−0.217	−0.064	−0.197	0.145	−0.426	−0.127
$X_{7}$	−0.410	0.670	−0.480	−0.299	0.765	0.166	−0.312	0.072
$X_{8}$	−0.299	0.284	0.661	−0.268	−0.102	−0.568	0.901	0.132
	(c) Loadings Based Dependence Circle
$X_{1}$	0.740	0.529	0.069	0.230	0.062	−0.126	0.064	−0.378
$X_{2}$	−0.802	−0.211	0.250	0.036	0.023	0.267	−0.060	−0.398
$X_{3}$	0.245	−0.373	0.513	−0.405	0.335	−0.334	−0.188	−0.033
$X_{4}$	−0.370	−0.224	0.063	0.615	0.302	−0.376	0.002	0.086
$X_{5}$	−0.308	−0.029	−0.654	−0.321	−0.070	−0.329	0.047	−0.180
$X_{6}$	−0.451	0.566	0.158	−0.021	−0.281	−0.051	−0.334	0.098
$X_{7}$	−0.240	0.421	−0.191	−0.203	0.577	0.039	−0.139	0.075
$X_{8}$	−0.425	0.258	0.365	−0.185	−0.075	−0.302	0.522	0.028

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Billard, L.; Douzal-Chouakria, A.; Samadi, S.Y. Exploring Dynamic Structures in Matrix-Valued Time Series via Principal Component Analysis. Axioms 2023, 12, 570. https://doi.org/10.3390/axioms12060570

AMA Style

Billard L, Douzal-Chouakria A, Samadi SY. Exploring Dynamic Structures in Matrix-Valued Time Series via Principal Component Analysis. Axioms. 2023; 12(6):570. https://doi.org/10.3390/axioms12060570

Chicago/Turabian Style

Billard, Lynne, Ahlame Douzal-Chouakria, and S. Yaser Samadi. 2023. "Exploring Dynamic Structures in Matrix-Valued Time Series via Principal Component Analysis" Axioms 12, no. 6: 570. https://doi.org/10.3390/axioms12060570

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Dynamic Structures in Matrix-Valued Time Series via Principal Component Analysis

Abstract

1. Introduction

2. Cross-Autocorrelation Functions

3. Principal Component Analysis

3.1. Principal Components for Time Series

3.2. Interpretation—Correlations Circles

4. Simulation Studies

5. Illustration

5.1. Auslan Sign Language Data

5.2. Analysis Based on Total Variations

5.3. Within Variations and $k \neq 1$

5.3.1. Principal Components Based on within Variations

5.3.2. Lag $k \neq 1$

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Exploring Dynamic Structures in Matrix-Valued Time Series via Principal Component Analysis

Abstract

1. Introduction

2. Cross-Autocorrelation Functions

3. Principal Component Analysis

3.1. Principal Components for Time Series

3.2. Interpretation—Correlations Circles

4. Simulation Studies

5. Illustration

5.1. Auslan Sign Language Data

5.2. Analysis Based on Total Variations

5.3. Within Variations and k ≠ 1

5.3.1. Principal Components Based on within Variations

5.3.2. Lag k ≠ 1

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.3. Within Variations and $k \neq 1$

5.3.2. Lag $k \neq 1$