Statistical Papers
https://doi.org/10.1007/s00362-023-01451-y
REGULAR ARTICLE
Identification of canonical models for vectors of time series:
a subspace approach
Alfredo Garcia-Hiernaux1 · Jose Casals2
· Miguel Jerez1
Received: 16 November 2021 / Revised: 1 March 2023
© The Author(s) 2023
Abstract
We propose a new method to specify linear models for vectors of time series with
some convenient properties. First, it provides a unified modeling approach for single
and multiple time series, as the same decisions are required in both cases. Second, it
is scalable, meaning that it provides a quick preliminary model, which can be refined
in subsequent modeling phases if required. Third, it is optionally automatic, because
the specification depends on a few key parameters which can be determined either
automatically or by human decision. And last, it is parsimonious, as it allows one
to choose and impose a canonical structure by a novel estimation procedure. Several
examples with simulated and real data illustrate its application in practice.
Keywords System identification · Canonical models · Kronecker indices · Subspace
methods · State-space models
1 Introduction
This paper proposes a new method to specify linear models for vectors of time series
which draws on ideas from two different sources: the subspace methods, originated in
the system identification literature, and the time series analysis approach, built on the
seminal work of Box and Jenkins (1970).
Since the pioneer work of Ho and Kalman (1966), the term “system identification”
refers to the use of statistics and algebraic tools to fit a State-Space (SS) model to
Jose Casals and Miguel Jerez have contributed equally to this work.
B
Miguel Jerez
mjerezme@ucm.es
Alfredo Garcia-Hiernaux
agarciah@ucm.es
Jose Casals
jmcasalsc@gmail.com
1
DANAE and ICAE, Universidad Complutense de Madrid, Madrid, Spain
2
ICAE, Universidad Complutense de Madrid, Madrid, Spain
123
A. Garcia-Hiernaux et al.
a dataset without a priori restrictions. Much of the recent work in this area follows
the subspace methods approach: a family of system identification procedures that rely
heavily on canonical correlation and Least Squares (LS) methods. For a survey see,
e.g., Qin (2006).
On the other hand, modern time series analysis is critically influenced by the
classical Box and Jenkins (1970) book and subsequent works. The philosophical contribution of this methodology can be synthesized in three basic principles: (i) “let the
data speak” meaning that the specification of reduced form models should be based on
the observable properties of the sample; (ii) “parameterize with parsimony,” meaning
that models with few parameters should be preferred, all other aspects being equal,
and; (iii) “all the models are wrong, but some are useful,” meaning that all specifications must be considered tentative and subject to a structured process of critical
revision and diagnostic testing. We subscribe these principles and implemented them
in the foundations of our approach.
On these bases, our method is structured in three sequential steps:
Step-1 Determine the dynamic order of the system (or McMillan degree), defined as
the minimum number of dynamic components required to realize the system
output. We do this by using subspace methods to estimate a sequence of SS
models over a predetermined range of orders, and then choosing the best
one by means of Likelihood Ratio (LR) tests and information criteria (ICs).
This simple decision, similar to the VAR order selection, is enough to fit a
preliminary “Step-1 model” to the dataset, which might be overparameterized
but useful to cover simple needs.
Step-2 Determine the dynamic order for each individual time series, which are also
known as Kronecker index (KI). We do this by applying a new subspace algorithm, named SUBEST1, which estimates the parameters of the Luenberger
Canonical Form corresponding to a given vector of KIs. We use this algorithm
to estimate the models corresponding to all the combinations of KIs which
add up to the previously determined system order. The model with the best ICs
(“Step-2 model”) is therefore a canonical representation for the time series,
and will often be more parsimonious than the Step-1 model. In the univariate
case this step is not required, because the system order coincides with the KI
for the single series and the Step-1 model is, therefore, canonical.
Step-3 The third and final step consists in refining the models obtained in previous
steps, for example by computing Maximum-Likelihood Estimates (MLE) or
by pruning non-significant parameters. Such a “Step-3 model” might be more
parsimonious than the previously fitted models but, on the other hand, requires
more effort and human decision-making.
The computational procedures employed to implement the Step-2, as well as the
distinction between Steps 1, 2 and 3, are original contributions to the literature on
canonical modeling, although closely related to the works of Hannan and Deistler
(1988), Deistler et al. (1995), Bauer and Deistler (1999) and Bauer (2005c).
In comparison with mainstream econometric procedures, our proposal has three
specific advantages, as it:
123
Identification of canonical...
1. ...provides a unified modeling approach for single and multiple time series, as the
same model-building process is followed in both cases,
2. ...is optionally automatic, meaning that each model is characterized by a few key
integer parameters to be chosen following a clearly structured process, based on
the use of a limited set of sample statistics; so the modeling decisions can be
adopted automatically or by a human analyst,
3. ...is scalable, meaning that: (i) it provides a possibly crude but statistically adequate
Step-1 model, which may be enough to satisfy simple analytical needs, which (ii)
can be made parsimonious if required, by enforcing a canonical structure in Step-2,
and (iii) additional efficiency can be achieved in Step-3.
The structure of the paper is as follows. Section 2 presents the notation and reviews
some ideas on VARMA and SS systems. It also introduces the subspace methods
and its asymptotic properties. Section 3 outlines the methodology proposed in the
relatively simple case of stationary and non-seasonal time series and Sect. 4 extends
the previous method to nonstationary and/or seasonal series. We do this by adding
to the previous schema two specification phases to determine the number of unit
roots and cointegrating relations, and to identify the dynamic orders of the seasonal
and non-seasonal sub-systems. Sections 5 and 6 illustrate the practical application
of this approach to single and multiple time series, respectively, with benchmark
datasets taken from the literature. Last, Sect. 7 compares the method proposed with
alternative methods, provides some concluding remarks and indicates how to obtain
a free MATLAB code which implements all the algorithms required by our proposal.
2 Notation and previous results
In this Section we present the notation employed and the main previous results upon
which our work is based. We begin by defining VARMA and SS systems. Then, we
introduce the subspace methods and finally discuss the relation between canonical
VARMA models (named echelon) and their equivalent SS representation.
2.1 VARMA models
Much work in applied time series analysis is based on the linear dynamic model:
F̄(B)zt = L̄(B)at ,
(1)
where zt ∈ Rm is observed for t = 1, . . . , T ; at ∈ Rm is an innovation vector such
that at ∼ iid(0, a ); B denotes the backshift operator, such that for any sequence ωt :
B i ωt = ωt−i , i = 0, ±1, ±2, . . . , I and, finally, factors F̄(B) and L̄(B) in (1) are
given by:
p
q
F̄(B) =
(2)
F̄ j B j , L̄(B) =
L̄ j B j .
j=0
j=0
123
A. Garcia-Hiernaux et al.
Model (1–2) is assumed to be left coprime with the roots of F̄(B) and L̄(B) greater
than unity, so zt is stationary and invertible.
If we multiply the left and right-hand side of Eq. (1) by any arbitrary non-defective
matrix we would obtain an observationally equivalent model for zt . Therefore, left
coprimeness is not enough to identify the model. Achieving identifiability requires
additional constraints. For example, normalising F̄0 = L̄0 = I yields the standard
VARMA( p,q) introduced by Quenouille (1957). That is because, for a given transfer
function, several VARMA( p,q) models exists, even under left coprimeness. For more
details, the equivalence classes of VARMA systems is given in Hannan and Deistler
(1988), Theorem 2.2.1.
An interesting alternative to a standard VARMA is the VARMA echelon, or
VARMA E , representation. System (1–2) is in echelon form if, denoting by F̄kl (B) and
L̄ kl (B) the kl-th elements of F̄(B) and L̄(B), respectively, the polynomial factors in
(1) are uniquely defined by:
F̄kk (B) = 1 +
i=1
nk
F̄kl (B) =
L̄ kl (B) =
nk
F̄kk (i)B i , for k = 1, . . . , m.,
F̄kl (i)B i , for k = l.,
(3a)
(3b)
i=n k −n kl +1
nk
L̄ kl (i)B i , with L̄ kl (0) = F̄kl (0) for k, l = 1, . . . , m, ,
(3c)
i=1
where the multi-index α = (n 1 , n 2 , . . . , n m ), previously named KI, represents the
dynamic order of each series, and:
n kl =
min(n k + 1, nl ) for l ≤ k
for l > k
min(n k , nl )
k, l = 1, 2, . . . , m.
(4)
The KIs uniquely define an echelon canonical VARMA form by means of Eqs.
(3a–4); see Hannan and Deistler (1988), Theorem 2.5.1. Broadly speaking, these
indices determine the zeros, ones and free parameters structure of the echelon form.
By canonical, we mean a unique representation (commonly with the minimum number
of parameters)
among the output-equivalent VARMA class. Another property of KIs
is that m
k=1 n k = |α| = n, meaning that the echelon form distributes the dynamic
order between the different time series so that their sum equals the McMillan degree,
denoted by n. Specific details about echelon forms can be found in Hannan and Deistler
(1988), Chapter 2, and Lütkepohl (2005), Definition 12.2.
To clarify the exposition above, the following example compares two equivalent
VARMA representations of the same transfer function with α = (1, 0). We will use
this model in different examples throughout this paper.
123
Identification of canonical...
Example 1: A VARMA(1,1) model vs. its equivalent VARMAE
Consider the left coprime standard VARMA(1,1) model:
(I + F̄1 B)zt = (I + L̄ 1 B)at , at ∼ iid N (0, a ),
(5)
with,
− 0.50 0
F̄1 =
,
0.25 0
1.00 0.40
− 0.30 0.20
.
, a =
L̄ 1 =
0.40 1.00
0.15 −0.10
As α = (1, 0), its equivalent VARMA E form is defined as:
(F0 + F1 B)zt = (L0 + L1 B)at ,
(6)
where,
− 0.30 0.20
− 0.50 0
1 0
F0 = L0 =
,
, L1 =
, F1 =
0
0
0 0
0.50 1
and a coincides with the one in (5), because both models are observationally
equivalent and, therefore, have the same innovations.
This example motivates the following remarks:
Remark 1 It is easy to see the output equivalence of models (5) and (6), as
1
pre-multiplying both sides of (6) by F-1
0 yields (5).
Remark 2 The standard VARMA specification (5) has six coefficients while its equivalent echelon form (6) only has four, being parsimony one of the advantages of canonical
versus non-canonical formulations.
Remark 3 As α = (1, 0) in (6), each scalar in the vector corresponds to the highest
dynamic order for the corresponding series. Therefore, the order of the whole system
is n = 1.
Remark 4 When all the KIs are equal, and only in this case, the standard VARMA
model (1-2) has an echelon structure and, accordingly, is a canonical representation. In
other case, the echelon form has less parameters than the equivalent standard VARMA
which, accordingly, is not canonical.
Remark 5 Models with different KI vectors are not nested in general, so they cannot
be compared using LR tests. Their relative adequacy is often assessed by means of
ICs, see, e.g., Hannan and Kavalieris (1984a); Poskitt (1992); Lütkepohl and Poskitt
(1996).
1 Note that F is unimodular. For a general definition of output-equivalent VARMA systems using
0
unimodular matrices, see Hannan and Deistler (1988), Theorem 2.2.1.
123
A. Garcia-Hiernaux et al.
As the above remarks point out, when the KIs are different, one can generally obtain
the equivalent VARMA representation for any echelon form by pre-multiplying it by
an appropriate unimodular transformation. Nonetheless, Gevers (1985) shows that, in
general, the standard VARMA representation (1–2) implies that the McMillan degree
is m × max{ p, q} and, therefore, only transfer functions whose system order is a
multiple of m can be adequately represented that way. Additional details on the KIs
nesting structure can be found in Hannan and Deistler (1988), Section 2.6. These facts
are fundamental for our paper, and motivate Sects. 3.1 and 3.2.
2.2 State-Space systems
The linear time-invariant SS representation employed in this paper is:
xt+1 = xt + Kat ,
zt = Hxt + at ,
(7a)
(7b)
where zt and at are as in (1–2). We assume that the SS system is:
A1. ...strictly minimum-phase, that is, |λi ( − KH)| < 1, ∀i = 1, ..., n, where λi (.)
denotes the i-th eigenvalue,
A2. ...minimal, which implies that the pair (, H) is observable and (, K) is
controllable, and
A3. ...stable, so: |λi ()| < 1, ∀i = 1, ..., n.
In comparison with mainstream alternatives, model (7a–7b) has a unique error term
affecting both, the state and observation equations. This model is sometimes known
as Prediction Error or Innovations Form (IF).2 We will use the latter denomination
throughout the paper. The importance of the IF lies in the fact that
1. ...it is a general representation, because any SS system with different unobserved
inputs in (7a–7b) can be written in an equivalent IF (see, Hannan and Deistler 1988,
Chapter 1). Additionally, Casals et al. (1999) present an algorithm to compute the
matrices of an IF equivalent to any SS model by solving an exactly determined
system of equations, and
2. ...it has clear computational advantages for likelihood computation and signal
extraction, because its states can be estimated with an uncertainty that converges
to zero under nonrestrictive assumptions, see Casals et al. (2013).
Most of these results are the core of Chapters 1 and 2 in Hannan and Deistler (1988),
and have also been empirically discussed in Casals et al. (2016), Chapters 2, 5 and 9.
2.3 Subspace methods
Subspace methods are used to estimate the parameters in the IF (7a–7b) by applying
generalized LS to a rank-deficient system, see e.g. Qin (2006); Favoreel et al. (2000).
2 More precisely, by IF we refer to the innovations steady-state form, where K and are invariant.
a
123
Identification of canonical...
To do this, we need to write the model (7a–7b) in subspace form by substituting (7b)
into (7a) in at , and solving the resulting recursion. This yields:
t−1
xt = ( − KH)
t−1
x1 +
( − KH)t−1− j Kz j ,
(8)
j=1
where the states at time t depend on its initial value, x1 , as well as the past values of
the output.
Substituting (8) into the observation Eq. (7b) results in:
zt = Ht−1 x1 + H
t−1
t−1− j Ka j + at .
(9)
j=1
Equations (8) and (9) can be written in matrix form as:
X f = Lix X p + Lz Z p ,
Z f = Oi X f + Vi A f ,
(10a)
(10b)
where i is a user-defined index which represents the number of past block rows, p.
For simplicity, we will assume hereafter that the scalars p and f are equal to i.3 For
now, the only requisite for this value is that i > n. This dimensioning implies that L
matrices in (10a) are:
Lx := ( − KH)n×n ,
Lz := Li−1
x K . . . Lx K K
(11)
n×im
(12)
.
Second, given these previous choices, Z f and Z p are block-Hankel matrices whose
⊺
⊺
⊺ ⊺
⊺ ⊺
⊺
⊺
rows are [zt , zt+1 , . . . , zt+ f −1 ] and [zt− p , zt− p+1 , . . . , zt−1 ] , respectively, and
each column is characterized by t = p + 1, . . . , T − f + 1. Therefore, the dimension
of both matrices is im × T − 2i + 1. The block-Hankel matrix A f has the same general structure as Z f , with at instead of zt , while X p := [x1 , x2 , . . . , xT − p− f +1 ] and
X f := [x p+1 , x p+2 , . . . , xT − f +1 ], which have n rows (from assumption A2) and the
same number of columns as Z f , are the sequence of past and future states, respectively.
Finally, matrices Oi and Vi are known functions of the original parameter matrices,
, K and H, see Section 2 of Garcia-Hiernaux et al. (2010), such that:
⊺
⊺
⊺
⊺
Oi := H (H) (H2 ) . . . (Hi−1 )
⊺
i·m×n
,
(13)
3 Along the paper we fix i by rounding log(T ) to the nearest integer. There is not much literature about
optimal values for p and f in short samples. For instance, in terms of forecasting, Garcia-Hiernaux (2011)
shows that one obtains better results (in RMSE measure) when combining the forecasts provided by models
with different values of i, rather than when a single value is used. Also, as subspace methods are not iterative,
one can estimate with a reasonable computational cost all the models for a given sequence of i, and then
choose the value that generates the best in-sample fit.
123
A. Garcia-Hiernaux et al.
⎡
⎢
⎢
⎢
Vi := ⎢
⎢
⎣
Im
HK
HK
..
.
0
Im
HK
..
.
0
0
Im
..
.
...
...
...
..
.
⎤
0
0⎥
⎥
0⎥
⎥
.. ⎥
.⎦
Hi−2 K Hi−3 K Hi−4 K . . . Im
.
(14)
i·m
From assumption A1 and for sufficiently large i, Lix tends to the null matrix.
Consequently, substituting Eqs. (10a) in (10b) gives:
Z f ≃ i Z p + Vi A f ,
(15)
where i := Oi Lz .
Equation (15) is the basis of subspace methods. It is a regression model which can
be consistently estimated by LS because the columns in Vi A f are uncorrelated with
the regressor Z p (from assumption A3).
Note that, for sufficiently large i, matrix i has a reduced rank, n, which is less
than the matrix dimension im × im. So, estimating (15) requires reduced-rank LS.
The estimation problem can be written as:
2
min W1 Z f − i Z p ,
{i }
F
such that rank(i ) = n,
(16)
where · F is the Frobenius norm. This problem is solved by computing:
ˆ i = Z f Z⊺p (Z p Z⊺p )−1 ,
(17)
svd
and then applying a SVD decomposition (denoted as = ) on a weighted version of
ˆ i , to deal with the rank reduction. Van Overschee and De Moor (1995) proved that
several subspace methods are equivalent to computing:
svd
ˆ i W2 = Un Sn Vn + Ên ,
W1
⊺
(18)
where Ên denotes the approximation error arising from the use of a system order n, and
W1 , W2 are two nonsingular weighting matrices, whose composition depends on the
⊺ 1
specific algorithm employed. While W2 = (Z p Z p ) 2 is common for most subspace
methods (only for systems without observable inputs) there are differences for W1 ;
see Table 1.
Table 1 shows that MOESP and N4SID subspace algorithms use the identity as
weighting matrix. In contrast, when CVA is used, the singular values obtained from
the SVD decomposition coincide with the canonical correlations between Z f and Z p .
Alternatively, the weighting matrix used in this paper is an estimate of the inverse
of the square root of the error covariance matrix, as ⊥
Z p is the projection in the
orthogonal complement of Z p . As we will discuss later (see Sect. 2.3.3) adequately
choosing these weightings is crucial for some asymptotic results.
123
Identification of canonical...
Table 1 Weighting matrix W1 in Eq. (18) in different subspace algorithms
Weighting
CVA
W1
(Z f Z f )− 2
⊺
⊺
1
MOESP
N4SID
This paper
I
I
−2
(Z f ⊥
Zp Z f )
⊺
1
⊺
−1
where ⊥
Z = I − Z p (Z p Z p ) Z p
p
From Eq. (18), matrices in (7a–7b) are usually estimated following two main
approaches. The first one uses the extended observability matrix, sometimes known as
shift invariance approach, as it uses the shift invariance property of Oi . The second
uses the estimates of the states sequence, and is usually termed state approach.
2.3.1 Shift invariance approach
The extended observability matrix in (13) can be estimated from (18) as Ôi =
1/2
W1−1 Un Sn , where we use that i := Oi Lz . Note that if a different factorization is
used, the resulting estimates would be observationally equivalent.
From Arun and Kung (1990) and Verhaegen (1994), we can obtain the matrix in
(7a–7b) as:
ˆ = Ôi† Ôi ,
(19)
where Ôi and Ôi result from removing the last and first m rows, respectively, from
Ôi , and A† denotes the Moore–Penrose pseudo–inverse of A (see, e.g., Golub and
Van Loan 1996). An estimate of H is obtained from the first m rows of Ôi .
Last, from the SVD (18) we also get:
1/2
L̂z = Sn Vn W2−1 ,
⊺
(20)
and K̂ can be easily obtained using (12).4
2.3.2 State approach
This second approach is based on the estimation of the state sequence, see e.g., Larimore (1990); Van Overschee and De Moor (1994). These algorithms estimate the
model parameters by solving a simple set of LS problems.
From L̂z obtained in (20), we can compute the state sequence as:
X̂ f = L̂z Z p ,
(21)
and similarly, X̂ f + = L̂z Z p+ , where Z p+ is like Z p , but adding a ‘+1’ to all the
ˆ K̂) are
subscripts.5 Once the state sequence is approximated, the estimates (Ĥ, ,
4 One can alternatively solve the Riccati equation to obtain an estimate of K.
5 In these two expressions we use that Li tends to the null matrix for large i.
x
123
A. Garcia-Hiernaux et al.
obtained as follows:
⊺
⊺
Ĥ = Z f1 X̂ f X̂ f X̂ f
−1
,
(22)
where Z f1 denotes the first row of Z f . Building on these estimates, we compute the
residuals  f1 = Z f1 − ĤX̂ f and, finally, get:
−1
⊺
⊺
X̂
⊺
⊺
f
ˆ
+
=
X̂
.
K̂
X̂ f  f1
X̂ f  f1
f
 f1
(23)
2.3.3 Asymptotic properties
This section briefly discusses the asymptotic properties of subspace methods.
The estimates resulting from the main subspace methods are known to be strongly
consistent and asymptotically normal under mild assumptions. Bauer (2005a) surveys
the literature on this subject and gives conditions for consistency and asymptotic
normality for many subspace methods. This is done for the system (7a–7b) and also
when observable inputs are included.
An effort has also been made to study the efficiency of these estimators, e.g., by
comparing their asymptotic variances. In contrast with consistency and asymptotic
normality, here we find some differences among the methods. Bauer and Jansson
(2000) and Bauer and Ljung (2002) provide numerical approximations of the asymptotic variance of the estimators for several algorithms. The most relevant result in this
line is presented by Bauer (2005b), who proves that CVA weighting matrices (see
Table 1), together with the state approach (see Sect. 2.3.2), leads to estimates that are
asymptotically equivalent to (pseudo) MLE,6 being this the only asymptotic efficient
result currently available in the literature.7
The algorithm suggested in this paper named SUBEST1 (see Sect. 3.2 for details)
follows a two-stage approach with the weighting matrix W2 given in Table 1. So,
Bauer (2005b) theorem cannot be directly applied. We do not have a formal proof for
its asymptotic efficiency. For this reason, we conduct a simulation where the meansquare errors of SUBEST1 and ML (Casals et al. 1999) estimates are computed.
Additionally, the results obtained in Deistler et al. (1995) for other subspace methods
and the same data generating process (DGP) are included for comparison. The model
(7a–7b) considered in the simulation is:
=
10
1.5
0
0.8 0.2
,
, H=
, K=
01
− 0.2 − 0.8
− 0.4 − 0.5
(24)
with at ∼ iid N (0, I). Following Deistler et al. (1995), the exercise is performed for
T = 2000, 4000, 8000, 16000 and R = 500 replications for each sample size, so that
the results are comparable. We consider two measures for the quality of SUBEST1
6 As in Bauer (2005b), pseudo means that the Gaussian likelihood is maximized even if the true noise
distribution is unknown.
7 No equivalent result has been obtained for systems with exogenous variables.
123
Identification of canonical...
Table 2 Simulation results for model (24)
Algorithm
T = 2000
T = 4000
T = 8000
T = 16000
M(SUBEST1)
0.8594
0.8247
0.8104
0.8138
M(MLE)
0.8307
0.8088
0.8034
0.8116
M(ECH)
1.0040
0.8900
0.9084
0.9376
M(CVA)
0.8929
0.7745
0.7796
0.8234
CRB
0.8195
0.8195
0.8195
0.8195
D(SUBEST1)
0.0424
0.0234
0.0119
0.0050
D(ECH)
0.1418
0.1381
0.1460
0.1190
D(CVA)
0.0224
0.0115
0.0073
0.0038
The row M(SUBEST1) shows the average of T times the mean-square error of the SUBEST1 estimates
for all the parameters of the model, see (25). The row M(SUBEST1) displays the average of T times the
mean-square difference between SUBEST1 and MLE for all the parameters of the model, see (25). CRB is
the average of the Cramer-Rao Bounds obtained from the inverse of the Fisher information matrix
estimates:
M(SUBEST1) =
R
2
T SUBEST1
− θi ,
θ̂i
R
(25)
R
2
T SUBEST1
θ̂i
− θ̂iML
R
(26)
i=1
and
D(SUBEST1) =
i=1
where θ̂iSUBEST1 , θ̂iML denote the SUBEST1 and ML estimates, while θi denotes the
true values given in (24). The quantity M(.) is T times the sample mean-squared error
of the estimate. For asymptotically efficient methods, M(.) converges to the CramerRao Bound (CRB) which is a lower bound for the asymptotic variance of the parameter
estimates. On the other hand, D(.) is T times the mean-squared difference between
the corresponding method and ML estimates. For an asymptotically efficient method,
D(.) → 0 as T → ∞.
The results of the simulation are summarized in Table 2 and Fig. 1. Table 2 shows
the average of M(.) and D(.) statistics for all the parameters in (24), for SUBEST1,
the echelon subspace estimator (ECH) and CVA, computed for the same example by
Deistler et al. (1995), and, finally, the corresponding CRB. The results suggest that
SUBEST1, contrary to ECH, yields estimates that are as asymptotically efficient as
those obtained by ML or CVA; see Table 2. Figure 1 evidences that M(SUBEST1)
→ CRB and D(SUBEST1) → 0, as T → ∞. This conclusion was confirmed by
many other simulations (see the online Appendix at https://doi.org/10.100/s00362023-01451-4), while no counterexamples have been found. This result supports the
use of SUBEST1 for the modeling method presented in the rest of the paper.
123
A. Garcia-Hiernaux et al.
Fig. 1 Results of the simulation of model (24). Values for ECH and CVA come from Peternell (1995); see
Table 2 for more detail
2.4 Relationships between SS and VARMAE representations
As VARMAs, the IF of a given dynamic system is not unique. Note that for any
nonsingular matrix T, applying the transformation xt∗ = T−1 xt , ∗ = T−1 T,
Ŵ ∗ = T−1 Ŵ, E∗ = T−1 E, H∗ = HT to any IF, yields an alternative output-equivalent
representation.
A canonical IF representation is characterized by two elements, a specific structure
of the transition matrix and a unique transformation matrix T. The main property
123
Identification of canonical...
of canonical representations is that they realize the system output as a function of a
unique parameter set and, therefore, are exactly identified.8
Hannan and Deistler (1988) show the equivalence between the VARMA E and the
echelon IF by analyzing the structure of linear dependence relations between the
rows of the Hankel matrix H = OC, where O is the infinite version of (13) and
C := [K, K, 2 K, ...]; see Hannan and Deistler (1988), Theorems 2.5.1 and 2.5.2.
In this paper we use the SS Luenberger Canonical Form (LCF).9 This representation
is convenient because it is easy to obtain from it the equivalent VARMA E representation. Particularly, Proposition 2 in Casals et al. (2012) proves that if the coefficients of
a IF process are known, then: (1) the KI vector can be derived from the observability
matrix (13), and (2) the VARMA E representation can be obtained from a change of
coordinates that transforms any IF into its corresponding LCF. We will use this result
in Sect. 3.2.
3 Modeling stationary datasets
This section details the main steps of our modeling strategy, which consist in: (i)
determining the system order, (ii) estimating the KIs compatible with the previous
decision, and (iii) refining the resulting model.
3.1 Step-1: Estimating the system order and a preliminary model
In our approach, the first step consists in determining the minimum dynamic order (or
McMillan degree) required to realize zt .
The following remarks and examples may help to clarify this concept:
1. A model with n = 0 would be static, as its output could be realized by 0 dynamic
components;
2. ...when n = 3 the observable time series can be realized as a linear combination
of three first-order dynamic components;
3. ...the order of a standard VARMA( p,q) process is m × max{ p, q}; and,
4. ...the output of a n-order system can also be realized by systems of order n + 1,
n + 2... Hence, a unique characterization for the McMillan degree requires the
dimension of the system to be minimal (see assumption A2).
The literature typically determines n by estimating a sequence of SS models over a
predetermined search space for the McMillan degree, n = 0, 1, 2... These models can
be efficiently estimated using subspace techniques, see Favoreel et al. (2000); Bauer
(2005a), and compared with each other by means of: (a) LR tests, see Tsay (1989a);
Tiao and Tsay (1989), (b) information criteria, see Akaike (1974); Schwarz (1978);
Hannan and Quinn (1979), or (c) canonical correlation criteria, see Bauer (2001);
Garcia-Hiernaux et al. (2012).
8 Note however that there can be different canonical forms for the same system, being each one of them
exactly identified. Canonical forms present in most cases the minimum number of parameters required to
realize the system output.
9 Luenberger (1967) proposed the first canonical form for multivariate processes.
123
A. Garcia-Hiernaux et al.
Table 3 Output from NID when
applied to zt in model (5)
Unit roots
0
Results for different criteria
n
AIC
SBC
HQ
SV C2
NIDC
χ 2 (5%)
0
6.33
6.41
6.36
1.80
1.80
–
1
5.54
5.72
5.61
0.56
0.56
0.00
2
5.52
5.81
5.64
0.68
0.67
0.40
3
5.59
5.98
5.74
0.84
0.83
0.74
1
1
1
1
Estimated dynamic order
2
1
We determine the McMillan degree automatically by following the modal strategy
suggested by Garcia-Hiernaux et al. (2012),which consists in determining n as the
modal value of the orders chosen by a selection of the methods mentioned above. We
refer to this procedure as NID (N-IDentification) and the following example illustrates
its application.
Example 2: Estimating the Step-1 model
Consider again the VARMA(1,1) model (5) used in Example 1. We simulate 100 observations of this process and apply to them the previously mentioned NID algorithm,
obtaining the results displayed in Table 3 , where the acronyms AIC, SBC and HQ
denote the Akaike (1974), Schwarz (1978) and Hannan and Quinn (1979) information
criteria, respectively, SV C2 and NIDC denote the canonical correlation-based criteria proposed by Garcia-Hiernaux et al. (2012) and, last, χ 2 (5%) denotes the p-value
of the LR test for significance of the canonical correlations proposed by Tsay (1989a).
The 5% value in parentheses denotes that automatic testing will reject the null if the
p-value is smaller than 5%. Last, the orders automatically chosen by each criterion
are displayed in the last row of Table 3. Note that the correct order (n = 1) was the
modal choice, and the only criterion that overestimates n is the AIC.
After determining the system order, it is easy to apply any of the subspace methods
described in Sect. 2.3 to estimate the parameters of a n-order IF (7a–7b). In this case,
⊺ −1
2 and i = 5.10 We will
we use a shift invariance approach with W1 = (Z f ⊥
Zp Z f )
call this a “Step-1 model”, and the estimated parameter matrices for this sample are:
ˆ = .519, K̂ = 0.251 .258 ,
.924 .376
ˆ
.
a =
.376 1.001
.640
Ĥ =
,
−.292
10 Unless otherwise indicated, we generally fix i = p = f to the integer closer to log(T ).
123
Identification of canonical...
Chapter 9 in Casals et al. (2016) describes a simple numerical procedure to write
an IF in an equivalent balanced VARMA(n,n) model.11 Applying this procedure to
the previous IF returns the standard VARMA(1,1) model:
F̄ 0 = L̄ 0 = I,
−.519 0
ˆ
,
F̄ 1 =
.237 0
ˆL̄ = −.236 .165 ,
1
.120 −.075
whose values are very close to the true coefficients in (5).
In the univariate case, the Step-1 model is a canonical representation which can
only be refined by pruning non-significant parameters, see Sect. 3.3. In the multivariate case, assuring a canonical representation requires further refining. We discuss this
issue in the next Section. However, even if this Step-1 model may be somewhat overparameterized, it might be sufficient to achieve the objectives of the model-building
exercise. In this case, the modeling process could stop here.
3.2 Step-2: Estimating the KIs and its corresponding canonical model
We already mentioned that IF and VARMA representations are interchangeable. In
particular, Casals et al. (2012) show how to obtain the VARMA E representation of
any IF when the matrices of the system and the KI, α, are known.
However, the equivalence issue is still unsolved when the IF coefficients are not
known, but estimated from a finite sample. That happens because the estimated observability matrix does not present the linear dependence structure implied by α, and then
the similar transformation proposed in Casals et al. (2012) does not lead to the LCF
and, hence, to the equivalent VARMA E .
To solve this issue, we devised SUBEST1, a novel subspace algorithm that estimates
the LCF model corresponding to any given KI vector α. The estimated parameters
resulting from SUBEST1 coincide with those in a VARMA E form. The algorithm
follows a two-stage procedure. Initial estimates of the system matrices (using a shift
invariance approach, see Sect. 2.3.1) are used to obtain a transformation matrix T̂ .
We then use this matrix to transform the states and finally apply a state approach (see
Sect. 2.3.2) to estimate the echelon form. As far as we know, this procedure is new in
the literature and is the main contribution of the paper.
To describe the steps of SUBEST1, we will use the subindex notation {i: j}, meaning
that any matrix Ai: j is the partition of A that includes the rows from i to j.
Step-1. Assume that n is known or consistently estimated using the procedures presented in Sect. 3.1. Estimate , K and H using the subspace methods presented
ˆ K̂ and Ĥ these estimates.
in Sect. 2.3.1 and the system order n. We denote by ,
Step-2. For a given KI vector α, apply Proposition 2 in Casals et al. (2012) to obtain the
transformation matrix T such that ∗ = T−1 T and H∗ = HT present the
LCF structure when α, and H are known. Note that, even if the α imposed
as constraint is the true KI, this does not lead to the ones/zeros structure of
11 The term “balanced” refers to the fact that the resulting VARMA model has the same autoregressive and
moving average orders, being both equal to the system order.
123
A. Garcia-Hiernaux et al.
ˆ and not with the true (unknown)
the LCF, as T is computed with Ĥ and ,
matrices of the system.
Step-3. Obtain the states X f and X f + from the model in Step-1, as explained in
Sect. 2.3.2, and denote these estimates by X̂ f and X̂ f + , respectively. Then
we compute the sequences of states, X∗f and X∗f + , that correspond to the
LCF by applying the previous similar transformation T as X̂∗f = T−1 X̂ f and
X̂∗f + = T−1 X̂ f + .
Step-4. The LCF is defined by the matrices ∗ and H∗ as:
⎡
F1
F2
..
.
Q1
0
..
.
⎢
⎢
⎢
=⎢
⎢
⎣Fn max −1 0
Fn max 0
∗
0
Q2
..
.
...
...
..
.
⎤
0
0
..
.
⎥
⎥
⎥
⎥ and H∗ = F0 0 0 .
⎥
0 . . . Qn max −1 ⎦
0 ...
0
(27)
where ∗ is a companion matrix such that each F j block ( j = 1, ..., n max )
has a
number of rows equal to the number of KIs greater or equal to k,
In fact, the (k, l)m̄ = m
k=1 min{n k , 1} columns and some null elements.
th element of F j will be nonzero only if j ∈ n k − n kl + 1, n k , where
n kl was defined in (4). Each Qk block is a zeros/ones matrix, with as many
columns as the number of KIs greater or equal to k. If the endogenous variables are sorted according to their corresponding KI, the structure of Qk will
⊺
be Qk = Ik+1 0 , where Ik+1 is an identity matrix with the same number
of rows as Fk+1 .
About H∗ , F0 is a m × m̄ matrix, such that the rows corresponding to components with a nonzero KI can be organized in a m̄ × m̄ lower triangular matrix
with ones in the main diagonal.
Therefore, ∗ and H∗ can be written as:
∗
H̄1 + I 0
∗ = ∗1:m̄ ∗m̄+1:n , H∗ =
H2∗ 0
(28)
where H̄1∗ = H1∗ − I and matrices ∗1:m̄ , H̄1∗ and H2∗ include parameters to be
estimated and zeros. On the other hand, the matrix ∗m̄+1:n is invariant and is
only made up of ones and zeros. Therefore, estimating ∗ and H∗ boils down
to estimating some elements in their m̄ first columns, i.e., some elements in
∗1:m̄ , H̄1∗ and H2∗ .
Step-5. Building on the estimate of the states obtained in Step-3, and the LCF structure
of matrices ∗ and H∗ , determined in Step-4, estimate the LCF corresponding
to the given KI vector α. To this end, formulate the SS model:
X∗f + = ∗1:m̄ ∗m̄+1:n
123
X∗f ,1:m̄
+ K∗ A f 1 ,
X∗f ,m̄+1:n
(29a)
Identification of canonical...
Z f1 =
∗
∗
X f ,1:m̄
H̄1 + I 0
+ A f1 ,
H2∗ 0 X∗f ,m̄+1:n
(29b)
where Z f1 and A f1 are defined in Eq. (23). Rearranging terms in (29a–29b)
yields:
∗
∗
X̃∗f + = ∗1:m̄ Xi,1:
m̄ + K A f 1
∗
H̄1
Z̃ f1 =
X∗f ,1:m̄ + A f1 ,
H2∗
(30a)
(30b)
where X̃∗f + = X∗f + − ∗m̄+1:n X∗f ,m̄+1:n and Z̃ f1 = [I 0] X∗f ,1:m̄ . Notice
⊺
that estimating ∗1:m̄ , H̄1∗ , H2∗ and K∗ comes down to apply Eqs. (22–23), but
substituting Z f1 , X̂ f + and X̂ f by, respectively, Z̃ f1 , X̂∗f + and X̂∗f , where the
last two matrices were obtained in Step-3.
Therefore, estimates of ∗ and H∗ given in (28) and obtained as the LS
solution of (30a–30b) keeps the LCF structure.12
In practice, α is not known. However, it can be empirically determined bearing in
mind that |α| = n, and that models with different α are not nested in general, see
Example 1, Remarks 3 and 5. These remarks motivate our proposal, which consists in
estimating the models corresponding to all the combinations of KIs which add up to
the system order, already specified in the previous phase, and choosing the model that
optimizes a given information criterion. This idea can be formalized by the following
minimization problem:
min
(n 1 ,...,n m )
s.t.
ˆ 1 , . . . , n m )| + C(T )d(n 1 , . . . , n m )
log|(n
|α| = n̂,
(31)
ˆ 1 , . . . , n m ) is an estimate for the error covariance matrix, C(T ) is a
where (n
penalty function that determines which information criterion is to be optimized and
d(n 1 , . . . , n m ) is the number of parameters of the corresponding echelon form. The
usual alternatives are considered for the penalty term C(T ), i.e., 2/T (AIC), log(T )/T
(SBC), and 2 log(log(T ))/T (HQ).
Some authors have previously developed procedures to estimate α and its associated
echelon form. For instance, Hannan and Rissanen (1982); Hannan and Kavalieris
(1984a); Tsay (1989b); Poskitt (1992, 2016); Nsiri and Roy (1992); Lütkepohl (2005),
among others, deal with the problem using a VARMA representation. To the best of our
knowledge, only Peternell (1995); Peternell et al. (1996) propose subspace methods to
this aim. However, most of these methods are not statistically efficient in general, see
12 As previously mentioned, the state approach using the CVA weighting scheme in the stationary case leads
to statistically efficient estimates. Obviously, this property holds if correct linear restrictions are imposed.
SUBEST1 uses restrictions that are only asymptotically valid, and it is not easy to discuss analytically
what happens in this case. Simulations suggest that asymptotic efficiency remains, although it seems that
SUBEST1 does not return estimates as close to pseudo MLE as the CVA estimator does, in finite samples.
We thank an anonymous referee for this observation.
123
A. Garcia-Hiernaux et al.
Table 4 KI estimates using the algorithm in Sect. 3.2
α
(1, 2, 1)
(2, 1, 1)
(2, 2, 0)
(3, 1, 0)
(2, 2, 1)
(3, 2, 0)
(2, 1, 2)
T = 150
0.16
98.02
1.18
0.24
0.12
0.14
0.14
300
0.0
99.50
0.20
0.10
0.12
0.02
0.06
500
0.0
99.78
0.08
0.02
0.10
0.0
0.02
Relative values are multiplied by 100 to facilitate the comparison. In the subspace algorithm, p = f are
chosen as the integer closer to log(T )
Deistler et al. (1995), and some are computationally expensive or difficult to automate
(Nsiri and Roy 1992). If SUBEST1 is asymptotically as efficient as MLE, as we
conjectured in Sect. 2.3.3 based on simulation results, this would make it an attractive
procedure with both, computational simplicity and statistical optimality.
On this basis, the methodology proposed here returns both, the KIs resulting from
the optimization problem (31) and the estimates of its associated LCF. It is structured
in three steps:
1. Determine the system order, n, by means of the NID procedure.
2. Compute all the KI vectors which add up to n, and estimate its corresponding
canonical models using SUBEST1.
3. Select the KIs associated of the model with the best ICs.
To illustrate the performance of the algorithm, we simulated 5000 replications of
a trivariate system with α = (2, 1, 1), which is treated as unknown. Table 4 shows
the relative estimates of different α. Note that the procedure performs well even for
relatively small samples and, as expected, its performance improves as T increases.
Moreover, the relative number of cases in which the system order is misspecified (last
three α vectors in Table 4) is very small, so the remarkable reduction in the number
of models to be estimated by constraining |α| = n does not have a significant cost
although, obviously, this will depend on the DGP. Additionally, Table 5presents the
DGP and the average estimates of its parameters for different sample sizes.
3.3 Step-3: Refining the Step-2 model
The third and final step consists in refining the canonical model obtained in Step-1, in
the univariate case, or in Step-2, in the multivariate case, by different procedures such
as:
1. Applying NID to the model residuals. If the model is statistically adequate, the
order identified should be n = 0. A higher McMillan degree would imply that
the residuals have a dynamic structure, i.e., they are autocorrelated, so the model
should be re-specified.
2. Estimating the models by Gaussian ML for improved efficiency, if the normality
assumption holds.
3. Excluding non-significant parameters to achieve further parsimony. Estimation
and testing in this phase can be done by ML methods or iterative subspace-based
techniques (see, Garcia-Hiernaux et al. 2009).
123
Identification of canonical...
Table 5 Estimate of an echelon form using the algorithm in Sect. 3.2
T
150
300
500
F0 = Q 0
⎤
⎡
1.0 0 0
⎣ 0.4 1.0 0 ⎦
− 0.6 0 1.0
F̂0 = Q̂ 0
⎡
1.0 0
⎣ 0.40 1.0
− 0.56 0
⎡
1.0 0
⎣ 0.40 1.0
− 0.58 0
⎡
1.0 0
⎣ 0.40 1.0
− 0.59 0
⎤
0
0⎦
1.0
⎤
0
0⎦
1.0
⎤
0
0⎦
1.0
F1
⎤
⎡
− 0.7 0
0
⎣ 0.2 − 0.5 − 0.9⎦
0.4 0.3 − 0.2
F̂1
⎡
− 0.67
⎣ 0.18
0.30
⎡
− 0.69
⎣ 0.20
0.37
⎡
− 0.69
⎣ 0.20
0.39
⎤
0
0
− 0.47 − 0.89⎦
0.44 − 0.09
⎤
0
0
− 0.49 − 0.90⎦
0.34 − 0.17
⎤
0
0
− 0.50 − 0.90⎦
0.32 − 0.20
Average values of the estimates obtained when α̂ = (2, 1, 1)
F2
⎤
⎡
0.3 − 0.2 0.5
⎣0
0
0⎦
0
0
0
F̂2
⎡
0.20 − 0.07
⎣ 0
0
0
0
⎡
0.28 − 0.17
⎣ 0
0
0
0
⎡
0.29 − 0.19
⎣ 0
0
0
0
⎤
.60
0⎦
0
⎤
0.52
0 ⎦
0
⎤
0.50
0 ⎦
0
Q1
⎤
⎡
− 0.2 0.4
.7
⎣ 0.6 − 0.3 − 0.4⎦
0.3 1.0 − 0.8
Q̂ 1
⎡
− 0.21
⎣ 0.55
0.24
⎡
− 0.20
⎣ 0.59
0.30
⎡
− 0.20
⎣ 0.60
0.30
⎤
0.39 0.70
−0.34 − 0.33⎦
0.97 − 0.45
⎤
0.40 0.69
− 0.30 − 0.38⎦
0.95 − 0.66
⎤
0.40 0.70
− 0.30 − 0.39⎦
0.96 − 0.72
Q2
⎡
⎤
0.3 0.5 − 0.8
⎣0 0
0 ⎦
0 0
0
Q̂ 2
⎡
⎤
0.23 0.43 − 0.48
⎣ 0
0
0 ⎦
0
0
0
⎤
⎡
0.30 0.44 − 0.68
⎣ 0
0
0 ⎦
0
0
0
⎤
⎡
0.31 0.46 − 0.74
⎣ 0
0
0 ⎦
0
0
0
123
A. Garcia-Hiernaux et al.
Note that Step-3 seeks parsimony. While this a desirable feature, it is somewhat
balanced by the complex procedures and modeling decisions required.
We illustrate Steps 2 and 3 with the same data used in Example 2.
Example 3: Estimating and refining a canonical representation
We applied the procedure described in Sect. 3.2 to determine α in the simulated data
already used in Example 2, with the constraint that the system order is n = 1. Again
we set i = log(T ) = 5 in SUBEST1, which returns a SBC value of 5.90 and 6.02 for
α = (1, 0) and α = (0, 1), respectively. So, the α = (1, 0) final model in VARMA E
form, estimated by Gaussian ML is:
⎡
1.0
0
(−)
⎤
⎡
⎦ , F̂ 1 = ⎣
F̂ 0 = L̂ 0 = ⎣ (−)
0.464 1.0
(.153) (−)
0.921 0.377
ˆ
a =
,
0.377 0.998
⎤
⎡
−0.272 0.170
(−)⎦
(.074) ⎦
,
, L̂ 1 = ⎣ (.112)
0
0
0
−0.510 0
(.111)
0
(−)
⎤
(−)
(−)
(−)
very similar to (6). The standard errors are in parentheses. The residual diagnostics
did not reveal any symptom of misspecification.
4 Nonstationary and seasonal processes
We will now discuss the extension of the previous analysis to nonstationary and seasonal variables. Instead of modeling directly the nonstationary process (as in 2005c),
we follow the traditional strategy of identifying the integration order of each variable
(or cointegration rank and vector) and then applying the corresponding transformation
to induce stationarity.
4.1 Nonstationarity
Assume now that the system output, yt ∈ Rm , is nonstationary. The methods described
in Sect. 3 require stationarity, so we will first proceed to determine how to transform
yt into a stationary vector zt .
The literature provides many tools to this end, ranging from graphical assessment
(Box et al. 2015), to univariate unit roots testing (Dickey and Fuller 1981; Kwiatkowski
et al. 1992), and cointegration testing, (Johansen 1991). Subspace approaches have
also been applied to determine the stationary transformation. In this case, the methods
build on analyzing the canonical correlation coefficients (σ j , j = 1, . . . , i) between the
past and future information subspaces (Z p and Z f , respectively, see Sect. 2.3). Bauer
and Wagner (2002) and Bauer (2005c) extend the stationary theory to analyze I(1) and
cointegrated variables within this framework. In a later work, Garcia-Hiernaux et al.
(2010) discuss nonstationarity and cointegration analysis in the same line. The main
idea in these papers is that the estimated σ j corresponding to unit roots converge to
123
Identification of canonical...
their true values (unity) much faster than the other canonical correlation coefficients.
This property, known as superconsistency, allows one to distinguish both kinds of
correlations.
Here we use the latter method because of it is simple and can be implemented
within the aforementioned NID procedure. Specifically, Garcia-Hiernaux et al. (2010)
propose the following unit root criterion (URC) to identify the σ j corresponding to
unit roots:
U RC = 1 − σ̂ j2 − G(T , i, d) ≤ 0.
(32)
This criterion concludes that σ j = 1 when the distance between σ̂ j2 and the unity
is smaller than (or equal to) the penalty function, G(T , i, d), which depends on the
sample size, T , the row dimension of Z p , i, and the number of unit roots we wish to
evaluate, d. This threshold is computed through Monte Carlo simulations.
In the univariate case, m = 1, we apply the URC to determine d, so that z t = ∇ d yt
is stationary. We could then proceed as described in Sect. 3 with z t .
In the multivariate case, m > 1, the procedure becomes more complex as potential
cointegrating relations come to play. Particularly, when yt is made up of m I(1) processes, then the cointegrating rank is c = m − d, being d the number of unit roots of
the multivariate process. Building on this basis, we determine the cointegrating rank
using the following algorithm:
1. Check that the series are I(1), applying UCR to every single series.
2. Estimate the number of unit roots of the multivariate process, d̂, applying UCR to
the multivariate process.
3. Calculate the cointegrating rank as the difference between the system dimension
and the number of unit roots estimated in the previous step, i.e., ĉ = m − d̂.
When the procedure above suggests a cointegrating relation (ĉ > 0), Garcia-Hiernaux
et al. (2010) show how to estimate the cointegrating matrix of yt . This matrix represents the linear combinations of yt which are stationary and, after applying this
transformation, we proceed as described in Sect. 3.
4.2 Seasonality
The methodology in Sect. 3 can be extended to model the multiplicative seasonal
structure of process by using systems in cascade; see Chapter 7 in Casals et al. (2016).
However, as in the previous section, the adequate transformation to induce stationarity
must be decided first. We do not consider here seasonal cointegration, so we will
address this problem in the univariate case.
Assume that a given series, yt , has a seasonal dynamic structure of period s, meaning
that there are s observations per seasonal cycle. Following Box et al. (2015), we denote
the regular difference operator by ∇ = 1 − B, and the seasonal difference operator
by ∇s = 1 − B s . In seasonal processes, we would first compute the seasonal sum of
yt , defined as:
wt = (1 + B + B 2 + · · · + B s−1 )yt ,
(33)
and then apply NID to detect unit roots at the zero frequency in wt , ∇wt , ∇ 2 wt ... If a
single unit root in wt is detected, the stationary transformation would be the seasonal
123
A. Garcia-Hiernaux et al.
difference:
z t = ∇(1 + B + B 2 + · · · + B s−1 )yt = ∇s yt .
(34)
Similarly, detecting two unit roots in wt would imply that the stationary transformation
is:
(35)
z t = ∇ 2 (1 + B + B 2 + · · · + B s−1 )yt = ∇∇s yt ,
and so on. After the stationary transformation is identified and applied, one must
run the NID procedure to determine, first, regular stationary structures of order n =
0, 1, 2, . . . , and then seasonal structures in the range n = 0, s, 2s, 3s, . . .
After fitting univariate models to each series in a dataset, one can cope with a
vector of seasonal time series by: (a) filtering out the seasonal structure of each series
using, e.g., the exact decomposition in Casals et al. (2002), and then (b) modeling the
seasonally adjusted vector using the approach described in Sects. 3 and 4.1. After doing
this, one could also estimate a final model combining the regular specification with a
seasonal factor, given by seasonal AR, MA and transformation matrices, containing
in the main diagonal corresponding factors of the univariate seasonal models.
Finally, as mentioned above, we do not address the case of seasonal cointegration.
Recently, Bauer and Buschmeier (2021) went further in this sense proving that CVA,
see Sect. 2.3, provides consistent estimators for long-run and short-run dynamics,
even in the presence of seasonal unit roots. They consider multivariate systems and,
accordingly, propose new tests for the cointegrating rank at seasonal frequencies. The
simulations show that CVA performs better than some alternatives in specific cases
(small sample, large system dimension).
This shows the flexibility of subspace methods to model vectors of time series, but
a deeper discussion of this issue is beyond the scope of this paper.
5 Univariate modeling examples
We will now illustrate the performance of the methodology proposed in previous
sections when applied to build univariate models. Section 5.1 presents a step-by-step
analysis of the famous series G for International Airline Passengers (Box et al. 2015),
which is a popular benchmark for practical cases emphasizing in nonstationarity and
seasonality. Section 5.2 applies the modal automatic strategy described in Sect. 3.1 to
nine benchmarks commonly used in the literature.
5.1 Airline passenger series
There is a consensus in the literature that the monthly (log) Airline Passenger series,
z t = log(yt ), is adequately represented by an IMA(1,1) × IMA(1,1)12 model. We will
show now that our methodology leads to this model.
Following the steps described in Sect. 4.2, we first compute the seasonal sum,
wt = (1 + B + ... + B s−1 )z t , where s = 12, and apply to it the procedure for unit
root detection, finding: (a) at least one unit root in wt (Table 6), (b) at least one unit
root in ∇wt (Table 7 ), and (c) zero unit roots in ∇ 2 wt . These decisions seem quite
123
Identification of canonical...
Table 6 Output from NID when
applied to ∇wt
Unit roots
1
Results for different criteria
n
AIC
SBC
HQ
SV C2
NIDC
χ 2 (5%)
0
− 6.13
− 6.10
− 6.12
10.74
10.74
–
1
− 8.44
− 8.38
− 8.42
0.15
0.13
0.00
2
− 8.41
− 8.30
− 8.37
0.22
0.17
0.15
3
− 8.38
− 8.23
− 8.32
0.23
0.16
0.06
1
1
1
1
SV C2
NIDC
χ 2 (5%)
Estimated dynamic order
1
Table 7 Output from NID when
applied to ∇ 2 wt
1
Unit roots
0
Results for different criteria
n
AIC
SBC
HQ
0
− 8.29
− 8.27
− 8.28
0.24
0.24
–
1
− 8.43
− 8.36
− 8.40
0.16
0.14
0.00
2
− 8.40
− 8.29
− 8.36
0.21
0.17
0.07
3
− 8.38
− 8.22
− 8.31
0.24
0.17
0.20
1
1
1
1
SV C2
NIDC
χ 2 (5%)
Estimated dynamic order
1
Table 8 Determination of the
seasonal subsystem order for
∇∇12 z t with NID
1
Results for different criteria
n
AIC
SBC
HQ
0
− 3.32
− 3.30
− 3.32
0.39
0.3
–
1
− 3.57
− 3.50
− 3.58
0.11
0.09
0.01
2
− 3.54
− 3.43
− 3.56
0.17
0.13
0.34
3
− 3.51
− 3.36
− 3.53
0.24
0.17
0.34
1
1
1
1
Estimated dynamic order
1
1
reliable as the first canonical correlation coefficient for ∇wt and ∇ 2 wt are .9925 and
.9565, and so the Unit Root Criterion presented in (32) clearly indicates the existence
of a unit root in each case.
Therefore, an adequate stationary transformation is ∇ 2 wt = ∇∇12 z t , which coincides with the standard result in the literature. Note that Table 7 also determines the
order of the regular system, which is set at n̂ = 1.
The next step consists in determining the order of the seasonal subsystem. To this
end, we apply NID to ∇∇12 z t , setting the length of the seasonal cycle to s = 12. By
doing so, we obtain the results shown in Table 8. Accordingly, the dynamic order of
123
A. Garcia-Hiernaux et al.
Table 9 Output from NID when
applied to â2t in Eq. (37)
Unit roots
0
Results for different criteria
n
AIC
SBC
HQ
SV C2
NIDC
χ 2 (5%)
0
− 3.77
− 3.74
− 3.76
0.12
0.12
–
1
− 3.76
− 3.69
− 3.73
0.14
0.12
0.36
2
− 3.73
− 3.62
− 3.68
0.18
0.13
0.33
3
− 3.71
− 3.56
− 3.65
0.23
0.17
0.43
0
0
1
0
Estimated dynamic order
0
0
the seasonal subsystem is set to n̂ s = 1 (see Garcia-Hiernaux et al. 2012, for the use
of NID with seasonal series).
After this preliminary specification analysis, we apply SUBEST1 to the stationary
transformed series, setting the orders of the regular and seasonal subsystems to the
previously determined values: n̂ = 1 and n̂ s = 1. This yields the balanced model:
(1 − 0.06 B)(1 − 0.08 B 12 )∇∇12 z t = (1 − 0.43 B)(1 − 0.58 B 12 )â1t ,
(0.21)
(0.17)
(0.21)
(0.16)
(36)
where the figures in parentheses are the parameter standard errors, obtained by computing the corresponding information matrix. The estimate for the percent standard
deviation of the error is σ̂a1 = 3.68%.
To achieve the highest efficiency with Gaussian data and to exclude some nonsignificant parameters, one can compute MLE estimates using the previous values
as initial conditions for the iterative optimization.13 In this case, this provides the
following result:
∇∇12 z t = (1 − 0.40 B)(1 − 0.56 B 12 )â2t , σ̂a = 3.67%,
(0.08)
(0.08)
(37)
which essentially coincides with the results in the literature.
Finally, one can check for autocorrelation by applying NID to the residuals of this
tentative model. If we do so setting the frequencies at s = 1 and s = 12 to check for
additional regular or seasonal dynamic structures, we obtain Tables 9 and 10, whose
results show no evidence of misspecification.
5.2 Semi-automatic identification for real and simulated time series
Now we apply our methodology to nine benchmark time series taken from the literature, using the modal strategy described in Sect. 3.1. Table 11describes these series,
while Tables 12 and 13 summarize the main results. Note that:
13 With the highest efficiency we mean that, even if simulations in Sect. 2.3.3 suggest that SUBEST1 is
asymptotically efficient, they also show that ML estimates have a smaller variance in short samples.
123
Identification of canonical...
Table 10 Determination of the
seasonal subsystem order for â2t
in Eq. (37)
Results for different criteria
n
AIC
SBC
HQ
SV C2
NIDC
χ 2 (5%)
0
− 3.77
− 3.74
− 3.76
0.04
0.04
–
1
− 3.74
− 3.67
− 3.71
0.11
0.09
0.29
2
− 3.72
− 3.61
− 3.68
0.16
0.12
0.66
3
− 3.69
− 3.54
− 3.63
0.23
0.17
0.76
0
0
0
0
Estimated dynamic order
0
0
1. The stationary transformation was correctly determined in all the cases.
2. When working with seasonal series (series B, G y H) the final results coincide
essentially with the models suggested by the literature, with two minor issues:
(a) Seasonality in series G is weak, so it was not detected until we applied NID to
the residuals of the initial regular model;
(b) In the model for series H, the MLE of the seasonal MA parameter converged
to unity, which implies a deterministic seasonality.
3. All the models in the column “NID+SUBEST1+ML" in Table 13 coincide with
the specifications proposed by the literature.
Regarding the performance of the modal strategy, there was a perfect agreement
between all the criteria employed except for series D, where SBC suggests a firstorder process, while the remaining procedures suggests the “consensus” second-order
specification.
6 Multivariate modeling examples
Now we will show two examples of multivariate modeling, chosen because of the
different cointegration properties of the final models. The first one is not cointegrated,
while the second has one nonstationary common factor.
6.1 Flour price indices in three cities
The first exercise is based on three monthly series of flour price indexes in Buffalo,
Minneapolis and Kansas City, recorded from August 1972 to November 1980, which
will be denoted by yt . This dataset is adequate to test the “Law of one price”, as flour
is a quite homogeneous product and these cities are close enough to allow for product
flows.
Figure 2 displays the profile of these series. Note that they show a high degree
of comovement and so, they could be cointegrated. Despite this graphical evidence,
previous analyses reject the null of cointegration for this dataset; see Tiao and Tsay
(1989); Lütkepohl and Poskitt (1996). This makes it a good benchmark to test for false
positives in cointegration testing.
123
123
Table 11 Datasets employed to estimate the models in Tables 12 and 13
Series
T
A
45
B⋆
150
Description
Daily average number of defects per truck, taken from Burr (1976)
Quarterly simulated series, taken from Wei (2006)
C
35
D⋆
200
Simulated series n. 2, taken from Wold (1964)
Series W4. Monthly unemployed young women in the U.S. from Jan 1961 to Dec 1985, taken from Wei (2006)
E
226
Series C-Chemical Process Temperature Readings, taken from Box and Jenkins (1970); Box et al. (2015)
F
197
Series A-Chemical Process Concentration Readings, taken from Box and Jenkins (1970); Box et al. (2015)
G
178
Monthly employers in food production in Wisconsin, Jan 1961–Oct 1975, taken from O’Donovan (1983)
H
32
Quarterly Production of Beer in the US; 1975-1Q, 1982-4Q, taken from Wei (2006)
I⋆
36
Simulated series, taken from Makridakis et al. (1983)
In Tables 12 and 13 we present the DGP for series B and D, and the author’s specification and estimates for series I
A. Garcia-Hiernaux et al.
Series
T
Estimates reported by the literature
NID + SUBEST1
A
45
(1 − 0.43 B)(Z t − 1.79 ) = at
(1 − 0.61 B)(Z t − 3.94 ) = (1 − 0.24 B)at
σ̂a2 = 0.21
σ̂a2 = 0.21
∇∇4 Z t = (1 − 0.80B)(1 − 0.60B 4 )at
(1 + 0.22 B)(1 + 0.21 B 4 )∇∇4 Z t = (1 − 0.56 B)(1 − 0.49 B 4 )at
σ̂a2 = 1.00
σ̂a2 = 0.97
B
C
D
E
150
35
200
226
(0.13)
(0.07)
(0.30)
G
H
123
I
197
178
32
36
(0.13)
(0.31)
(0.13)
(0.13)
(0.12)
∇ Z t = (1 − 0.51B )at
(1 − 0.00B )∇ Z t = (1 − 0.51B )at
σ̂a2 = 1397.27
σ̂a2 = 1380.35
(1 − 0.70B + 0.49B 2 )Z t = at
(1 − 0.84 B + 0.51 B 2 )Z t = (1 − 0.16 B + 0.05 B 2 )at
σ̂a2 = 0.60
σ̂a2 = 0.66
(1 − 0.82 B)∇ Z t = at
(1 − 0.82 B)∇ Z t = (1 − .02 B)at
(0.05)
(0.11)
(0.11)
(0.04)
(0.11)
(0.10)
(0.11)
(0.06)
2
σ̂a = .02
σ̂a2 = 0.02
F
(0.13)
(.08)
(1 − 0.92 B)Z t = 1.45 + (1 − 0.58 B)at
(1 − 0.88 B)Z t = 2.08 + (1 − 0.50 B)at
σ̂a2 = 0.10
σ̂a2 = 0.10
(0.04)
(0.08)
(0.09)
(0.11)
(0.02)
(0.10)
(1 − 0.79 B)∇12 Z t = (1 − 0.76 B 12 )at
(1 − 0.56 B)(1 − 0.05 B 12 )∇12 Z t = − 2.11 + (1 − 0.18 B)(1 − 0.35 B 12 )at
σ̂a2 = 1.24
σ̂a2 = 1.54
(0.05)
(0.05)
(0.10)
(0.18)
(0.10)
∇4 Z t = 1.49 + (1 − 0.87 B 4 )at
(1 − 0.14 B 4 )∇4 Z t = 1.21 + (1 − 0.71 B 4 )at
σ̂a2 = 2.39
σ̂a2 = 2.03
Z t = 51.03 + at
Z t = 47.31 + at
σ̂a2 = 818.81
σ̂a2 = 790.54
(0.09)
(4.84)
(0.16)
(0.26)
(4.77)
(0.30)
(0.23)
(0.11)
(0.17)
Identification of canonical...
Table 12 Step-1 univariate models, resulting from the successive application of NID and SUBEST1. Standard errors are in parentheses
123
Table 13 Univariate models, resulting from the successive application of NID, SUBEST1 and ML. Standard errors are in parentheses
Series
T
Estimates reported by the literature
NID+SUBEST1+ML
A
45
(1 − 0.43 B)(Z t − 1.79 ) = at
(1 − 0.42 B)(Z t − 1.81 ) = at
σ̂a2 = 0.21
σ̂a2 = 0.21
∇∇4 Z t = (1 − 0.80B)(1 − 0.60B 4 )at
∇∇4 Z t = (1 − 0.75 B)(1 − 0.63 B 4 )at
σ̂a2 = 1.00
σ̂a2 = 0.91
B
C
D
E
150
35
200
226
(0.13)
(0.07)
(0.15)
G
I
178
32
36
(0.07)
∇ Z t = (1 − 0.51B )at
σ̂a2 = 1397.27
σ̂a2 = 1393.12
(1 − 0.70B + 0.49B 2 )Z t = at
(1 − 0.72 B + 0.41 B 2 )Z t = at
σ̂a2 = 0.60
σ̂a2 = 0.66
(1 − 0.82 B)∇ Z t = at
(1 − 0.82 B)∇ Z t = at
(0.05)
(0.05)
(0.06)
(0.04)
(0.06)
(0.04)
2
σ̂a = 0.02
(1 − 0.92 B)Z t = 1.45 + (1 − 0.58 B)at
(1 − 0.92 B)Z t = 1.45 + (1 − 0.60 B)at
σ̂a2 = 0.10
σ̂a2 = 0.10
(0.04)
(0.08)
(0.04)
(0.67)
(0.08)
(1 − 0.79 B)∇12 Z t = (1 − 0.76 B 12 )at
(1 − 0.77 B)∇12 Z t = (1 − 0.61 B 12 )at
σ̂a2 = 1.24
σ̂a2 = 1.31
(0.05)
(0.05)
∇4 Z t = 1.49 + (1 − 0.87 B 4 )at
(0.09)
σ̂a2 = 2.39
(0.16)
(0.05)
∇4 Z t = 1.55 + ∇4 at
(0.13)
σ̂a2 = 1.64
Z t = 51.03 + at
Z t = 51.03 + at
σ̂a2 = 818.81
σ̂a2 = 818.86
(4.84)
(4.84)
(0.07)
A. Garcia-Hiernaux et al.
H
197
(0.06)
∇ Z t = (1 − 0.51B )at
σ̂a2 = 0.02
F
(0.07)
Identification of canonical...
Fig. 2 Logs of flour price indexes in Buffalo, Minneapolis and Kansas City. Monthly data from August
1972 to November 1980
Table 14 Output from NID
when applied to log(yt )
Unit roots
3
Cointegration relationships
0
Results for different criteria
n
AIC
SBC
HQ
SV C2
N I DC
χ 2 (5%)
0
− 1.18
− 1.03
− 1.12
56.21
56.21
–
1
− 5.63
− 5.32
− 5.50
8.15
8.08
0.00
2
− 7.60
− 7.13
− 7.41
4.12
3.97
0.00
3
− 9.02
− 8.39
− 8.77
1.18
0.97
0.00
4
− 8.90
− 8.12
− 8.59
1.32
1.04
0.99
3
3
3
3
Estimated dynamic order
3
3
Applying NID to each individual series concludes that they all are I(1). Again, the
first canonical correlation is close to unity in the three series, so this decision seems
solid. To check for cointegration, we apply NID to the vector log(yt ), detecting again
three unit roots for the multivariate process. Here the first three canonical correlations are .99, .94 and .88, respectively. Although the unit root criterion presented in
(32) indicates three unit roots, note that the last one is a borderline case. Anyway,
cointegration is rejected, see Table 14.
Table 14 shows no evidence of any dynamic structure beyond the unit roots.
This could be due to the fact that nonstationary components often dominate other
weaker dynamic components, which detection requires a previous stationarity inducing transformation. For this reason, we apply again the NID algorithm to the stationary
transformation ∇ log(yt ), obtaining the results in Table 15. In this case there is a tie, as
three criteria detect a first-order structure while the other three conclude that the order
123
A. Garcia-Hiernaux et al.
Table 15 Output from NID when applied to ∇ log(yt )
Unit roots
0
Results for different criteria
n
AIC
SBC
HQ
SV C2
N I DC
χ 2 (5%)
0
− 13.61
− 13.45
− 13.55
0.67
0.67
–
1
− 13.83
− 13.52
− 13.71
0.80
0.73
0.09
2
− 13.75
− 13.28
− 13.56
0.98
0.84
0.39
3
− 13.68
− 13.05
− 13.43
1.13
0.92
0.92
4
− 13.59
− 12.81
− 13.28
1.34
1.05
0.99
1
1
0
0
0
Estimated dynamic order
1
is zero. When this happens, we prefer choosing the higher dynamic order because an
overfitting error can be easily solved by pruning insignificant parameters. Therefore,
we choose n̂ = 1.
After deciding the system order, SUBEST1 (with i = 3) provides estimates for α
and its corresponding VARMA E . In this case, AIC, SBC and HQ coincide in choosing
α̂ = (1, 0, 0). Accordingly, the estimated model is:
F̂(B)∇ log(yt ) = L̂(B) ât ,
(38)
with:
⎤
⎡
⎤
1 − 0.82B 1.08B 0.09B
1 + 0.39B 0 0
1
0 ⎦,
F̂(B) = ⎣ −0.71 1 0⎦ L̂(B) = ⎣ −0.71
−0.45
0
1
−0.45 0 1
⎡
⎤
0.21 − −
ˆ a (%) = ⎣0.22 0.25 − ⎦ ,
0.22 0.24 0.28
⎡
(39)
where the noise covariance estimates are presented in percentage. We then estimate
the previous structure by ML, obtaining:
⎡
1 + 0.41 B 0 0
(0.17)
⎢
⎢
F̂(B) = ⎢ −0.73
(0.11)
⎣
− .49
(0.20)
⎡
0.21 −
⎢(0.03)
⎢ 0.22 0.25
ˆ
a (%) = ⎢(0.03)
(0.03)
⎣
0.21 0.24
(−) (−)⎥
1 0⎥
(−) (−)⎥
⎦
0 1
(−) (−)
⎤
−
⎥
−− ⎥
⎥
⎦
0.28
(0.03) (0.04) (0.04)
123
⎤
⎡
1 − 0.94 B 1.32B 0
(0.46)
⎢
⎢
L̂(B) = ⎢ −0.73
(0.11)
⎣
−0.49
(0.20)
⎤
(0.37) (−)⎥
1
(−)
0
(−)
0⎥
(−)⎥
⎦
1
(−)
(40)
Identification of canonical...
Table 16 Output from NID when applied to ât ⋆
Unit roots
0
Results for different criteria
n
AIC
SBC
HQ
SV C2
N I DC
χ 2 (5%)
0
− 13.96
− 13.80
− 13.90
0.57
0.57
–
1
− 13.87
− 13.56
− 13.75
0.75
0.68
0.91
2
− 13.76
− 13.29
− 13.57
0.87
0.72
1.00
3
− 13.68
− 13.05
− 13.42
1.09
0.88
0.99
4
− 13.60
− 12.82
− 13.29
1.35
1.07
0.99
0
0
0
0
0
Estimated dynamic order
0
⋆ â are the residuals from model (40)
t
where the (1,3)-element in L̂ was found non-significant and therefore pruned.
Last, Table 16 summarizes the output of NID when applied to the residuals of model
(40), which does not show any symptom of misspecification. The Q matrix of Ljung
and Box (1978), computed with 7 degrees of freedom, confirms this conclusion:
Q(7)
⎡
⎤
3.5 3.3 3.2
= ⎣4.6 4.3 4.9⎦ .
6.9 5.6 4.0
(41)
The VARMA echelon model (40) implies that the three series share a first-order
stationary AR component, and receive the effect of the innovations in Buffalo and
Minneapolis at t − 1. This structure is represented by seven free parameters, so it is
parsimonious in comparison, e.g., with the nine parameters of the VAR(1) specification
proposed by Tiao and Tsay (1989).
On the other hand, note that the error covariance matrix in Equation (40) implies
that the disturbances in this model are highly correlated. This fact would explain the
comovement observed in Fig. 2 and suggests that, in this case, the “Law of one price”
could be described, not by common trends, but by common shocks affecting prices in
different locations.
6.2 US interest rates
We will model now four monthly short-term US interest rates, from August 1985 to
January 2003, already analyzed by Martin-Manjon and Treadway (1997). These are
the Fed’s Target Rate (T Ot ), the effective rate (T E t ), and the secondary market yields
for 3-month (T 3m t ) and 6-month T-bills (T 6m t ). Figure 3 shows the profile of these
series.
We determine the integration order for each individual variable by applying NID.
The unit root criterion clearly indicates that each individual series is I(1). The second
step in the analysis consists in determining the dynamic order for the vector of variables
123
A. Garcia-Hiernaux et al.
Fig. 3 Short-term interest rates. Monthly data from August 1985 to January 2003
Table 17 Output from NID
when applied to Z1t ⋆
Unit roots
1
Cointegration relationships
3
Results for different criteria
n
AIC
SBC
HQ
SV C2
N I DC
χ 2 (5%)
0
4.82
4.98
4.88
112.71
112.71
–
1
0.23
0.52
0.35
4.12
4.06
0.00
2
− 1.22
− 0.80
− 1.05
2.35
2.23
0.00
3
− 2.00
− 1.46
− 1.78
1.21
1.02
0.00
4
− 2.18
− 1.51
− 1.90
1.26
1.01
0.00
4
3
4
4
Estimated dynamic order
4
⋆ X7
1t = (T E t
4
T 3m t T 6m t
T Ot )⊺
by applying NID again. In this case, the procedure detects one unit root and, therefore,
three cointegrating relationships, see Table 17.14
NID also provides estimates for the coefficients in the cointegrating equations, so
the stationary-inducing transformations for the data is given by:
⎤
⎡
⎤⎡
∇ 000
T Ot
⎢− 1.01 1 0 0⎥ ⎢ T E t ⎥
⎥
⎥⎢
(42)
zt = ⎢
⎣− 0.92 0 1 0⎦ ⎣T 3m t ⎦
T 6m t
− 0.93 0 0 1
Therefore, cointegration operates between pairs of variables and the stationary
combinations are close to the spreads between each rate and the Fed’s target. Note
14 The numerical result of applying URC in (32) to the first four canonical correlations is
−.127, .055, .265, .649, concluding then that there is only one unit root in the multivariate process.
123
Identification of canonical...
Table 18 Output from NID
when applied to Z1t ⋆
Unit roots
0
Results for different criteria
n
AIC
SBC
HQ
SV C2
N I DC
χ 2 (5%)
2
− 4.04
− 3.63
− 3.87
2.41
2.39
0.00
3
− 4.64
− 4.10
− 4.42
1.63
1.60
0.00
4
− 4.88
− 4.21
− 4.61
1.52
1.47
0.00
5
− 4.81
− 4.01
− 4.49
1.58
1.52
0.31
4
4
4
4
Estimated dynamic order
4
⋆Z
1t ≃ (∇T Ot
4
T E t − T Ot T 3m t − T Ot T 6m t − T Ot )⊺
also that the structure of Equation (42) has a clear interpretation in terms of economic
controllability, as determining the value of the Target Rate T Ot implies choosing
the long-run mean value of T E t , T 3m t and T 6m t Applying NID to the stationarytransformed vector, zt , detects four stationary dynamic components and no additional
unit roots, see Table 18.
In this situation, the next steps in the analysis would be: (a) determining the KI
vector α, (b) estimating the final model, and (c) performing diagnostic checks. We
do not include this results here because they do not add much to what was shown in
previous examples.
7 Discussion
The model-building approach proposed in this paper has substantial advantages in
comparison with its main alternatives.
7.1 Comparison with the VAR methodology
The procedure to fit a Step-1 model described in Sect. 3.1 is similar to standard VAR
model-building, see Lütkepohl (2005), as it relies on LS methods, explores a succession
of linear models with increasing orders and compares the resulting models both, in
terms of LRs and ICs. However, there are two important differences.
First, in VAR modeling the exploration of possible dynamic orders is not systematic,
because it skips some intermediate dynamic orders. To see this, consider a vector of
three time series. A standard VAR analysis would consider vector autoregressions of
orders 0, 1, 2... which corresponding dynamic orders would be 0, 1×3 = 3, 2×3 = 6...
In comparison, our method explores the whole sequence of positive integer orders (0,
1, 2,...) up to a user-specified upper limit. Because of this, standard VAR modeling
will be prone to over or under-specifying the system order.
123
A. Garcia-Hiernaux et al.
Second, the VAR approach considers only autoregressive structures, while Step-1
models have both, AR and MA terms. Accordingly, if the DGP process has some moving average structure our approach should provide a more parsimonious representation,
even if we do not proceed to Step-2.
Third, while a Step-1 model can be compared with a VAR, in terms of the modeling
decisions required (just deciding a scalar order) our approach provides a way to achieve
further parsimony by enforcing a canonical structure (i.e., fitting a Step-2 model)
without much additional complexity. Conversely, the VAR methodology does not
provide such an option.
7.2 Comparison with VARMA specification methods
In comparison with alternative VARMA specification procedures, as those in Jenkins
and Alavi (1981), Tiao and Box (1981) and Reinsel (1997), our methodology:
1. ...provides a unified modeling procedure for single and multiple time series because
the same basic decisions are required in both cases,
2. ...is scalable, meaning that the analyst may choose between a quick nonparsimonious Step-1 model, or the more efficient Step-2 and Step-3 models,
3. ...is easy to implement in automatic mode, and
4. ...offers a flexible approach to the parsimony principle, which can be enforced in
the latter stages of the analysis by adopting an efficient canonical structure (Step-2)
and pruning non-significant parameters (Step-3).
On the other hand, standard VARMA specification methods: (a) do not enforce
coherence with univariate model-building, (b) are difficult to implement in automatic
mode, (c) are not scalable, as they do not provide cost-efficient relaxations of the parsimony requirement and, in contradiction with this, (d) may yield an overparameterized
model, as they do not assure the final model to be canonical.
7.3 Comparison with other methods using the Kronecker indices
The closer benchmarks for our proposal are the alternative methods to determine the
KIs and, hence, to specify a VARMA E . This literature is rich and includes, among
many works, those of Hannan and Kavalieris (1984b), Poskitt (1992), Lütkepohl and
Poskitt (1996), Tsay (1989b), Li and Tsay (1998) and Poskitt (2016).
The common feature of all these methods is that they directly identify the KIs
α = (n 1 , n 2 , . . . , n m ), while we first determine the system order, n, and only then
estimate and compare all the models with a sum of KIs equal to n. Our approach has
several advantages:
1. Determining the system order is useful in itself, as it allows one to specify a crude
but fast Step-1 model which may be adequate in some cases, see Sect. 3.1, thus
avoiding the need to determine α.
2. The dynamic order can be obtained with both, ICs and LR tests, because lower
order specifications are nested in higher order ones. On the other hand, models
with different α non-nested in general, so they can only be compared with ICs.
123
Identification of canonical...
3. Limiting the search space to the KIs compatible with a previously estimated system order provides a faster and more robust performance than searching in an
unbounded space.
Besides these advantages, our proposal adds extensions for nonstationary time
series, cointegration and seasonality, not treated by other VARMA E approaches.
7.4 Comparison with the SCM approach
Tiao and Tsay (1989) propose a structured method to specify a canonical model. They
build on the so-called “Scalar Component” (SC), which is the basis of the family of
“Scalar Component Models” (SCM). These concepts are analogous, but not equivalent,
to those of KIs and VARMA E . In a nutshell, they build on a canonical correlation
analysis to find a contemporaneous linear transformation for the vector process that:
(a) reveals simplifying structures, (b) achieves parametric parsimony, and (c) identifies
alternative “exchangeable” parameterizations.
Athanasopoulos et al. (2012) compare comprehensively the SC and KI-based
approaches at theoretical, experimental, and empirical levels. They conclude that:
1. VARMA E and SC models are connected in theory.
2. The greatest advantage of echelon forms is that their construction can be automated,
being this impossible to achieve within the SC identification process.
3. The experimental comparison is performed via Monte Carlo experiments. In this
aspect, they conclude that both procedures work well in identifying some common
VARMA DGPs.
4. SC VARMA models are more flexible because their maximum autoregressive order
does not have to be the same as the order of the moving average component, while
these orders are constrained to be equal for a VARMA E form and, perhaps for this
reason,
5. ...the empirical out-of-sample forecast evaluation shows that SC VARMA models
provide better forecasts than VARMA E models which, in turn, provide better
forecasts than VAR models.
These conclusions are fully consistent with our own experience. In the framework
of this paper, we add significance tests in Step-3 to relax the constraint on the autoregressive and moving average orders implied by the echelon form, so our Step-3 and
SC models could potentially have the same forecasting performance. Confirming this
possibility would require and in-depth forecasting power comparison, which exceeds
the scope of this paper.
On the other hand, the methodology proposed here is scalable and includes extensions for nonstationary time series, cointegration and seasonality, not treated by the
SCM approach.
7.5 Software implementation
An important disadvantage of our method in comparison with some alternatives is
that the software required is not commonly available. To address this shortcoming,
123
A. Garcia-Hiernaux et al.
we implemented all the computational procedures described in this paper in the E 4
functions NID and SUBEST1.
E 4 is a MATLAB toolbox for time series modeling, which can be downloaded
at: www.ucm.es/e-4. The source code for all the functions in the toolbox is freely
provided under the terms of the GNU General Public License. This site also includes
a complete user manual and other reference materials.
Supplementary Information The online version contains supplementary material available at https://doi.
org/10.1007/s00362-023-01451-y.
Acknowledgements The authors thank two anonymous referees for their valuable suggestions, which significantly improved this work. The first draft of this work was written while Miguel Jerez was visiting UC
Davis. Special thanks are due to the Department of Economics for providing ideal working conditions.
We also received useful feedback from the assistants to the research seminars at UC Davis, Universidad
del Pais Vasco, Universidad Autonoma de Madrid and the 36th Annual International Symposium on Forecasting. Last, but not least, support from Instituto Complutense de Analisis Economico (ICAE), Programa
Becas Complutenses Del Amo and from Programa de ayudas a grupos de investigacion Santander-UCM
(Econometrics in State Space research group, ref. 940223) is gratefully acknowledged.
Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included
in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–
723
Arun K, Kung S (1990) Balanced approximation of stochastic systems. SIAM J Matrix Anal Appl 11(1):42–
68
Athanasopoulos G, Poskitt D, Vahid F (2012) Two canonical VARMA forms: scalar component models
vis-à-vis the echelon form. Econom Rev 31(1):60–83
Bauer D (2001) Order estimation for subspace methods. Automatica 37:1561–1573
Bauer D (2005) Asymptotic properties of subspace estimators. Automatica 41:359–376
Bauer D (2005) Comparing the CCA subspace method to pseudo maximum likelihood methods in the case
of no exogenous inputs. J Time Ser Anal 26(5):631–668
Bauer D (2005) Estimating linear dynamical systems using subspace methods. Econom Theory 21:181–211
Bauer D, Buschmeier R (2021) Asymptotic properties of estimators for seasonally cointegrated state space
models obtained using the CVA subspace method. Entropy 23:436
Bauer D, Deistler M (1999) Balanced canonical forms for system identification. IEEE Trans Autom Control
44(6):1118–1131
Bauer D, Jansson M (2000) Analysis of the asymptotic properties of the MOESP type of subspace algorithms.
Automatica 36:497–509
Bauer D, Ljung L (2002) Some facts about the choice of the weighting matrices in Larimore type of subspace
algorithms. Automatica 38:763–773
Bauer D, Wagner M (2002) Estimating cointegrated systems using subspace algorithms. J Econom 111:47–
84
Box G, Jenkins G (1970) Time series analysis, forecasting and control. Holden-Day, San Francisco
123
Identification of canonical...
Box G, Jenkins G, Reinsel G, Ljung G (2015) Time series analysis: forecasting and control. Wiley Series
in Probability and Statistics. Wiley, New York
Burr I (1976) Statistical quality control methods. Marcel Dekker, New York
Casals J, Garcia-Hiernaux A, Jerez M (2012) From general State-Space to VARMAX models. Math Comput
Simul 80(5):924–936
Casals J, Garcia-Hiernaux A, Jerez M, Sotoca S, Trindade A (2016) State-space methods for time series
analysis: theory, applications and software. Chapman-Hall/CRC Press, New York
Casals J, Jerez M, Sotoca S (2002) An exact multivariate modelbased structural decomposition. J Am Stat
Assoc 97(458):553–564
Casals J, Sotoca S, Jerez M (1999) A fast and stable method to compute the likelihood of time invariant
state space models. Econ Lett 65(3):329–337
Casals J, Sotoca S, Jerez M (2013) Single versus multiple-source error models for signal extraction. J Stat
Comput Simul 85(5):1053–1069
Deistler M, Peternell K, Scherrer W (1995) Consistency and relative efficency of subspace methods.
Automatica 31(12):1865–1875
Dickey D, Fuller W (1981) Likelihood ratio statistics for autoregressive time series with a unit root.
Econometrica 49(4):1057–1072
Favoreel W, De Moor B, Van Overschee P (2000) Subspace state space system identification for industrial
processes. J Process Control 10:149–155
Garcia-Hiernaux A (2011) Forecasting linear dynamical systems using subspace methods. J Time Ser Anal
32(5):462–468
Garcia-Hiernaux A, Jerez M, Casals J (2009) Fast estimation methods for time series models in state-space
form. J Stat Comput Simul 79(2):121–134
Garcia-Hiernaux A, Jerez M, Casals J (2010) Unit roots and cointegration modeling through a family of
flexible information criteria. J Stat Comput Simul 80(2):173–189
Garcia-Hiernaux A, Jerez M, Casals J (2012) Estimating the system order by subspace methods. Comput
Stat 27(3):411–425
Gevers MR (1985) ARMA models, their Kronecker indices and their McMillan degree. Int J Control
6(43):1745–1761
Golub G, Van Loan C (1996) Matrix computations. John Hopkins University Press, Baltimore
Hannan EJ, Deistler M (1988) The statistical theory of linear systems. Wiley, New York
Hannan EJ, Kavalieris L (1984) A method for autoregressive-moving average estimation. Biometrika
71:273–80
Hannan EJ, Kavalieris L (1984) Multivariate linear time series models. Adv Appl Probab 16:492–561
Hannan EJ, Quinn B (1979) The determination of the order of an autoregression. J R Stat Soc Ser B
41(2):713–723
Hannan EJ, Rissanen J (1982) Recursive estimation of mixed autoregressive-moving average order.
Biometrika 69:81–94
Ho B, Kalman R (1966) Effective construction of linear state-variable models from input-output functions.
Regelungstechnik 14:545–548
Jenkins G, Alavi A (1981) Somes aspects of modelling and forecasting multivariate time series. J Time Ser
Anal 1(2):1–47
Johansen S (1991) Estimation and hypothesis testing of cointegration vectors in Gaussian vector
autoregressive models. Econometrica 59:1551–1580
Kwiatkowski D, Phillips P, Schmidt P, Shin Y (1992) Testing the null hypothesis of stationarity against
the alternative of a unit root: how sure are we that economic time series have a unit root? J Econom
54(1–3):159–178
Larimore WE (1990) Canonical variate analysis in identification, filtering and adaptive control. In:
Proceedings of the 29th conference on decision and control, Hawaii, pp. 596–604
Li H, Tsay RS (1998) A unified approach to identifying multivariate time series models. J Am Stat Assoc
93(442):770–782
Ljung G, Box G (1978) On a measure of lack of fit in time series models. Biometrika 65(2):297–303
Luenberger DG (1967) Canonical forms for linear multivariate systems. IEEE Trans Autom Control 12:290–
293
Lütkepohl H (2005) New introduction to multiple time series analysis. Springer-Verlag, Berlin
Lütkepohl H, Poskitt DS (1996) Specification of echelon form VARMA models. J Bus Econ Stat 14(1):69–79
123
A. Garcia-Hiernaux et al.
Makridakis SG, Wheelwright SC, McGee VE (1983) Forecasting: methods and applications. Wiley, New
York
Martin-Manjon R, Treadway A (1997) The Fed controls only one of the two interest rates in the US economy.
ICAE working Paper 9716
Nsiri N, Roy R (1992) On the identification of ARMA echelon-form models. Can J Stat 20:369–386
O’Donovan TM (1983) Short-term forecasting. Wiley, New York
Peternell K (1995) Identification of linear dynamical systems by subspace and realization-based algorithms
(Unpublished doctoral dissertation). TU Wien
Peternell K, Scherrer W, Deistler M (1996) Statistical analysis of novel subspace identification methods.
Signal Process 52:161–177
Poskitt DS (1992) Identification of echelon canonical forms for vector linear processes using least squares.
Ann Stat 20:195–215
Poskitt DS (2016) Vector autoregressive moving average identification for macroeconomic modeling: a new
methodology. J Econom 192:468–484
Qin SJ (2006) An overview of subspace identification. Comput Chem Eng 30:1502–1513
Quenouille MH (1957) The analysis of multiple time series. Griffin, London
Reinsel CG (1997) Elements of multivariate time series analysis, 2nd edn. Springer-Verlag, New York
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Tiao GC, Box GEP (1981) Modeling multiple time series with applications. J Am Stat Assoc 76:802–816
Tiao GC, Tsay RS (1989) Model specification in multivariate time series. J R Stat Soc Ser B 51(2):157–213
Tsay RS (1989) Identifying multivariate time series models. J Time Ser Anal 10(4):357–372
Tsay RS (1989) Parsimonious parametrization of vector autoregressive moving average models. J Bus Econ
Stat 7(3):327–341
Van Overschee P, De Moor B (1994) N4SID: subspace algorithms for the identification of combined
deterministic-stochastic systems. Automatica 30(1):75–93
Van Overschee P, De Moor B (1995) A unifying theorem for three subspace system identification algorithms.
Automatica 31(12):1853–1864
Verhaegen M (1994) Identification of the deterministic part of MIMO state space models given in innovations
form from input-output data. Automatica 30(1):61–74
Wei W (2006) Time series analysis univariate and multivariate methods, 2nd edn. Addison Wesley, New
York
Wold HOA (1964) A study in the analysis of stationary time series. Almqvist and Wiksell, Uppsala
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
123