Identification of canonical models for vectors of time series: a subspace approach

MIGUEL JEREZ MENDEZ

Identification of canonical models for vectors of time series: a subspace approach

Statistical papers, 2023

Statistical Papers https://doi.org/10.1007/s00362-023-01451-y REGULAR ARTICLE Identiﬁcation of canonical models for vectors of time series: a subspace approach Alfredo Garcia-Hiernaux 1 · Jose Casals 2 · Miguel Jerez 1 Received: 16 November 2021 / Revised: 1 March 2023 © The Author(s) 2023 Abstract We propose a new method to specify linear models for vectors of time series with some convenient properties. First, it provides a uniﬁed modeling approach for single and multiple time series, as the same decisions are required in both cases. Second, it is scalable, meaning that it provides a quick preliminary model, which can be reﬁned in subsequent modeling phases if required. Third, it is optionally automatic, because the speciﬁcation depends on a few key parameters which can be determined either automatically or by human decision. And last, it is parsimonious, as it allows one to choose and impose a canonical structure by a novel estimation procedure. Several examples with simulated and real data illustrate its application in practice. Keywords System identiﬁcation · Canonical models · Kronecker indices · Subspace methods · State-space models 1 Introduction This paper proposes a new method to specify linear models for vectors of time series which draws on ideas from two different sources: the subspace methods, originated in the system identiﬁcation literature, and the time series analysis approach, built on the seminal work of Box and Jenkins (1970). Since the pioneer work of Ho and Kalman (1966), the term “system identiﬁcation” refers to the use of statistics and algebraic tools to ﬁt a State-Space (SS) model to Jose Casals and Miguel Jerez have contributed equally to this work. B Miguel Jerez mjerezme@ucm.es Alfredo Garcia-Hiernaux agarciah@ucm.es Jose Casals jmcasalsc@gmail.com 1 DANAE and ICAE, Universidad Complutense de Madrid, Madrid, Spain 2 ICAE, Universidad Complutense de Madrid, Madrid, Spain 123

A. Garcia-Hiernaux et al. a dataset without a priori restrictions. Much of the recent work in this area follows the subspace methods approach: a family of system identiﬁcation procedures that rely heavily on canonical correlation and Least Squares (LS) methods. For a survey see, e.g., Qin (2006). On the other hand, modern time series analysis is critically inﬂuenced by the classical Box and Jenkins (1970) book and subsequent works. The philosophical con- tribution of this methodology can be synthesized in three basic principles: (i) “let the data speak” meaning that the speciﬁcation of reduced form models should be based on the observable properties of the sample; (ii) “parameterize with parsimony,” meaning that models with few parameters should be preferred, all other aspects being equal, and; (iii) “all the models are wrong, but some are useful,” meaning that all speciﬁ- cations must be considered tentative and subject to a structured process of critical revision and diagnostic testing. We subscribe these principles and implemented them in the foundations of our approach. On these bases, our method is structured in three sequential steps: Step-1 Determine the dynamic order of the system (or McMillan degree), deﬁned as the minimum number of dynamic components required to realize the system output. We do this by using subspace methods to estimate a sequence of SS models over a predetermined range of orders, and then choosing the best one by means of Likelihood Ratio (LR) tests and information criteria (ICs). This simple decision, similar to the VAR order selection, is enough to ﬁt a preliminary “Step-1 model” to the dataset, which might be overparameterized but useful to cover simple needs. Step-2 Determine the dynamic order for each individual time series, which are also known as Kronecker index (KI). We do this by applying a new subspace algo- rithm, named SUBEST1, which estimates the parameters of the Luenberger Canonical Form corresponding to a given vector of KIs. We use this algorithm to estimate the models corresponding to all the combinations of KIs which add up to the previously determined system order. The model with the best ICs (“Step-2 model”) is therefore a canonical representation for the time series, and will often be more parsimonious than the Step-1 model. In the univariate case this step is not required, because the system order coincides with the KI for the single series and the Step-1 model is, therefore, canonical. Step-3 The third and ﬁnal step consists in reﬁning the models obtained in previous steps, for example by computing Maximum-Likelihood Estimates (MLE) or by pruning non-signiﬁcant parameters. Such a “Step-3 model” might be more parsimonious than the previously ﬁtted models but, on the other hand, requires more effort and human decision-making. The computational procedures employed to implement the Step-2, as well as the distinction between Steps 1, 2 and 3, are original contributions to the literature on canonical modeling, although closely related to the works of Hannan and Deistler (1988), Deistler et al. (1995), Bauer and Deistler (1999) and Bauer (2005c). In comparison with mainstream econometric procedures, our proposal has three speciﬁc advantages, as it: 123

Statistical Papers https://doi.org/10.1007/s00362-023-01451-y REGULAR ARTICLE Identification of canonical models for vectors of time series: a subspace approach Alfredo Garcia-Hiernaux1 · Jose Casals2 · Miguel Jerez1 Received: 16 November 2021 / Revised: 1 March 2023 © The Author(s) 2023 Abstract We propose a new method to specify linear models for vectors of time series with some convenient properties. First, it provides a unified modeling approach for single and multiple time series, as the same decisions are required in both cases. Second, it is scalable, meaning that it provides a quick preliminary model, which can be refined in subsequent modeling phases if required. Third, it is optionally automatic, because the specification depends on a few key parameters which can be determined either automatically or by human decision. And last, it is parsimonious, as it allows one to choose and impose a canonical structure by a novel estimation procedure. Several examples with simulated and real data illustrate its application in practice. Keywords System identification · Canonical models · Kronecker indices · Subspace methods · State-space models 1 Introduction This paper proposes a new method to specify linear models for vectors of time series which draws on ideas from two different sources: the subspace methods, originated in the system identification literature, and the time series analysis approach, built on the seminal work of Box and Jenkins (1970). Since the pioneer work of Ho and Kalman (1966), the term “system identification” refers to the use of statistics and algebraic tools to fit a State-Space (SS) model to Jose Casals and Miguel Jerez have contributed equally to this work. B Miguel Jerez mjerezme@ucm.es Alfredo Garcia-Hiernaux agarciah@ucm.es Jose Casals jmcasalsc@gmail.com 1 DANAE and ICAE, Universidad Complutense de Madrid, Madrid, Spain 2 ICAE, Universidad Complutense de Madrid, Madrid, Spain 123 A. Garcia-Hiernaux et al. a dataset without a priori restrictions. Much of the recent work in this area follows the subspace methods approach: a family of system identification procedures that rely heavily on canonical correlation and Least Squares (LS) methods. For a survey see, e.g., Qin (2006). On the other hand, modern time series analysis is critically influenced by the classical Box and Jenkins (1970) book and subsequent works. The philosophical contribution of this methodology can be synthesized in three basic principles: (i) “let the data speak” meaning that the specification of reduced form models should be based on the observable properties of the sample; (ii) “parameterize with parsimony,” meaning that models with few parameters should be preferred, all other aspects being equal, and; (iii) “all the models are wrong, but some are useful,” meaning that all specifications must be considered tentative and subject to a structured process of critical revision and diagnostic testing. We subscribe these principles and implemented them in the foundations of our approach. On these bases, our method is structured in three sequential steps: Step-1 Determine the dynamic order of the system (or McMillan degree), defined as the minimum number of dynamic components required to realize the system output. We do this by using subspace methods to estimate a sequence of SS models over a predetermined range of orders, and then choosing the best one by means of Likelihood Ratio (LR) tests and information criteria (ICs). This simple decision, similar to the VAR order selection, is enough to fit a preliminary “Step-1 model” to the dataset, which might be overparameterized but useful to cover simple needs. Step-2 Determine the dynamic order for each individual time series, which are also known as Kronecker index (KI). We do this by applying a new subspace algorithm, named SUBEST1, which estimates the parameters of the Luenberger Canonical Form corresponding to a given vector of KIs. We use this algorithm to estimate the models corresponding to all the combinations of KIs which add up to the previously determined system order. The model with the best ICs (“Step-2 model”) is therefore a canonical representation for the time series, and will often be more parsimonious than the Step-1 model. In the univariate case this step is not required, because the system order coincides with the KI for the single series and the Step-1 model is, therefore, canonical. Step-3 The third and final step consists in refining the models obtained in previous steps, for example by computing Maximum-Likelihood Estimates (MLE) or by pruning non-significant parameters. Such a “Step-3 model” might be more parsimonious than the previously fitted models but, on the other hand, requires more effort and human decision-making. The computational procedures employed to implement the Step-2, as well as the distinction between Steps 1, 2 and 3, are original contributions to the literature on canonical modeling, although closely related to the works of Hannan and Deistler (1988), Deistler et al. (1995), Bauer and Deistler (1999) and Bauer (2005c). In comparison with mainstream econometric procedures, our proposal has three specific advantages, as it: 123 Identification of canonical... 1. ...provides a unified modeling approach for single and multiple time series, as the same model-building process is followed in both cases, 2. ...is optionally automatic, meaning that each model is characterized by a few key integer parameters to be chosen following a clearly structured process, based on the use of a limited set of sample statistics; so the modeling decisions can be adopted automatically or by a human analyst, 3. ...is scalable, meaning that: (i) it provides a possibly crude but statistically adequate Step-1 model, which may be enough to satisfy simple analytical needs, which (ii) can be made parsimonious if required, by enforcing a canonical structure in Step-2, and (iii) additional efficiency can be achieved in Step-3. The structure of the paper is as follows. Section 2 presents the notation and reviews some ideas on VARMA and SS systems. It also introduces the subspace methods and its asymptotic properties. Section 3 outlines the methodology proposed in the relatively simple case of stationary and non-seasonal time series and Sect. 4 extends the previous method to nonstationary and/or seasonal series. We do this by adding to the previous schema two specification phases to determine the number of unit roots and cointegrating relations, and to identify the dynamic orders of the seasonal and non-seasonal sub-systems. Sections 5 and 6 illustrate the practical application of this approach to single and multiple time series, respectively, with benchmark datasets taken from the literature. Last, Sect. 7 compares the method proposed with alternative methods, provides some concluding remarks and indicates how to obtain a free MATLAB code which implements all the algorithms required by our proposal. 2 Notation and previous results In this Section we present the notation employed and the main previous results upon which our work is based. We begin by defining VARMA and SS systems. Then, we introduce the subspace methods and finally discuss the relation between canonical VARMA models (named echelon) and their equivalent SS representation. 2.1 VARMA models Much work in applied time series analysis is based on the linear dynamic model: F̄(B)zt = L̄(B)at , (1) where zt ∈ Rm is observed for t = 1, . . . , T ; at ∈ Rm is an innovation vector such that at ∼ iid(0, a ); B denotes the backshift operator, such that for any sequence ωt : B i ωt = ωt−i , i = 0, ±1, ±2, . . . , I and, finally, factors F̄(B) and L̄(B) in (1) are given by: p q F̄(B) = (2) F̄ j B j , L̄(B) = L̄ j B j . j=0 j=0 123 A. Garcia-Hiernaux et al. Model (1–2) is assumed to be left coprime with the roots of F̄(B) and L̄(B) greater than unity, so zt is stationary and invertible. If we multiply the left and right-hand side of Eq. (1) by any arbitrary non-defective matrix we would obtain an observationally equivalent model for zt . Therefore, left coprimeness is not enough to identify the model. Achieving identifiability requires additional constraints. For example, normalising F̄0 = L̄0 = I yields the standard VARMA( p,q) introduced by Quenouille (1957). That is because, for a given transfer function, several VARMA( p,q) models exists, even under left coprimeness. For more details, the equivalence classes of VARMA systems is given in Hannan and Deistler (1988), Theorem 2.2.1. An interesting alternative to a standard VARMA is the VARMA echelon, or VARMA E , representation. System (1–2) is in echelon form if, denoting by F̄kl (B) and L̄ kl (B) the kl-th elements of F̄(B) and L̄(B), respectively, the polynomial factors in (1) are uniquely defined by: F̄kk (B) = 1 + i=1 nk F̄kl (B) = L̄ kl (B) = nk F̄kk (i)B i , for k = 1, . . . , m., F̄kl (i)B i , for k = l., (3a) (3b) i=n k −n kl +1 nk L̄ kl (i)B i , with L̄ kl (0) = F̄kl (0) for k, l = 1, . . . , m, , (3c) i=1 where the multi-index α = (n 1 , n 2 , . . . , n m ), previously named KI, represents the dynamic order of each series, and: n kl = min(n k + 1, nl ) for l ≤ k for l > k min(n k , nl ) k, l = 1, 2, . . . , m. (4) The KIs uniquely define an echelon canonical VARMA form by means of Eqs. (3a–4); see Hannan and Deistler (1988), Theorem 2.5.1. Broadly speaking, these indices determine the zeros, ones and free parameters structure of the echelon form. By canonical, we mean a unique representation (commonly with the minimum number of parameters) among the output-equivalent VARMA class. Another property of KIs is that m k=1 n k = |α| = n, meaning that the echelon form distributes the dynamic order between the different time series so that their sum equals the McMillan degree, denoted by n. Specific details about echelon forms can be found in Hannan and Deistler (1988), Chapter 2, and Lütkepohl (2005), Definition 12.2. To clarify the exposition above, the following example compares two equivalent VARMA representations of the same transfer function with α = (1, 0). We will use this model in different examples throughout this paper. 123 Identification of canonical... Example 1: A VARMA(1,1) model vs. its equivalent VARMAE Consider the left coprime standard VARMA(1,1) model: (I + F̄1 B)zt = (I + L̄ 1 B)at , at ∼ iid N (0, a ), (5) with, − 0.50 0 F̄1 = , 0.25 0 1.00 0.40 − 0.30 0.20 . , a = L̄ 1 = 0.40 1.00 0.15 −0.10 As α = (1, 0), its equivalent VARMA E form is defined as: (F0 + F1 B)zt = (L0 + L1 B)at , (6) where, − 0.30 0.20 − 0.50 0 1 0 F0 = L0 = , , L1 = , F1 = 0 0 0 0 0.50 1 and a coincides with the one in (5), because both models are observationally equivalent and, therefore, have the same innovations. This example motivates the following remarks: Remark 1 It is easy to see the output equivalence of models (5) and (6), as 1 pre-multiplying both sides of (6) by F-1 0 yields (5). Remark 2 The standard VARMA specification (5) has six coefficients while its equivalent echelon form (6) only has four, being parsimony one of the advantages of canonical versus non-canonical formulations. Remark 3 As α = (1, 0) in (6), each scalar in the vector corresponds to the highest dynamic order for the corresponding series. Therefore, the order of the whole system is n = 1. Remark 4 When all the KIs are equal, and only in this case, the standard VARMA model (1-2) has an echelon structure and, accordingly, is a canonical representation. In other case, the echelon form has less parameters than the equivalent standard VARMA which, accordingly, is not canonical. Remark 5 Models with different KI vectors are not nested in general, so they cannot be compared using LR tests. Their relative adequacy is often assessed by means of ICs, see, e.g., Hannan and Kavalieris (1984a); Poskitt (1992); Lütkepohl and Poskitt (1996). 1 Note that F is unimodular. For a general definition of output-equivalent VARMA systems using 0 unimodular matrices, see Hannan and Deistler (1988), Theorem 2.2.1. 123 A. Garcia-Hiernaux et al. As the above remarks point out, when the KIs are different, one can generally obtain the equivalent VARMA representation for any echelon form by pre-multiplying it by an appropriate unimodular transformation. Nonetheless, Gevers (1985) shows that, in general, the standard VARMA representation (1–2) implies that the McMillan degree is m × max{ p, q} and, therefore, only transfer functions whose system order is a multiple of m can be adequately represented that way. Additional details on the KIs nesting structure can be found in Hannan and Deistler (1988), Section 2.6. These facts are fundamental for our paper, and motivate Sects. 3.1 and 3.2. 2.2 State-Space systems The linear time-invariant SS representation employed in this paper is: xt+1 = xt + Kat , zt = Hxt + at , (7a) (7b) where zt and at are as in (1–2). We assume that the SS system is: A1. ...strictly minimum-phase, that is, |λi ( − KH)| < 1, ∀i = 1, ..., n, where λi (.) denotes the i-th eigenvalue, A2. ...minimal, which implies that the pair (, H) is observable and (, K) is controllable, and A3. ...stable, so: |λi ()| < 1, ∀i = 1, ..., n. In comparison with mainstream alternatives, model (7a–7b) has a unique error term affecting both, the state and observation equations. This model is sometimes known as Prediction Error or Innovations Form (IF).2 We will use the latter denomination throughout the paper. The importance of the IF lies in the fact that 1. ...it is a general representation, because any SS system with different unobserved inputs in (7a–7b) can be written in an equivalent IF (see, Hannan and Deistler 1988, Chapter 1). Additionally, Casals et al. (1999) present an algorithm to compute the matrices of an IF equivalent to any SS model by solving an exactly determined system of equations, and 2. ...it has clear computational advantages for likelihood computation and signal extraction, because its states can be estimated with an uncertainty that converges to zero under nonrestrictive assumptions, see Casals et al. (2013). Most of these results are the core of Chapters 1 and 2 in Hannan and Deistler (1988), and have also been empirically discussed in Casals et al. (2016), Chapters 2, 5 and 9. 2.3 Subspace methods Subspace methods are used to estimate the parameters in the IF (7a–7b) by applying generalized LS to a rank-deficient system, see e.g. Qin (2006); Favoreel et al. (2000). 2 More precisely, by IF we refer to the innovations steady-state form, where K and are invariant. a 123 Identification of canonical... To do this, we need to write the model (7a–7b) in subspace form by substituting (7b) into (7a) in at , and solving the resulting recursion. This yields: t−1 xt = ( − KH) t−1 x1 + ( − KH)t−1− j Kz j , (8) j=1 where the states at time t depend on its initial value, x1 , as well as the past values of the output. Substituting (8) into the observation Eq. (7b) results in: zt = Ht−1 x1 + H t−1 t−1− j Ka j + at . (9) j=1 Equations (8) and (9) can be written in matrix form as: X f = Lix X p + Lz Z p , Z f = Oi X f + Vi A f , (10a) (10b) where i is a user-defined index which represents the number of past block rows, p. For simplicity, we will assume hereafter that the scalars p and f are equal to i.3 For now, the only requisite for this value is that i > n. This dimensioning implies that L matrices in (10a) are: Lx := ( − KH)n×n , Lz := Li−1 x K . . . Lx K K (11) n×im (12) . Second, given these previous choices, Z f and Z p are block-Hankel matrices whose ⊺ ⊺ ⊺ ⊺ ⊺ ⊺ ⊺ ⊺ rows are [zt , zt+1 , . . . , zt+ f −1 ] and [zt− p , zt− p+1 , . . . , zt−1 ] , respectively, and each column is characterized by t = p + 1, . . . , T − f + 1. Therefore, the dimension of both matrices is im × T − 2i + 1. The block-Hankel matrix A f has the same general structure as Z f , with at instead of zt , while X p := [x1 , x2 , . . . , xT − p− f +1 ] and X f := [x p+1 , x p+2 , . . . , xT − f +1 ], which have n rows (from assumption A2) and the same number of columns as Z f , are the sequence of past and future states, respectively. Finally, matrices Oi and Vi are known functions of the original parameter matrices, , K and H, see Section 2 of Garcia-Hiernaux et al. (2010), such that: ⊺ ⊺ ⊺ ⊺ Oi := H (H) (H2 ) . . . (Hi−1 ) ⊺ i·m×n , (13) 3 Along the paper we fix i by rounding log(T ) to the nearest integer. There is not much literature about optimal values for p and f in short samples. For instance, in terms of forecasting, Garcia-Hiernaux (2011) shows that one obtains better results (in RMSE measure) when combining the forecasts provided by models with different values of i, rather than when a single value is used. Also, as subspace methods are not iterative, one can estimate with a reasonable computational cost all the models for a given sequence of i, and then choose the value that generates the best in-sample fit. 123 A. Garcia-Hiernaux et al. ⎡ ⎢ ⎢ ⎢ Vi := ⎢ ⎢ ⎣ Im HK HK .. . 0 Im HK .. . 0 0 Im .. . ... ... ... .. . ⎤ 0 0⎥ ⎥ 0⎥ ⎥ .. ⎥ .⎦ Hi−2 K Hi−3 K Hi−4 K . . . Im . (14) i·m From assumption A1 and for sufficiently large i, Lix tends to the null matrix. Consequently, substituting Eqs. (10a) in (10b) gives: Z f ≃ i Z p + Vi A f , (15) where i := Oi Lz . Equation (15) is the basis of subspace methods. It is a regression model which can be consistently estimated by LS because the columns in Vi A f are uncorrelated with the regressor Z p (from assumption A3). Note that, for sufficiently large i, matrix i has a reduced rank, n, which is less than the matrix dimension im × im. So, estimating (15) requires reduced-rank LS. The estimation problem can be written as: 2 min W1 Z f − i Z p , {i } F such that rank(i ) = n, (16) where · F is the Frobenius norm. This problem is solved by computing: ˆ i = Z f Z⊺p (Z p Z⊺p )−1 , (17) svd and then applying a SVD decomposition (denoted as = ) on a weighted version of ˆ i , to deal with the rank reduction. Van Overschee and De Moor (1995) proved that several subspace methods are equivalent to computing: svd ˆ i W2 = Un Sn Vn + Ên , W1 ⊺ (18) where Ên denotes the approximation error arising from the use of a system order n, and W1 , W2 are two nonsingular weighting matrices, whose composition depends on the ⊺ 1 specific algorithm employed. While W2 = (Z p Z p ) 2 is common for most subspace methods (only for systems without observable inputs) there are differences for W1 ; see Table 1. Table 1 shows that MOESP and N4SID subspace algorithms use the identity as weighting matrix. In contrast, when CVA is used, the singular values obtained from the SVD decomposition coincide with the canonical correlations between Z f and Z p . Alternatively, the weighting matrix used in this paper is an estimate of the inverse of the square root of the error covariance matrix, as ⊥ Z p is the projection in the orthogonal complement of Z p . As we will discuss later (see Sect. 2.3.3) adequately choosing these weightings is crucial for some asymptotic results. 123 Identification of canonical... Table 1 Weighting matrix W1 in Eq. (18) in different subspace algorithms Weighting CVA W1 (Z f Z f )− 2 ⊺ ⊺ 1 MOESP N4SID This paper I I −2 (Z f ⊥ Zp Z f ) ⊺ 1 ⊺ −1 where ⊥ Z = I − Z p (Z p Z p ) Z p p From Eq. (18), matrices in (7a–7b) are usually estimated following two main approaches. The first one uses the extended observability matrix, sometimes known as shift invariance approach, as it uses the shift invariance property of Oi . The second uses the estimates of the states sequence, and is usually termed state approach. 2.3.1 Shift invariance approach The extended observability matrix in (13) can be estimated from (18) as Ôi = 1/2 W1−1 Un Sn , where we use that i := Oi Lz . Note that if a different factorization is used, the resulting estimates would be observationally equivalent. From Arun and Kung (1990) and Verhaegen (1994), we can obtain the matrix in (7a–7b) as: ˆ = Ôi† Ôi , (19) where Ôi and Ôi result from removing the last and first m rows, respectively, from Ôi , and A† denotes the Moore–Penrose pseudo–inverse of A (see, e.g., Golub and Van Loan 1996). An estimate of H is obtained from the first m rows of Ôi . Last, from the SVD (18) we also get: 1/2 L̂z = Sn Vn W2−1 , ⊺ (20) and K̂ can be easily obtained using (12).4 2.3.2 State approach This second approach is based on the estimation of the state sequence, see e.g., Larimore (1990); Van Overschee and De Moor (1994). These algorithms estimate the model parameters by solving a simple set of LS problems. From L̂z obtained in (20), we can compute the state sequence as: X̂ f = L̂z Z p , (21) and similarly, X̂ f + = L̂z Z p+ , where Z p+ is like Z p , but adding a ‘+1’ to all the ˆ K̂) are subscripts.5 Once the state sequence is approximated, the estimates (Ĥ, , 4 One can alternatively solve the Riccati equation to obtain an estimate of K. 5 In these two expressions we use that Li tends to the null matrix for large i. x 123 A. Garcia-Hiernaux et al. obtained as follows: ⊺ ⊺ Ĥ = Z f1 X̂ f X̂ f X̂ f −1 , (22) where Z f1 denotes the first row of Z f . Building on these estimates, we compute the residuals Â f1 = Z f1 − ĤX̂ f and, finally, get: −1 ⊺ ⊺ X̂ ⊺ ⊺ f ˆ + = X̂ . K̂ X̂ f Â f1 X̂ f Â f1 f Â f1 (23) 2.3.3 Asymptotic properties This section briefly discusses the asymptotic properties of subspace methods. The estimates resulting from the main subspace methods are known to be strongly consistent and asymptotically normal under mild assumptions. Bauer (2005a) surveys the literature on this subject and gives conditions for consistency and asymptotic normality for many subspace methods. This is done for the system (7a–7b) and also when observable inputs are included. An effort has also been made to study the efficiency of these estimators, e.g., by comparing their asymptotic variances. In contrast with consistency and asymptotic normality, here we find some differences among the methods. Bauer and Jansson (2000) and Bauer and Ljung (2002) provide numerical approximations of the asymptotic variance of the estimators for several algorithms. The most relevant result in this line is presented by Bauer (2005b), who proves that CVA weighting matrices (see Table 1), together with the state approach (see Sect. 2.3.2), leads to estimates that are asymptotically equivalent to (pseudo) MLE,6 being this the only asymptotic efficient result currently available in the literature.7 The algorithm suggested in this paper named SUBEST1 (see Sect. 3.2 for details) follows a two-stage approach with the weighting matrix W2 given in Table 1. So, Bauer (2005b) theorem cannot be directly applied. We do not have a formal proof for its asymptotic efficiency. For this reason, we conduct a simulation where the meansquare errors of SUBEST1 and ML (Casals et al. 1999) estimates are computed. Additionally, the results obtained in Deistler et al. (1995) for other subspace methods and the same data generating process (DGP) are included for comparison. The model (7a–7b) considered in the simulation is: = 10 1.5 0 0.8 0.2 , , H= , K= 01 − 0.2 − 0.8 − 0.4 − 0.5 (24) with at ∼ iid N (0, I). Following Deistler et al. (1995), the exercise is performed for T = 2000, 4000, 8000, 16000 and R = 500 replications for each sample size, so that the results are comparable. We consider two measures for the quality of SUBEST1 6 As in Bauer (2005b), pseudo means that the Gaussian likelihood is maximized even if the true noise distribution is unknown. 7 No equivalent result has been obtained for systems with exogenous variables. 123 Identification of canonical... Table 2 Simulation results for model (24) Algorithm T = 2000 T = 4000 T = 8000 T = 16000 M(SUBEST1) 0.8594 0.8247 0.8104 0.8138 M(MLE) 0.8307 0.8088 0.8034 0.8116 M(ECH) 1.0040 0.8900 0.9084 0.9376 M(CVA) 0.8929 0.7745 0.7796 0.8234 CRB 0.8195 0.8195 0.8195 0.8195 D(SUBEST1) 0.0424 0.0234 0.0119 0.0050 D(ECH) 0.1418 0.1381 0.1460 0.1190 D(CVA) 0.0224 0.0115 0.0073 0.0038 The row M(SUBEST1) shows the average of T times the mean-square error of the SUBEST1 estimates for all the parameters of the model, see (25). The row M(SUBEST1) displays the average of T times the mean-square difference between SUBEST1 and MLE for all the parameters of the model, see (25). CRB is the average of the Cramer-Rao Bounds obtained from the inverse of the Fisher information matrix estimates: M(SUBEST1) = R 2 T SUBEST1 − θi , θ̂i R (25) R 2 T SUBEST1 θ̂i − θ̂iML R (26) i=1 and D(SUBEST1) = i=1 where θ̂iSUBEST1 , θ̂iML denote the SUBEST1 and ML estimates, while θi denotes the true values given in (24). The quantity M(.) is T times the sample mean-squared error of the estimate. For asymptotically efficient methods, M(.) converges to the CramerRao Bound (CRB) which is a lower bound for the asymptotic variance of the parameter estimates. On the other hand, D(.) is T times the mean-squared difference between the corresponding method and ML estimates. For an asymptotically efficient method, D(.) → 0 as T → ∞. The results of the simulation are summarized in Table 2 and Fig. 1. Table 2 shows the average of M(.) and D(.) statistics for all the parameters in (24), for SUBEST1, the echelon subspace estimator (ECH) and CVA, computed for the same example by Deistler et al. (1995), and, finally, the corresponding CRB. The results suggest that SUBEST1, contrary to ECH, yields estimates that are as asymptotically efficient as those obtained by ML or CVA; see Table 2. Figure 1 evidences that M(SUBEST1) → CRB and D(SUBEST1) → 0, as T → ∞. This conclusion was confirmed by many other simulations (see the online Appendix at https://doi.org/10.100/s00362023-01451-4), while no counterexamples have been found. This result supports the use of SUBEST1 for the modeling method presented in the rest of the paper. 123 A. Garcia-Hiernaux et al. Fig. 1 Results of the simulation of model (24). Values for ECH and CVA come from Peternell (1995); see Table 2 for more detail 2.4 Relationships between SS and VARMAE representations As VARMAs, the IF of a given dynamic system is not unique. Note that for any nonsingular matrix T, applying the transformation xt∗ = T−1 xt , ∗ = T−1 T, Ŵ ∗ = T−1 Ŵ, E∗ = T−1 E, H∗ = HT to any IF, yields an alternative output-equivalent representation. A canonical IF representation is characterized by two elements, a specific structure of the transition matrix and a unique transformation matrix T. The main property 123 Identification of canonical... of canonical representations is that they realize the system output as a function of a unique parameter set and, therefore, are exactly identified.8 Hannan and Deistler (1988) show the equivalence between the VARMA E and the echelon IF by analyzing the structure of linear dependence relations between the rows of the Hankel matrix H = OC, where O is the infinite version of (13) and C := [K, K, 2 K, ...]; see Hannan and Deistler (1988), Theorems 2.5.1 and 2.5.2. In this paper we use the SS Luenberger Canonical Form (LCF).9 This representation is convenient because it is easy to obtain from it the equivalent VARMA E representation. Particularly, Proposition 2 in Casals et al. (2012) proves that if the coefficients of a IF process are known, then: (1) the KI vector can be derived from the observability matrix (13), and (2) the VARMA E representation can be obtained from a change of coordinates that transforms any IF into its corresponding LCF. We will use this result in Sect. 3.2. 3 Modeling stationary datasets This section details the main steps of our modeling strategy, which consist in: (i) determining the system order, (ii) estimating the KIs compatible with the previous decision, and (iii) refining the resulting model. 3.1 Step-1: Estimating the system order and a preliminary model In our approach, the first step consists in determining the minimum dynamic order (or McMillan degree) required to realize zt . The following remarks and examples may help to clarify this concept: 1. A model with n = 0 would be static, as its output could be realized by 0 dynamic components; 2. ...when n = 3 the observable time series can be realized as a linear combination of three first-order dynamic components; 3. ...the order of a standard VARMA( p,q) process is m × max{ p, q}; and, 4. ...the output of a n-order system can also be realized by systems of order n + 1, n + 2... Hence, a unique characterization for the McMillan degree requires the dimension of the system to be minimal (see assumption A2). The literature typically determines n by estimating a sequence of SS models over a predetermined search space for the McMillan degree, n = 0, 1, 2... These models can be efficiently estimated using subspace techniques, see Favoreel et al. (2000); Bauer (2005a), and compared with each other by means of: (a) LR tests, see Tsay (1989a); Tiao and Tsay (1989), (b) information criteria, see Akaike (1974); Schwarz (1978); Hannan and Quinn (1979), or (c) canonical correlation criteria, see Bauer (2001); Garcia-Hiernaux et al. (2012). 8 Note however that there can be different canonical forms for the same system, being each one of them exactly identified. Canonical forms present in most cases the minimum number of parameters required to realize the system output. 9 Luenberger (1967) proposed the first canonical form for multivariate processes. 123 A. Garcia-Hiernaux et al. Table 3 Output from NID when applied to zt in model (5) Unit roots 0 Results for different criteria n AIC SBC HQ SV C2 NIDC χ 2 (5%) 0 6.33 6.41 6.36 1.80 1.80 – 1 5.54 5.72 5.61 0.56 0.56 0.00 2 5.52 5.81 5.64 0.68 0.67 0.40 3 5.59 5.98 5.74 0.84 0.83 0.74 1 1 1 1 Estimated dynamic order 2 1 We determine the McMillan degree automatically by following the modal strategy suggested by Garcia-Hiernaux et al. (2012),which consists in determining n as the modal value of the orders chosen by a selection of the methods mentioned above. We refer to this procedure as NID (N-IDentification) and the following example illustrates its application. Example 2: Estimating the Step-1 model Consider again the VARMA(1,1) model (5) used in Example 1. We simulate 100 observations of this process and apply to them the previously mentioned NID algorithm, obtaining the results displayed in Table 3 , where the acronyms AIC, SBC and HQ denote the Akaike (1974), Schwarz (1978) and Hannan and Quinn (1979) information criteria, respectively, SV C2 and NIDC denote the canonical correlation-based criteria proposed by Garcia-Hiernaux et al. (2012) and, last, χ 2 (5%) denotes the p-value of the LR test for significance of the canonical correlations proposed by Tsay (1989a). The 5% value in parentheses denotes that automatic testing will reject the null if the p-value is smaller than 5%. Last, the orders automatically chosen by each criterion are displayed in the last row of Table 3. Note that the correct order (n = 1) was the modal choice, and the only criterion that overestimates n is the AIC. After determining the system order, it is easy to apply any of the subspace methods described in Sect. 2.3 to estimate the parameters of a n-order IF (7a–7b). In this case, ⊺ −1 2 and i = 5.10 We will we use a shift invariance approach with W1 = (Z f ⊥ Zp Z f ) call this a “Step-1 model”, and the estimated parameter matrices for this sample are: ˆ = .519, K̂ = 0.251 .258 , .924 .376 ˆ . a = .376 1.001 .640 Ĥ = , −.292 10 Unless otherwise indicated, we generally fix i = p = f to the integer closer to log(T ). 123 Identification of canonical... Chapter 9 in Casals et al. (2016) describes a simple numerical procedure to write an IF in an equivalent balanced VARMA(n,n) model.11 Applying this procedure to the previous IF returns the standard VARMA(1,1) model: F̄ 0 = L̄ 0 = I, −.519 0 ˆ , F̄ 1 = .237 0 ˆL̄ = −.236 .165 , 1 .120 −.075 whose values are very close to the true coefficients in (5). In the univariate case, the Step-1 model is a canonical representation which can only be refined by pruning non-significant parameters, see Sect. 3.3. In the multivariate case, assuring a canonical representation requires further refining. We discuss this issue in the next Section. However, even if this Step-1 model may be somewhat overparameterized, it might be sufficient to achieve the objectives of the model-building exercise. In this case, the modeling process could stop here. 3.2 Step-2: Estimating the KIs and its corresponding canonical model We already mentioned that IF and VARMA representations are interchangeable. In particular, Casals et al. (2012) show how to obtain the VARMA E representation of any IF when the matrices of the system and the KI, α, are known. However, the equivalence issue is still unsolved when the IF coefficients are not known, but estimated from a finite sample. That happens because the estimated observability matrix does not present the linear dependence structure implied by α, and then the similar transformation proposed in Casals et al. (2012) does not lead to the LCF and, hence, to the equivalent VARMA E . To solve this issue, we devised SUBEST1, a novel subspace algorithm that estimates the LCF model corresponding to any given KI vector α. The estimated parameters resulting from SUBEST1 coincide with those in a VARMA E form. The algorithm follows a two-stage procedure. Initial estimates of the system matrices (using a shift invariance approach, see Sect. 2.3.1) are used to obtain a transformation matrix T̂ . We then use this matrix to transform the states and finally apply a state approach (see Sect. 2.3.2) to estimate the echelon form. As far as we know, this procedure is new in the literature and is the main contribution of the paper. To describe the steps of SUBEST1, we will use the subindex notation {i: j}, meaning that any matrix Ai: j is the partition of A that includes the rows from i to j. Step-1. Assume that n is known or consistently estimated using the procedures presented in Sect. 3.1. Estimate , K and H using the subspace methods presented ˆ K̂ and Ĥ these estimates. in Sect. 2.3.1 and the system order n. We denote by , Step-2. For a given KI vector α, apply Proposition 2 in Casals et al. (2012) to obtain the transformation matrix T such that ∗ = T−1 T and H∗ = HT present the LCF structure when α, and H are known. Note that, even if the α imposed as constraint is the true KI, this does not lead to the ones/zeros structure of 11 The term “balanced” refers to the fact that the resulting VARMA model has the same autoregressive and moving average orders, being both equal to the system order. 123 A. Garcia-Hiernaux et al. ˆ and not with the true (unknown) the LCF, as T is computed with Ĥ and , matrices of the system. Step-3. Obtain the states X f and X f + from the model in Step-1, as explained in Sect. 2.3.2, and denote these estimates by X̂ f and X̂ f + , respectively. Then we compute the sequences of states, X∗f and X∗f + , that correspond to the LCF by applying the previous similar transformation T as X̂∗f = T−1 X̂ f and X̂∗f + = T−1 X̂ f + . Step-4. The LCF is defined by the matrices ∗ and H∗ as: ⎡ F1 F2 .. . Q1 0 .. . ⎢ ⎢ ⎢ =⎢ ⎢ ⎣Fn max −1 0 Fn max 0 ∗ 0 Q2 .. . ... ... .. . ⎤ 0 0 .. . ⎥ ⎥ ⎥ ⎥ and H∗ = F0 0 0 . ⎥ 0 . . . Qn max −1 ⎦ 0 ... 0 (27) where ∗ is a companion matrix such that each F j block ( j = 1, ..., n max ) has a number of rows equal to the number of KIs greater or equal to k, In fact, the (k, l)m̄ = m k=1 min{n k , 1} columns and some null elements. th element of F j will be nonzero only if j ∈ n k − n kl + 1, n k , where n kl was defined in (4). Each Qk block is a zeros/ones matrix, with as many columns as the number of KIs greater or equal to k. If the endogenous variables are sorted according to their corresponding KI, the structure of Qk will ⊺ be Qk = Ik+1 0 , where Ik+1 is an identity matrix with the same number of rows as Fk+1 . About H∗ , F0 is a m × m̄ matrix, such that the rows corresponding to components with a nonzero KI can be organized in a m̄ × m̄ lower triangular matrix with ones in the main diagonal. Therefore, ∗ and H∗ can be written as: ∗ H̄1 + I 0 ∗ = ∗1:m̄ ∗m̄+1:n , H∗ = H2∗ 0 (28) where H̄1∗ = H1∗ − I and matrices ∗1:m̄ , H̄1∗ and H2∗ include parameters to be estimated and zeros. On the other hand, the matrix ∗m̄+1:n is invariant and is only made up of ones and zeros. Therefore, estimating ∗ and H∗ boils down to estimating some elements in their m̄ first columns, i.e., some elements in ∗1:m̄ , H̄1∗ and H2∗ . Step-5. Building on the estimate of the states obtained in Step-3, and the LCF structure of matrices ∗ and H∗ , determined in Step-4, estimate the LCF corresponding to the given KI vector α. To this end, formulate the SS model: X∗f + = ∗1:m̄ ∗m̄+1:n 123 X∗f ,1:m̄ + K∗ A f 1 , X∗f ,m̄+1:n (29a) Identification of canonical... Z f1 = ∗ ∗ X f ,1:m̄ H̄1 + I 0 + A f1 , H2∗ 0 X∗f ,m̄+1:n (29b) where Z f1 and A f1 are defined in Eq. (23). Rearranging terms in (29a–29b) yields: ∗ ∗ X̃∗f + = ∗1:m̄ Xi,1: m̄ + K A f 1 ∗ H̄1 Z̃ f1 = X∗f ,1:m̄ + A f1 , H2∗ (30a) (30b) where X̃∗f + = X∗f + − ∗m̄+1:n X∗f ,m̄+1:n and Z̃ f1 = [I 0] X∗f ,1:m̄ . Notice ⊺ that estimating ∗1:m̄ , H̄1∗ , H2∗ and K∗ comes down to apply Eqs. (22–23), but substituting Z f1 , X̂ f + and X̂ f by, respectively, Z̃ f1 , X̂∗f + and X̂∗f , where the last two matrices were obtained in Step-3. Therefore, estimates of ∗ and H∗ given in (28) and obtained as the LS solution of (30a–30b) keeps the LCF structure.12 In practice, α is not known. However, it can be empirically determined bearing in mind that |α| = n, and that models with different α are not nested in general, see Example 1, Remarks 3 and 5. These remarks motivate our proposal, which consists in estimating the models corresponding to all the combinations of KIs which add up to the system order, already specified in the previous phase, and choosing the model that optimizes a given information criterion. This idea can be formalized by the following minimization problem: min (n 1 ,...,n m ) s.t. ˆ 1 , . . . , n m )| + C(T )d(n 1 , . . . , n m ) log|(n |α| = n̂, (31) ˆ 1 , . . . , n m ) is an estimate for the error covariance matrix, C(T ) is a where (n penalty function that determines which information criterion is to be optimized and d(n 1 , . . . , n m ) is the number of parameters of the corresponding echelon form. The usual alternatives are considered for the penalty term C(T ), i.e., 2/T (AIC), log(T )/T (SBC), and 2 log(log(T ))/T (HQ). Some authors have previously developed procedures to estimate α and its associated echelon form. For instance, Hannan and Rissanen (1982); Hannan and Kavalieris (1984a); Tsay (1989b); Poskitt (1992, 2016); Nsiri and Roy (1992); Lütkepohl (2005), among others, deal with the problem using a VARMA representation. To the best of our knowledge, only Peternell (1995); Peternell et al. (1996) propose subspace methods to this aim. However, most of these methods are not statistically efficient in general, see 12 As previously mentioned, the state approach using the CVA weighting scheme in the stationary case leads to statistically efficient estimates. Obviously, this property holds if correct linear restrictions are imposed. SUBEST1 uses restrictions that are only asymptotically valid, and it is not easy to discuss analytically what happens in this case. Simulations suggest that asymptotic efficiency remains, although it seems that SUBEST1 does not return estimates as close to pseudo MLE as the CVA estimator does, in finite samples. We thank an anonymous referee for this observation. 123 A. Garcia-Hiernaux et al. Table 4 KI estimates using the algorithm in Sect. 3.2 α (1, 2, 1) (2, 1, 1) (2, 2, 0) (3, 1, 0) (2, 2, 1) (3, 2, 0) (2, 1, 2) T = 150 0.16 98.02 1.18 0.24 0.12 0.14 0.14 300 0.0 99.50 0.20 0.10 0.12 0.02 0.06 500 0.0 99.78 0.08 0.02 0.10 0.0 0.02 Relative values are multiplied by 100 to facilitate the comparison. In the subspace algorithm, p = f are chosen as the integer closer to log(T ) Deistler et al. (1995), and some are computationally expensive or difficult to automate (Nsiri and Roy 1992). If SUBEST1 is asymptotically as efficient as MLE, as we conjectured in Sect. 2.3.3 based on simulation results, this would make it an attractive procedure with both, computational simplicity and statistical optimality. On this basis, the methodology proposed here returns both, the KIs resulting from the optimization problem (31) and the estimates of its associated LCF. It is structured in three steps: 1. Determine the system order, n, by means of the NID procedure. 2. Compute all the KI vectors which add up to n, and estimate its corresponding canonical models using SUBEST1. 3. Select the KIs associated of the model with the best ICs. To illustrate the performance of the algorithm, we simulated 5000 replications of a trivariate system with α = (2, 1, 1), which is treated as unknown. Table 4 shows the relative estimates of different α. Note that the procedure performs well even for relatively small samples and, as expected, its performance improves as T increases. Moreover, the relative number of cases in which the system order is misspecified (last three α vectors in Table 4) is very small, so the remarkable reduction in the number of models to be estimated by constraining |α| = n does not have a significant cost although, obviously, this will depend on the DGP. Additionally, Table 5presents the DGP and the average estimates of its parameters for different sample sizes. 3.3 Step-3: Refining the Step-2 model The third and final step consists in refining the canonical model obtained in Step-1, in the univariate case, or in Step-2, in the multivariate case, by different procedures such as: 1. Applying NID to the model residuals. If the model is statistically adequate, the order identified should be n = 0. A higher McMillan degree would imply that the residuals have a dynamic structure, i.e., they are autocorrelated, so the model should be re-specified. 2. Estimating the models by Gaussian ML for improved efficiency, if the normality assumption holds. 3. Excluding non-significant parameters to achieve further parsimony. Estimation and testing in this phase can be done by ML methods or iterative subspace-based techniques (see, Garcia-Hiernaux et al. 2009). 123 Identification of canonical... Table 5 Estimate of an echelon form using the algorithm in Sect. 3.2 T 150 300 500 F0 = Q 0 ⎤ ⎡ 1.0 0 0 ⎣ 0.4 1.0 0 ⎦ − 0.6 0 1.0 F̂0 = Q̂ 0 ⎡ 1.0 0 ⎣ 0.40 1.0 − 0.56 0 ⎡ 1.0 0 ⎣ 0.40 1.0 − 0.58 0 ⎡ 1.0 0 ⎣ 0.40 1.0 − 0.59 0 ⎤ 0 0⎦ 1.0 ⎤ 0 0⎦ 1.0 ⎤ 0 0⎦ 1.0 F1 ⎤ ⎡ − 0.7 0 0 ⎣ 0.2 − 0.5 − 0.9⎦ 0.4 0.3 − 0.2 F̂1 ⎡ − 0.67 ⎣ 0.18 0.30 ⎡ − 0.69 ⎣ 0.20 0.37 ⎡ − 0.69 ⎣ 0.20 0.39 ⎤ 0 0 − 0.47 − 0.89⎦ 0.44 − 0.09 ⎤ 0 0 − 0.49 − 0.90⎦ 0.34 − 0.17 ⎤ 0 0 − 0.50 − 0.90⎦ 0.32 − 0.20 Average values of the estimates obtained when α̂ = (2, 1, 1) F2 ⎤ ⎡ 0.3 − 0.2 0.5 ⎣0 0 0⎦ 0 0 0 F̂2 ⎡ 0.20 − 0.07 ⎣ 0 0 0 0 ⎡ 0.28 − 0.17 ⎣ 0 0 0 0 ⎡ 0.29 − 0.19 ⎣ 0 0 0 0 ⎤ .60 0⎦ 0 ⎤ 0.52 0 ⎦ 0 ⎤ 0.50 0 ⎦ 0 Q1 ⎤ ⎡ − 0.2 0.4 .7 ⎣ 0.6 − 0.3 − 0.4⎦ 0.3 1.0 − 0.8 Q̂ 1 ⎡ − 0.21 ⎣ 0.55 0.24 ⎡ − 0.20 ⎣ 0.59 0.30 ⎡ − 0.20 ⎣ 0.60 0.30 ⎤ 0.39 0.70 −0.34 − 0.33⎦ 0.97 − 0.45 ⎤ 0.40 0.69 − 0.30 − 0.38⎦ 0.95 − 0.66 ⎤ 0.40 0.70 − 0.30 − 0.39⎦ 0.96 − 0.72 Q2 ⎡ ⎤ 0.3 0.5 − 0.8 ⎣0 0 0 ⎦ 0 0 0 Q̂ 2 ⎡ ⎤ 0.23 0.43 − 0.48 ⎣ 0 0 0 ⎦ 0 0 0 ⎤ ⎡ 0.30 0.44 − 0.68 ⎣ 0 0 0 ⎦ 0 0 0 ⎤ ⎡ 0.31 0.46 − 0.74 ⎣ 0 0 0 ⎦ 0 0 0 123 A. Garcia-Hiernaux et al. Note that Step-3 seeks parsimony. While this a desirable feature, it is somewhat balanced by the complex procedures and modeling decisions required. We illustrate Steps 2 and 3 with the same data used in Example 2. Example 3: Estimating and refining a canonical representation We applied the procedure described in Sect. 3.2 to determine α in the simulated data already used in Example 2, with the constraint that the system order is n = 1. Again we set i = log(T ) = 5 in SUBEST1, which returns a SBC value of 5.90 and 6.02 for α = (1, 0) and α = (0, 1), respectively. So, the α = (1, 0) final model in VARMA E form, estimated by Gaussian ML is: ⎡ 1.0 0 (−) ⎤ ⎡ ⎦ , F̂ 1 = ⎣ F̂ 0 = L̂ 0 = ⎣ (−) 0.464 1.0 (.153) (−) 0.921 0.377 ˆ a = , 0.377 0.998 ⎤ ⎡ −0.272 0.170 (−)⎦ (.074) ⎦ , , L̂ 1 = ⎣ (.112) 0 0 0 −0.510 0 (.111) 0 (−) ⎤ (−) (−) (−) very similar to (6). The standard errors are in parentheses. The residual diagnostics did not reveal any symptom of misspecification. 4 Nonstationary and seasonal processes We will now discuss the extension of the previous analysis to nonstationary and seasonal variables. Instead of modeling directly the nonstationary process (as in 2005c), we follow the traditional strategy of identifying the integration order of each variable (or cointegration rank and vector) and then applying the corresponding transformation to induce stationarity. 4.1 Nonstationarity Assume now that the system output, yt ∈ Rm , is nonstationary. The methods described in Sect. 3 require stationarity, so we will first proceed to determine how to transform yt into a stationary vector zt . The literature provides many tools to this end, ranging from graphical assessment (Box et al. 2015), to univariate unit roots testing (Dickey and Fuller 1981; Kwiatkowski et al. 1992), and cointegration testing, (Johansen 1991). Subspace approaches have also been applied to determine the stationary transformation. In this case, the methods build on analyzing the canonical correlation coefficients (σ j , j = 1, . . . , i) between the past and future information subspaces (Z p and Z f , respectively, see Sect. 2.3). Bauer and Wagner (2002) and Bauer (2005c) extend the stationary theory to analyze I(1) and cointegrated variables within this framework. In a later work, Garcia-Hiernaux et al. (2010) discuss nonstationarity and cointegration analysis in the same line. The main idea in these papers is that the estimated σ j corresponding to unit roots converge to 123 Identification of canonical... their true values (unity) much faster than the other canonical correlation coefficients. This property, known as superconsistency, allows one to distinguish both kinds of correlations. Here we use the latter method because of it is simple and can be implemented within the aforementioned NID procedure. Specifically, Garcia-Hiernaux et al. (2010) propose the following unit root criterion (URC) to identify the σ j corresponding to unit roots: U RC = 1 − σ̂ j2 − G(T , i, d) ≤ 0. (32) This criterion concludes that σ j = 1 when the distance between σ̂ j2 and the unity is smaller than (or equal to) the penalty function, G(T , i, d), which depends on the sample size, T , the row dimension of Z p , i, and the number of unit roots we wish to evaluate, d. This threshold is computed through Monte Carlo simulations. In the univariate case, m = 1, we apply the URC to determine d, so that z t = ∇ d yt is stationary. We could then proceed as described in Sect. 3 with z t . In the multivariate case, m > 1, the procedure becomes more complex as potential cointegrating relations come to play. Particularly, when yt is made up of m I(1) processes, then the cointegrating rank is c = m − d, being d the number of unit roots of the multivariate process. Building on this basis, we determine the cointegrating rank using the following algorithm: 1. Check that the series are I(1), applying UCR to every single series. 2. Estimate the number of unit roots of the multivariate process, d̂, applying UCR to the multivariate process. 3. Calculate the cointegrating rank as the difference between the system dimension and the number of unit roots estimated in the previous step, i.e., ĉ = m − d̂. When the procedure above suggests a cointegrating relation (ĉ > 0), Garcia-Hiernaux et al. (2010) show how to estimate the cointegrating matrix of yt . This matrix represents the linear combinations of yt which are stationary and, after applying this transformation, we proceed as described in Sect. 3. 4.2 Seasonality The methodology in Sect. 3 can be extended to model the multiplicative seasonal structure of process by using systems in cascade; see Chapter 7 in Casals et al. (2016). However, as in the previous section, the adequate transformation to induce stationarity must be decided first. We do not consider here seasonal cointegration, so we will address this problem in the univariate case. Assume that a given series, yt , has a seasonal dynamic structure of period s, meaning that there are s observations per seasonal cycle. Following Box et al. (2015), we denote the regular difference operator by ∇ = 1 − B, and the seasonal difference operator by ∇s = 1 − B s . In seasonal processes, we would first compute the seasonal sum of yt , defined as: wt = (1 + B + B 2 + · · · + B s−1 )yt , (33) and then apply NID to detect unit roots at the zero frequency in wt , ∇wt , ∇ 2 wt ... If a single unit root in wt is detected, the stationary transformation would be the seasonal 123 A. Garcia-Hiernaux et al. difference: z t = ∇(1 + B + B 2 + · · · + B s−1 )yt = ∇s yt . (34) Similarly, detecting two unit roots in wt would imply that the stationary transformation is: (35) z t = ∇ 2 (1 + B + B 2 + · · · + B s−1 )yt = ∇∇s yt , and so on. After the stationary transformation is identified and applied, one must run the NID procedure to determine, first, regular stationary structures of order n = 0, 1, 2, . . . , and then seasonal structures in the range n = 0, s, 2s, 3s, . . . After fitting univariate models to each series in a dataset, one can cope with a vector of seasonal time series by: (a) filtering out the seasonal structure of each series using, e.g., the exact decomposition in Casals et al. (2002), and then (b) modeling the seasonally adjusted vector using the approach described in Sects. 3 and 4.1. After doing this, one could also estimate a final model combining the regular specification with a seasonal factor, given by seasonal AR, MA and transformation matrices, containing in the main diagonal corresponding factors of the univariate seasonal models. Finally, as mentioned above, we do not address the case of seasonal cointegration. Recently, Bauer and Buschmeier (2021) went further in this sense proving that CVA, see Sect. 2.3, provides consistent estimators for long-run and short-run dynamics, even in the presence of seasonal unit roots. They consider multivariate systems and, accordingly, propose new tests for the cointegrating rank at seasonal frequencies. The simulations show that CVA performs better than some alternatives in specific cases (small sample, large system dimension). This shows the flexibility of subspace methods to model vectors of time series, but a deeper discussion of this issue is beyond the scope of this paper. 5 Univariate modeling examples We will now illustrate the performance of the methodology proposed in previous sections when applied to build univariate models. Section 5.1 presents a step-by-step analysis of the famous series G for International Airline Passengers (Box et al. 2015), which is a popular benchmark for practical cases emphasizing in nonstationarity and seasonality. Section 5.2 applies the modal automatic strategy described in Sect. 3.1 to nine benchmarks commonly used in the literature. 5.1 Airline passenger series There is a consensus in the literature that the monthly (log) Airline Passenger series, z t = log(yt ), is adequately represented by an IMA(1,1) × IMA(1,1)12 model. We will show now that our methodology leads to this model. Following the steps described in Sect. 4.2, we first compute the seasonal sum, wt = (1 + B + ... + B s−1 )z t , where s = 12, and apply to it the procedure for unit root detection, finding: (a) at least one unit root in wt (Table 6), (b) at least one unit root in ∇wt (Table 7 ), and (c) zero unit roots in ∇ 2 wt . These decisions seem quite 123 Identification of canonical... Table 6 Output from NID when applied to ∇wt Unit roots 1 Results for different criteria n AIC SBC HQ SV C2 NIDC χ 2 (5%) 0 − 6.13 − 6.10 − 6.12 10.74 10.74 – 1 − 8.44 − 8.38 − 8.42 0.15 0.13 0.00 2 − 8.41 − 8.30 − 8.37 0.22 0.17 0.15 3 − 8.38 − 8.23 − 8.32 0.23 0.16 0.06 1 1 1 1 SV C2 NIDC χ 2 (5%) Estimated dynamic order 1 Table 7 Output from NID when applied to ∇ 2 wt 1 Unit roots 0 Results for different criteria n AIC SBC HQ 0 − 8.29 − 8.27 − 8.28 0.24 0.24 – 1 − 8.43 − 8.36 − 8.40 0.16 0.14 0.00 2 − 8.40 − 8.29 − 8.36 0.21 0.17 0.07 3 − 8.38 − 8.22 − 8.31 0.24 0.17 0.20 1 1 1 1 SV C2 NIDC χ 2 (5%) Estimated dynamic order 1 Table 8 Determination of the seasonal subsystem order for ∇∇12 z t with NID 1 Results for different criteria n AIC SBC HQ 0 − 3.32 − 3.30 − 3.32 0.39 0.3 – 1 − 3.57 − 3.50 − 3.58 0.11 0.09 0.01 2 − 3.54 − 3.43 − 3.56 0.17 0.13 0.34 3 − 3.51 − 3.36 − 3.53 0.24 0.17 0.34 1 1 1 1 Estimated dynamic order 1 1 reliable as the first canonical correlation coefficient for ∇wt and ∇ 2 wt are .9925 and .9565, and so the Unit Root Criterion presented in (32) clearly indicates the existence of a unit root in each case. Therefore, an adequate stationary transformation is ∇ 2 wt = ∇∇12 z t , which coincides with the standard result in the literature. Note that Table 7 also determines the order of the regular system, which is set at n̂ = 1. The next step consists in determining the order of the seasonal subsystem. To this end, we apply NID to ∇∇12 z t , setting the length of the seasonal cycle to s = 12. By doing so, we obtain the results shown in Table 8. Accordingly, the dynamic order of 123 A. Garcia-Hiernaux et al. Table 9 Output from NID when applied to â2t in Eq. (37) Unit roots 0 Results for different criteria n AIC SBC HQ SV C2 NIDC χ 2 (5%) 0 − 3.77 − 3.74 − 3.76 0.12 0.12 – 1 − 3.76 − 3.69 − 3.73 0.14 0.12 0.36 2 − 3.73 − 3.62 − 3.68 0.18 0.13 0.33 3 − 3.71 − 3.56 − 3.65 0.23 0.17 0.43 0 0 1 0 Estimated dynamic order 0 0 the seasonal subsystem is set to n̂ s = 1 (see Garcia-Hiernaux et al. 2012, for the use of NID with seasonal series). After this preliminary specification analysis, we apply SUBEST1 to the stationary transformed series, setting the orders of the regular and seasonal subsystems to the previously determined values: n̂ = 1 and n̂ s = 1. This yields the balanced model: (1 − 0.06 B)(1 − 0.08 B 12 )∇∇12 z t = (1 − 0.43 B)(1 − 0.58 B 12 )â1t , (0.21) (0.17) (0.21) (0.16) (36) where the figures in parentheses are the parameter standard errors, obtained by computing the corresponding information matrix. The estimate for the percent standard deviation of the error is σ̂a1 = 3.68%. To achieve the highest efficiency with Gaussian data and to exclude some nonsignificant parameters, one can compute MLE estimates using the previous values as initial conditions for the iterative optimization.13 In this case, this provides the following result: ∇∇12 z t = (1 − 0.40 B)(1 − 0.56 B 12 )â2t , σ̂a = 3.67%, (0.08) (0.08) (37) which essentially coincides with the results in the literature. Finally, one can check for autocorrelation by applying NID to the residuals of this tentative model. If we do so setting the frequencies at s = 1 and s = 12 to check for additional regular or seasonal dynamic structures, we obtain Tables 9 and 10, whose results show no evidence of misspecification. 5.2 Semi-automatic identification for real and simulated time series Now we apply our methodology to nine benchmark time series taken from the literature, using the modal strategy described in Sect. 3.1. Table 11describes these series, while Tables 12 and 13 summarize the main results. Note that: 13 With the highest efficiency we mean that, even if simulations in Sect. 2.3.3 suggest that SUBEST1 is asymptotically efficient, they also show that ML estimates have a smaller variance in short samples. 123 Identification of canonical... Table 10 Determination of the seasonal subsystem order for â2t in Eq. (37) Results for different criteria n AIC SBC HQ SV C2 NIDC χ 2 (5%) 0 − 3.77 − 3.74 − 3.76 0.04 0.04 – 1 − 3.74 − 3.67 − 3.71 0.11 0.09 0.29 2 − 3.72 − 3.61 − 3.68 0.16 0.12 0.66 3 − 3.69 − 3.54 − 3.63 0.23 0.17 0.76 0 0 0 0 Estimated dynamic order 0 0 1. The stationary transformation was correctly determined in all the cases. 2. When working with seasonal series (series B, G y H) the final results coincide essentially with the models suggested by the literature, with two minor issues: (a) Seasonality in series G is weak, so it was not detected until we applied NID to the residuals of the initial regular model; (b) In the model for series H, the MLE of the seasonal MA parameter converged to unity, which implies a deterministic seasonality. 3. All the models in the column “NID+SUBEST1+ML" in Table 13 coincide with the specifications proposed by the literature. Regarding the performance of the modal strategy, there was a perfect agreement between all the criteria employed except for series D, where SBC suggests a firstorder process, while the remaining procedures suggests the “consensus” second-order specification. 6 Multivariate modeling examples Now we will show two examples of multivariate modeling, chosen because of the different cointegration properties of the final models. The first one is not cointegrated, while the second has one nonstationary common factor. 6.1 Flour price indices in three cities The first exercise is based on three monthly series of flour price indexes in Buffalo, Minneapolis and Kansas City, recorded from August 1972 to November 1980, which will be denoted by yt . This dataset is adequate to test the “Law of one price”, as flour is a quite homogeneous product and these cities are close enough to allow for product flows. Figure 2 displays the profile of these series. Note that they show a high degree of comovement and so, they could be cointegrated. Despite this graphical evidence, previous analyses reject the null of cointegration for this dataset; see Tiao and Tsay (1989); Lütkepohl and Poskitt (1996). This makes it a good benchmark to test for false positives in cointegration testing. 123 123 Table 11 Datasets employed to estimate the models in Tables 12 and 13 Series T A 45 B⋆ 150 Description Daily average number of defects per truck, taken from Burr (1976) Quarterly simulated series, taken from Wei (2006) C 35 D⋆ 200 Simulated series n. 2, taken from Wold (1964) Series W4. Monthly unemployed young women in the U.S. from Jan 1961 to Dec 1985, taken from Wei (2006) E 226 Series C-Chemical Process Temperature Readings, taken from Box and Jenkins (1970); Box et al. (2015) F 197 Series A-Chemical Process Concentration Readings, taken from Box and Jenkins (1970); Box et al. (2015) G 178 Monthly employers in food production in Wisconsin, Jan 1961–Oct 1975, taken from O’Donovan (1983) H 32 Quarterly Production of Beer in the US; 1975-1Q, 1982-4Q, taken from Wei (2006) I⋆ 36 Simulated series, taken from Makridakis et al. (1983) In Tables 12 and 13 we present the DGP for series B and D, and the author’s specification and estimates for series I A. Garcia-Hiernaux et al. Series T Estimates reported by the literature NID + SUBEST1 A 45 (1 − 0.43 B)(Z t − 1.79 ) = at (1 − 0.61 B)(Z t − 3.94 ) = (1 − 0.24 B)at σ̂a2 = 0.21 σ̂a2 = 0.21 ∇∇4 Z t = (1 − 0.80B)(1 − 0.60B 4 )at (1 + 0.22 B)(1 + 0.21 B 4 )∇∇4 Z t = (1 − 0.56 B)(1 − 0.49 B 4 )at σ̂a2 = 1.00 σ̂a2 = 0.97 B C D E 150 35 200 226 (0.13) (0.07) (0.30) G H 123 I 197 178 32 36 (0.13) (0.31) (0.13) (0.13) (0.12) ∇ Z t = (1 − 0.51B )at (1 − 0.00B )∇ Z t = (1 − 0.51B )at σ̂a2 = 1397.27 σ̂a2 = 1380.35 (1 − 0.70B + 0.49B 2 )Z t = at (1 − 0.84 B + 0.51 B 2 )Z t = (1 − 0.16 B + 0.05 B 2 )at σ̂a2 = 0.60 σ̂a2 = 0.66 (1 − 0.82 B)∇ Z t = at (1 − 0.82 B)∇ Z t = (1 − .02 B)at (0.05) (0.11) (0.11) (0.04) (0.11) (0.10) (0.11) (0.06) 2 σ̂a = .02 σ̂a2 = 0.02 F (0.13) (.08) (1 − 0.92 B)Z t = 1.45 + (1 − 0.58 B)at (1 − 0.88 B)Z t = 2.08 + (1 − 0.50 B)at σ̂a2 = 0.10 σ̂a2 = 0.10 (0.04) (0.08) (0.09) (0.11) (0.02) (0.10) (1 − 0.79 B)∇12 Z t = (1 − 0.76 B 12 )at (1 − 0.56 B)(1 − 0.05 B 12 )∇12 Z t = − 2.11 + (1 − 0.18 B)(1 − 0.35 B 12 )at σ̂a2 = 1.24 σ̂a2 = 1.54 (0.05) (0.05) (0.10) (0.18) (0.10) ∇4 Z t = 1.49 + (1 − 0.87 B 4 )at (1 − 0.14 B 4 )∇4 Z t = 1.21 + (1 − 0.71 B 4 )at σ̂a2 = 2.39 σ̂a2 = 2.03 Z t = 51.03 + at Z t = 47.31 + at σ̂a2 = 818.81 σ̂a2 = 790.54 (0.09) (4.84) (0.16) (0.26) (4.77) (0.30) (0.23) (0.11) (0.17) Identification of canonical... Table 12 Step-1 univariate models, resulting from the successive application of NID and SUBEST1. Standard errors are in parentheses 123 Table 13 Univariate models, resulting from the successive application of NID, SUBEST1 and ML. Standard errors are in parentheses Series T Estimates reported by the literature NID+SUBEST1+ML A 45 (1 − 0.43 B)(Z t − 1.79 ) = at (1 − 0.42 B)(Z t − 1.81 ) = at σ̂a2 = 0.21 σ̂a2 = 0.21 ∇∇4 Z t = (1 − 0.80B)(1 − 0.60B 4 )at ∇∇4 Z t = (1 − 0.75 B)(1 − 0.63 B 4 )at σ̂a2 = 1.00 σ̂a2 = 0.91 B C D E 150 35 200 226 (0.13) (0.07) (0.15) G I 178 32 36 (0.07) ∇ Z t = (1 − 0.51B )at σ̂a2 = 1397.27 σ̂a2 = 1393.12 (1 − 0.70B + 0.49B 2 )Z t = at (1 − 0.72 B + 0.41 B 2 )Z t = at σ̂a2 = 0.60 σ̂a2 = 0.66 (1 − 0.82 B)∇ Z t = at (1 − 0.82 B)∇ Z t = at (0.05) (0.05) (0.06) (0.04) (0.06) (0.04) 2 σ̂a = 0.02 (1 − 0.92 B)Z t = 1.45 + (1 − 0.58 B)at (1 − 0.92 B)Z t = 1.45 + (1 − 0.60 B)at σ̂a2 = 0.10 σ̂a2 = 0.10 (0.04) (0.08) (0.04) (0.67) (0.08) (1 − 0.79 B)∇12 Z t = (1 − 0.76 B 12 )at (1 − 0.77 B)∇12 Z t = (1 − 0.61 B 12 )at σ̂a2 = 1.24 σ̂a2 = 1.31 (0.05) (0.05) ∇4 Z t = 1.49 + (1 − 0.87 B 4 )at (0.09) σ̂a2 = 2.39 (0.16) (0.05) ∇4 Z t = 1.55 + ∇4 at (0.13) σ̂a2 = 1.64 Z t = 51.03 + at Z t = 51.03 + at σ̂a2 = 818.81 σ̂a2 = 818.86 (4.84) (4.84) (0.07) A. Garcia-Hiernaux et al. H 197 (0.06) ∇ Z t = (1 − 0.51B )at σ̂a2 = 0.02 F (0.07) Identification of canonical... Fig. 2 Logs of flour price indexes in Buffalo, Minneapolis and Kansas City. Monthly data from August 1972 to November 1980 Table 14 Output from NID when applied to log(yt ) Unit roots 3 Cointegration relationships 0 Results for different criteria n AIC SBC HQ SV C2 N I DC χ 2 (5%) 0 − 1.18 − 1.03 − 1.12 56.21 56.21 – 1 − 5.63 − 5.32 − 5.50 8.15 8.08 0.00 2 − 7.60 − 7.13 − 7.41 4.12 3.97 0.00 3 − 9.02 − 8.39 − 8.77 1.18 0.97 0.00 4 − 8.90 − 8.12 − 8.59 1.32 1.04 0.99 3 3 3 3 Estimated dynamic order 3 3 Applying NID to each individual series concludes that they all are I(1). Again, the first canonical correlation is close to unity in the three series, so this decision seems solid. To check for cointegration, we apply NID to the vector log(yt ), detecting again three unit roots for the multivariate process. Here the first three canonical correlations are .99, .94 and .88, respectively. Although the unit root criterion presented in (32) indicates three unit roots, note that the last one is a borderline case. Anyway, cointegration is rejected, see Table 14. Table 14 shows no evidence of any dynamic structure beyond the unit roots. This could be due to the fact that nonstationary components often dominate other weaker dynamic components, which detection requires a previous stationarity inducing transformation. For this reason, we apply again the NID algorithm to the stationary transformation ∇ log(yt ), obtaining the results in Table 15. In this case there is a tie, as three criteria detect a first-order structure while the other three conclude that the order 123 A. Garcia-Hiernaux et al. Table 15 Output from NID when applied to ∇ log(yt ) Unit roots 0 Results for different criteria n AIC SBC HQ SV C2 N I DC χ 2 (5%) 0 − 13.61 − 13.45 − 13.55 0.67 0.67 – 1 − 13.83 − 13.52 − 13.71 0.80 0.73 0.09 2 − 13.75 − 13.28 − 13.56 0.98 0.84 0.39 3 − 13.68 − 13.05 − 13.43 1.13 0.92 0.92 4 − 13.59 − 12.81 − 13.28 1.34 1.05 0.99 1 1 0 0 0 Estimated dynamic order 1 is zero. When this happens, we prefer choosing the higher dynamic order because an overfitting error can be easily solved by pruning insignificant parameters. Therefore, we choose n̂ = 1. After deciding the system order, SUBEST1 (with i = 3) provides estimates for α and its corresponding VARMA E . In this case, AIC, SBC and HQ coincide in choosing α̂ = (1, 0, 0). Accordingly, the estimated model is: F̂(B)∇ log(yt ) = L̂(B) ât , (38) with: ⎤ ⎡ ⎤ 1 − 0.82B 1.08B 0.09B 1 + 0.39B 0 0 1 0 ⎦, F̂(B) = ⎣ −0.71 1 0⎦ L̂(B) = ⎣ −0.71 −0.45 0 1 −0.45 0 1 ⎡ ⎤ 0.21 − − ˆ a (%) = ⎣0.22 0.25 − ⎦ , 0.22 0.24 0.28 ⎡ (39) where the noise covariance estimates are presented in percentage. We then estimate the previous structure by ML, obtaining: ⎡ 1 + 0.41 B 0 0 (0.17) ⎢ ⎢ F̂(B) = ⎢ −0.73 (0.11) ⎣ − .49 (0.20) ⎡ 0.21 − ⎢(0.03) ⎢ 0.22 0.25 ˆ a (%) = ⎢(0.03) (0.03) ⎣ 0.21 0.24 (−) (−)⎥ 1 0⎥ (−) (−)⎥ ⎦ 0 1 (−) (−) ⎤ − ⎥ −− ⎥ ⎥ ⎦ 0.28 (0.03) (0.04) (0.04) 123 ⎤ ⎡ 1 − 0.94 B 1.32B 0 (0.46) ⎢ ⎢ L̂(B) = ⎢ −0.73 (0.11) ⎣ −0.49 (0.20) ⎤ (0.37) (−)⎥ 1 (−) 0 (−) 0⎥ (−)⎥ ⎦ 1 (−) (40) Identification of canonical... Table 16 Output from NID when applied to ât ⋆ Unit roots 0 Results for different criteria n AIC SBC HQ SV C2 N I DC χ 2 (5%) 0 − 13.96 − 13.80 − 13.90 0.57 0.57 – 1 − 13.87 − 13.56 − 13.75 0.75 0.68 0.91 2 − 13.76 − 13.29 − 13.57 0.87 0.72 1.00 3 − 13.68 − 13.05 − 13.42 1.09 0.88 0.99 4 − 13.60 − 12.82 − 13.29 1.35 1.07 0.99 0 0 0 0 0 Estimated dynamic order 0 ⋆ â are the residuals from model (40) t where the (1,3)-element in L̂ was found non-significant and therefore pruned. Last, Table 16 summarizes the output of NID when applied to the residuals of model (40), which does not show any symptom of misspecification. The Q matrix of Ljung and Box (1978), computed with 7 degrees of freedom, confirms this conclusion: Q(7) ⎡ ⎤ 3.5 3.3 3.2 = ⎣4.6 4.3 4.9⎦ . 6.9 5.6 4.0 (41) The VARMA echelon model (40) implies that the three series share a first-order stationary AR component, and receive the effect of the innovations in Buffalo and Minneapolis at t − 1. This structure is represented by seven free parameters, so it is parsimonious in comparison, e.g., with the nine parameters of the VAR(1) specification proposed by Tiao and Tsay (1989). On the other hand, note that the error covariance matrix in Equation (40) implies that the disturbances in this model are highly correlated. This fact would explain the comovement observed in Fig. 2 and suggests that, in this case, the “Law of one price” could be described, not by common trends, but by common shocks affecting prices in different locations. 6.2 US interest rates We will model now four monthly short-term US interest rates, from August 1985 to January 2003, already analyzed by Martin-Manjon and Treadway (1997). These are the Fed’s Target Rate (T Ot ), the effective rate (T E t ), and the secondary market yields for 3-month (T 3m t ) and 6-month T-bills (T 6m t ). Figure 3 shows the profile of these series. We determine the integration order for each individual variable by applying NID. The unit root criterion clearly indicates that each individual series is I(1). The second step in the analysis consists in determining the dynamic order for the vector of variables 123 A. Garcia-Hiernaux et al. Fig. 3 Short-term interest rates. Monthly data from August 1985 to January 2003 Table 17 Output from NID when applied to Z1t ⋆ Unit roots 1 Cointegration relationships 3 Results for different criteria n AIC SBC HQ SV C2 N I DC χ 2 (5%) 0 4.82 4.98 4.88 112.71 112.71 – 1 0.23 0.52 0.35 4.12 4.06 0.00 2 − 1.22 − 0.80 − 1.05 2.35 2.23 0.00 3 − 2.00 − 1.46 − 1.78 1.21 1.02 0.00 4 − 2.18 − 1.51 − 1.90 1.26 1.01 0.00 4 3 4 4 Estimated dynamic order 4 ⋆ X7 1t = (T E t 4 T 3m t T 6m t T Ot )⊺ by applying NID again. In this case, the procedure detects one unit root and, therefore, three cointegrating relationships, see Table 17.14 NID also provides estimates for the coefficients in the cointegrating equations, so the stationary-inducing transformations for the data is given by: ⎤ ⎡ ⎤⎡ ∇ 000 T Ot ⎢− 1.01 1 0 0⎥ ⎢ T E t ⎥ ⎥ ⎥⎢ (42) zt = ⎢ ⎣− 0.92 0 1 0⎦ ⎣T 3m t ⎦ T 6m t − 0.93 0 0 1 Therefore, cointegration operates between pairs of variables and the stationary combinations are close to the spreads between each rate and the Fed’s target. Note 14 The numerical result of applying URC in (32) to the first four canonical correlations is −.127, .055, .265, .649, concluding then that there is only one unit root in the multivariate process. 123 Identification of canonical... Table 18 Output from NID when applied to Z1t ⋆ Unit roots 0 Results for different criteria n AIC SBC HQ SV C2 N I DC χ 2 (5%) 2 − 4.04 − 3.63 − 3.87 2.41 2.39 0.00 3 − 4.64 − 4.10 − 4.42 1.63 1.60 0.00 4 − 4.88 − 4.21 − 4.61 1.52 1.47 0.00 5 − 4.81 − 4.01 − 4.49 1.58 1.52 0.31 4 4 4 4 Estimated dynamic order 4 ⋆Z 1t ≃ (∇T Ot 4 T E t − T Ot T 3m t − T Ot T 6m t − T Ot )⊺ also that the structure of Equation (42) has a clear interpretation in terms of economic controllability, as determining the value of the Target Rate T Ot implies choosing the long-run mean value of T E t , T 3m t and T 6m t Applying NID to the stationarytransformed vector, zt , detects four stationary dynamic components and no additional unit roots, see Table 18. In this situation, the next steps in the analysis would be: (a) determining the KI vector α, (b) estimating the final model, and (c) performing diagnostic checks. We do not include this results here because they do not add much to what was shown in previous examples. 7 Discussion The model-building approach proposed in this paper has substantial advantages in comparison with its main alternatives. 7.1 Comparison with the VAR methodology The procedure to fit a Step-1 model described in Sect. 3.1 is similar to standard VAR model-building, see Lütkepohl (2005), as it relies on LS methods, explores a succession of linear models with increasing orders and compares the resulting models both, in terms of LRs and ICs. However, there are two important differences. First, in VAR modeling the exploration of possible dynamic orders is not systematic, because it skips some intermediate dynamic orders. To see this, consider a vector of three time series. A standard VAR analysis would consider vector autoregressions of orders 0, 1, 2... which corresponding dynamic orders would be 0, 1×3 = 3, 2×3 = 6... In comparison, our method explores the whole sequence of positive integer orders (0, 1, 2,...) up to a user-specified upper limit. Because of this, standard VAR modeling will be prone to over or under-specifying the system order. 123 A. Garcia-Hiernaux et al. Second, the VAR approach considers only autoregressive structures, while Step-1 models have both, AR and MA terms. Accordingly, if the DGP process has some moving average structure our approach should provide a more parsimonious representation, even if we do not proceed to Step-2. Third, while a Step-1 model can be compared with a VAR, in terms of the modeling decisions required (just deciding a scalar order) our approach provides a way to achieve further parsimony by enforcing a canonical structure (i.e., fitting a Step-2 model) without much additional complexity. Conversely, the VAR methodology does not provide such an option. 7.2 Comparison with VARMA specification methods In comparison with alternative VARMA specification procedures, as those in Jenkins and Alavi (1981), Tiao and Box (1981) and Reinsel (1997), our methodology: 1. ...provides a unified modeling procedure for single and multiple time series because the same basic decisions are required in both cases, 2. ...is scalable, meaning that the analyst may choose between a quick nonparsimonious Step-1 model, or the more efficient Step-2 and Step-3 models, 3. ...is easy to implement in automatic mode, and 4. ...offers a flexible approach to the parsimony principle, which can be enforced in the latter stages of the analysis by adopting an efficient canonical structure (Step-2) and pruning non-significant parameters (Step-3). On the other hand, standard VARMA specification methods: (a) do not enforce coherence with univariate model-building, (b) are difficult to implement in automatic mode, (c) are not scalable, as they do not provide cost-efficient relaxations of the parsimony requirement and, in contradiction with this, (d) may yield an overparameterized model, as they do not assure the final model to be canonical. 7.3 Comparison with other methods using the Kronecker indices The closer benchmarks for our proposal are the alternative methods to determine the KIs and, hence, to specify a VARMA E . This literature is rich and includes, among many works, those of Hannan and Kavalieris (1984b), Poskitt (1992), Lütkepohl and Poskitt (1996), Tsay (1989b), Li and Tsay (1998) and Poskitt (2016). The common feature of all these methods is that they directly identify the KIs α = (n 1 , n 2 , . . . , n m ), while we first determine the system order, n, and only then estimate and compare all the models with a sum of KIs equal to n. Our approach has several advantages: 1. Determining the system order is useful in itself, as it allows one to specify a crude but fast Step-1 model which may be adequate in some cases, see Sect. 3.1, thus avoiding the need to determine α. 2. The dynamic order can be obtained with both, ICs and LR tests, because lower order specifications are nested in higher order ones. On the other hand, models with different α non-nested in general, so they can only be compared with ICs. 123 Identification of canonical... 3. Limiting the search space to the KIs compatible with a previously estimated system order provides a faster and more robust performance than searching in an unbounded space. Besides these advantages, our proposal adds extensions for nonstationary time series, cointegration and seasonality, not treated by other VARMA E approaches. 7.4 Comparison with the SCM approach Tiao and Tsay (1989) propose a structured method to specify a canonical model. They build on the so-called “Scalar Component” (SC), which is the basis of the family of “Scalar Component Models” (SCM). These concepts are analogous, but not equivalent, to those of KIs and VARMA E . In a nutshell, they build on a canonical correlation analysis to find a contemporaneous linear transformation for the vector process that: (a) reveals simplifying structures, (b) achieves parametric parsimony, and (c) identifies alternative “exchangeable” parameterizations. Athanasopoulos et al. (2012) compare comprehensively the SC and KI-based approaches at theoretical, experimental, and empirical levels. They conclude that: 1. VARMA E and SC models are connected in theory. 2. The greatest advantage of echelon forms is that their construction can be automated, being this impossible to achieve within the SC identification process. 3. The experimental comparison is performed via Monte Carlo experiments. In this aspect, they conclude that both procedures work well in identifying some common VARMA DGPs. 4. SC VARMA models are more flexible because their maximum autoregressive order does not have to be the same as the order of the moving average component, while these orders are constrained to be equal for a VARMA E form and, perhaps for this reason, 5. ...the empirical out-of-sample forecast evaluation shows that SC VARMA models provide better forecasts than VARMA E models which, in turn, provide better forecasts than VAR models. These conclusions are fully consistent with our own experience. In the framework of this paper, we add significance tests in Step-3 to relax the constraint on the autoregressive and moving average orders implied by the echelon form, so our Step-3 and SC models could potentially have the same forecasting performance. Confirming this possibility would require and in-depth forecasting power comparison, which exceeds the scope of this paper. On the other hand, the methodology proposed here is scalable and includes extensions for nonstationary time series, cointegration and seasonality, not treated by the SCM approach. 7.5 Software implementation An important disadvantage of our method in comparison with some alternatives is that the software required is not commonly available. To address this shortcoming, 123 A. Garcia-Hiernaux et al. we implemented all the computational procedures described in this paper in the E 4 functions NID and SUBEST1. E 4 is a MATLAB toolbox for time series modeling, which can be downloaded at: www.ucm.es/e-4. The source code for all the functions in the toolbox is freely provided under the terms of the GNU General Public License. This site also includes a complete user manual and other reference materials. Supplementary Information The online version contains supplementary material available at https://doi. org/10.1007/s00362-023-01451-y. Acknowledgements The authors thank two anonymous referees for their valuable suggestions, which significantly improved this work. The first draft of this work was written while Miguel Jerez was visiting UC Davis. Special thanks are due to the Department of Economics for providing ideal working conditions. We also received useful feedback from the assistants to the research seminars at UC Davis, Universidad del Pais Vasco, Universidad Autonoma de Madrid and the 36th Annual International Symposium on Forecasting. Last, but not least, support from Instituto Complutense de Analisis Economico (ICAE), Programa Becas Complutenses Del Amo and from Programa de ayudas a grupos de investigacion Santander-UCM (Econometrics in State Space research group, ref. 940223) is gratefully acknowledged. Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. References Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716– 723 Arun K, Kung S (1990) Balanced approximation of stochastic systems. SIAM J Matrix Anal Appl 11(1):42– 68 Athanasopoulos G, Poskitt D, Vahid F (2012) Two canonical VARMA forms: scalar component models vis-à-vis the echelon form. Econom Rev 31(1):60–83 Bauer D (2001) Order estimation for subspace methods. Automatica 37:1561–1573 Bauer D (2005) Asymptotic properties of subspace estimators. Automatica 41:359–376 Bauer D (2005) Comparing the CCA subspace method to pseudo maximum likelihood methods in the case of no exogenous inputs. J Time Ser Anal 26(5):631–668 Bauer D (2005) Estimating linear dynamical systems using subspace methods. Econom Theory 21:181–211 Bauer D, Buschmeier R (2021) Asymptotic properties of estimators for seasonally cointegrated state space models obtained using the CVA subspace method. Entropy 23:436 Bauer D, Deistler M (1999) Balanced canonical forms for system identification. IEEE Trans Autom Control 44(6):1118–1131 Bauer D, Jansson M (2000) Analysis of the asymptotic properties of the MOESP type of subspace algorithms. Automatica 36:497–509 Bauer D, Ljung L (2002) Some facts about the choice of the weighting matrices in Larimore type of subspace algorithms. Automatica 38:763–773 Bauer D, Wagner M (2002) Estimating cointegrated systems using subspace algorithms. J Econom 111:47– 84 Box G, Jenkins G (1970) Time series analysis, forecasting and control. Holden-Day, San Francisco 123 Identification of canonical... Box G, Jenkins G, Reinsel G, Ljung G (2015) Time series analysis: forecasting and control. Wiley Series in Probability and Statistics. Wiley, New York Burr I (1976) Statistical quality control methods. Marcel Dekker, New York Casals J, Garcia-Hiernaux A, Jerez M (2012) From general State-Space to VARMAX models. Math Comput Simul 80(5):924–936 Casals J, Garcia-Hiernaux A, Jerez M, Sotoca S, Trindade A (2016) State-space methods for time series analysis: theory, applications and software. Chapman-Hall/CRC Press, New York Casals J, Jerez M, Sotoca S (2002) An exact multivariate modelbased structural decomposition. J Am Stat Assoc 97(458):553–564 Casals J, Sotoca S, Jerez M (1999) A fast and stable method to compute the likelihood of time invariant state space models. Econ Lett 65(3):329–337 Casals J, Sotoca S, Jerez M (2013) Single versus multiple-source error models for signal extraction. J Stat Comput Simul 85(5):1053–1069 Deistler M, Peternell K, Scherrer W (1995) Consistency and relative efficency of subspace methods. Automatica 31(12):1865–1875 Dickey D, Fuller W (1981) Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 49(4):1057–1072 Favoreel W, De Moor B, Van Overschee P (2000) Subspace state space system identification for industrial processes. J Process Control 10:149–155 Garcia-Hiernaux A (2011) Forecasting linear dynamical systems using subspace methods. J Time Ser Anal 32(5):462–468 Garcia-Hiernaux A, Jerez M, Casals J (2009) Fast estimation methods for time series models in state-space form. J Stat Comput Simul 79(2):121–134 Garcia-Hiernaux A, Jerez M, Casals J (2010) Unit roots and cointegration modeling through a family of flexible information criteria. J Stat Comput Simul 80(2):173–189 Garcia-Hiernaux A, Jerez M, Casals J (2012) Estimating the system order by subspace methods. Comput Stat 27(3):411–425 Gevers MR (1985) ARMA models, their Kronecker indices and their McMillan degree. Int J Control 6(43):1745–1761 Golub G, Van Loan C (1996) Matrix computations. John Hopkins University Press, Baltimore Hannan EJ, Deistler M (1988) The statistical theory of linear systems. Wiley, New York Hannan EJ, Kavalieris L (1984) A method for autoregressive-moving average estimation. Biometrika 71:273–80 Hannan EJ, Kavalieris L (1984) Multivariate linear time series models. Adv Appl Probab 16:492–561 Hannan EJ, Quinn B (1979) The determination of the order of an autoregression. J R Stat Soc Ser B 41(2):713–723 Hannan EJ, Rissanen J (1982) Recursive estimation of mixed autoregressive-moving average order. Biometrika 69:81–94 Ho B, Kalman R (1966) Effective construction of linear state-variable models from input-output functions. Regelungstechnik 14:545–548 Jenkins G, Alavi A (1981) Somes aspects of modelling and forecasting multivariate time series. J Time Ser Anal 1(2):1–47 Johansen S (1991) Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59:1551–1580 Kwiatkowski D, Phillips P, Schmidt P, Shin Y (1992) Testing the null hypothesis of stationarity against the alternative of a unit root: how sure are we that economic time series have a unit root? J Econom 54(1–3):159–178 Larimore WE (1990) Canonical variate analysis in identification, filtering and adaptive control. In: Proceedings of the 29th conference on decision and control, Hawaii, pp. 596–604 Li H, Tsay RS (1998) A unified approach to identifying multivariate time series models. J Am Stat Assoc 93(442):770–782 Ljung G, Box G (1978) On a measure of lack of fit in time series models. Biometrika 65(2):297–303 Luenberger DG (1967) Canonical forms for linear multivariate systems. IEEE Trans Autom Control 12:290– 293 Lütkepohl H (2005) New introduction to multiple time series analysis. Springer-Verlag, Berlin Lütkepohl H, Poskitt DS (1996) Specification of echelon form VARMA models. J Bus Econ Stat 14(1):69–79 123 A. Garcia-Hiernaux et al. Makridakis SG, Wheelwright SC, McGee VE (1983) Forecasting: methods and applications. Wiley, New York Martin-Manjon R, Treadway A (1997) The Fed controls only one of the two interest rates in the US economy. ICAE working Paper 9716 Nsiri N, Roy R (1992) On the identification of ARMA echelon-form models. Can J Stat 20:369–386 O’Donovan TM (1983) Short-term forecasting. Wiley, New York Peternell K (1995) Identification of linear dynamical systems by subspace and realization-based algorithms (Unpublished doctoral dissertation). TU Wien Peternell K, Scherrer W, Deistler M (1996) Statistical analysis of novel subspace identification methods. Signal Process 52:161–177 Poskitt DS (1992) Identification of echelon canonical forms for vector linear processes using least squares. Ann Stat 20:195–215 Poskitt DS (2016) Vector autoregressive moving average identification for macroeconomic modeling: a new methodology. J Econom 192:468–484 Qin SJ (2006) An overview of subspace identification. Comput Chem Eng 30:1502–1513 Quenouille MH (1957) The analysis of multiple time series. Griffin, London Reinsel CG (1997) Elements of multivariate time series analysis, 2nd edn. Springer-Verlag, New York Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464 Tiao GC, Box GEP (1981) Modeling multiple time series with applications. J Am Stat Assoc 76:802–816 Tiao GC, Tsay RS (1989) Model specification in multivariate time series. J R Stat Soc Ser B 51(2):157–213 Tsay RS (1989) Identifying multivariate time series models. J Time Ser Anal 10(4):357–372 Tsay RS (1989) Parsimonious parametrization of vector autoregressive moving average models. J Bus Econ Stat 7(3):327–341 Van Overschee P, De Moor B (1994) N4SID: subspace algorithms for the identification of combined deterministic-stochastic systems. Automatica 30(1):75–93 Van Overschee P, De Moor B (1995) A unifying theorem for three subspace system identification algorithms. Automatica 31(12):1853–1864 Verhaegen M (1994) Identification of the deterministic part of MIMO state space models given in innovations form from input-output data. Automatica 30(1):61–74 Wei W (2006) Time series analysis univariate and multivariate methods, 2nd edn. Addison Wesley, New York Wold HOA (1964) A study in the analysis of stationary time series. Almqvist and Wiksell, Uppsala Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. 123

Log In

Identification of canonical models for vectors of time series: a subspace approach