Subset ARMA Model Identification Using Genetic Algorithms

Carlo Gaetan

SUBSET ARMA MODEL IDENTIFICATION USING GENETIC ALGORITHMS By Carlo Gaetan University of Padua First version received July 1998 Abstract. Subset models are often useful in the analysis of stationary time series. Although subset autoregressive models have received a lot of attention, the same attention has not been given to subset autoregressive moving-average (ARMA) models, as their identi®cation can be computationally cumbersome. In this paper we propose to overcome this disadvantage by employing a genetic algorithm. After encoding each ARMA model as a binary string, the iterative algorithm attempts to mimic the natural evolution of the population of such strings by allowing strings to reproduce, creating new models that compete for survival in the next population. The success of the proposed procedure is illustrated by showing its ef®ciency in identifying the true model for simulated data. An application to real data is also considered. Keywords. Subset models; identi®cation; genetic algorithm; BIC. 1. INTRODUCTION Subset models are often useful in time series analysis, especially when the data exhibit some form of periodic behaviour with a range of different natural periods, for instance days, weeks, months and years. Recentlly, a class of subset autoregressive (SAR) models has been proposed by several reseachers, such as McClave (1975), Penm and Terrell (1982), Haggan and Oyetunji (1984), Yu and Lin (1991) and Zhang and Terrell (1997). A zero-mean stationary stochastic process fX t g is said to be a SAR model of order k (k 1 , . . ., k p ), with k 1 , . . . , k p , if it satis®es the stochastic difference equation X t ÿ ök 1 X tÿ k 1 ÿ . . . ÿ ö k p X tÿ k p W t (1) where fW t g is a white noise sequence with zero mean and variance ó 2 . 0. This model is a special case of the autoregressive (AR) model of order k p with zero coef®cients for some intermediate lags. As AR models are the simplest models to estimate amongst linear time series models and may be easily used for forecasting purposes, selection procedures for SAR models have received a lot of attention in the literature. Usually the selection of the optimal subset model is based upon the 0143-9782/00/04 559±570 JOURNAL OF TIME SERIES ANALYSIS Vol. 21, No. 5 # 2000 Blackwell Publishers Ltd., 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA. 560 C. GAETAN Kullback±Leibler information. For a linear time series model, this can be approximated by the penalized function log ó^ 2m m C(T ) T where m is the number of parameters in the linear stochastic difference equation (1), ó^ 2m is the conditional maximum likelihood estimate of the residual variance ó 2 and C(T ) is a function of the number T of observations. The Akaike information criterion (AIC) (Akaike, 1973), the Bayesian information criterion (BIC) (Schwartz, 1978) and the Hannan±Quinn criterion (Hannan and Quinn, 1979) correspond to the choices C(T ) 2, C(T ) log T and C(T ) 2 log log T , respectively. In the case of full AR models, the small sample properties of the criteria are discussed by Hurvich and Tsai (1989), where it appears that among the consistent criteria the BIC selects the true model more frequently. In principle, without considering the computational effort, one can ®nd the best SAR model of a speci®ed size m by comparing ó^ 2m among the SAR models with all possible lag combinations and selecting the one with smallest ó^ 2m . The penalized function is then used to identify the optimal SAR from the previously selected SAR models by letting m vary. McClave (1975) proposed an ef®cient algorithm to search for the optimal SAR model without having to evaluate all possible lag combinations for a speci®ed size. However, this means that if several models of size m have very similar ó^ 2m, only one of them, the one with the smallest ó^ 2m , will be considered in the ®nal comparison procedure. On the other hand the diagnostic checking of different models with criterion values near to optimum can be advisable. Therefore, Haggan and Oyetunji (1984) developed an algorithm to evaluate all possible models ef®ciently. Since the number of possible models increases exponentially, the Haggan±Oyetunji method becomes less ef®cient when k p becomes large. In addition the unknown parameter k p should be carefully selected before modelling. Yu and Lin (1991) modi®ed the Haggan±Oyetunji method to overcome these disadvantages. Their proposal is to use an iterative model-building approach. First, a tentative model is chosen with a lag that has the most signi®cant absolute value in the inverse correlation function. Then, this ®rst choice is re®ned using he cross-correlation function (CCF) between the residuals and the observed series to choose an additional lag that has the most signi®cant CCF value. More recently, Zhang and Terrell (1997) introduced a new statistic, the projection modulus, and developed a new algorithm for selecting the optimal SAR that is considerably more ef®cient than the McClave algorithm, in the sense that fewer SAR models need to be included to ®nd the optimal one. Moreover it is more accurate than the Yu±Lin method in the sense of a true model being identi®ed with higher probability. SAR models have received a lot of attention in the literature, but similar # Blackwell Publishers Ltd 2000 SUBSET ARMA MODEL IDENTIFICATION 561 attention has not been given to the subset selection for autoregressive movingaverage (ARMA) models. Nevertheless, these models play an important role in time series analysis; see, for example, the so-called airline model (Box et al., 1994) X t ÿ X tÿ1 ÿ X tÿ12 X tÿ13 W t ÿ èW tÿ1 ÿ ÈW tÿ12 èÈW tÿ13 : Selection of subset ARMA models is more complicated because the search space of possible models is larger than in the previous case. It is easy to see, for example, that the Yu±Lin and Zhang±Terrell search procedures are infeasible. In addition, (conditional) maximum likelihood (ML) estimation methods are onerous to apply. In this paper we develop a selection procedure for subset ARMA models based on innovation regression methods and stochastic binary search algorithms. In particular we shall consider the genetic algorithms introduced by Holland (1975) in order to perform the selection. There are several reasons for the choice of a genetic algorithm. First, the space of the solutions (all possible models) is ®nite, discrete but large. Second, each model can be coded quite easily in terms of binary strings. In addition the ®nal population of models generated by this algorithm allows us to consider many interesting models. The plan of the paper is as follows. In Section 2 we describe the selection procedure that is a slight modi®cation of the Hannan±Rissanen method (1982). In Section 3 the genetic algorithm is introduced and in Section 4 we discuss particular features of the genetic algorithm suitable for the selection problem. Some numerical examples with synthetic and real data are presented in Section 5. Conclusions are drawn in Section 6. 2. SELECTION PROCEDURE Denote k (k 1 , . . ., k p ), i (i1 , . . ., iq ), with k 1 , . . . , k p and i1 , . . . , iq , and consider a zero-mean stationary stochastic process fX t g generated by the ARMA(k, i) model X t ÿ ö k 1 X tÿ k 1 ÿ . . . ÿ ö k p X tÿ k p W t ÿ èi1 W tÿi1 ÿ . . . ÿ èi q W tÿi q : The model is assumed to be invertible. Innovation regression methods for ARMA modelling (Hannan and Rissanen, 1982) basically consist of two steps that determine the orders as well as estimate the parameters. The ®rst step is to ®t an AR model with high order to the observations. The second step is to apply the least squares (LS) method to the observations and the estimated innovations of the AR model in order to estimate the ARMA parameters. Because the optimization problems in these steps are quadratic, they are computationally cheaper than the ML method. We shall use the steps of the Hannan±Rissanen procedure as follows. # Blackwell Publishers Ltd 2000 562 C. GAETAN 1. Fit an AR(n) model, using the Yule±Walker equations n X á l X tÿ l W t : Xt l1 Hannan and Kavalieris (1984) recommended choosing the order n using the ~ 1 , . . ., á ~n AIC, and the upper bound (ln T ) a, 0 , a , 1, is suggested for n. Let á be the estimates of á1 , . . ., án . Then calculate the estimated innovations n X ~t Xt ÿ ~ l X tÿ l á t n 1, . . ., T : Z l1 2. Let P and Q be suf®ciently large so that they are greater than the true maximum order for k and i, respectively. For each k and i such that k p < P of ö (ö k , . . ., ö k )9, and è, and iq < Q calculate the LS estimates, ö 1 p è (èi1 , . . ., èi q )9, minimizing !2 q p T X X 1X 2 ~ tÿi l ö k j X tÿ k j èi l Z Xt ÿ ó (ö, è) T t t0 j1 l1 where t0 max(n k p , n iq ). Choose k and i minimizing ( p q) ln T : è) BIC(k, i) ln ó 2 (ö, T Note that if P . n and Q . 0 the design matrix in the LS procedure could not have full rank, so a sensible upper bound for P is n. We can reduce the computations in the second stage substantially if we set t0 max(n P, n Q). For large T there will be no notable effects in BIC values. In this case if the number 2 PQ of possible subset models is of order 107 , the global optimum is obtainable using branch-and-bound techniques (Miller, 1990, p. 63). However, for reasonably large P and Q an exhaustive optimum search becomes impractical, so a random search algorithm can be very helpful, as we shall see in the following sections. 3. GENETIC ALGORITHMS Genetic algorithms (GAs) are randomized global search techniques that are based on the principles of natural selection to ®nd the maximum or other high values of an objective function g. GAs differ from traditional search techniques in that they use a chromosomal representation of the problem space, gather and evaluate information on many points in the search space simultaneously, and use recombination operators to preserve discovered information. Holland's schema theorem (see, for example, Goldberg, 1989, p. 28) suggests that the genetic structure of relatively good solutions can be combined to ®nd the genetic # Blackwell Publishers Ltd 2000 SUBSET ARMA MODEL IDENTIFICATION 563 structure of better solutions. Therefore, while GAs are randomized and probabilistic, they do not perform a directionless search. Statistical applications of GAs have been discussed by Chatterjee et al. (1996) and cover optimization problems where the usual mathematical requirements, e.g. continuity, differentiability and convexity, do not apply. Relevant areas of application are variable selection problems in regresion analysis (Leardi et al., 1992) and outlier detection (Crawford and Wainwright, 1995; Baragona et al., 1997). The simplest GA starts with a population of size N of binary encoded strings, or chromosomes, of length L. Each of the L positions represents a gene and each gene's value is randomly chosen to be zero or one. The string represents one solution in the search space. A positive ®tness function f is calculated as a monotone increasing function of g for each chromosome in the current generation. Then N parents for the next generation are selected with replacement, with the probability pj of choosing the jth chromosome in the current population being proportional to its ®tness f j , i.e. fj pj P N i1 f i : (2) These parents are considered in pairs. For each pair a crossover operator randomly selects a point between 1 and L with probability pc and two children are created by exchanging the parent's genetic structure after the crossover point. If not crossover takes place, the two children are formed that are exact copies of their respective parents. Afterwards a mutation operator is applied on each child's genes. This operator ¯ips the bits with probability Pm. Then the children replace their parents in the population. The breeding cycle is iterated for a ®xed number of generations or until some convergence criterion is met. On termination, the string in the ®nal population with the best value of g can be returned as the solution to the optimization problem. The theoretical basis for the convergence to global optimum strings is given by Holland's schema theorem (but see Jenninson and Sheehan (1995) for a critical discussion about this theorem). With a binary representation, schemata are strings based on a ternary alphabet: 0, 1, and (do not care). However, since a good solution may be lost by the algorithm, a more ef®cient strategy is to save the best solution found at any iteration and return this as a solution. Variations of the simple GA and detailed explanations of the mechanics are given by Goldberg (1989). The construction of a GA requires speci®cation of the string representation and the population size N , choice of ®tness function, implementation of crossover and mutation operators, and a methodology for creating the initial population. In the next section we discuss these issues in tackling the identi®cation problem for subset ARMA models. # Blackwell Publishers Ltd 2000 564 C. GAETAN 4. APPLYING A GA TO THE IDENTIFICATION PROBLEM Implementation of the GA, as described in Section 3, ®rst requires a string representation. Each chromosome z (z1 , . . ., zPQ ) represents an ARMA model. Its length is equal to P Q, i.e. to the maximum orders allowed in stage 2. In this binary encoded string, the ®rst P genes correspond to the AR parameters, the last Q to the MA ones, and a gene is 1 if the corresponding parameter is not equal to zero. For instance, if P Q 5, the chromosome z (0, 1, 0, 0, 0, 1, 0, 0, 1, 0) corresponds to the ARMA model X t ÿ ö2 X tÿ2 W t ÿ è1 W tÿ1 ÿ è4 W tÿ4 : After deciding on encoding, the second decision to make in using a GA is how to perform selection. From a general point of view, the function f must provide suf®cient selectivity to ensure that the algorithm prefers superior solutions to the extent that it eventually reaches an optimal or near-optimal answer. On the other hand, selectivity must not be so strong that populations polarize at an early stage. In this case the potential advantages of maintaining a diverse collection of ARMA models are lost. Since the BIC may take positive values, the simple choice f ÿBIC is unsuitable as it provides negative ®tness. In the case of a negative value of f , Goldberg (1989, p. 76) suggests using an adaptive form of the ®tness function, namely f j M 1 ÿ BIC j : (3) BIC j is the BIC value for the jth chromosome in the current population, M is the maximum value of BIC in the current population and the constant 1 is added to ensure that any chromosome in the current population has a non-zero probability of selection. However, this choice is also not completely satisfactory. In fact our experiments have shown that the ®tness function (3) can maintain the diversity in the population for more than 100 generations but does not provide a high enough level of selectivity in the later stages to move towards the best solutions. One can try to avoid this problem by using an `elitist selection' (ES) that retains a number r of the best chromosomes in each generation and replaces randomly chosen chromosomes with them. No speci®c rule for the choice is given in the literature. We have found that r 2 is suf®cient. Another selection scheme that we have experimented with is the `Boltzmann selection' (BS) (Goldberg, 1990), in which a continuously varying `temperature' ä . 0 controls the rate of selection according to a preset schedule. In this scheme the ®tness function is given by BIC j ä : (4) f j exp ÿ ä A high starting temperature means that every chromosome has a reasonable probability of reproduction. When the GA is running, the temperature is # Blackwell Publishers Ltd 2000 565 SUBSET ARMA MODEL IDENTIFICATION gradually lowered, allowing the GA to narrow more closely to the best part of the search space while maintaining the appropriate degree of diversity. The classic schedule employs a constant temperature decrement of the form ä k ä0 á k , with ä0 . 0 and 0:85 , á , 0:99. Given these guidelines, after experimenting with several values we set á 0:95 and ä0 1. Results showed little dependence on the choice of population size for values of M in the range 20±40, and on the choice of the crossover probability for values of pc in the range 0.5±0.9. Smaller values of pc caused a lack of ability ot explore the entire space of solutions and premature convergence. Low values for the probability of mutation pm are usually recommended in order to avoid the disruption of good schemata. We chose pm 0:05. Finally, according to the parsimony principle, we started from an initial population with few parameters on average. So each string of the initial population is chosen according to N independent trials from the Bernoulli distribution with probability 0.3. 5. NUMERICAL EXAMPLES 5.1. Synthetic data The subset selection algorithm was applied to 100 independent simulations (each with different sample sizes T 100, 300, 600, 1000) of the ARMA models shown in Table I with W t N (0, 1). For each model we set the maximum orders P, Q. Since, in the ®rst step, the selected order n of the AR model could be less than P, especially for small T , the chromosome length N is set equal to min(P, n) Q. The parmeters of the GA are reported in Table II. We used the GA in a kind of deterministic fashion, i.e. we ®xed the number of iterations equal to 100 and, before each GA run, the seed of the pseudo-random generator was set to the same value, regardless of the selection procedure. Thus for each of the 100 simulations the procedure employed the same pseudo-random numbers and performed 3000 regressions. As an illustrative example we shall consider one of these simulations for Model C with T 300. Figure 1 shows the minimum, the maximum and the mean values of BIC evaluated in each population during 100 iterations with BS. TABLE I Design of Simulation Experiments Model A B C AR MA ö1 0:8, ö3 ÿ0:4, ö12 0:2 ± ö1 0:8, ö3 0:4, ö12 0:2 ± è3 ÿ0:3, è9 0:6 è8 ÿ0:8 # Blackwell Publishers Ltd 2000 566 C. GAETAN TABLE II Parameters of the GA Model A B C P 14 3 14 Q 5 11 8 M 30 30 30 pc 0.8 0.8 0.8 pm 0.05 0.05 0.05 äk r k 0.95 0.95 k 0.95 k 2 2 2 Figure 1. Genetic algorithm iterations. The outcomes of these simulations are summarized in Table III. Four statistics, s1 , s2 , s3 , s4 , are used to summarize the results: s1 is the proportion of simulations in which the true model is detected; s2 is the proportion of simulations in which non-existent lags were erroneously included; s3 is the proportion of simulations in which true lags were erroneously excluded; s4 is the proportion of simulations in which we have erroneous exclusions and inclusion of lags. The results indicate that no substantial difference exists between BS and ES. Note that when T 100 the selection procedure performs very badly. The result is not surprising, especially for Model A because the condition P < n is seldom attained. In other cases the procedure seems to work well. To illustrate the ef®ciency of the GA with BS in identifying the true model, we show in Table IV the mean and the standard deviation of the ML estimates for Model C when T 300. The results indicate that mean values are close to the true values used to generate the synthetic series. # Blackwell Publishers Ltd 2000 567 SUBSET ARMA MODEL IDENTIFICATION TABLE III Results from Simulation Experiments BS ES Model s1 s2 s3 s4 s1 s2 s3 s4 T 100 A B C 5 32 12 1 9 5 93 58 83 1 1 0 3 30 13 2 9 6 94 60 81 1 1 0 T 300 A B C 72 76 55 18 18 31 10 6 14 0 0 0 69 76 53 17 18 32 14 6 15 0 0 0 T 600 A B C 87 85 77 12 14 22 1 1 1 0 0 0 88 85 67 11 14 29 1 1 4 0 0 0 T 600 A B C 85 88 81 14 12 18 1 0 1 0 0 0 84 88 72 15 12 28 1 0 0 0 0 0 TABLE IV Mean Values of the Estimtaed Parameters for Model C and T 300 (Standard Deviation in Parentheses) ^ i , i 1, . . ., 14 ö 0.7900 (0.0415) ÿ0.0014 (0.0140) ÿ0.0071 (0.0304) 0.0063 (0.0319) ^ èi , i 1, . . ., 8 ÿ0.0016 (0.0105) 0.0000 (0.0000) ÿ0.0030 0.0000 ÿ0.0008 ÿ0.0004 (0.0379) (0.0000) (0.0125) (0.0209) ÿ0.4040 (0.0464) ÿ0.0051 (0.0353) 0.0106 (0.0439) 0.0000 (0.0000) 0.0023 (0.0315) 0.1789 (0.0800) 0.0024 (0.0145) 0.0002 (0.0115) 0.0023 (0.0230) 0.0004 (0.0035) ÿ0.0009 (0.0093) ÿ0.8134 (0.0950) 5.2. Real data We consider the monthly sales of red wine by Australian winemakers from Janurary 1980 through October 1991 (Brockwell and Davis, 1996). Figure 2 shows sample autorocorrelations and partial autocorrelations after taking natural logarithms and differencing at lag 12. Inspecting the standard errors of the coef®cient estimators, Brockwell and Davis (BD) suggest two different ARMA models for the trnasformed data (see Table V). We ran the GA algorithm using BS for 100 iterations with parameters P 14, Q 14, M 20, pc 0:8, pm 0:05, ä k 0:95 k , r 2. Note that in this case the number of possible subset models is of order 108 . # Blackwell Publishers Ltd 2000 568 C. GAETAN Figure 2. Sample autocorrelation and partial autocorrelation functions of transformed wine data. TABLE V Wine Data Models Model BD: AR(12) BD: MA(13) GA:ARMA(8,12) Non-zero coef®cients BIC AIC ö1 , ö5 , ö8 , ö11 , ö12 è1 , è2 , è5 , è8 , è10 , è12 , è13 ö1 , ö5 , ö8 , è12 ÿ4.0823 Not invertible ÿ4.2644 ÿ4.1997 ÿ4.4128 ÿ4.3583 Table V shows the best model found in these iterations. The sample autocorrelation and partial autocorrelation functions of residuals (Figure 3) indicate that an adequate and parsimonious model was obtained. 6. CONCLUDING REMARKS As a general comment we should say that, although we are able to report a successful application of the GA to the subset selection of ARMA models, further development remain to be explored. This paper deals primarily with univariate ARMA models, but the procedure would be very useful in the case of vector ARMAX models. The author is currently investigating extensions in this direction. Secondly, another general purpose optimization algorithm is available # Blackwell Publishers Ltd 2000 SUBSET ARMA MODEL IDENTIFICATION 569 Figure 3. Sample autocorrelation and partial autocorrelation functions of residuals. in the literature: the simulated annealing algorithm. An interesting study would be to compare simulated annealing (Kirkpatrick et al., 1983) with GAs. Finally, a natural question concerns the statistical properties of the ®nal population of the GA. REFERENCES Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory (eds B. Petrov and F. Csaki). Budapest: AkadeÂmiai KiadoÂ, pp. 267±81. Baragona, B., Battaglia, F. and Calzini, C. (1997) Genetic algorithms for outlier identi®cation in time series. Technical Report, Dipartimento di Sociologia, UniversitaÂ, `La Sapienza', Rome. Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1994) Time Series Analysis, Forecasting and Control, Revised Edition. San Francisco, CA: Holden-Day. Brockwell, P. J. and Davis, R. A. (1996) Introduction to Time Series and Forecasting. New York: Springer. Chatterjee, S., Laudato, M. and Lynch, L. A. (1996) Genetic algorithms and their statistical applications: an introduction. Comput. Stat. Data Anal. 22, 633±51. Crawford, K. D. and Wainwright, R. L. (1995) Applying genetic algorithms to outlier detection. In Proceedings of the Sixth International Conference on Genetic Algorithms (ed. L. J. Eshelman). San Mateo, CA: Morgan Kaufmann, pp. 546±50. Goldberg, D. E. (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley. Ð± (1990) A note on Boltzmann tournament selection for genetic algorithms and populationoriented simulated annealing. Complex Syst. 4, 445±60. # Blackwell Publishers Ltd 2000 570 C. GAETAN Haggan, V. and Oyetunji, O. B. (1984) On the selection of subset autoregressive time series models. J. Time Ser. Anal. 5, 103±13. Hannan, E. J. and Kavalieris, L. (1984) A method for autoregressive-moving average estimation. Biometrika 72, 273±80. Ð± and Quinn, B. G. (1979) The determination of the order of an autoregression. J. R. Stat. Soc., Ser. B 41, 190±95. Ð± and Rissanen, J. (1982) Recursive estimation of mixed autoregressive-moving average order. Biometrika 69, 81±94. Correction: 70 (1983), 303. Holland, J. H. (1975) Adaptation in Natural and Arti®cial Systems. Ann Arbor MI: University of Michigan Press. Hurvich, C. M. and Tsai, C. L. (1989) Regression and time series model selection in small samples. Biometrika 76, 297±307. Jenninson, C. and Sheehan, N. (1995) Theoretical and empirical properties of the genetic algorithm as a numerical optimizer. J. Comput. Gr. Stat. 4, 296±318. Kirkpatrick, S., Gelatt, C. D. and Vecchi, M. P. (1983) Optimization by simulated annelaing. Science 220, 671±80. Leardi, R., Boggia, R. and Terrile, M. (1992) Genetic algorithm as a strategy for feature selection. J. Chemomet. 6, 267±81. McClave, J. (1975) Subset autoregression. Technometrics 17, 213±19. Miller, A. J. (1990) Subset Selection in Regression. London: Chapman and Hall. Penm, J. H. W. and Terrell, R. D. (1982) On the recursive ®tting of subset autoregressions. J. Time Ser. Anal. 3, 43±59. Schwartz, G. (1978) Estimating the dimension of a model. Ann Stat. 6, 461±64. Yu, G. and Lin, Y. (1991) A methodology for selecting subset autoregressive time series models. J. Time Ser. Anal. 12, 363±73. Zhang, X. and Terrell, R. D. (1997) Projection modulus: a new direction for selecting subset autoregressive models. J. Time Ser. Anal. 18, 195±212. # Blackwell Publishers Ltd 2000

RELATED PAPERS

RELATED TOPICS

Log In

Subset ARMA Model Identification Using Genetic Algorithms

Subset ARMA Model Identification Using Genetic Algorithms

Subset ARMA Model Identification Using Genetic Algorithms

Subset ARMA Model Identification Using Genetic Algorithms

Subset ARMA Model Identification Using Genetic Algorithms

Related Papers

RELATED PAPERS

RELATED TOPICS