Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Simultaneous Momentum and Position Measurement and the Instrumental Weyl-Heisenberg Group
Next Article in Special Issue
An Approach for the Estimation of Concentrations of Soluble Compounds in E. coli Bioprocesses
Previous Article in Journal
Bounded Confidence and Cohesion-Moderated Pressure: A General Model for the Large-Scale Dynamics of Ordered Opinion
Previous Article in Special Issue
Improving the Performance and Stability of TIC and ICE
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Consistent Model Selection Procedure for Random Coefficient INAR Models

School of Statistics, Southwestern University of Finance and Economics, Chengdu 611130, China
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(8), 1220; https://doi.org/10.3390/e25081220
Submission received: 14 June 2023 / Revised: 13 August 2023 / Accepted: 14 August 2023 / Published: 16 August 2023

Abstract

:
In the realm of time series data analysis, information criteria constructed on the basis of likelihood functions serve as crucial instruments for determining the appropriate lag order. However, the intricate structure of random coefficient integer-valued time series models, which are founded on thinning operators, complicates the establishment of likelihood functions. Consequently, employing information criteria such as AIC and BIC for model selection becomes problematic. This study introduces an innovative methodology that formulates a penalized criterion by utilizing the estimation equation within conditional least squares estimation, effectively addressing the aforementioned challenge. Initially, the asymptotic properties of the penalized criterion are derived, followed by a numerical simulation study and a comparative analysis. The findings from both theoretical examinations and simulation investigations reveal that this novel approach consistently selects variables under relatively relaxed conditions. Lastly, the applications of this method to infectious disease data and seismic frequency data produce satisfactory outcomes.

1. Introduction

Integer-valued time series are ubiquitous in scientific research and everyday life, encompassing examples such as the daily count of hospitalized patients admitted to hospitals and the frequency of crimes committed daily or monthly. Consequently, integer-valued time series have increasingly garnered attention from scholars. However, traditional continuous-valued time series models fail to capture the integer-valued characteristics, only approximating integer-valued data through continuous-valued time series models. This approximation may result in model misspecification issues, complicating statistical inference. As a result, the modeling and analysis of integer-valued time series data have become a growing area of focus in academia. Among the variety of integer-valued time series modeling methods, thinning operator models have gained favor due to their resemblance to autoregressive moving average (ARMA) models found in traditional continuous-valued time series theory. Thinning operator models substitute the multiplication in ARMA models with the binomial thinning operator introduced by Steutel and Van Harn [1]:
ϕ Y i = i = 1 Y i B i
In this equation, Y i represents a count sequence, while { B i } denotes a series of Bernoulli random variables independent of { Y i } . The probability mass function satisfies P B i = 1 = 1 P B i = 0 = ϕ with ϕ [ 0 , 1 ) . Building on this foundation, Al-Osh and Alzaid [2] developed the first-order integer-valued autoregressive (INAR (1)) model for t N + :
Y t = ϕ Y t 1 + Z t
where Z t is regarded as the innovation term entering the model at period t , with its marginal distribution being a Poisson distribution with an expected value of λ . Consequently, model (2) is called the Poisson INAR(1) model. Later, Du and Li [3] introduced the INAR(p) model and provided conditions for ensuring the stationarity and ergodicity of the INAR(p) process. The incorporation of additional lag terms increased the model’s flexibility. Subsequently, Joe [4] and Zheng, Basawa, and Datta [5] developed the random coefficient thinning operator model (RCINAR(1)) by allowing the parameter ϕ in the INAR(1) model to follow a specific random distribution. Zheng, Basawa, and Datta [6] extended the RCINAR(1) model to the p-th order integer-valued autoregressive model, known as the RCINAR(p) model. Zhang, Wang, and Zhu [7] established a data-driven empirical likelihood interval estimation for the RCINAR(p) model using the empirical likelihood (EL) estimation method. By employing the geometric thinning operator (also referred to as the negative binomial thinning operator) proposed by Ristić, Bakouch, and Nastić [8], Tian, Wang, and Cui [9]) constructed an INAR(1) model capable of describing seasonal effects. Lu [10] investigated the prediction problem of the thinning operator model using the Taylor expansion. For further discussions on thinning operator models, readers can consult the textbook by Weiß [11].
In general, researchers engaged in statistical analysis, particularly during the initial stages of time series data investigation, frequently encounter the challenge of model selection. Current model selection techniques can be broadly categorized into three groups: The first group relies on sample autocorrelation (ACF) and partial autocorrelation (PACF) functions for model selection, as exemplified by Latour [12]; the second group, which is the most prevalent method for variable selection, comprises a series of information criteria founded on maximum likelihood estimation. Akaike [13] introduced the Akaike Information Criterion (AIC) by performing an unbiased estimation of the expected log-likelihood function, while Schwarz [14] established the Bayesian Information Criterion (BIC) by employing a Laplace expansion for the posterior estimation of the expected log-likelihood function. Ding, Tarokh, and Yang [15] devised a novel information criterion for autoregressive time series models by connecting AIC and BIC. Furthermore, given that empirical likelihood estimation can substantially circumvent issues stemming from model misspecification and maintain certain maximum likelihood estimation features, researchers have started to investigate data-driven information criteria based on empirical likelihood estimation. Variyath, Chen, and Abraham [16] formulated the Empirical Akaike Information Criterion (EAIC) and the Empirical Bayesian Information Criterion (EBIC) by drawing on the principles of AIC and BIC with empirical likelihood estimation. They demonstrated that EBIC possesses consistency in variable selection. Chen, Wang, Wu, and Li [17] addressed potential computational convergence problems in empirical likelihood estimation by incorporating external estimators (typically moment estimators) into the empirical likelihood function, thereby developing a robust and consistent information criterion. For additional discussions on information criteria, readers may consult the textbook by Konishi and Kitagawa [18] and the review article by Ding, Tarokh, and Hong [19].
In the specific domain of integer-valued time series analysis, our objective is to determine which lagged variables of Y t ought to be incorporated into the model. Extensive research has been conducted on model selection for integer-valued autoregressive conditional heteroskedasticity (INARCH) models, which allow for relatively straightforward likelihood function establishment. Notable examples include Weiß and Feld [20], who provided comprehensive numerical simulations for integer-valued time series model selection using information criteria, and Diop and Kengne [21], who introduced consistent model selection methods for INARCH models based on quasi-maximum likelihood estimation. However, the process becomes more challenging when dealing with higher-order and random coefficient INAR(p) models constructed using thinning operators. The complexity of the likelihood functions and the substantial computational requirements make it difficult to establish and utilize information criteria. Consequently, Zheng, Basawa, and Datta [6] proposed estimating the model based on its conditional moments rather than relying on likelihood functions. While this approach facilitates the estimation of unknown parameters for researchers, it creates complications for variable selection. To overcome this hurdle, Wang, Wang, and Yang [22] implemented penalty functions and pseudo-quasi-maximum likelihood estimation (PQML) for variable selection, demonstrating the robustness of their method even when faced with contaminated data. Drawing inspiration from these preceding studies, this paper endeavors to establish a novel model selection method akin to information criteria founded upon the estimating equations in conditional least squares (CLS) estimation. Furthermore, we attempt to demonstrate the consistency of this innovative model selection method in addressing variable selection problems within integer-valued time series. This approach circumvents the need for complex probability distribution assumptions while preserving effective variable selection capabilities.
The organization of this paper is as follows: In Section 2, we revisit the RCINAR(p) model, introduce the proposed information criterion, and outline its asymptotic properties. In Section 3, we carry out numerical simulation studies on variable selection utilizing this information criterion. In Section 4, we endeavor to apply this information criterion for variable selection in real data sets. Lastly, in Section 5, we engage in a discussion and offer concluding remarks.

2. RCINAR Model and Model Selection Procedure

In this section, we discuss the ergodic stationary RCINAR model and its associated model selection methods.

2.1. RCINAR(p) Model and Its Estimation

The INAR(p) model with constant coefficients, as introduced by Du and Li [3], is formulated as follows:
Y t = ϕ 1 Y t 1 + + ϕ p Y t p + Z t
In this expression, given the vector ( Y t 1 , , Y t p ) , the elements ϕ 1 Y t 1 , , ϕ p Y t p are deemed to be mutually conditionally independent. This conditional independence ensures that the autocorrelation function of the INAR(p) model is congruent with that of its continuous-valued Autoregressive (AR(p)) counterpart. Moreover, Du and Li [3] substantiated that, under these model settings, the stationarity condition for the INAR(p) model necessitates that the roots of the polynomial h z = 1 ϕ 1 z ϕ p z p = 0 are located outside the unit circle. This implies that the INAR(p) model attains stationarity when the sum i = 1 p ϕ i is less than 1. Building upon these foundational insights, Zheng, Basawa, and Datta [6] extended the INAR(p) model under the constant coefficient assumption, giving rise to the Random Coefficient Integer-valued Autoregressive (RCINAR(p)) model.
Let Y t t = 1 T represent a non-negative integer-valued sequence. The RCINAR(p) model is defined by the following equation:
Y t = ϕ 1 ( t ) Y t 1 + + ϕ p ( t ) Y t p + Z t
where “ ” denotes the thinning operator defined in Equation (1). Let θ 0 = ϕ 10 , , ϕ p 0 , λ 0 be the true parameter vector of this data-generating process, with θ 0 Θ , where Θ is a compact subset of R p + 1 . θ = ϕ 1 , , ϕ p , λ represents the p + 1 dimensional parameter vector to be estimated. Here, ϕ j t are sequences of independent and identically distributed random variables defined on [ 0 , 1 ) with a mean of ϕ j , and their probability density function f ϕ j ϕ 0 , ϕ [ 0 , 1 ) , with j = 1 p ϕ j < 1 .
Moreover, we do not assume a specific parametric distribution for { Z t } , only requiring that { Z t } be an independent and identically distributed non-negative integer-valued random variable sequence with a mean of λ and a probability mass function f Z z 0 , z N . In this context, we consider the semiparametric INAR model as described by Drost, Van den Akker, and Werker [23].
Remark 1.
As can be discerned from the preceding discussion, the INAR(p) model (3) represents a special case of the RCINAR(p) model (4). That is, when  ϕ j t   is a constant coefficient vector, the RCINAR(p) model reduces to the INAR(p) model. As demonstrated by Zheng, Basawa, and Datta [6], the statistical methods employed in the study of the RCINAR(p) model can also be directly applied to the INAR(p) model. Consequently, in order to cater to a wider range of application scenarios, the academic community tends to prioritize the study of the RCINAR model while investigating thinning operator models. For instance, Kang and Lee [24] investigated the problem of change-point detection in the RCINAR model by leveraging the Cumulative Sum (CUSUM) test. Similarly, Zhang, Wang, and Zhu [7] proposed an interval estimation method for the RCINAR model based on empirical likelihood estimation. Awale, Balakrishna, and Ramanathan [25], on the other hand, constructed a locally most powerful-type test devised specifically for examining structural changes within the framework of the RCINAR model. Therefore, this paper will center its research on the RCINAR model.
To estimate the RCINAR(p) model and establish model selection criteria, we draw inspiration from the assumptions delineated by Zhang, Wang, and Zhu [7]. These assumptions are as follows:
(A1)
{ Y t } constitutes an ergodic and strictly stationary RCINAR(p) process.
(A2)
There exists δ > 0 such that E Y t 4 + δ < .
Derived from Equation (4), the one-step-ahead transition probability is as follows:
P Y t = i Y t 1 = i 1 , , Y t p = i p = k = 0 min i , j = 1 p i j f Z i k 0 j = 1 p k j k i j k j × 0 ϕ 1 t ϕ p t < 1 j = 1 p ϕ j t k j 1 ϕ j t i j k j d P ϕ 1 t , , ϕ p t
Here, P ϕ 1 t , , ϕ p t represents the joint distribution function of ϕ 1 t , , ϕ p t . Utilizing this one-step-ahead transition probability function, we can construct the likelihood function:
L = P Y p = i 1 p + 1 , , Y 1 = i p p + 1 t = p + 1 T P Y t = i t Y t 1 = i 1 t , , Y t p = i p t
The likelihood function L for model (4) is notably complex, involving numerous multivariate numerical integrations within statistical computations, which demand substantial computational resources. Consequently, Zheng, Basawa, and Datta [6] advocated for estimating the model based on its conditional moments rather than employing the likelihood function. This preference also underlies the prevalent use of conditional least squares (CLS) estimation in the study of RCINAR(p) models within the scholarly community. In the subsequent section, we offer a concise introduction to the CLS estimation methodology for the RCINAR(p) model.
We can obtain the first-order conditional moment of model (4) as follows:
E Y t F t 1 = j = 1 p ϕ j Y t j + λ
where F t 1 = σ ( Y t 1 , Y t 2 , ) . This derivation allows us to compute the conditional least squares (CLS) estimation. Let
S θ = t = p + 1 T Y t j = 1 p ϕ j Y t j λ 2
represent the conditional least squares (CLS) objective function. The CLS estimator is then given by:
θ ^ = a r g m i n θ S θ
Let
S t θ = Y t j = 1 p ϕ j Y t j λ 2
Then the estimating equations are:
1 2 S t θ θ = 0 = Ψ t θ = ψ t ( 1 ) θ , ψ t ( 2 ) θ , , ψ t ( p + 1 ) θ
where
ψ t ( s ) = Y t j = 1 p ϕ j Y t j λ Y t s , 1 s p
ψ t ( p + 1 ) = Y t j = 1 p ϕ j Y t j λ
For the estimating equation Ψ t θ , we introduce an additional assumption:
(A3)
Ψ t θ is identifiable, that is, E Ψ t θ 0 = 0 , and if θ is in the neighborhood of θ θ 0 , then E Ψ t θ exists and E Ψ t θ > 0 .
Assumption (A3) is the identifiability assumption, which further implies that the model (4) is identifiable if only the currently specified model satisfies E Ψ t θ 0 = 0 . Based on these assumptions above, the following lemma can be deduced:
Lemma 1.
Based on assumptions (A1) to (A3), the subsequent conclusions are valid:
(i)
E ( Ψ t θ 0 Ψ t θ 0 )  constitutes a positive definite matrix.
(ii)
2 Ψ t θ θ θ  remains continuous within the neighborhood of  θ 0 .
(iii)
Both  Ψ t θ θ   and  2 Ψ t θ θ θ   possess upper bounds in the neighborhood of  θ 0 .
Moreover, Zheng, Basawa, and Datta [6] established that θ ^ C L S is a consistent estimator with an asymptotic distribution:
T θ ^ θ d N ( V 1 ( θ 0 ) W ( θ 0 ) V 1 ( θ 0 ) )
where:
W θ 0 = E Ψ t θ 0 Ψ t θ 0
V θ 0 = E E Y t Y t 1 θ · E Y t Y t 1 θ E u t θ 0 2 E Y t Y t 1 θ θ
u t θ 0 = Y t E Y t Y t 1

2.2. Model Selection Procedure

For the data-generating process defined by Equation (4), we establish the following settings:
  • A model m is a subset of M = { 1 , 2 , , p , p + 1 } , with its dimension denoted as | m | . Consequently, p + 1 represents the maximum model dimension we consider, noted as the full model, while the minimum model dimension we consider is 1 , corresponding to an independent and identically distributed non-negative integer-valued random variable sequence. Let the true model be m 0 .
  • θ m is the parameter vector associated with model m , which can be extended to the p + 1 dimensional vector θ ~ m = θ j 1 j p + 1 : θ j = θ m j , i f   j m ; θ j = 0 , i f   j m . For instance, if the considered model m is Y t = ϕ 1 ( t ) Y t 1 + ϕ 3 t Y t 3 + Z t , then m = { 1 , 3 , p + 1 } , θ m = ( ϕ 1 , ϕ 3 , λ ) , and it can be extended to the p + 1 dimensional vector θ ~ m = ( ϕ 1 , 0 , ϕ 3 , 0 , , 0 , λ ) .
  • Let Θ ( m ) be the compact parameter space of model m , Θ ~ m = θ j 1 j p + 1 R p + 1 : θ j = 0 , i f   j m constitutes a compact subset of R p + 1 , and all possible θ ~ ( m ) values, when restricted to the | m | dimensional vector θ m , are interior points of its corresponding compact subset Θ ( m ) . Furthermore, we denote θ ~ = θ M as the parameter vector to be estimated in Θ ~ M = Θ ( M ) , i.e., the parameter vector of the full model M .
For model m , we partition θ ~ into two components, i.e., θ ~ = θ ~ 1 m , θ ~ 2 m , where θ ~ ( 1 ) m = θ j , j m = θ m and θ ~ ( 2 ) m = θ j , j m . Correspondingly, it is evident that if the model m is correctly specified, denoted as m = m 0 , θ ~ 2 m 0 = 0 , then θ ~ 0 = θ ~ 1 m 0 , θ ~ 2 m 0 = θ m 0 , 0 . We can then divide the estimating equation Ψ t θ ~ into two parts:
Ψ t θ ~ = Ψ 1 t θ ~ Ψ 2 t θ ~
where
Ψ 1 t θ ~ = 1 2 S t θ ~ θ ~ 1 ( m )
Ψ 2 t θ ~ = 1 2 S t θ ~ θ ~ ( 2 ) ( m )
Let θ ^ C L S m = θ ^ 1 , C L S m , 0 , i.e., θ ^ 1 , C L S m is the solution to Ψ 1 t θ ~ ( 1 ) m , 0 = 0 , where θ ~ 2 m is constrained to be 0 . Therefore θ ^ 1 , C L S m represents the CLS estimator of model m . Define the function:
H θ ~ = t = p + 1 T Ψ t θ ~ t = p + 1 T Ψ t θ ~ Ψ t θ ~ 1 t = p + 1 T Ψ t θ ~
We can then derive the following lemma:
Lemma 2.
Given assumptions (A1)–(A3), as  T :
H θ ~ 0 χ p + 1 2
Because the proof of this lemma closely resembles the proof of Theorem 1 in Zhang, Wang, and Zhu [7], we omit the details. It is important to note that when m = M , θ ^ C L S M is the solution to the estimating equation t = p + 1 T Ψ t θ ~ = 0 , and in this case, H θ ^ C L S M = 0 . Furthermore, Lemma 2 suggests that H θ ~ 0 = O p ( 1 ) .
Definition 1.
We propose the following penalized criteria:
H θ ^ C L S m + P T · m
where the penalty term  P T  is an increasing sequence, P T  and satisfies  P T = O T 1 2 and l o g ( T ) P T = O 1 .
Remark 2.
Intuitively, in this penalized criterion,  H θ ^ C L S m  serves as a measure of the model’s fit to the data. If it can be demonstrated that the divergence rate of  H θ ^ C L S m  is slower when  m 0 m  compared to the divergence rate of  H θ ^ C L S m 1  when  m 0 m 1 , then a smaller  H θ ^ C L S m  would suggest a superior fit of model  m   to the data. However, upon closer examination, it becomes evident that if we merely adopt model  M , then  H θ ^ C L S M = 0  . Consequently, it is necessary to introduce a penalty term,  P T · m , to constrain the number of lagged variables incorporated by model  m . By striking a balance between the degree of data fitting  H θ ^ C L S m  and the number of lagged variables  P T · m , Theorems 1–3 substantiate the ability to select the appropriate model.
Under the correct model specification, the following theorem can be derived:
Theorem 1.
Given assumptions (A1) and (A2), under the correct model specification
θ ^ 1 , C L S m θ ~ 1 m 0 = E Ψ t θ ~ 0 θ ~ 1 m 1 1 T t = p + 1 T Ψ 1 t θ ~ 0 + o p T 1 2
and  H θ ^ C L S m   converges in probability to  j = 1 p + 1 Λ j χ 2 ( 1 ) , where Λ j  is the eigenvalue of the matrix  Σ 11 1 2 Σ Σ 11 1 Σ Σ 11 1 2 , where
Σ = I E Ψ t θ ~ 0 θ ~ E Ψ t θ ~ 0 θ ~ 1 m 1 0 0 0
Σ 11 = E Ψ t θ ~ 0 Ψ t θ ~ 0
Theorem 1 establishes the asymptotic distribution of θ ^ 1 , C L S m and H θ ^ C L S m under the correct model specification, which serves as a crucial component in the derivation for the consistency of our penalized criteria (6). In the following, we discuss the performance of H θ ^ C L S m when the model specification m is incorrect.
Theorem 2.
Given assumptions (A1)–(A3), for any  θ ~ 1  in the neighborhood of  θ ~ θ ~ 0 , we have:
T 1 2 H θ ~ 1
Theorem 2 and assumption (A3) ensure that if the model m is misspecified, H θ ^ C L S m will diverge to positive infinity at a rate of at least T 1 2 . Combining Theorems 1 and 2, we can present the primary conclusion of this paper. When the model is specified as m , we have the following theorem.
Theorem 3.
Given assumptions (A1)–(A3), we have:
P min H θ ^ C L S m + P T · m : m m o > H θ ^ C L S m 0 + P T · m 0 1
From the proof of Theorem 3, and Lemma A.1, we can observe that the divergence rate of P T needs to be at least as fast as l o g ( T ) . In practical applications, we may use settings such as P T = T 1 5 . In such settings, although l o g ( T ) P T 0 , in finite samples, P T < l o g ( T ) . In fact, in the interval [ 4 , 332,106 ] , T 1 5 < l o g ( T ) , which may result in the performance of P T = T 1 5 not being as effective as P T = l o g ( T ) in finite samples. Nevertheless, such penalty term settings still hold value, and we will discuss this situation in the numerical simulation section.
Theorem 3 provides the consistency of the penalized criteria (6) for model selection. It becomes evident that Theorem 3 holds under very relaxed assumptions and relies solely on the CLS estimation, which can be rapidly completed in any statistical software, and the estimating equation constructed by first-order conditional moments, which is easy to derive. This makes the penalized criteria (6) highly suitable for use in INAR models, particularly in RCINAR models. Now let m ^ be the model selected by the criterion (6):
m ^ = a r g m i n m M H m + P T · m
We now present the asymptotic properties of the selected model:
Theorem 4.
Given assumptions (A1)–(A3), we have:
T ( θ ^ C L S m ^ θ ~ 0 ) d N ( V 1 ( θ ~ 0 ) W ( θ ~ 0 ) V 1 ( θ ~ 0 ) )
where:
W θ ~ 0 = E Ψ t θ ~ 0 Ψ t θ ~ 0
V θ ~ 0 = E E Y t Y t 1 θ ~ · E Y t Y t 1 θ ~ E u t θ ~ 0 2 E Y t Y t 1 θ ~ θ ~
u t θ ~ 0 = Y t E Y t Y t 1
Remark 3.
From the inference process in this section, we can see that the estimating equation used in constructing the penalized criteria (6) actually utilizes the information of  E ( Y t | F t 1 ) , where F t 1 = σ ( Y t 1 , Y t 2 , )  and does not involve the information of thinning operators. Therefore, the penalized criteria (6) can be applied to models with the same linear form conditional expectations, such as INARCH models and continuous-valued AR models. The likelihood functions of INARCH and AR models can be established with relative ease, enabling us to compare the efficacy of the penalty criteria (6) with that of AIC and BIC across both models.

3. Numerical Simulations

In this section, we first conduct a simulation study to evaluate the performance of the penalized criteria proposed in this paper for INAR models. Secondly, to compare the proposed penalized criteria with the traditional likelihood-based AIC and BIC, we apply these criteria to INARCH models and AR models. Finally, by utilizing innovation terms of different random distributions, we carry out a simulation study on the robustness of the penalized criteria proposed in this paper.

3.1. Performance of the Penalized Criteria in INAR Models

In this subsection, we consider the true data-generating process to be:
Y t = ϕ 1 t Y t 1 + ϕ 3 t Y t 3 + Z t
where the mean of ϕ 1 t is 0.4, the mean of ϕ 3 t is 0.2, and λ = 2 , i.e., θ ~ 0 = 0.4 , 0 , 0.2 , 2 . By applying the penalized criteria (6), we attempt to select the true model from all RCINAR models up to the third order. In Table 1 below:
  • i . i . d . represents an i.i.d. Poisson random variable sequence,
  • y t 1 represents the model Y t = ϕ 1 t Y t 1 + Z t ,
  • y t 2   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 2 t Y t 2 + Z t ,
  • y t 3   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 3 t Y t 3 + Z t ,
  • y t 1 , y t 2   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 1 t Y t 1 + ϕ 2 t Y t 2 + Z t ,
  • y t 1 , y t 3   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 1 t Y t 1 + ϕ 3 t Y t 3 + Z t ,
  • y t 1 , y t 3   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 2 t Y t 2 + ϕ 3 t Y t 3 + Z t ,
  • y t 1 , y t 2 , y t 3   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 1 t Y t 1 + ϕ 2 t Y t 2 + ϕ 3 t Y t 3 + Z t .
In addition, “Coef” denotes the random distribution of the coefficient. In this subsection, we focus on the performance of penalized criteria in INAR models. We use boldface to highlight the true model, i.e., y t 1 , y t 3 . We compare three different penalty term settings P T = l o g ( T ) , P T = T 1 3 , and P T = T 1 5 and consider three different distributions for ϕ 1 t and ϕ 3 t :
(i)
Fixed coefficients, i.e., ϕ 1 t = 0.4 , ϕ 3 t = 0.2 , regardless of t ;
(ii)
ϕ 1 t follows a uniform distribution on the interval [ 0 , 0.8 ] , ϕ 3 t follows a uniform distribution on the interval [ 0 , 0.4 ] ;
(iii)
ϕ 1 t follows a beta distribution with a mean of 0.4, ϕ 3 t follows a beta distribution with a mean of 0.2. In this scenario, we fix the parameter vector ( a , b ) for the beta distribution with a = 4 and control the parameter b to achieve different means.
We consider sample sizes T = 100, 200, 300, 500, 1000, and for each sample size T and parameter setting, we perform 1000 independent repeated experiments.
As shown in Table 1, for the three penalty terms, the accuracy of model selection using the penalized criteria (6) increases with the sample size T , consistent with the asymptotic conclusion described in Theorem 3. However, when the sample size is large, we find that the accuracy of P T = T 1 5 is slightly worse than P T = T 1 3 and P T = l o g ( T ) . This is because
T 1 / 5 l o g ( T )
However, in the interval [ 4 , 332,106 ] , T 1 5 < l o g ( T ) , which may cause the performance of P T = T 1 5 in larger finite samples to be not as good as P T = l o g ( T ) . Nonetheless, the penalty term setting P T = T 1 5 is not entirely without merit. As shown in Table 1, when the sample size is small, i.e., T 500 , the performance of P T = T 1 5 is better.
Table 1. Frequency of model selection for INAR model of order 2 by the penalized criterion (6).
Table 1. Frequency of model selection for INAR model of order 2 by the penalized criterion (6).
Y t = ϕ 1 t Y t 1 + ϕ 3 t Y t 3 + Z t
ϕ 1 = 0.4 ,   ϕ 3 = 0.2 Models to be Selected
T Coef P T i . i . d . y t 1 y t 2 y t 3 y t 1 , y t 2 y t 1 , y t 3 y t 2 , y t 3 y t 1 , y t 2 , y t 3
100Fixed T 1 / 3 0.0540.6010.0030.0190.0150.2960.0020.01
l o g ( T ) 0.050.5990.0030.0180.0160.3010.0020.011
T 1 / 5 0.0060.3610.0010.0120.0470.4930.0010.079
Uniform T 1 / 3 0.0590.5960.0050.0370.0150.27500.013
l o g ( T ) 0.0550.5920.0050.0370.0150.28300.013
T 1 / 5 0.010.3660.0010.0180.0490.4750.0040.077
Beta T 1 / 3 0.0720.5850.0020.0290.0260.2810.0020.003
l o g ( T ) 0.0690.5820.0020.0290.0270.2850.0020.004
T 1 / 5 0.0130.3690.0020.0170.0560.4790.0030.061
200Fixed T 1 / 3 00.36800.0020.0160.60700.007
l o g ( T ) 00.32600.0020.0170.64400.011
T 1 / 5 00.126000.030.78100.063
Uniform T 1 / 3 0.0010.42900.0020.0160.54500.007
l o g ( T ) 00.3700.0010.020.59400.015
T 1 / 5 00.159000.0320.72100.088
Beta T 1 / 3 0.0020.36300.0010.0210.60200.011
l o g ( T ) 0.0020.314000.0250.64500.014
T 1 / 5 00.122000.0290.76800.081
300Fixed T 1 / 3 00.183000.0080.80200.007
l o g ( T ) 00.132000.0070.84500.016
T 1 / 5 00.037000.0090.8800.074
Uniform T 1 / 3 00.252000.010.72500.013
l o g ( T ) 00.176000.0150.7900.019
T 1 / 5 00.06000.020.84200.078
Beta T 1 / 3 00.218000.0120.76600.004
l o g ( T ) 00.15000.0160.82500.009
T 1 / 5 00.06000.0210.85900.06
500Fixed T 1 / 3 00.04000.0020.95500.003
l o g ( T ) 00.014000.0030.97400.009
T 1 / 5 00.002000.0020.9500.046
Uniform T 1 / 3 00.062000.0040.93200.002
l o g ( T ) 00.03000.0070.95500.008
T 1 / 5 00.007000.0060.91900.068
Beta T 1 / 3 00.046000.0030.93600.015
l o g ( T ) 00.026000.0030.9600.011
T 1 / 5 00.005000.0030.93200.06
1000Fixed T 1 / 3 000000.99900.001
l o g ( T ) 000000.98900.011
T 1 / 5 000000.96400.036
Uniform T 1 / 3 000000.99700.003
l o g ( T ) 000000.9900.01
T 1 / 5 000000.95200.048
Beta T 1 / 3 000000.99800.002
l o g ( T ) 000000.99200.008
T 1 / 5 000000.9400.06
To investigate the performance of the three penalty terms under varying sample sizes and coefficient mean settings, we continue to consider model (7), where ϕ 1 t follows a beta distribution with a mean of 0.4, and ϕ 3 t follows a beta distribution with a mean of ϕ 3 . In Figure 1, we report the impact of sample size on the accuracy of the penalized criteria using the three penalty terms under different ϕ 3 settings. In Figure 1 and Figure 2, the red line represents P T = T 1 3 , the black line represents P T = l o g ( T ) , and the blue line represents P T = T 1 5 , and the vertical axis of both figures represents the frequency of the penalized criteria (6) selecting the correct model. It can be observed that when ϕ 3 is small or the sample size is small, the performance of P T = T 1 5 is superior. However, as ϕ 3 gradually moves further from 0 and the sample size increases, the performance of P T = T 1 5 becomes slightly worse than P T = l o g ( T ) and P T = T 1 3 .
In Figure 2, we report the frequency of selecting the model Y t = ϕ 1 t Y t 1 + ϕ 3 t Y t 3 + Z t using the penalized criteria (6) as ϕ 3 gradually varies from 0 to 0.4 under different sample size conditions. It should be noted that when ϕ 3 = 0 , Y t = ϕ 1 t Y t 1 + ϕ 3 t Y t 3 + Z t represents an incorrect model setting and the correct model setting, in this case, should be Y t = ϕ 1 t Y t 1 + Z t . As shown in Figure 2, when the sample size is small, particularly when the sample size is 100, the performance of P T = T 1 5 is notably improved compared to P T = l o g ( T ) and P T = T 1 3 . As the sample size increases, this advantage gradually diminishes, but the penalty term setting P T = T 1 5 still maintains an advantage when ϕ 3 is relatively close to 0.
Based on the numerical simulation results presented in this subsection, we can offer recommendations for applying the penalized criteria (6): when the sample size is small, or some coefficients in the true model are relatively close to 0, we can employ the penalty term setting P T = T 1 5 . In other cases, the performance of the penalty term settings P T = T 1 3 and P T = l o g ( T ) is comparable and slightly better than P T = T 1 5 . Furthermore, we also conducted a simulation study on lag variable selection for the data-generating process:
Y t = ϕ 2 t Y t 2 + Z t
where the mean of ϕ 2 t is 0.3. The results can be found in Table A1 in Appendix A.

3.2. Performance of Penalized Criteria in INARCH Models and AR Models

As stated in the Remark of Section 2, we can apply the penalty criteria (6) to both INARCH and AR models. Because the likelihood functions for these two models can be easily established, we can compare the performance of the penalty criteria (6) with that of AIC and BIC for both these models.

3.2.1. INARCH Model

In this subsection, we consider the true data-generating process as follows:
Y t | F t 1 ~ P o i s s o n ( λ t )
λ t = ϕ 0 + ϕ 1 Y t 1 + ϕ 3 Y t 3
where ϕ 1 = 0.4 , ϕ 3 = 0.2 , and ϕ 0 = 2 . Fokianos, Rahbek, and Tjøstheim [26] proposed this model and derived the conditions for its stationarity and ergodicity. By applying the penalized criteria (6) alongside AIC and BIC, we attempt to select the true model from all INARCH models up to the third order. In Table 2:
  • i . i . d . represents an i.i.d. Poisson random variable sequence,
  • y t 1 represents the model Y t | F t 1 ~ P o i s s o n λ t λ t = ϕ 0 + ϕ 1 Y t 1 ,
  • y t 2   r e p r e s e n t s   t h e   m o d e l   Y t | F t 1 ~ P o i s s o n λ t λ t = ϕ 0 + ϕ 2 Y t 2 ,
  • y t 3   r e p r e s e n t s   t h e   m o d e l   Y t | F t 1 ~ P o i s s o n λ t λ t = ϕ 0 + ϕ 3 Y t 3 ,
  • y t 1 , y t 2   r e p r e s e n t s   t h e   m o d e l   Y t | F t 1 ~ P o i s s o n λ t λ t = ϕ 0 + ϕ 1 Y t 1 + ϕ 2 Y t 2 ,
  • y t 1 , y t 3   r e p r e s e n t s   t h e   m o d e l   Y t | F t 1 ~ P o i s s o n λ t λ t = ϕ 0 + ϕ 1 Y t 1 + ϕ 3 Y t 3 ,
  • y t 1 , y t 3   r e p r e s e n t s   t h e   m o d e l   Y t | F t 1 ~ P o i s s o n λ t λ t = ϕ 0 + ϕ 2 Y t 2 + ϕ 3 Y t 3 ,
  • y t 1 , y t 2 , y t 3   r e p r e s e n t s   t h e   m o d e l   Y t | F t 1 ~ P o i s s o n λ t ,
    λ t = ϕ 0 + ϕ 1 Y t 1 + ϕ 2 Y t 2 + ϕ 3 Y t 3 .
    “Criterion” denotes the model selection criteria we use, and we use H + P T · m to denote penalized criteria (6). Furthermore, we have bolded the true model y t 1 , y t 3 . We consider sample sizes T = 100, 200, 300, 500, 1000, and for each sample size T and parameter setting, we conduct 1000 independent repeated experiments.
From Table 2, we can observe that, similar to the INAR case, the accuracy of P T = T 1 5 is slightly worse than P T = T 1 3 and P T = l o g ( T ) in larger sample sizes, but in smaller sample sizes, i.e., T 500 , the performance of P T = T 1 5 is superior. In addition, from Table 2, we can observe that the accuracy of the penalized criteria proposed in this paper is roughly equivalent to BIC when P T = T 1 3 and P T = l o g ( T ) , while the accuracy is roughly equivalent to AIC in small samples when P T = T 1 5 , but P T = T 1 5 is far better than AIC when the sample size is large.
Table 2. Frequency of model selection for INARCH model of order 2 by the penalized criterion (6).
Table 2. Frequency of model selection for INARCH model of order 2 by the penalized criterion (6).
Y t | F t 1 ~ P o i s s o n ( λ t )
λ t = ϕ 0 + ϕ 1 Y t 1 + ϕ 3 Y t 3
ϕ 1 = 0.4 ,   ϕ 3 = 0.2 Models to Be Selected
T Criterion P T i . i . d . y t 1 y t 2 y t 3 y t 1 , y t 2 y t 1 , y t 3 y t 2 , y t 3 y t 1 , y t 2 , y t 3
100 H + P T · m T 1 / 3 0.0590.5760.0010.0190.0210.3090.0010.014
l o g ( T ) 0.0570.5680.0010.0170.0210.320.0010.015
T 1 / 5 0.0120.33700.0070.0540.5090.0030.078
A I C 0.0030.2700.0060.0540.5590.0010.107
B I C 0.0320.5510.0020.0210.0220.3610.0010.01
200 H + P T · m T 1 / 3 00.40600.0020.0130.57400.005
l o g ( T ) 00.35900.0010.0170.61200.011
T 1 / 5 00.138000.0280.75600.078
A I C 00.068000.0240.77400.134
B I C 00.29600.0010.0190.67300.011
300 H + P T · m T 1 / 3 00.22000.070.76700.006
l o g ( T ) 00.153000.0110.82600.01
T 1 / 5 00.041000.010.87400.075
A I C 00.016000.0060.83200.146
B I C 00.127000.0080.85500.01
500 H + P T · m T 1 / 3 00.035000.030.95800.004
l o g ( T ) 00.017000.0020.97100.01
T 1 / 5 00.001000.0020.93400.063
A I C 00000.0010.84100.158
B I C 00.012000.0040.97600.008
1000 H + P T · m T 1 / 3 00000100
l o g ( T ) 000000.99100.009
T 1 / 5 000000.95600.044
A I C 000000.84800.152
B I C 000000.99500.005
Additionally, we provide a simulation study on lag variable selection for the data-generating process:
Y t | F t 1 ~ P o i s s o n ( λ t )
λ t = ϕ 0 + ϕ 1 Y t 1 .
The results can be found in Table A2 in the Appendix A.

3.2.2. AR Model

In this subsection, we consider the true data-generating process as follows:
Y t = ϕ 0 + ϕ 1 Y t 1 + ϕ 3 Y t 3 + Z t
where ϕ 1 = 0.4 , ϕ 3 = 0.2 , ϕ 0 = 1 , and Z t follows a normal distribution with a mean of 0 and a standard deviation of 2. By applying the penalized criteria (6) alongside AIC and BIC, we attempt to select the true model from all AR models up to the third order. In Table 3:
  • i . i . d . represents an i.i.d. Normal random variable sequence,
  • y t 1 represents the model Y t = ϕ 0 + ϕ 1 Y t 1 + Z t ,
  • y t 2   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 0 + ϕ 2 Y t 2 + Z t ,
  • y t 3   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 0 + ϕ 3 Y t 3 + Z t ,
  • y t 1 , y t 2   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 0 + ϕ 1 Y t 1 + ϕ 2 Y t 2 + Z t ,
  • y t 1 , y t 3   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 0 + ϕ 1 Y t 1 + ϕ 3 Y t 3 + Z t ,
  • y t 1 , y t 3   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 0 + ϕ 2 Y t 2 + ϕ 3 Y t 3 + Z t ,
  • y t 1 , y t 2 , y t 3   r e p r e s e n t s   t h e   m o d e l   Y t = ϕ 0 + ϕ 1 Y t 1 + ϕ 2 Y t 2 + ϕ 3 Y t 3 + Z t ,
“Criterion” denotes the model selection criteria we use, and we use H + P T · m to denote penalized criteria (6). We use boldface to highlight the true model:
Table 3. Frequency of model selection AR model of order 1 by the penalized criterion (6).
Table 3. Frequency of model selection AR model of order 1 by the penalized criterion (6).
Y t = ϕ 0 + ϕ 1 Y t 1 + ϕ 3 Y t 3 + Z t
ϕ 1 = 0.4 ,   ϕ 3 = 0.2 Models to Be Selected
T Criterion P T i . i . d . y t 1 y t 2 y t 3 y t 1 , y t 2 y t 1 , y t 3 y t 2 , y t 3 y t 1 , y t 2 , y t 3
100 H + P T · m T 1 / 3 0.0480.5770.0020.020.0180.3240.0010.01
l o g ( T ) 0.0480.5640.0020.020.0180.3360.0010.011
T 1 / 5 0.0090.340.0020.0120.0440.5240.0010.068
A I C 0.0040.2550.0020.0050.0590.5780.0020.095
B I C 0.0320.5640.0010.0160.0180.3520.0010.016
200 H + P T · m T 1 / 3 0.0010.32300.0010.0110.65400.01
l o g ( T ) 0.0010.279000.0140.69300.013
T 1 / 5 00.11000.0250.7900.075
A I C 00.062000.0210.77100.146
B I C 0.0010.261000.0130.71300.012
300 H + P T · m T 1 / 3 00.167000.0040.82500.004
l o g ( T ) 00.116000.0040.87400.006
T 1 / 5 00.042000.0070.89300.058
A I C 00.017000.0070.82400.152
B I C 00.107000.0050.8800.008
500 H + P T · m T 1 / 3 00.034000.0030.95900.004
l o g ( T ) 00.013000.0040.97500.008
T 1 / 5 00.002000.0010.93700.06
A I C 000000.83700.163
B I C 00.011000.0030.97700.009
1000 H + P T · m T 1 / 3 000000.99600.004
l o g ( T ) 000000.98900.011
T 1 / 5 000000.95100.046
A I C 000000.84600.154
B I C 000000.98800.012
From Table 3, we can observe that, similar to the INAR case, the accuracy of P T = T 1 5 is slightly worse than P T = T 1 3 and P T = l o g ( T ) in larger sample sizes, but in smaller sample sizes, i.e., T 500 , the performance of P T = T 1 5 is superior. The comparison of the penalized criteria proposed in this paper with AIC and BIC in the AR model is analogous to that in the INARCH model; thus further elaboration is not required.

3.3. Robustness of Variable Selection Procedure

In this section, we investigate the robustness of the penalized criteria (6) for different distributions of the innovation term Z t in model (7). Specifically, we consider Z t to follow a Poisson distribution, a geometric distribution with a mean of 2, and a uniform distribution over { 0 , 1 , 2 , 3 , 4 } . In Table 4, “ Z t ” denotes the random distribution of the innovation term, whereas “geom” denotes the geometric distribution.
Through Table 4, we observe that the penalized criteria (6) remain robust for various distributions of the innovation term Z t . This finding suggests that the criteria proposed in this paper can effectively select the correct lag order even when the innovation term adheres to different distributions. We use boldface to highlight the true model:
Table 4. Frequency of model selection of INAR model by the penalized criterion (6) with Z t misspecification.
Table 4. Frequency of model selection of INAR model by the penalized criterion (6) with Z t misspecification.
Y t = ϕ 1 ( t ) Y t 1 + ϕ 3 ( t ) Y t 3 + Z t
ϕ 1 = 0.4 , ϕ 3 = 0.2 Models to Be Selected
T Z t P T i . i . d . y t 1 y t 2 y t 3 y t 1 , y t 2 y t 1 , y t 3 y t 2 , y t 3 y t 1 , y t 2 , y t 3
100Poisson T 1 / 3 0.0450.5870.0030.0180.0180.31900.01
l o g ( T ) 0.0430.5840.0030.0190.0180.32300.01
T 1 / 5 0.0050.3490.0020.0080.0530.5140.0030.066
Uniform T 1 / 3 0.0480.5640.0010.020.0180.3360.0010.048
l o g ( T ) 0.0430.55900.0190.020.3450.0010.043
T 1 / 5 0.0140.33200.0040.0470.51900.014
Geom T 1 / 3 0.070.5750.0040.0320.0230.2850.0010.02
l o g ( T ) 0.0670.570.0040.0320.0240.2920.0010.02
T 1 / 5 0.0090.3270.0020.0110.0520.5110.0020.086
200Poisson T 1 / 3 00.3700.0010.0080.61200.008
l o g ( T ) 00.31900.0020.0120.65500.012
T 1 / 5 00.10900.0010.020.79500.075
Uniform T 1 / 3 00.343000.0120.63600.009
l o g ( T ) 00.29000.0160.68100.013
T 1 / 5 00.13000.0260.77600.068
Geom T 1 / 3 0.0050.358000.0180.60300.016
l o g ( T ) 0.0040.312000.020.64300.021
T 1 / 5 00.108000.0340.75200.106
300Poisson T 1 / 3 00.193000.0030.80100.003
l o g ( T ) 00.138000.0040.85200.006
T 1 / 5 00.044000.0050.87800.073
Uniform T 1 / 3 00.184000.010.80200.004
l o g ( T ) 00.122000.0120.85100.015
T 1 / 5 00.03000.0110.88500.074
Geom T 1 / 3 00.188000.0120.79600.004
l o g ( T ) 00.133000.0150.83400.018
T 1 / 5 00.027000.0130.8800.08
500Poisson T 1 / 3 00.027000.0050.96200.006
l o g ( T ) 00.008000.0050.97500.012
T 1 / 5 00.002000.0030.92300.072
Uniform T 1 / 3 00.04000.0040.9500.006
l o g ( T ) 00.02000.0020.96400.014
T 1 / 5 00.0030000.92600.071
Geom T 1 / 3 00.035000.0080.95400.003
l o g ( T ) 00.018000.0070.96400.011
T 1 / 5 00.002000.0030.92800.067
1000Poisson T 1 / 3 000000.99800.002
l o g ( T ) 000000.98900.011
T 1 / 5 000000.95200.048
Uniform T 1 / 3 00000100
l o g ( T ) 000000.99500.05
T 1 / 5 000000.94400.056
Geom T 1 / 3 000000.99400.006
l o g ( T ) 000000.98200.018
T 1 / 5 000000.94700.053
Furthermore, we compare the performance of the penalized criteria proposed in this paper, AIC, and BIC when the innovation term Z t in AR model (8) follows a uniform distribution over [−2, 2] while the assumption of Z t is a normal distribution with mean 0 and unknown variance σ Z 2 . In Appendix A, Table A3 shows that regardless of the distribution of the innovation term, when the conditional mean is set correctly, the performance and robustness of the penalized criteria proposed in this paper are generally equivalent to those of AIC and BIC.

4. Real Data Application

4.1. COVID-19 Infection Data

The investigation of data related to infectious diseases constitutes a crucial application of integer-valued time series models within the public health domain. In May 2020, the Ministry of Health in Cyprus disseminated a national epidemic surveillance report, which displayed the temporal data pertaining to the number of infections during the initial phase of the COVID-19 outbreak. Conducting research on this data is instrumental for the public health academia in uncovering the intrinsic mechanisms governing epidemic propagation. Owing to the incubation period associated with the coronavirus, individuals who contract the virus typically disclose their infection status to governmental statistical departments after a lapse of several days. As a result, it becomes imperative to scrutinize the matter of lag variable selection within this time series dataset, see Figure 3 below.
Based on the ACF plot, it can be inferred that the data may stem from an autoregressive data-generating process. The PACF plot suggests that selecting either the model:
Y t = ϕ 1 t Y t 1 + Z t
or
Y t = ϕ 1 t Y t 1 + ϕ 3 t Y t 3 + Z t
is reasonable, as the partial autocorrelation function for a lag of three periods does not significantly exceed the critical value. Consequently, we employ the model selection procedure (6) for variable selection, and we provide the result in the following Table 5.
Given that the penalized criteria (6) favor the model
Y t = ϕ 1 t Y t 1 + ϕ 3 t Y t 3 + Z t
under all three penalty settings, we adopt this model. The estimated results for this model are:
Y t = 0.5736 Y t 1 + 0.2933 Y t 3 + Z t ( 0.1081 ) ( 0.1092 )
where the mean of Z t is 1.8567; ϕ 1 t and ϕ 3 t as two non-negative random variables, have expected values of 0.5736 and 0.2933, respectively. This finding suggests that during the initial stages of the outbreak in Cyprus, the number of infections on a given day may have been influenced by the number of infections one day and three days prior.

4.2. Seismic Frequency Data

The exploration of earthquake frequency constitutes a significant application frontier for integer-valued time series models. As documented by Zucchini, MacDonald, and Langrock [27], comprehensive annual data delineating global seismic occurrences of magnitude seven or above, encompassing the period from 1900 to 2007, has been provided. This wealth of data offers a promising platform for scholars seeking to unravel the intricate mechanisms underpinning the mutual interactions among earthquakes. It is envisaged that through a meticulous investigation of this time-series data associated with seismic activities, one might gain insights into whether the interplay is mediated by crustal stress dynamics or alternative conduits, see Figure 4 below.
Informed by the ACF, we hypothesize that the underlying data generation process might be suitably modeled by an autoregressive construct. On the other hand, insights gleaned from the PACF advocate for the application of a first-order autoregressive model. To substantiate this conjecture further, we will proceed to invoke the penalized criterion (6) as our analytical tool in the ensuing discourse, and we provide the result in the following Table 6.
Given that under the three penalty settings, the penalized criterion (6) exhibits a preference for the model
Y t = ϕ 1 t Y t 1 + Z t
we opt to adopt this model. The estimated results for this model are as follows:
Y t = 0.5799 Y t 1 + Z t ( 0.0812 )
In this model, the mean value of Z t is identified as 2.1014. The derived estimations posit that every occurrence of a magnitude seven or higher earthquake in the preceding year induces a count of similar-intensity earthquakes in the subsequent year, which manifests as a discrete random variable with an expected value of 0.5799. Simultaneously, the number of major earthquakes occurring independently each year is approximately two. These results substantiate the existence of a year-on-year time-varying dependency mechanism in the frequency of major seismic disasters.

5. Discussion and Conclusions

In this paper, we propose a model selection criterion based on an estimation equation established in Conditional Least Squares estimation. This penalized method does not rely on detailed distributional assumptions for the data-generating process. It circumvents the complex likelihood function construction in Random Coefficient Integer-Valued Autoregressive models and can consistently select the correct variables under relatively mild assumptions.
In our numerical simulations, we compared the impact of three penalty term settings on the performance of the penalty criteria. We found that the impact of these penalty terms on the performance of the information criteria varies as partial coefficients in the RCINAR model move farther away from 0 or as the sample size increases. Moreover, we applied the model selection method proposed in this paper to both the INARCH and traditional continuous-valued AR models. We discovered that in both scenarios where likelihood functions can be easily constructed, the proposed model selection criteria and the traditional likelihood-based information criteria, AIC and BIC, exhibit similar model selection efficiency. Specifically, under the settings of P T = T 1 3 and P T = l o g ( T ) , the accuracy of the proposed model selection method is similar to that of BIC. However, in cases with smaller sample sizes, the proposed model selection method with P T = T 1 5 performs similarly to AIC while outperforming AIC with larger sample sizes.
In the future, model selection methods based on estimation equations have considerable potential for development. In this discussion section, we briefly introduce three aspects:
(1)
Distinguishing between different thinning operators or innovation terms with varying distributions: The criterion (6) provided in this paper is primarily used for lag variable selection but lacks the ability to differentiate between various thinning operators and distinct distributions of innovation terms. It is well known that INAR models can describe scenarios such as zero inflation, variance inflation, and extreme values by flexibly selecting thinning operators and innovation terms. Therefore, if a model selection criterion can distinguish between different thinning operators and varying distributions of innovation terms, it will have a more extensive application scope.
(2)
Incorporating higher-order conditional moments from the data-generating process into the information criterion. Through the form of the H ( θ ~ ) function:
H θ ~ = t = p + 1 T Ψ t θ ~ t = p + 1 T Ψ t θ ~ Ψ t θ ~ 1 t = p + 1 T Ψ t θ ~
It is evident that criterion (6) only contains the mean structure information of the model and lacks the ability to describe higher-order moment information. Since many variants of the INAR model exhibit differences in higher-order moments, incorporating higher-order moment information into the model selection criterion would enable criterion (6) to perform model selection within a broader context.
(3)
Detecting change points. In the field of time series data research, the change point detection problem has a long history. Specifically, within the integer-valued time series domain, the change point problem refers to the existence of positive integers τ 1 , τ 2 , , τ m , such that:
Y t = ϕ 1 Y t 1 + Z t 1 0 < t τ 1 ϕ 2 Y t 1 + Z t 2 τ 1 < t τ 2 ϕ m Y t 1 + Z t m τ m < t T
For continuous-valued time series models, Chen and Gupta [28] introduced a method for change point detection using AIC and BIC. Since parameter changes are prominently reflected in the mean structure of INAR models, it is likely feasible to perform change point detection using the criterion (6) based on the estimation equations.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/e25081220/s1.

Author Contributions

Conceptualization, K.Y. and T.T.; methodology, T.T.; software, T.T.; validation, K.Y. and T.T.; formal analysis, T.T.; investigation, T.T.; resources, K.Y.; data curation, K.Y.; writing—original draft preparation, T.T.; writing—review and editing, K.Y.; visualization, T.T.; supervision, K.Y.; project administration, K.Y.; funding acquisition, K.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science Fund of China (No. 18BTJ039).

Data Availability Statement

The data has been uploaded as a Supplementary File of this paper. Interested readers are also encouraged to request the relevant data and code from the authors directly through e-mail.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proofs

Proof of Lemma 1.
(1)
From Lemma 1 in Zhang, Wang, and Zhu [7], it can be proved immediately.
(2)
Due to the construction of Ψ t θ , we have 2 Ψ t θ θ θ = 0 , which proves the statement.
(3)
Because the construction of Ψ t θ ensures that Ψ t θ θ is a constant with respect to the parameter vector θ and is only related to { Y t } , the conclusion holds under Assumption (A1). □
Proof of Theorem 1.
When model m is correctly specified, by Taylor expansion, we have:
t = p + 1 T Ψ t θ ^ C L S m = t = p + 1 T Ψ t θ ~ 0 + t = p + 1 T Ψ t θ ~ 0 θ ~ θ ^ C L S m θ ~ 0 + o p T 1 2
where, according to Equation (5), o p θ ^ C L S m θ ~ 0 = o p T 1 2 .
Since θ ^ 2 , C L S m = θ ~ 2 m = 0 , then:
0 = 1 T t = p + 1 T Ψ 1 t θ ^ C L S m = 1 T t = p + 1 T Ψ 1 t θ ~ 0 + 1 T t = p + 1 T Ψ t θ ~ 0 θ 1 m θ ^ 1 , C L S m θ ~ 1 m 0 + o p T 1 2
Thus, we have:
θ ^ 1 , C L S m θ ~ 1 m 0 = 1 T t = p + 1 T Ψ t θ ~ 0 θ ~ 1 m 1 1 T t = p + 1 T Ψ 1 t θ ~ 0 + o p T 1 2 = E Ψ t θ ~ 0 θ ~ 1 m 1 1 T t = p + 1 T Ψ 1 t θ ~ 0 + o p T 1 2
Therefore:
θ ^ C L S m θ ~ 0 = θ ^ 1 , C L S m θ ~ 1 m 0 0 = E Ψ t θ ~ 0 θ ~ 1 m 1 1 T t = p + 1 T Ψ 1 t θ ~ 0 + o p T 1 2 0
Hence:
1 T t = p + 1 T Ψ t θ ^ C L S m = I E Ψ t θ ~ 0 θ ~ E Ψ t θ ~ 0 θ ~ 1 m 1 0 0 0 1 T t = p + 1 T Ψ t θ ~ 0 + o p T 1 2 Σ 1 T t = p + 1 T Ψ t θ ~ 0 + o p T 1 2
where I is the identity matrix, and let Σ 11 = E Ψ t θ ~ 0 Ψ t θ ~ 0 , then by Lemma 1, we have:
H θ ^ C L S m = t = p + 1 T Ψ t θ ^ C L S m t = p + 1 T Ψ t θ ^ C L S m Ψ t θ ^ C L S m 1 t = p + 1 T Ψ t θ ^ C L S m = T 1 2 t = p + 1 T Σ 11 1 2 Ψ t θ ~ 0 Σ 11 1 2 Σ Σ 11 1 Σ Σ 11 1 2 T 1 2 t = p + 1 T Σ 11 1 2 Ψ t θ ~ 0 + O p ( 1 )
Let Ω = Σ 11 1 2 Σ Σ 11 1 Σ Σ 11 1 2 = Σ 11 1 2 Σ Σ 11 1 2 Σ 11 1 2 Σ Σ 11 1 2 , which implies that Ω is a positive semi-definite matrix. Consequently, according to Johnston [29]’s Theorem 2.1.6, we have Ω = U Λ U , where U is an orthogonal matrix, Λ is a diagonal matrix, and the diagonal elements of Λ are the eigenvalues of Ω , denoted as Λ 1 , , Λ p + 1 . Thus,
H θ ^ C L S m = j = 1 p + 1 Λ j T 1 2 t = p + 1 T Σ 11 1 2 Ψ t θ ~ 0 j 2 + O p ( 1 )
where J j denotes the j-th element of vector J . Therefore, we obtain:
H θ ^ C L S m d j = 1 p + 1 Λ j χ 2 ( 1 )
Thus, it is known that Λ j is the eigenvalue of the matrix Σ 11 1 2 Σ Σ 11 1 Σ Σ 11 1 2 , and j = 1 p + 1 Λ j = t r a c e Ω = t r a c e Σ 11 1 2 Σ Σ 11 1 Σ Σ 11 1 2 . □
Proof of Theorem 2.
Due to assumptions (A1) and (A2), following steps similar to those in Lemma 1 of Zhang, Wang, and Zhu [7], we know that E Ψ t θ ~ 2 + δ 2 is bounded above.
For any θ ~ 1 in the neighborhood of θ ~ θ ~ 0 , by applying the Markov inequality, we have:
i = 1 P Ψ t θ ~ 1 2 > i i = 1 E Ψ t θ ~ 1 2 + δ 2 i 1 + δ 4 <
By applying the Borel–Cantelli lemma, we can always find a sufficiently large natural number N such that for any i > N , Ψ t θ 1 i 1 2 holds with probability 1. This further implies that max 1 t T Ψ t θ ~ 1 = o p T 1 2 . Therefore:
H θ ~ 1 = t = p + 1 T Ψ t θ ~ 1 t = p + 1 T Ψ t θ ~ 1 Ψ t θ ~ 1 1 t = p + 1 T Ψ t θ ~ 1 t = p + 1 T Ψ t θ ~ 1 max t Ψ t θ ~ 1 1 t = p + 1 T Ψ t θ ~ 1 1 t = p + 1 T Ψ t θ ~ 1
where 1 = 1 , 1 , , 1 . Due to assumption (A3), t = p + 1 T Ψ t θ ~ 1 = O p ( T ) , we can deduce that:
T 1 2 H θ ~ 1
 □
Lemma A1.
H m 0 = o p ( l o g ( T ) ) .
Given:
H m 0 = t = p + 1 T Ψ t θ ^ C L S m 0 t = p + 1 T Ψ t θ ^ C L S m 0 Ψ t θ ^ C L S m 0 1 t = p + 1 T Ψ t θ ^ C L S m 0
and
t = p + 1 T Ψ t θ ^ C L S m 0 = t = p + 1 T Ψ t θ ~ 0 + t = p + 1 T Ψ t θ ~ 0 θ ~ 1 m 1 θ ^ 1 , C L S m θ ~ 1 m 0 + o p 1 Q t ( θ ~ 0 )
1 T t = p + 1 T Ψ t θ ^ C L S m 0 Ψ t θ ^ C L S m 0 = Σ 11 + o p ( 1 )
We can deduce that:
H m 0 = 1 T Q t ( θ ~ 0 ) Σ 11 1 Q t ( θ ~ 0 ) + o p ( 1 )
Let V = C o v 1 T t = p + 1 T Ψ t θ ~ 0 + E Ψ t θ ~ 0 θ ~ 1 m θ ^ 1 , C L S m θ ~ 1 , 0 m Notice that:
P H m 0 l o g ( T ) E 1 T Q t θ ~ 0 Σ 11 1 Q t θ ~ 0 l o g ( T ) = t r a c e Σ 11 1 V l o g ( T )
Using Lemma A2 below, we can conclude that H m 0 = o p ( l o g T ) .
Lemma A2.
t r a c e Σ 11 1 V = O p ( 1 ) .
Because both Σ 11 1 and V are semi-positive definite matrices:
0 t r a c e Σ 11 1 V t r a c e Σ 11 1 t r a c e ( V )
As Σ 11 1 = O p ( 1 ) . Note that θ ^ 1 , C L S m θ ~ 1 , 0 m = O p T 1 2 , and according to Lemma 1, 1 T t = p + 1 T Ψ t θ ~ 0 = O p 1 . Therefore, the proof is complete.
Proof of Theorem 3.
We divide the proof of this theorem into two parts:
(1)
If m 0 m , by applying Theorem 1 and Lemma A1 to m 0 , and Theorem 2 to m , we know that for P T = O T 1 2 and l o g ( T ) P T 0 :
H θ ^ C L S m + P T · m H θ ^ C L S m 0 P T · m 0
(2)
If m 0 m , then by applying Theorem 1 and Lemma A1 to both m 0 and m , we know that for P T :
H θ ^ C L S m + P T · m H θ ^ C L S m 0 P T · m 0 = P T m m 0 + o p ( l o g ( T ) )
Therefore,
P min H θ ^ C L S m + P T · m : m m o > H θ ^ C L S m 0 + P T · m 0 1
 □
Proof of Theorem 4.
Following the steps in Diop and Kengne [21], we have:
For x = x i 1 i p + 1 , x i R , define:
F T x = P 1 i p + 1 T θ ^ C L S m ^ θ ~ 0 i x i
Then we have:
F T x = P 1 i p + 1 T θ ^ C L S m ^ θ ~ 0 i x i | m ^ = m o P m ^ = m o + P 1 i p + 1 T θ ^ C L S m ^ θ ~ 0 i x i | m ^ m o P m ^ m o
According to Theorem 3, as T :
P m ^ = m 0 1 ,   P m ^ m 0 0
Therefore:
P 1 i p + 1 T θ ^ C L S m ^ θ ~ 0 i x i | m ^ m o P m ^ m o 0
Hence:
F T x = P i m o T θ ^ C L S m ^ θ ~ 0 i x i i m o T θ ^ C L S m ^ θ ~ 0 i x i
Since θ ~ m 0 Θ ~ ( m 0 ) , θ ^ C L S m 0 i i m 0 = θ ~ i i m 0 = 0 and x i i m 0 is a set of real numbers, i.e., x i i m 0 < , then by Lemma 1:
P i m o T θ ^ C L S m ^ θ ~ 0 i x i i m o T θ ^ C L S m ^ θ ~ 0 i x i = P i m o T θ ^ C L S m ^ θ ~ 0 i x i + o p ( 1 ) P Σ θ ~ 0 1 2 Z x i i m 0
where Σ θ ~ 0 = V 1 ( θ ~ 0 ) W ( θ ~ 0 ) V 1 ( θ ~ 0 ) , and Z is a standard normal random vector of dimension | m 0 |  □

Appendix A.2. Complementary Tables

Table A1. Frequency of model selection for INAR model of order 1 by the penalized criterion (6).
Table A1. Frequency of model selection for INAR model of order 1 by the penalized criterion (6).
Y t = ϕ 2 ( t ) Y t 2 + Z t
ϕ 2 = 0.3 Models to Be Selected
T Coef P T i . i . d . y t 1 y t 2 y t 3 y t 1 , y t 2 y t 1 , y t 3 y t 2 , y t 3 y t 1 , y t 2 , y t 3
100Fixed T 1 / 3 0.2830.0250.6470.0060.0160.0010.0220
l o g ( T ) 0.2810.0250.6480.0060.0160.0010.0230
T 1 / 5 0.080.0250.6910.0150.0880.0080.0850.008
Uniform T 1 / 3 0.3040.0330.6180.010.01100.0230.001
l o g ( T ) 0.2970.0340.6230.0110.01100.0230.001
T 1 / 5 0.1040.0380.6520.0150.0870.0030.0850.016
Beta T 1 / 3 0.2870.0290.640.0110.0110.0010.0210
l o g ( T ) 0.2810.0290.6450.0110.0110.0010.0220
T 1 / 5 0.1180.0320.6660.010.0760.0050.0810.012
200Fixed T 1 / 3 0.0580.0040.9020.0010.01700.0180
l o g ( T ) 0.0450.0030.9020.0010.02100.0240.001
T 1 / 5 0.00700.8040.0030.08100.0920.013
Uniform T 1 / 3 0.0910.0060.8820.0020.00600.0130
l o g ( T ) 0.0710.0070.8940.0020.00900.0130
T 1 / 5 0.0170.0020.80900.07400.090.008
Beta T 1 / 3 0.0590.0040.9120.0010.01100.0120.001
l o g ( T ) 0.0440.0030.9110.0010.01800.0210.002
T 1 / 5 0.0090.0010.8030.0010.08100.0920.013
300Fixed T 1 / 3 0.01300.970.0010.00800.0080
l o g ( T ) 0.00700.960.0010.01500.0170
T 1 / 5 0.00100.83600.07600.0770.01
Uniform T 1 / 3 0.01700.95600.01600.010.001
l o g ( T ) 0.00900.9400.02500.0240.002
T 1 / 5 000.83900.08100.2640.016
Beta T 1 / 3 0.01100.96700.01200.010
l o g ( T ) 0.00800.95600.01900.0170
T 1 / 5 000.85700.07800.0580.007
500Fixed T 1 / 3 000.9900.00800.0020
l o g ( T ) 000.97700.01200.0110
T 1 / 5 000.86800.05400.0660.012
Uniform T 1 / 3 000.99100.00300.0060
l o g ( T ) 000.97400.0100.0160
T 1 / 5 000.87400.04700.0730.006
Beta T 1 / 3 000.99300.00400.0030
l o g ( T ) 000.97600.01100.0130
T 1 / 5 000.87900.05600.0580.007
1000Fixed T 1 / 3 000.9980000.0020
l o g ( T ) 000.98700.00600.0070
T 1 / 5 000.91600.0400.0430
Uniform T 1 / 3 000.99600.00100.0030
l o g ( T ) 000.97800.01200.010
T 1 / 5 000.91400.04500.0360.005
Beta T 1 / 3 000.99800.00100.0010
l o g ( T ) 000.98700.00500.0080
T 1 / 5 000.90100.04100.0550.003
Table A2. Frequency of model selection for INARCH model of order 1 by the penalized criterion (6).
Table A2. Frequency of model selection for INARCH model of order 1 by the penalized criterion (6).
Y t | F t 1 ~ P o i s s o n ( λ t )
λ t = ϕ 0 + ϕ 1 Y t 1
ϕ 1 = 0.3 Models to Be Selected
T Criterion P T i . i . d . y t 1 y t 2 y t 3 y t 1 , y t 2 y t 1 , y t 3 y t 2 , y t 3 y t 1 , y t 2 , y t 3
100 H + P T · m T 1 / 3 0.090.8490.0030.020.0320.02200.002
l o g ( T ) 0.0870.850.0030.0020.0330.02300.002
T 1 / 5 0.0240.7460.0060.0020.1090.09400.019
A I C 0.0110.7130.0050.0010.1250.11600.029
B I C 0.060.8880.0020.0030.0230.02300.001
200 H + P T · m T 1 / 3 0.0010.958000.0170.02300.001
l o g ( T ) 0.0010.947000.0220.02900.001
T 1 / 5 00.821000.0710.0900.018
A I C 00.719000.1130.12900.039
B I C 00.959000.0190.02100.001
300 H + P T · m T 1 / 3 00.986000.0050.00900
l o g ( T ) 00.978000.010.01200
T 1 / 5 00.854000.0650.07500
A I C 00.707000.1240.14800.021
B I C 00.977000.0090.01400
500 H + P T · m T 1 / 3 00.994000.0040.00200
l o g ( T ) 00.978000.0120.0100
T 1 / 5 00.878000.060.05700.005
A I C 00.704000.1350.12600.035
B I C 00.973000.0170.0100
1000 H + P T · m T 1 / 3 00.997000.020.0100
l o g ( T ) 00.979000.0080.01300
T 1 / 5 00.904000.0420.0500.004
A I C 00.701000.130.14200.027
B I C 00.976000.0080.01600
Table A3. Frequency of model selection for AR model by the penalized criterion (6) with Z t misspecification.
Table A3. Frequency of model selection for AR model by the penalized criterion (6) with Z t misspecification.
Y t = ϕ 0 + ϕ 1 Y t 1 + ϕ 3 Y t 3 + Z t
Z t ~ U n i f o r m ( [ 2 , 2 ] )
ϕ 1 = 0.4 ,   ϕ 3 = 0.2 Models to Be Selected
TCriterion P T i . i . d . y t 1 y t 2 y t 3 y t 1 , y t 2 y t 1 , y t 3 y t 2 , y t 3 y t 1 , y t 2 , y t 3
100 H + P T · m T 1 / 3 0.0460.5950.0010.0160.0160.3170.0010.008
l o g ( T ) 0.0460.590.0010.0150.0170.3220.0010.008
T 1 / 5 0.0030.3660.0010.0080.0390.51600.067
A I C 0.0030.2690.0010.0030.0590.55400.111
B I C 0.0260.5780.0010.0120.0210.3490.0010.012
200 H + P T · m T 1 / 3 00.356000.0080.6300.006
l o g ( T ) 00.308000.0120.66900.011
T 1 / 5 00.115000.0260.79400.065
A I C 00.069000.0270.78700.117
B I C 00.283000.0140.69100.012
300 H + P T · m T 1 / 3 00.157000.0020.83200.009
l o g ( T ) 00.105000.0050.87500.015
T 1 / 5 00.029000.0090.90100.061
A I C 00.01000.0080.85100.131
B I C 00.098000.0060.88300.013
500 H + P T · m T 1 / 3 00.029000.010.96600.004
l o g ( T ) 00.013000.010.97500.011
T 1 / 5 00.002000.010.93300.064
A I C 00000.010.84100.158
B I C 00.011000.020.97700.01
1000 H + P T · m T 1 / 3 000000.99900.001
l o g ( T ) 000000.99200.008
T 1 / 5 000000.96200.038
A I C 000000.86400.136
B I C 000000.99200.008

References

  1. Steutel, F.W.; van Harn, K. Discrete analogues of self-decomposability and stability. Ann. Probab. 1979, 7, 893–899. [Google Scholar] [CrossRef]
  2. Al-Osh, M.A.; Alzaid, A.A. First-order integer-valued autoregressive (INAR(1)) process. J. Time Ser. Anal. 1987, 8, 261–275. [Google Scholar] [CrossRef]
  3. Du, J.G.; Li, Y. The integer-valued autoregressive (INAR(p)) model. J. Time Ser. Anal. 1991, 12, 129–142. [Google Scholar]
  4. Joe, H. Time series models with univariate margins in the convolution-closed infinitely divisible class. J. Appl. Probab. 1996, 33, 664–677. [Google Scholar] [CrossRef]
  5. Zheng, H.T.; Basawa, I.V.; Datta, S. First-order random coefficient integer-valued autoregressive processes. J. Stat. Plan. Inference 2007, 137, 212–229. [Google Scholar] [CrossRef]
  6. Zheng, H.T.; Basawa, I.V.; Datta, S. Inference for pth-order random coefficient integer-valued autoregressive processes. J. Time Ser. Anal. 2007, 23, 411–440. [Google Scholar] [CrossRef]
  7. Zhang, H.X.; Wang, D.H.; Zhu, F.K. Empirical likelihood inference for random coefficient INAR(p) process. J. Time Ser. Anal. 2011, 32, 195–203. [Google Scholar] [CrossRef]
  8. Ristić, M.M.; Bakouch, H.S.; Nastić, A.S. A new geometric first-order integer-valued autoregressive (NGINAR(1)) process. J. Stat. Plan. Inference 2009, 139, 2218–2226. [Google Scholar] [CrossRef]
  9. Tian, S.Q.; Wang, D.H.; Shuai, C. A seasonal geometric INAR(1) process based on negative binomial thinning operator. Stat. Pap. 2018, 61, 2561–2581. [Google Scholar] [CrossRef]
  10. Lu, Y. The predictive distributions of thinning-based count processes. Scand. J. Stat. 2019, 48, 42–67. [Google Scholar] [CrossRef]
  11. Weiß, C.H. An Introduction to Discrete-Valued Time Series; John Wiley & Sons Ltd.: Hoboken, NJ, USA, 2018. [Google Scholar]
  12. Latour, A. Existence and stochastic structure of a non-negative integer-valued autoregressive process. J. Time Ser. Anal. 1998, 719, 439–455. [Google Scholar] [CrossRef]
  13. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
  14. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  15. Ding, J.; Tarokh, V.; Yang, Y.H. Bridging AIC and BIC: A New Criterion for Autoregression. IEEE Trans. Inf. Theory 2018, 64, 4024–4043. [Google Scholar] [CrossRef]
  16. Variyath, A.M.; Chen, J.H.; Abraham, B. Empirical likelihood based variable selection. J. Stat. Plan. Inference 2010, 140, 971–981. [Google Scholar] [CrossRef]
  17. Chen, C.X.; Wang, M.; Wu, R.L.; Li, R.Z. A robust consistent information criterion for model selection based on empirical like-lihood. Stat. Sin. 2022, 32, 1205–1223. [Google Scholar]
  18. Konishi, S.; Kitagawa, G. Information Criteria and Statistical Modeling; Springer: New York, NY, USA, 2007. [Google Scholar]
  19. Ding, J.; Tarokh, V.; Yang, Y.H. Model selection techniques. IEEE Signal Process. Mag. 2018, 11, 16–34. [Google Scholar] [CrossRef]
  20. Weiß, C.H.; Feld, M.H.-J. On the performance of information criteria for model identification of count time series. Stud. Nonlinear Dyn. Econ. 2019, 24, 20180012. [Google Scholar] [CrossRef]
  21. Diop, M.L.; Kengne, W. Consistent model selection procedure for general integer-valued time series. Statistics 2022, 55, 1207–1230. [Google Scholar] [CrossRef]
  22. Wang, X.Y.; Wang, D.H.; Yang, K. Integer-valued time series model order shrinkage and selection via penalized quasi-likelihood approach. Metrika 2021, 84, 713–750. [Google Scholar] [CrossRef]
  23. Drost, F.C.; Van den Akker, R.; Werker, B.J.M. Efficient estimation of auto-regression parameters and innovation distributions for semiparametric integer-valued AR(p) models. J. R. Stat. Soc. Ser. B 2009, 71, 467–485. [Google Scholar] [CrossRef]
  24. Kang, J.W.; Lee, S.Y. Parameter Change Test for Random Coefficient Integer-Valued Autoregressive Process with Application to Polio Data Analysis. J. Time Ser. Anal. 2009, 30, 239–258. [Google Scholar] [CrossRef]
  25. Awale, M.; Balakrishna, N.; Ramanathan, T.V. Testing the constancy of the thinning parameter in a random coefficient integer autoregressive model. Stat. Pap. 2019, 60, 1515–1539. [Google Scholar] [CrossRef]
  26. Fokianos, K.; Rahbek, A.; Tjøstheim, D. Poisson autoregression. J. Am. Stat. Assoc. 2009, 104, 1430–1439. [Google Scholar] [CrossRef]
  27. Zucchini, W.; MacDonald, I.L.; Langrock, R. Hidden Markov Models for Time Series an Introduction Using R; CRC Press: New York, NY, USA, 2016. [Google Scholar]
  28. Chen, J.; Gupta, A.K. Parametric Statistical Change Point Analysis; Birkhäuser: New York, NY, USA, 2013. [Google Scholar]
  29. Johnston, N. Advanced Linear and Matrix Algebra; Springer Nature: Cham, Switzerland, 2021. [Google Scholar]
Figure 1. The impact of sample size on accuracy under different ϕ 3 settings.
Figure 1. The impact of sample size on accuracy under different ϕ 3 settings.
Entropy 25 01220 g001
Figure 2. The impact of ϕ 3 settings on accuracy under different sample sizes.
Figure 2. The impact of ϕ 3 settings on accuracy under different sample sizes.
Entropy 25 01220 g002
Figure 3. Number of COVID-19 infections in Cyprus, 13 March to 12 May 2020.
Figure 3. Number of COVID-19 infections in Cyprus, 13 March to 12 May 2020.
Entropy 25 01220 g003
Figure 4. Global frequency of earthquakes of magnitude seven or greater between 1900 and 2007.
Figure 4. Global frequency of earthquakes of magnitude seven or greater between 1900 and 2007.
Entropy 25 01220 g004
Table 5. Result of model selection with COVID-19 data.
Table 5. Result of model selection with COVID-19 data.
P T i . i . d . y t 1 y t 2 y t 3 y t 1 , y t 2 y t 1 , y t 3 y t 2 , y t 3 y t 1 , y t 2 , y t 3
T 1 / 3 28.615.922.920.517.911.822.915.7
l o g ( T ) 28.716.323.320.818.412.323.416.4
T 1 / 5 26.912.619.617.112.96.817.99.1
Table 6. Result of model selection with seismic frequency data.
Table 6. Result of model selection with seismic frequency data.
P T i . i . d . y t 1 y t 2 y t 3 y t 1 , y t 2 y t 1 , y t 3 y t 2 , y t 3 y t 1 , y t 2 , y t 3
T 1 / 3 29.810.729.535.915.315.133.119.0
l o g ( T ) 29.710.629.335.815.012.932.818.7
T 1 / 5 27.56.325.131.68.68.526.510.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, K.; Tao, T. Consistent Model Selection Procedure for Random Coefficient INAR Models. Entropy 2023, 25, 1220. https://doi.org/10.3390/e25081220

AMA Style

Yu K, Tao T. Consistent Model Selection Procedure for Random Coefficient INAR Models. Entropy. 2023; 25(8):1220. https://doi.org/10.3390/e25081220

Chicago/Turabian Style

Yu, Kaizhi, and Tielai Tao. 2023. "Consistent Model Selection Procedure for Random Coefficient INAR Models" Entropy 25, no. 8: 1220. https://doi.org/10.3390/e25081220

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop