Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Methods for Applied Macroeconomic Research
Methods for Applied Macroeconomic Research
Methods for Applied Macroeconomic Research
Ebook876 pages10 hours

Methods for Applied Macroeconomic Research

Rating: 3.5 out of 5 stars

3.5/5

()

Read preview

About this ebook

The last twenty years have witnessed tremendous advances in the mathematical, statistical, and computational tools available to applied macroeconomists. This rapidly evolving field has redefined how researchers test models and validate theories. Yet until now there has been no textbook that unites the latest methods and bridges the divide between theoretical and applied work.


Fabio Canova brings together dynamic equilibrium theory, data analysis, and advanced econometric and computational methods to provide the first comprehensive set of techniques for use by academic economists as well as professional macroeconomists in banking and finance, industry, and government. This graduate-level textbook is for readers knowledgeable in modern macroeconomic theory, econometrics, and computational programming using RATS, MATLAB, or Gauss. Inevitably a modern treatment of such a complex topic requires a quantitative perspective, a solid dynamic theory background, and the development of empirical and numerical methods--which is where Canova's book differs from typical graduate textbooks in macroeconomics and econometrics. Rather than list a series of estimators and their properties, Canova starts from a class of DSGE models, finds an approximate linear representation for the decision rules, and describes methods needed to estimate their parameters, examining their fit to the data. The book is complete with numerous examples and exercises.


Today's economic analysts need a strong foundation in both theory and application. Methods for Applied Macroeconomic Research offers the essential tools for the next generation of macroeconomists.

LanguageEnglish
Release dateSep 19, 2011
ISBN9781400841028
Methods for Applied Macroeconomic Research
Author

Fabio Canova

Fabio Canova is ICREA Research Professor at the University of Pompeu Fabra in Barcelona and Fellow of the Centre for Economic Policy Research in London.

Related to Methods for Applied Macroeconomic Research

Related ebooks

Economics For You

View More

Related articles

Reviews for Methods for Applied Macroeconomic Research

Rating: 3.3333333333333335 out of 5 stars
3.5/5

3 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Methods for Applied Macroeconomic Research - Fabio Canova

    Research

    1

    Preliminaries

    This chapter is introductory and intended for readers who are unfamiliar with time series concepts, with the properties of stochastic processes, with basic asymptotic theory results, and with the principles of spectral analysis. Those who feel comfortable with these topics can skip directly to chapter 2.

    Since the material is vast and complex, an effort is made to present it at the simplest possible level, emphasizing a selected number of topics and only those aspects which are useful for the central topic of this book: comparing the properties of dynamic stochastic general equilibrium (DSGE) models to the data. This means that intuition rather than mathematical rigor is stressed. More specialized books, such as those by Brockwell and Davis (1991), Davidson (1994), Priestley (1981), or White (1984), provide a comprehensive and in-depth treatment of these topics.

    When trying to provide background material, there is always the risk of going too far back to the basics, of trying to reinvent the wheel. To avoid this, we assume that the reader is familiar with simple concepts of calculus such as limits, continuity, and uniform continuity of functions of real numbers, and that she is familiar with distributions functions, measures, and probability spaces.

    The chapter is divided into six sections. The first defines what a stochastic process is. The second examines the limiting behavior of stochastic processes introducing four concepts of convergence and characterizing their relationships. Section 1.3 deals with time series concepts. Section 1.4 deals with laws of large numbers. These laws are useful to ensure that functions of stochastic processes converge to appropriate limits. We examine three situations: a case where the elements of a stochastic process are dependent and identically distributed; one where they are dependent and heterogeneously distributed; and one where they are martingale differences. Section 1.5 describes three central limit theorems corresponding to the three situations analyzed in section 1.4. Central limit theorems are useful for deriving the limiting distribution of functions of stochastic processes and are the basis for (classical) tests of hypotheses and for some model evaluation criteria.

    Section 1.6 presents elements of spectral analysis. Spectral analysis is useful for breaking down economic time series into components (trends, cycles, etc.), for building measures of persistence in response to shocks, for computing the asymptotic covariance matrix of certain estimators, and for defining measures of distance between a model and the data. It may be challenging at first. However, once it is realized that most of the functions typically performed in everyday life employ spectral methods (frequency modulation in a stereo, frequency band reception in a cellular phone, etc.), the reader should feel more comfortable with it. Spectral analysis offers an alternative way to look at time series, translating serially dependent time observations into contemporaneously independent frequency observations. This change of coordinates allows us to analyze the primitive cycles which compose time series and to discuss their length, amplitude, and persistence.

    , and P , where, for each tis a measurable functionis the real line. We assume that at each t , so that any function h(or yt .A normal random variable with zero mean and variance ∑y and a random variable uniformly distributed over the interval [a1, a. Finally, i.i.d. indicates identically and independently distributed random variables and a white noise is an i.i.d. process with zero mean and constant variance.

    1.1 Stochastic Processes

    Definition 1.1. (stochastic process). is a probability measure defined on sets of sequences of real vectors (the paths of the process).

    and t for a given t, and performing countable unions, finite intersections, and complementing the above set of paths, we generate a set of events with proper probabilities. Note that yt is unrestricted for all τ ≤ tonly at t². Two simple stochastic processes are the following.

    Example 1.1. , where e1 and e2 are random variables, e. Here yt is periodic: e1 controls the amplitude and e2 the periodicity of yt.

    is such that P[yt = ±1] = 0.5 for all t. Such a process has no memory and flips between −1 and 1 as t changes.

    Example 1.2. , and e1t and e2t , et ~ i.i.d. (0, 1) is a stochastic process.

    1.2 Convergence Concepts

    In a classical framework the properties of estimators are obtained by using sequences of estimators indexed by the sample size, and by showing that these sequences approach the true (unknown) parameter value as the sample size grows to infinity. Since estimators are continuous functions of the data, we need to ensure that the data possess a proper limit and that continuous functions of the data inherit these properties. To show that the former is the case, one can rely on a variety of convergence concepts. The first two deal with convergence of the sequence, the next with its moments, and the last with its distribution.

    1.2.1 Almost Sure Convergence

    The concept of almost sure (a.s.) convergence extends the idea of convergence to a limit employed in the case of a sequence of real numbers.

    , convergence can be similarly defined.

    Definition 1.2 (a.s. convergence). , and every ε > 0.

    converges a.s. if the probability of obtaining a path for yt after some T is infinite dimensional, a.s. convergence is called convergence almost everywhere; sometimes a.s. convergence is termed convergence with probability 1 or strong consistency criteria.

    Next, we describe the limiting behavior of functions of a.s. convergent sequences.

    Result 1.1. . Let h be an n .

    Result 1.1 is a simple extension of the standard fact that continuous functions of convergent sequences are convergent.

    Example 1.3. .

    Exercise 1.1. with probability 1 − 1/t with probability 1/tconverge a.s. to 1? Suppose

    . What is its a.s. limit?

    becomes arbitrarily small as t is another sequence of random variables. To obtain convergence in this situation we need to strengthen the conditions by requiring uniform continuity of h (for example, assuming continuity on a compact set).

    Result 1.2. Let h such that, for all t > T, y2t , uniformly in t.

    are given, and let h be some continuous statistics, e.g., the mean or the variance. Then, result 1.2 tells us that if simulated and actual paths are close enough as t → ∞, statistics generated from these paths will also be close.

    1.2.2 Convergence in Probability

    Convergence in probability is a weaker concept than a.s. convergence.

    Definition 1.3. (convergence in probability). for t .

    as t implies that after T as T .

    Example 1.4. Let yt and be independent ∀t, τ, let yt be either 0 or 1 and let

    Then P[yt, = 0] = 1 − 1/(j + 1) for t = 2j−1 + 1,…, 2j, j . This is because the probability that yt is in one of these classes is 1/j and, as t → ∞, the number of classes goes to infinity. However, yt does not converge a.s. to 0 since the probability that a convergent path is drawn is 0; i.e., the probability of getting a 1 for any t = 2j−1 + 1,…, 2j, j > 1, is small but, since the streak 2j−1 + 1,…, 2j is large, the probability of getting a 1 is 1 − [1 − 1/(j + 1)]²(j − ¹), which converges to 1 as j .

    Although convergence in probability does not imply a.s. convergence, the following result shows how the latter can be obtained from the former.

    Result 1.3. (see, for example, Lukacs 1975, p. 48).

    Intuitively, since convergence in probability allows a more erratic behavior in the converging sequence than a.s. convergence, one can obtain the latter by disregarding the erratic elements. The concept of convergence in probability is useful to show weak consistency of certain estimators.

    Example 1.5. (i) Let yt be a sequence of i.i.d. random variables with E(yt(Kolmogorov strong law of large numbers).

    (ii) Let yt be a sequence of uncorrelated random variables, E(yt) < ∞,

    (Chebyshev weak law of large numbers).

    In example 1.5 strong consistency requires i.i.d. random variables, while for weak consistency we just need a set of uncorrelated random variables with identical means and variances. Note also that weak consistency requires restrictions on the second moments of the sequence which are not needed in the former case.

    The analogs of results 1.1 and 1.2 for convergence in probability can be easily obtained.

    Result 1.4. for any continuous function h (see White 1984, p. 23).

    Result 1.5. Let h and, for large t, y2t , uniformly in t(see White 1984, p. 25).

    Sometimes yt , where each ej is i.i.d., has a limit which is not in the space of i.i.d. variables. In other cases, the limit point may be unknown. For all these cases, we can redefine a.s. convergence and convergence in probability by using the concept of Cauchy sequences.

    Definition 1.4 (convergence a.s. and in probability). {ytfor some τ > t > T

    1.2.3 Convergence in Lq-Norm

    While a.s. convergence and convergence in probability concern the path of yt, Lq-convergence refers to the qth moment of yt. Lq-convergence is typically analyzed when q = 2 (convergence in mean square), when q = 1 (absolute convergence), and when q = ∞ (minmax convergence).

    Definition 1.5. (convergence in the norm). {yt)} converges in the Lq-norm (or in the q, if there exists a yfor some q > 0.

    Obviously, if the qth moment does not exist, convergence in Lq does not apply (i.e., if yt is a Cauchy random variable, Lq-convergence is meaningless for all q), while convergence in probability applies even when moments do not exist. Intuitively, the difference between the two types of convergence lies in the fact that the latter allows the distance between yt and y to get large faster than the probability gets smaller, while this is not possible with Lq-convergence. Consequently, Lq-convergence is stronger than convergence in probability.

    Exercise 1.2. Let yt converge to 0 in Lq. Show that yt converges to 0 in probability. (Hint: use Chebyshev’s inequality.)

    The following result provides conditions ensuring that convergence in probability implies Lq-convergence.

    Result 1.6. and supt (Davidson 1994, p. 287).

    Hence, convergence in probability plus the restriction that |yt|q is uniformly integrable, ensures convergence in the Lq-norm. In general, there is no relationship between Lq and a.s. convergence. The following shows that the two concepts are distinct.

    Example 1.6. Let y) = t [0, 1/t) and y( ) (1 / t: limt→∞ yt. Since yt is not uniformly integrable it fails to converge in the q mean for any q > 1 (for q = 1, E|yt| = 1, ∀t). Hence, the limiting expectation of yt differs from its a.s. limit.

    Exercise 1.3. Let

    Show that the first and second moments of yt but that yt does not converge in quadratic mean to 1.

    The next result shows that convergence in the Lq′ -norm obtains when we know that convergence in the Lq-norm occurs, q > q′. The result makes use of Jensen’s inequality, which we state next. Let h and let y . If h .

    Example 1.7. For

    .

    Result 1.7. Let

    .

    Example 1.8. and P1) = P2) = 0.5. Let yt1) = (−1)t, yt2) = (−1)t+1 and let y1) = y2) = 0. Clearly, yt converges in the Lq. Since yt .

    1.2.4 Convergence in Distribution

    Definition 1.6 (convergence in distribution). Let {yt)} be an m , for every point of continuity zis the distribution function of a random variable y.

    Convergence in distribution is the weakest convergence concept and does not imply, in general, anything about the convergence of a sequence of random variables. Moreover, while the previous three convergence concepts require {yt)} and the limit y) to be defined on the same probability space, convergence in distribution is meaningful even when this is not the case.

    It is useful to characterize the relationship between convergence in distribution and convergence in probability.

    Result 1.8. , yy is the distribution of a random variable z such that P[z = y(see Rao 1913, p. 120).

    y is a continuous function of y.

    The next two results are handy when demonstrating the limiting properties of a class of estimators in dynamic models. Note that y1t) is Op(t j) if there exists an O(1) nonstochastic sequence y2t and that y2t is O(1) if for some 0 < Δ < ∞, there exists a T such that |y2t| < Δ for all t T.

    Result 1.9. is a constant (Davidson 1994, p. 355). If y1t and y2t (Rao 1913, p. 123).

    Result 1.9 is useful when the distribution of y1t cannot be determined directly. In fact, if we can find a y2t with known asymptotic distribution which converges in probability to y1t, then the distribution of y1t can automatically be obtained. We will use this result in chapter 5 when discussing two-step estimators.

    The limiting behavior of continuous functions of sequences which converge in distribution is easy to characterize. In fact, we have the following result.

    Result 1.10. . If h (Davidson 1994, p. 355).

    1.3 Time Series Concepts

    Most of the analysis conducted in this book assumes that observable time series are stationary and have memory which dies out sufficiently fast over time. In some cases we will use alternative and weaker hypotheses which allow for selected forms of nonstationarity and/or for more general memory requirements. This section provides definitions of these concepts and compare various alternatives.

    We need two preliminary definitions.

    Definition 1.7 (lag operator). The lag operator is defined by, ℓyt = yt−1 and −1yt = yt+1. The matrix lag operator A(ℓ) is defined by A(ℓ) = A0 + A1+ A2² + . . . , where Aj, j = 1, 2, . . . , are m × m matrices.

    Definition 1.8 (autocovariance function). The autocovariance function of {yt)} is

    and its autocorrelation function is

    In general, both the autocovariance and the autocorrelation functions depend on time and on the gap between yt and ytτ.

    Definition 1.9 (stationarity 1). is stationary if, for any set of paths

    .

    A process is stationary if shifting a path over time does not change the probability distribution of that path. In this case, the joint distribution of {yt1,. . . . , ytj. A weaker concept is the following.

    Definition 1.10 (stationarity 2). is covariance (weakly) stationary if E(yt) is constant; E|yt|² < ∞; ACFt(τ) is independent of t.

    Definition 1.10 is weaker than 1.9 since it concerns the first two moments of yt rather than its joint distribution. Clearly, a stationary process is weakly stationary, while the converse is true only when the yt are normal random variables. In fact, when yt path is normal.

    Example 1.9. Let yt = e1 cos(ωt) + e2 sin(ωt), where e1, e2 are uncorrelated with mean zero, unit variance, and ω [0, 2π]. Clearly, the mean of yt is constant and E|yt|² < ∞. Also, cov(yt, yt+τ) = cos(ωt) cos(ω(t + τ)) + sin(ωt) × sin(ω(t + τ)) = cos(ωτ). Hence, yt is covariance stationary.

    Exercise 1.4. Suppose yt = et if t is odd and yt = et + 1 if t is even, where et ~ i.i.d. (0, 1). Show that yt , where et is a constant, is not a stationary process, but that Δyt = yt yt−1 is stationary.

    When {yt0, (ii) |ACF(τACF(0), (iii) ACF(−τ) = ACF(τ) for all τ. Furthermore, if y1t and y2t are two stationary uncorrelated processes, y1t + y2t is stationary and the autocovariance function of y1t + y2t is ACFy1 (τ) + ACFy2 (τ).

    Example 1.10. , where |D| < 1 and et ~ i.i.d. (0, σ²). Clearly, yt is not covariance stationary since E(yt+ at, which depends on time. Taking first difference we have Δyt = a + DΔet. Here Eyt) = a, Eyt − a)² = 2D²σ² > 0, Eyt − a)(Δyt−1 − a) = −D²σ² < Eyt − a)², and E(Ayt − a)(Δyt+1 − a) = –D²σ².

    Exercise 1.5. , where et , a . Compute the mean and the autocovariance function of y2t. Is y2t stationary? Is it covariance stationary?

    Definition 1.11 (autocovariance generating function). , provided that the sum converges for all z −1 < |z> 1.

    Example 1.11. Consider the process yt = etDet−1 = (1−Dℓ)et, |D| < 1, et ,

    . Hence,

    Example 1.11 can be generalized to more complex processes. In fact, if yt , and this holds for both univariate and multivariate yt. One interesting special case occurs when z = e−iω = cos(ω) – i sin(ω, in which case

    is the spectral density of yt.

    Exercise 1.6. Consider yt = (1 + 0.5+ 0.8²)et, and (1 – 0.25)yt = et, where et . Are these processes covariance stationary? If so, show the autocovariance and the autocovariance generating functions.

    Exercise 1.7. Let {y1t)} be a stationary process and let h be an n × 1 vector of continuous functions. Show that y2t = h(y1t) is also stationary.

    Stationarity is a weaker requirement than i.i.d., where no dependence between elements of a sequence is allowed, but it is stronger that the identically (not necessarily independently) distributed assumption.

    Example 1.12. Let yt ~ i.i.d. (0, 1), ∀t. Since ytτ ~ i.i.d. (0, 1), ∀τ, any finite subsequence ytl+τ, . . . , ytj+τ will have the same distribution and therefore yt is stationary. It is easy to see that a stationary series is not necessarily i.i.d. For instance, let yt = et Det−1. If |D| < 1, yt is stationary but not i.i.d.

    Exercise 1.8. Give an example of a process which is identically (but not necessarily independently) distributed but nonstationary.

    In this book, processes which are stationary will sometimes be indicated with the notation I(0), while processes which are stationary after d differences will be denoted by I(d).

    A property of stationary sequences which ensures that the sample average converges to the population average is ergodicity (see section 1.4). Ergodicity is typically defined in terms of invariant events.

    Definition 1.12 (ergodicity 1). Suppose yt) = y1(ℓt−1 ), ∀t. Then {yt.

    ) will converge to the same limit. Hence, one path is sufficient to infer the moments of its distribution.

    Example 1.13. be the length of the interval [y0, yt, yt or 0 so yt is not ergodic.

    Example 1.14. Consider the process yt = et – 2et−1, where et and cov(yt, ytτ) does not depend on t. Therefore, the process is covariance stationary. To verify that it is ergodic, consider the sample mean (1 / T) Σt yt. The sample variance is

    , which converges to var(yt.

    Example 1.15. Let yt = e1 + e2t, where e2t ~ i.i.d. (0, 1) and e1 ~ i.i.d. (1, 1). Clearly, yt is stationary and E(yt) = 1. However, (1/Ttyt = e1 + (1/T) × Σte2t and

    . Since the time average of yt (equal to e1) is different from the population average of yt (equal to 1), yt is not ergodic.

    What is wrong with example 1.15? Intuitively, yt is not ergodic because it has too much memory (e1 appears in yt for every t). In fact, for ergodicity to hold, the process must forget its past reasonably fast. The laws of large numbers of section 1.4 give conditions ensuring that the memory of the process is not too strong.

    Exercise 1.9. Suppose yt = 0.6yt−1 + 0.2yt−2 + et, where et ~ i.i.d. (0, 1). Is yt stationary? Is it ergodic? Find the effect of a unitary change in et on yt+3. Repeat the exercise for yt = 0.4yt−1 + 0.8yt−2 + et.

    Exercise 1.10. Consider the bivariate process:

    where E(e1t e1τ) = 1 for τ = t and 0 otherwise, E(e2t e2τ) = 2 for τ = t and 0 otherwise, and E(e1t e2τ) = 0 for all τ, t. What is the limit of this derivative as τ → ∞?

    Exercise 1.11. Suppose that at t is given by

    Show that yt is stationary but not ergodic. Show that a single path (i.e., a path composed of only 1s and 0s) is ergodic.

    Exercise 1.12. , where et . Show that yt is neither stationary nor ergodic. Show that the sequence {yt, yt+4, yt+8, . . . } is stationary and ergodic.

    Exercise 1.12 shows an important result: if a process is nonergodic, it may be possible to find a subsequence which is ergodic.

    Exercise 1.13. Show that if {y1t)} is ergodic, y2t = h(y1t) is ergodic if h is continuous.

    A concept which bears some resemblance to ergodicity is that of mixing.

    Definition 1.13 (mixing 1). be two Borel algebrasand B-mixing and α-mixing are defined as follows:

    -mixing and αand α provides a measure of relative dependence while α measures absolute dependence.

    For a stochastic process αbe the Borel algebra generated by values of yt from the infinite past up to t be the Borel algebra generated by values of yt from t + τ contains information up to t information from t + τ on.

    Definition 1.14. (mixing 2). For a stochastic process {ytand α and α(τ) = supt .

    (τ) and α(τ), called respectively uniform and strong mixing, measure how much dependence there is between elements of {yt} separated by τ (τ) = α(τ) = 0, yt and yt+τ (τ) = α(τ) = 0 as τ (τα(τ-mixing implies α-mixing.

    Example 1.16. Let yt be such that cov(yt ytτ1) = 0 for some τ1. Then

    . Then α(τ) = 0 as τ → ∞.

    Exercise 1.14. (τ) does not go to zero as τ → ∞.

    Mixing is a somewhat stronger memory requirement than ergodicity. Rosenblatt (1978) shows the following result.

    Result 1.11. Let yt be stationary. If α(τ) → 0 as τ → ∞, yt is ergodic.

    Exercise 1.15. (τα(τ(τ) → 0 as τ -mixing process is ergodic.

    Both ergodicity and mixing are hard to verify in practice. A concept which bears some relationship to both and is easier to check is the following.

    Definition 1.15. (asymptotic uncorrelatedness). ytand

    , where var(yt) < ∞, ∀t.

    Intuitively, if we can find an upper bound to the correlation of yt and ytτ, ∀τ, and if the accumulation over τ of this bound is finite, the process has a memory that asymptotically dies out.

    Example 1.17. Let yt = Ayt−1 + et, et . Here corr(yt, ytτ) = , so that yt has asymptotically uncorrelated elements.

    Note that in definition 1.15 only τ > 0 matters. From example 1.17 it is clear that when var(yt) is constant and the covariance of yt with ytτ only depends on τ, asymptotic uncorrelatedness is the same as covariance stationarity.

    Exercise 1.16. as τ for some b > 0, τ sufficiently large.

    Exercise 1.17. Suppose that yt is such that the correlation between yt and ytτ goes to zero as τ → ∞. Is this sufficient to ensure that yt is ergodic?

    Instead of assuming stationarity and ergodicity or mixing, one can assume that yt satisfies an alternative set of conditions. These conditions considerably broaden the set of time series a researcher can work with.

    Definition 1.16. (martingale). {ytfor all t, τ.

    Definition 1.17. (martingale difference). {yt.

    Example 1.18. Let yt be i.i.d. with E(yt. Then yt is a martingale difference sequence.

    Martingale difference is a much weaker requirement than stationarity and ergodicity since it only involves restrictions on the first conditional moment. It is therefore easy to build examples of processes which are martingale differences but are not stationary.

    Example 1.19. Suppose that yt is i.i.d. with mean zero and variance t². Then yt is a martingale difference, nonstationary process.

    Exercise 1.18. Let y1t be its conditional expectation. Show that y2t is a martingale.

    Using the identity

    , one can write

    where

    is the one-step-ahead revision in yt, made with new information accrued from t j – 1 to t j. Revtj(t) plays an important role in deriving the properties of functions of stationary processes, and will be extensively used in chapters 4 and 10.

    Exercise 1.19. Show that Revtj(t) is a martingale difference.

    1.4 Laws of Large Numbers

    , which appear in the formulas of OLS or IV estimators stochastically converge to well-defined limits. Since different conditions apply to different kinds of economic data, we consider here situations which are typically encountered in macro-time series contexts. Given the results of section 1.2, we will describe only strong laws of large numbers since weak laws of large numbers hold as a consequence.

    Laws of large numbers typically come in the following form: given restrictions on the dependence and the heterogeneity of the observations and/or some restrictions on moments,

    We will consider three cases: (i) yt has dependent and identically distributed elements; (ii) yt has dependent and heterogeneously distributed elements; (iii) yt has martingale difference elements. To better understand the applicability of each case note that in all cases observations are serially correlated. In the first case we restrict the distribution of the observations to be the same for every t; in the second we allow some carefully selected form of heterogeneity (for example, structural breaks in the mean or in the variance or conditional heteroskedasticity); in the third we do not restrict the distribution of the process, but impose conditions on its moments.

    1.4.1 Dependent and Identically Distributed Observations

    To state a law of large numbers (LLN) for stationary processes, we need conditions on the memory of the sequence. Typically, one assumes ergodicity since this implies average asymptotic independence of the elements of the {yt)} sequence.

    The LLN is then as follows. Let {yt)} be stationary and ergodic with E|yt| < ∞, ∀t(see Stout 1974, p. 181).

    To use this law when dealing with econometric estimators, recall that, for any measurable function h such that y2t = h(y1t), y2t is stationary and ergodic if y1t is stationary and ergodic.

    Exercise 1.20 (strong consistency of OLS and IV estimators). Let

    , and assume

    (i)

    (ii)

    , where Σzx, T is an O(1) random matrix which depends on T and has uniformly continuous column rank.

    Show that αOLS = (xx)−1 (xy) and αIV = (zx)−1 (zy) exist a.s. for T under (ii). Show that under (ii′) αIV exists a.s. for T . (Hint: if An is a sequence of kk matrices, then An has uniformly full column rank if there exists a sequence of k × k which is uniformly nonsingular.)

    1.4.2 Dependent and Heterogeneously Distributed Observations

    To derive an LLN for dependent and heterogeneously distributed processes, we drop the ergodicity assumption and we substitute it with a mixing requirement. In addition, we need to define the size of the mixing conditions.

    Definition 1.18. Let 1 ≤ a (τ) = O(τb) for b > a/(2a (τ) is of size a/(2a – 1). If a > 1 and α(τ) = O(τ−b) for b > a/(a – 1), α(τ) is of size a/(a – 1).

    With definition 1.18 one can make precise statements on the memory of the process. In fact, a regulates the memory of a process. As a → ∞, the dependence increases while as a → 1, the sequence exhibits less and less serial dependence.

    The LLN is the following. Let {yt(τ) of size a/(2a – 1) or α(τ) of size a/(a – 1), a > 1, and E(yt) < ∞, ∀t. If, for some 0 < b ≤ a,

    (see McLeish 1974, theorem 2.10).

    In this law, the elements of yt are allowed to have time-varying distributions (e.g., E(yt) may depend on t) but the condition

    restricts the moments of the process. Note that, for a = 1 and b = 1, the above collapses to Kolmogorov law of large numbers.

    The moment condition can be weakened somewhat if we are willing to impose a bound on the (a + b)th moment.

    Result 1.12. Let {yt(τ) of size a/(2a – 1) or with α(τ) of size a/(a – 1), a > 1, such that E|yt|a+b is bounded for all t.

    The next result mirrors the one obtained for Stationary ergodic processes.

    Result 1.13. Let h , τ finite. If y1t (τ) (α(τ)) is O(τb) for some b > 0, y2τ (τ) (α(τ)) is O(τb).

    From the above result it immediately follows that, if {zt, xt, etare also mixing processes of the same size.

    The following result is useful when observations are heterogeneous.

    Result 1.14. Let {yt(see White 1984, p. 48).

    The LLN for processes with asymptotically uncorrelated elements is the following. Let {yt)} be a process with asymptotically uncorrelated elements, mean E(yt

    Compared with result 1.12, we have relaxed the dependence restriction from mixing to asymptotic uncorrelation at the cost of altering the restriction on moments of order a + b (a 1, b a, etc., directly.

    1.4.3 Martingale Difference Process

    The LLN for this type of process is the following. Let {yt)} be a martingale difference. If, for some a .

    The martingale LLN requires restrictions on the moments of the process which are slightly stronger than those assumed in the case of independent yt. The analogue of result 1.12 for martingale differences is the following.

    Result 1.15. Let {yt)} be a martingale difference such that E|yt|²a < Δ < ∞, for some a 1 and all t.

    Exercise 1.21. Suppose {y1t)} is a martingale difference. Show that y2t = y1tzt .

    Exercise 1.22. Let yt = xtα0 + et and assume that et is positive and finite. Show that α.

    1.5 Central Limit Theorems

    There are also several central limit theorems (CLTs) available in the literature. Clearly, their applicability depends on the type of data a researcher has available. In this section we list CLTs for the three cases we have described in section 1.4. Loeve (1977) or White (1984) provide theorems for other relevant cases.

    1.5.1 Dependent and Identically Distributed Observations

    for τ → ∞ (referred to as linear regularity in as τ → ∞. The second condition is obviously stronger than the first one. Restrictions on the variance of the process are needed since when yt is a dependent and identically distributed process its variance is the sum of the variances of the forecast revisions made at each t, and this may not converge to a finite limit. We ask the reader to show this in the next two exercises.

    Exercise 1.23. , where Revtj (t) was defined just before exercise 1.19. Note that this implies that

    .

    Exercise 1.24. Give conditions on yt that make ρτ independent of tgoes to ∞ as T → ∞.

    converges is that

    A CLT is then as follows. Let (i) {yt)} be stationary and ergodic process,

    and

    (see Gordin 1969).

    Example 1.20. . Consider, for example, yt = et et−1, et .

    Exercise 1.25. Assume that

    , where αOLS is the OLS estimator of α0 in the model yt = xtα0 + et and T is the number of observations.

    1.5.2 Dependent Heterogeneously Distributed Observations

    The CLT in this case is the following. Let {yt(τ) or α(τ) is of size a/a – 1, a > 1, with E(yt) = 0 and E|yt|²a < Δ < ∞, ∀t, uniformly in b. Then

    as T (see White and Domowitz 1984).

    As in the previous CLT, we need the condition that the variance of yt in b. This is equivalent to imposing that yt is asymptotically covariance stationary, that is, that heterogeneity in yt dies out as T increases (see White 1984, p. 128).

    1.5.3 Martingale Difference Observations

    The CLT in this case is as follows. Let {ytt be the distribution function of yt > 0,

    (see McLeish 1974).

    The last condition is somewhat mysterious: it requires that the average contribution of the extreme tails of the distribution to the variance of yt is zero in the limit. If this condition holds, then yt satisfies a uniform asymptotic negligibility condition. In other words, none of the elements of {yt)} can have a variance which dominates the variance of (1/T) Σt yt. We illustrate this condition in the next example.

    Example 1.21. , 0 < ρ as T → ∞. Then

    as T → ∞ and the asymptotic negligibility condition holds.

    The martingale difference assumption allows us to weaken several of the conditions needed to prove a central limit theorem relative to the case of stationary processes, and it will be the assumption used in several parts of this book.

    A result, which will become useful in later chapters, concerns the asymptotic distribution of functions of converging stochastic processes.

    Result 1.16. Suppose the m × 1 vector {yt, where Σy is a symmetric, nonnegative definite matrix and at → 0 as t be such that each hj (yis an n × m .

    Example 1.22. Suppose yt .

    1.6 Elements of Spectral Analysis

    A central object in the analysis of time series is the spectral density (or spectrum).

    Definition 1.19. (spectral density). The spectral density of a stationary {yt)} process at frequency ω .

    We have already mentioned that the spectral density is a reparametrization of the covariance generating function and is obtained by setting z = e−iω = cos(ω) – i sin(ω. Definition 1.19 also shows that the spectral density is the Fourier transform of the autocovariance of yt. Hence, the spectral density simply repackages the autocovariances of {yt)} by using sine and cosine functions as weights but can be more useful than the autocovariance function since, for ω appropriately chosen, its elements are uncorrelated.

    In fact, if we evaluate the spectral density at Fourier frequencies, i.e., at ωj = 2πj/T, j = 1, . . . , T – 1, for any two ω1 ≠ ω(ω(ω2). Note that Fourier frequencies change with T, making recursive evaluation of the spectral density cumbersome.

    Example 1.23. (ω (ωj). It is easily verified that

    that is, the spectral density at frequency zero is the (unweighted) sum of all the elements of the autocovariance function. When ωj = 2πj/T,

    , that is, the variance of the process is the area below the spectral density.

    To understand how the spectral density transforms the autocovariance function, select, for example, ω = π/2. Note that cos(π/2) = 1, cos(3π/2) = –1, cos(π) = cos(2π) = 0, and that sin(π/2) = sin(3π/2) = 0, sin(0) = 1, and sin(π) = –1, and that these values repeat themselves since the sine and cosine functions are periodic.

    Exercise 1.26. (ω = π). Which autocovariances enter at frequency π?

    For a Fourier frequency, the corresponding period of oscillation is 2π/ωj = T/j.

    Example 1.24. Suppose you have quarterly data. Then, at the Fourier frequency π/2, the period is equal to 4. That is, at frequency π/2 you have fluctuations with an annual periodicity. Similarly, at the frequency π, the period is 2 so that semiannual cycles are present at π.

    Exercise 1.27. Business cycles are typically thought to occur with a periodicity between two and eight years. Assuming that you have quarterly data, find the Fourier frequencies characterizing business cycle fluctuations. Repeat the exercise for annual and monthly data.

    Figure 1.1. (a) Short and (b) long cycles.

    Given the formula to calculate the period of oscillation, we can immediately see that low frequencies are associated with cycles of long periods of oscillation, that is, with infrequent shifts from a peak to a through, and high frequencies with cycles of short periods of oscillation, that is, with frequent shifts from a peak to a through (see figure 1.1). Hence, trends (i.e., cycles with an infinite periodicity) are located in the lowest frequencies of the spectrum and irregular fluctuations in the highest frequencies. Since the spectral density is periodic mod(2π) and symmetric around ω (ω) over the interval [0, π].

    Exercise 1.28. (ωj(−ωj).

    Example 1.25. Suppose {ytfor τ y(ωj) = σ²/2π, ∀ωj. That is, the spectral density of an i.i.d. process is constant for all ωj [0, π].

    Exercise 1.29. Consider a stationary AR(1) process {yt)} with autoregressive coefficient equal to 0 ≤ A < 1. Calculate the autocovariance function of yt. Show that the spectral density is monotonically increasing as ωj → 0.

    Exercise 1.30. Consider a stationary MA(1) process {yt)} with MA coefficient equal to D. Calculate the autocovariance function and the spectral density of yt. Show their shape when D > 0 and D < 0.

    Economic time series have a typical bell-shaped spectral density (see figure 1.2) with a large portion of the variance concentrated in the lower part of the spectrum. Given the result of exercise 1.29, it is therefore reasonable to posit that most economic time series can be represented with relatively simple AR processes.

    Figure 1.2. Spectral density.

    The definitions we have given are valid for univariate processes but can be easily extended to vectors of stochastic processes.

    Definition 1.20 (spectral density matrix). The spectral density of an m , where

    The elements on the diagonal of the spectral density matrix are real while the off-diagonal elements are typically complex. A measure of the strength of the relationship between two series at frequency ω is given by the coherence.

    Definition 1.21. Consider a bivariate stationary process {y1t), y2t)}. The coherence between {y1t)} and {y2t)} at frequency ω is

    The coherence is the frequency domain version of the correlation coefficient. Note that Co(ω) is a real-valued function, where |y| indicates the real part (or the modulus) of the complex number y.

    Example 1.26. Suppose yt = D()et, where et . It can be immediately verified that the coherence between et and yt is 1 at all frequencies. Suppose, on the other hand, that Co(ω) monotonically declines to 0 as ω moves from 0 to π. Then yt and et have similar low-frequency but different high-frequency components.

    Exercise 1.31. Suppose that et and let yt = Ayt−1 + et.

    Interesting transformations of yt can be obtained with the use of filters.

    Definition 1.22. A filter is a linear transformation of a stochastic process, i.e., if yt ()et, et () is a filter.

    A moving average (MA) process is therefore a filter since a white noise is linearly transformed into another process. In general, stochastic processes can be thought of as filtered versions of some white noise process. To study the spectral properties of filtered processes, let CGFe(z) be the covariance generating function of et. Then the covariance generating function of yt is CGFy(z(z(z−1) CGFe (z(z)|² CGFe(z(z(z).

    Example 1.27. Suppose that et e(ω) = σ²/2π, ∀ω. Consider now the process yt = D()et, where D() = D0 + D1+ D2² +···. It is usual to interpret D() as the response function of yt to a unitary change in ety(ω) = |D(e−iωe(ω), where |D(e−iω)|² = D(e−iω) D(eiω) and D(e−iω) = ΣτDτe−iωτ measures how a unitary change in et affects yt at frequency ω.

    Example 1.28. Suppose that yt + at + D()et, where et . Since yt (ω) does not exist. Differencing the process we have yt yt−1 = a + D()(et et−1) so that yt yt−1 is stationary if et et−1 is a stationary and all the roots of D() are greater than one in absolute value. If these conditions are met, the spectrum of Δyt Δy(ω) = |D(e−iωΔe(ω).

    (e−iω(e−iω)|², the square modulus of the transfer function, measures the change in variance of et (e−iω, measures how much the lead–lag relationships in et (e−iω) = Ga(ω)e−iPh(ω), where Ga(ω) is the gain. Here Ga(ω(e−iω)| measures the change in the amplitude of cycles induced by the filter.

    Filtering is an operation frequently performed in everyday life (e.g., tuning a radio on a station filters out all other signals (waves)). Several types of filter are used in modern macroeconomics. Figure 1.3 presents three general types of filter: a low pass, a high pass, and a band pass. A low pass filter leaves the low frequencies of the spectrum unchanged but wipes out high frequencies. A high pass filter does exactly the opposite. A band pass filter can be thought of as a combination of a low pass and a high pass filter: it wipes out very high and very low frequencies and leaves unchanged frequencies in middle range.

    Figure 1.3. Filters: (a) low pass; (b) high pass; (c) band pass.

    Low pass, high pass, and band pass filters are nonrealizable, in the sense that, with samples of finite length, it is impossible to construct objects that look like those of ((()bp) have the following time representations.

    , for some ω(0, π).

    , ∀j > 0.

    , ∀j > 0, ω2 > ω1.

    When j is finite the box-like spectral shape of these filters can only be approximated with a bell-shaped function. This means that relative to the ideal, realizable filters generate a loss of power at the edges of the band (a phenomenon called leakage) and an increase in the importance of the frequencies in the middle of the band (a phenomenon called compression). Approximations to these ideal filters are discussed in chapter 3.

    Definition 1.23. The periodogram of a stationary yt ) is

    .

    Perhaps surprisingly, the periodogram is an inconsistent estimator of the spectrum (see, for example, Priestley 1981, p. 433). Intuitively, this occurs because it consistently captures the power of yt over a band of frequencies but not in each single one of them. To obtain consistent estimates it is necessary to smooth periodogram estimates with a filter. Such a smoothing filter is typically called a kernel.

    Definition 1.24. (ωT(ωT(ω) → 0 uniformly as T → ∞, for |ω.

    Kernels can be applied to both autocovariance and periodogram estimates. When applied to the periodogram, a kernel produces an estimate of the spectrum at frequency ω by using a weighted average of the values of the periodogram in a neighborhood of ω. Note that this neighborhood is shrinking as T T(ω) looks like a δ-function, i.e., it puts all its mass at one point.

    There are several types of kernel. Those used in this book are the following.

    (1) Box-car (truncated):

    (2) Bartlett:

    (3) Parzen:

    (4) Quadratic spectral:

    Here J(T) is a truncation point, typically chosen to be a function of the sample size TQS crosses zero (call it J*(T)) and this point plays the same role as J(T) in the other three kernels.

    The Bartlett kernel and the quadratic spectral kernel are the most popular ones. The Bartlett kernel has the shape of a tent with width 2J(T). To ensure consistency of the spectral estimates, it is standard to select J(T) so that J(T)/T → 0 as T → ∞. In figure 1.4 we have set J(T) = 20. The quadratic spectral kernel has the form of a wave with infinite loops, but after the first crossing, the side loops are small.

    Exercise 1.32. is consistent, where

    is a kernel, and i, i′ = 1, 2.

    Figure 1.4. (a) Bartlett and (b) quadratic spectral kernels.

    While for the most part of this book we will consider stationary processes, we will deal at times with processes which are only locally stationary (e.g., processes with time-varying coefficients). For these processes, the spectral density is not defined. However, it is possible to define a local spectral density and practically all the properties we have described also apply to this alternative construction. For details, see Priestley (1981, chapter 11).

    Exercise 1.33. Compute the spectral density of consumption, investment, output, hours, real wage, consumer prices, M1, and the nominal interest rate by using quarterly U.S. data and compute their pairwise coherence with output. Are there any interesting features at business cycle frequencies you would like to emphasize? Repeat the exercise using euro area data. Are there important differences with the United States? (Hint: be careful with potential nonstationarities in the data.)

    ¹ A function of h .

    ² A stochastic process could also be defined as a sequence of random variables which are jointly measurable (see, for example, Davidson 1994, p. 177).

    ³ A Borel algebra is the smallest collection

    Enjoying the preview?
    Page 1 of 1