Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
STATA TIME-SERIES REFERENCE MANUAL RELEASE 14 ® A Stata Press Publication StataCorp LP College Station, Texas ® Copyright c 1985–2015 StataCorp LP All rights reserved Version 14 Published by Stata Press, 4905 Lakeway Drive, College Station, Texas 77845 Typeset in TEX ISBN-10: 1-59718-169-2 ISBN-13: 978-1-59718-169-3 This manual is protected by copyright. All rights are reserved. No part of this manual may be reproduced, stored in a retrieval system, or transcribed, in any form or by any means—electronic, mechanical, photocopy, recording, or otherwise—without the prior written permission of StataCorp LP unless permitted subject to the terms and conditions of a license granted to you by StataCorp LP to use the software and documentation. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. StataCorp provides this manual “as is” without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. StataCorp may make improvements and/or changes in the product(s) and the program(s) described in this manual at any time and without notice. The software described in this manual is furnished under a license agreement or nondisclosure agreement. The software may be copied only in accordance with the terms of the agreement. It is against the law to copy the software onto DVD, CD, disk, diskette, tape, or any other medium for any purpose other than backup or archival purposes. The automobile dataset appearing on the accompanying media is Copyright c 1979 by Consumers Union of U.S., Inc., Yonkers, NY 10703-1057 and is reproduced by permission from CONSUMER REPORTS, April 1979. Stata, , Stata Press, Mata, , and NetCourse are registered trademarks of StataCorp LP. Stata and Stata Press are registered trademarks with the World Intellectual Property Organization of the United Nations. NetCourseNow is a trademark of StataCorp LP. Other brand and product names are registered trademarks or trademarks of their respective companies. For copyright information about the software, type help copyright within Stata. The suggested citation for this software is StataCorp. 2015. Stata: Release 14 . Statistical Software. College Station, TX: StataCorp LP. Contents intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to time-series manual time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to time-series commands 1 2 arch . . . . . . . . . Autoregressive conditional heteroskedasticity (ARCH) family of estimators 12 arch postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for arch 46 arfima . . . . . . . . . . . . . . . . . . . Autoregressive fractionally integrated moving-average models 52 arfima postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for arfima 71 arima . . . . . . . . . . . . . . . . . . . . . . . ARIMA, ARMAX, and other dynamic regression models 79 arima postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for arima 103 corrgram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tabulate and graph autocorrelations 112 cumsp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative spectral distribution 120 dfactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic-factor models dfactor postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for dfactor dfgls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DF-GLS unit-root test dfuller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Augmented Dickey–Fuller unit-root test 124 142 148 155 estat estat estat estat acplot . . . . . . . . . . . . . . . . Plot parametric autocorrelation and autocovariance functions aroots . . . . . . . . . . . . . . . . . . . . . . . . Check the stability condition of ARIMA estimates sbknown . . . . . . . . . . . . . . . . . . . . Test for a structural break with a known break date sbsingle . . . . . . . . . . . . . . . . . . Test for a structural break with an unknown break date 161 166 172 177 fcast compute . . . . . . . . . . . . . . . . . . . . . . Compute dynamic forecasts after var, svar, or vec fcast graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph forecasts after fcast compute forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Econometric model forecasting forecast adjust . . . . . . . . . . . . . . . . . . . . . . Adjust a variable by add factoring, replacing, etc. forecast clear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clear current model from memory forecast coefvector . . . . . . . . . . . . . . . . . . . . . . . Specify an equation via a coefficient vector forecast create . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a new forecast model forecast describe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Describe features of the forecast model forecast drop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Drop forecast variables forecast estimates . . . . . . . . . . . . . . . . . . . . . . . . . Add estimation results to a forecast model forecast exogenous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declare exogenous variables forecast identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add an identity to a forecast model forecast list . . . . . . . . . . . . . . . . . . . . . . . . List forecast commands composing current model forecast query . . . . . . . . . . . . . . . . . . . . . . Check whether a forecast model has been started forecast solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Obtain static and dynamic forecasts 186 195 199 213 217 218 223 225 231 233 244 246 248 250 251 irf irf irf irf irf irf irf irf irf irf 267 272 274 279 305 311 315 318 325 331 . . . . . . . . . . . . . . . . . Create and analyze IRFs, dynamic-multiplier functions, and FEVDs add . . . . . . . . . . . . . . . . . . . . . . . . . . . Add results from an IRF file to the active IRF file cgraph . . . . . . . . . . Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs create . . . . . . . . . . . . . . . . . . . . . Obtain IRFs, dynamic-multiplier functions, and FEVDs ctable . . . . . . . . . . . Combined tables of IRFs, dynamic-multiplier functions, and FEVDs describe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Describe an IRF file drop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Drop IRF results from the active IRF file graph . . . . . . . . . . . . . . . . . . . . Graphs of IRFs, dynamic-multiplier functions, and FEVDs ograph . . . . . . . . . . . Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs rename . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rename an IRF result in an IRF file i ii Contents irf set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set the active IRF file 333 irf table . . . . . . . . . . . . . . . . . . . . Tables of IRFs, dynamic-multiplier functions, and FEVDs 336 mgarch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate GARCH models mgarch ccc . . . . . . . . . . . . . . Constant conditional correlation multivariate GARCH models mgarch ccc postestimation . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mgarch ccc mgarch dcc . . . . . . . . . . . . . . Dynamic conditional correlation multivariate GARCH models mgarch dcc postestimation . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mgarch dcc mgarch dvech . . . . . . . . . . . . . . . . . . . . . . . . . . . Diagonal vech multivariate GARCH models mgarch dvech postestimation . . . . . . . . . . . . . . . . . . . Postestimation tools for mgarch dvech mgarch vcc . . . . . . . . . . . . . . . Varying conditional correlation multivariate GARCH models mgarch vcc postestimation . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mgarch vcc mswitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Markov-switching regression models mswitch postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mswitch 342 348 363 368 383 388 401 407 422 427 453 newey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression with Newey–West standard errors 463 newey postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for newey 468 pergram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Periodogram pperron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Phillips–Perron unit-root test prais . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prais – Winsten and Cochrane – Orcutt regression prais postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for prais psdensity . . . . . . . . . . . . Parametric spectral density estimation after arima, arfima, and ucm 474 482 487 499 502 rolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rolling-window and recursive estimation 513 sspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State-space models 521 sspace postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for sspace 545 tsappend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add observations to a time-series dataset tsfill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fill in gaps in time variable tsfilter . . . . . . . . . . . . . . . . . . . . . . . . . Filter a time-series, keeping only selected periodicities tsfilter bk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Baxter–King time-series filter tsfilter bw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Butterworth time-series filter tsfilter cf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christiano–Fitzgerald time-series filter tsfilter hp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hodrick–Prescott time-series filter tsline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot time-series data tsreport . . . . . . . . . . . . . . . . . . . Report time-series aspects of a dataset or estimation sample tsrevar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time-series operator programming command tsset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declare data to be time-series data tssmooth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smooth and forecast univariate time-series data tssmooth dexponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Double-exponential smoothing tssmooth exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-exponential smoothing tssmooth hwinters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holt–Winters nonseasonal smoothing tssmooth ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moving-average filter tssmooth nl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonlinear filter tssmooth shwinters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holt–Winters seasonal smoothing 553 560 565 584 593 602 611 618 624 631 634 651 653 660 669 677 683 686 ucm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unobserved-components model 696 ucm postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for ucm 725 var var var var intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to vector autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vector autoregressive models postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for var svar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural vector autoregressive models 732 739 752 756 Contents var svar postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for svar varbasic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit a simple VAR and graph IRFs or FEVDs varbasic postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for varbasic vargranger . . . . . . . . . . . . . . . . . . . Perform pairwise Granger causality tests after var or svar varlmar . . . . . . . . . . . . . . . . . . Perform LM test for residual autocorrelation after var or svar varnorm . . . . . . . . . . . . . . . . . . . . Test for normally distributed disturbances after var or svar varsoc . . . . . . . . . . . . . . . . . . . . . . Obtain lag-order selection statistics for VARs and VECMs varstable . . . . . . . . . . . . . . . . . . . . . Check the stability condition of VAR or SVAR estimates varwle . . . . . . . . . . . . . . . . . . . . . . . . . . Obtain Wald lag-exclusion statistics after var or svar vec intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to vector error-correction models vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vector error-correction models vec postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for vec veclmar . . . . . . . . . . . . . . . . . . . . . . . Perform LM test for residual autocorrelation after vec vecnorm . . . . . . . . . . . . . . . . . . . . . . . . . . Test for normally distributed disturbances after vec vecrank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimate the cointegrating rank of a VECM vecstable . . . . . . . . . . . . . . . . . . . . . . . . . . Check the stability condition of VECM estimates iii 775 778 783 787 793 797 803 809 815 821 840 865 869 873 877 886 wntestb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bartlett’s periodogram-based test for white noise 891 wntestq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Portmanteau (Q) test for white noise 896 xcorr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-correlogram for bivariate time series 899 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904 Subject and author index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913 Cross-referencing the documentation When reading this manual, you will find references to other Stata manuals. For example, [U] 26 Overview of Stata estimation commands [R] regress [D] reshape The first example is a reference to chapter 26, Overview of Stata estimation commands, in the User’s Guide; the second is a reference to the regress entry in the Base Reference Manual; and the third is a reference to the reshape entry in the Data Management Reference Manual. All the manuals in the Stata Documentation have a shorthand notation: [GSM] [GSU] [GSW] [U] [R] [BAYES] [D] [FN] [G] [IRT] [XT] [ME] [MI] [MV] [PSS] [P] [SEM] [SVY] [ST] [TS] [TE] [I] Getting Started with Stata for Mac Getting Started with Stata for Unix Getting Started with Stata for Windows Stata User’s Guide Stata Base Reference Manual Stata Bayesian Analysis Reference Manual Stata Data Management Reference Manual Stata Functions Reference Manual Stata Graphics Reference Manual Stata Item Response Theory Reference Manual Stata Longitudinal-Data/Panel-Data Reference Manual Stata Multilevel Mixed-Effects Reference Manual Stata Multiple-Imputation Reference Manual Stata Multivariate Statistics Reference Manual Stata Power and Sample-Size Reference Manual Stata Programming Reference Manual Stata Structural Equation Modeling Reference Manual Stata Survey Data Reference Manual Stata Survival Analysis Reference Manual Stata Time-Series Reference Manual Stata Treatment-Effects Reference Manual: Potential Outcomes/Counterfactual Outcomes Stata Glossary and Index [M] Mata Reference Manual v Title intro — Introduction to time-series manual Description Also see Description This manual documents Stata’s time-series commands and is referred to as [TS] in cross-references. After this entry, [TS] time series provides an overview of the ts commands. The other parts of this manual are arranged alphabetically. If you are new to Stata’s time-series features, we recommend that you read the following sections first: [TS] time series [TS] tsset Introduction to time-series commands Declare a dataset to be time-series data Stata is continually being updated, and Stata users are always writing new commands. To ensure that you have the latest features, you should install the most recent official update; see [R] update. Also see [U] 1.3 What’s new [R] intro — Introduction to base reference manual 1 Title time series — Introduction to time-series commands Description Remarks and examples References Also see Description The Time-Series Reference Manual organizes the commands alphabetically, making it easy to find individual command entries if you know the name of the command. This overview organizes and presents the commands conceptually, that is, according to the similarities in the functions that they perform. The table below lists the manual entries that you should see for additional information. Data management tools and time-series operators. These commands help you prepare your data for further analysis. Univariate time series. These commands are grouped together because they are either estimators or filters designed for univariate time series or preestimation or postestimation commands that are conceptually related to one or more univariate time-series estimators. Multivariate time series. These commands are similarly grouped together because they are either estimators designed for use with multivariate time series or preestimation or postestimation commands conceptually related to one or more multivariate time-series estimators. Forecasting models. These commands work as a group to provide the tools you need to create models by combining estimation results, identities, and other objects and to solve those models to obtain forecasts. Within these three broad categories, similar commands have been grouped together. Data management tools and time-series operators [TS] tsset Declare data to be time-series data [TS] tsfill Fill in gaps in time variable [TS] tsappend Add observations to a time-series dataset [TS] tsreport Report time-series aspects of a dataset or estimation sample [TS] tsrevar Time-series operator programming command [TS] rolling Rolling-window and recursive estimation [D] datetime business calendars User-definable business calendars 2 time series — Introduction to time-series commands Univariate time series Estimators [TS] arfima [TS] [TS] [TS] [TS] arfima postestimation arima arima postestimation arch [TS] [TS] [TS] [TS] [TS] [TS] [TS] [TS] [TS] arch postestimation mswitch mswitch postestimation newey newey postestimation prais prais postestimation ucm ucm postestimation Time-series smoothers and filters [TS] tsfilter bk [TS] tsfilter bw [TS] tsfilter cf [TS] tsfilter hp [TS] tssmooth ma [TS] tssmooth dexponential [TS] tssmooth exponential [TS] tssmooth hwinters [TS] tssmooth shwinters [TS] tssmooth nl Autoregressive fractionally integrated moving-average models Postestimation tools for arfima ARIMA, ARMAX, and other dynamic regression models Postestimation tools for arima Autoregressive conditional heteroskedasticity (ARCH) family of estimators Postestimation tools for arch Markov-switching regression models Postestimation tools for mswitch Regression with Newey–West standard errors Postestimation tools for newey Prais–Winsten and Cochrane–Orcutt regression Postestimation tools for prais Unobserved-components model Postestimation tools for ucm Baxter–King time-series filter Butterworth time-series filter Christiano–Fitzgerald time-series filter Hodrick–Prescott time-series filter Moving-average filter Double-exponential smoothing Single-exponential smoothing Holt–Winters nonseasonal smoothing Holt–Winters seasonal smoothing Nonlinear filter 3 4 time series — Introduction to time-series commands Diagnostic tools [TS] corrgram Tabulate and graph autocorrelations [TS] xcorr Cross-correlogram for bivariate time series [TS] cumsp Cumulative spectral distribution [TS] pergram Periodogram [TS] psdensity Parametric spectral density estimation [TS] estat acplot Plot parametric autocorrelation and autocovariance functions [TS] estat aroots Check the stability condition of ARIMA estimates [TS] dfgls DF-GLS unit-root test [TS] dfuller Augmented Dickey–Fuller unit-root test [TS] pperron Phillips–Perron unit-root test [TS] estat sbknown Test for a structural break with a known break date [TS] estat sbsingle Test for a structural break with an unknown break date [R] regress postestimation time series Postestimation tools for regress with time series [TS] mswitch postestimation Postestimation tools for mswitch [TS] wntestb Bartlett’s periodogram-based test for white noise [TS] wntestq Portmanteau (Q) test for white noise Multivariate time series Estimators [TS] dfactor [TS] dfactor postestimation [TS] mgarch ccc [TS] mgarch ccc postestimation [TS] mgarch dcc [TS] mgarch dcc postestimation [TS] mgarch dvech [TS] mgarch dvech postestimation [TS] mgarch vcc [TS] mgarch vcc postestimation [TS] sspace [TS] sspace postestimation [TS] var [TS] var postestimation [TS] var svar [TS] var svar postestimation [TS] varbasic [TS] varbasic postestimation [TS] vec [TS] vec postestimation Dynamic-factor models Postestimation tools for dfactor Constant conditional correlation multivariate GARCH models Postestimation tools for mgarch ccc Dynamic conditional correlation multivariate GARCH models Postestimation tools for mgarch dcc Diagonal vech multivariate GARCH models Postestimation tools for mgarch dvech Varying conditional correlation multivariate GARCH models Postestimation tools for mgarch vcc State-space models Postestimation tools for sspace Vector autoregressive models Postestimation tools for var Structural vector autoregressive models Postestimation tools for svar Fit a simple VAR and graph IRFs or FEVDs Postestimation tools for varbasic Vector error-correction models Postestimation tools for vec time series — Introduction to time-series commands Diagnostic tools [TS] varlmar [TS] varnorm [TS] varsoc [TS] varstable [TS] varwle [TS] veclmar [TS] vecnorm [TS] vecrank [TS] vecstable 5 Perform LM test for residual autocorrelation Test for normally distributed disturbances Obtain lag-order selection statistics for VARs and VECMs Check the stability condition of VAR or SVAR estimates Obtain Wald lag-exclusion statistics Perform LM test for residual autocorrelation Test for normally distributed disturbances Estimate the cointegrating rank of a VECM Check the stability condition of VECM estimates Forecasting, inference, and interpretation [TS] irf create Obtain IRFs, dynamic-multiplier functions, and FEVDs [TS] fcast compute Compute dynamic forecasts after var, svar, or vec [TS] vargranger Perform pairwise Granger causality tests Graphs and tables [TS] corrgram [TS] xcorr [TS] pergram [TS] irf graph [TS] irf cgraph [TS] irf ograph [TS] irf table [TS] irf ctable [TS] fcast graph [TS] tsline [TS] varstable [TS] vecstable [TS] wntestb Tabulate and graph autocorrelations Cross-correlogram for bivariate time series Periodogram Graphs of IRFs, dynamic-multiplier functions, and FEVDs Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs Tables of IRFs, dynamic-multiplier functions, and FEVDs Combined tables of IRFs, dynamic-multiplier functions, and FEVDs Graph forecasts after fcast compute Plot time-series data Check the stability condition of VAR or SVAR estimates Check the stability condition of VECM estimates Bartlett’s periodogram-based test for white noise Results management tools [TS] irf add [TS] irf describe [TS] irf drop [TS] irf rename [TS] irf set Add results from an IRF file to the active IRF file Describe an IRF file Drop IRF results from the active IRF file Rename an IRF result in an IRF file Set the active IRF file 6 time series — Introduction to time-series commands Forecasting models [TS] forecast [TS] forecast adjust [TS] forecast clear [TS] forecast coefvector [TS] forecast create [TS] forecast describe [TS] forecast drop [TS] forecast estimates [TS] forecast exogenous [TS] forecast identity [TS] forecast list [TS] forecast query [TS] forecast solve Econometric model forecasting Adjust a variable by add factoring, replacing, etc. Clear current model from memory Specify an equation via a coefficient vector Create a new forecast model Describe features of the forecast model Drop forecast variables Add estimation results to a forecast model Declare exogenous variables Add an identity to a forecast model List forecast commands composing current model Check whether a forecast model has been started Obtain static and dynamic forecasts Remarks and examples Remarks are presented under the following headings: Data management tools and time-series operators Univariate time series Estimators Time-series smoothers and filters Diagnostic tools Multivariate time series Estimators Diagnostic tools Forecasting models time series — Introduction to time-series commands 7 We also offer a NetCourse on Stata’s time-series capabilities; see http://www.stata.com/netcourse/nc461.html. Data management tools and time-series operators Because time-series estimators are, by definition, a function of the temporal ordering of the observations in the estimation sample, Stata’s time-series commands require the data to be sorted and indexed by time, using the tsset command, before they can be used. tsset is simply a way for you to tell Stata which variable in your dataset represents time; tsset then sorts and indexes the data appropriately for use with the time-series commands. Once your dataset has been tsset, you can use Stata’s time-series operators in data manipulation or programming using that dataset and when specifying the syntax for most time-series commands. Stata has time-series operators for representing the lags, leads, differences, and seasonal differences of a variable. The time-series operators are documented in [TS] tsset. You can also define a business-day calendar so that Stata’s time-series operators respect the structure of missing observations in your data. The most common example is having Monday come after Friday in market data. [D] datetime business calendars provides a discussion and examples. tsset can also be used to declare that your dataset contains cross-sectional time-series data, often referred to as panel data. When you use tsset to declare your dataset to contain panel data, you specify a variable that identifies the panels and a variable that identifies the time periods. Once your dataset has been tsset as panel data, the time-series operators work appropriately for the data. tsfill, which is documented in [TS] tsfill, can be used after tsset to fill in missing times with missing observations. tsset will report any gaps in your data, and tsreport will provide more details about the gaps. tsappend adds observations to a time-series dataset by using the information set by tsset. This function can be particularly useful when you wish to predict out of sample after fitting a model with a time-series estimator. tsrevar is a programmer’s command that provides a way to use varlists that contain time-series operators with commands that do not otherwise support time-series operators. rolling performs rolling regressions, recursive regressions, and reverse recursive regressions. Any command that stores results in e() or r() can be used with rolling. Univariate time series Estimators The seven univariate time-series estimators currently available in Stata are arfima, arima, arch, mswitch, newey, prais, and ucm. newey and prais are really just extensions to ordinary linear regression. When you fit a linear regression on time-series data via ordinary least squares (OLS), if the disturbances are autocorrelated, the parameter estimates are usually consistent, but the estimated standard errors tend to be underestimated. Several estimators have been developed to deal with this problem. One strategy is to use OLS for estimating the regression parameters and use a different estimator for the variances, one that is consistent in the presence of autocorrelated disturbances, such as the Newey–West estimator implemented in newey. Another strategy is to model the dynamics of the disturbances. The estimators found in prais, arima, arch, arfima, and ucm are based on such a strategy. prais implements two such estimators: the Prais–Winsten and the Cochrane–Orcutt generalized least-squares (GLS) estimators. These estimators are GLS estimators, but they are fairly restrictive in that they permit only first-order autocorrelation in the disturbances. Although they have certain 8 time series — Introduction to time-series commands pedagogical and historical value, they are somewhat obsolete. Faster computers with more memory have made it possible to implement full information maximum likelihood (FIML) estimators, such as Stata’s arima command. These estimators permit much greater flexibility when modeling the disturbances and are more efficient estimators. arima provides the means to fit linear models with autoregressive moving-average (ARMA) disturbances, or in the absence of linear predictors, autoregressive integrated moving-average (ARIMA) models. This means that, whether you think that your data are best represented as a distributed-lag model, a transfer-function model, or a stochastic difference equation, or you simply wish to apply a Box–Jenkins filter to your data, the model can be fit using arima. arch, a conditional maximum likelihood estimator, has similar modeling capabilities for the mean of the time series but can also model autoregressive conditional heteroskedasticity in the disturbances with a wide variety of specifications for the variance equation. arfima estimates the parameters of autoregressive fractionally integrated moving-average (ARFIMA) models, which handle higher degrees of dependence than ARIMA models. ARFIMA models allow the autocorrelations to decay at the slower hyperbolic rate, whereas ARIMA models handle processes whose autocorrelations decay at an exponential rate. Markov-switching models are used for series that are believed to transition over a finite set of unobserved states, allowing the process to evolve differently in each state. The transitions occur according to a Markov process. mswitch estimates the state-dependent parameters of Markov-switching dynamic regression models and Markov-switching autoregression models. Unobserved-components models (UCMs) decompose a time series into trend, seasonal, cyclical, and idiosyncratic components and allow for exogenous variables. ucm estimates the parameters of UCMs by maximum likelihood. UCMs can also model the stationary cyclical component using the stochastic-cycle parameterization that has an intuitive frequency-domain interpretation. Time-series smoothers and filters In addition to the estimators mentioned above, Stata also provides time-series filters and smoothers. The Baxter–King and Christiano–Fitzgerald band-pass filters and the Butterworth and Hodrick–Prescott high-pass filters are implemented in tsfilter; see [TS] tsfilter for an overview. Also included are a simple, uniformly weighted, moving-average filter with unit weights; a weighted moving-average filter in which you can specify the weights; single- and double-exponential smoothers; Holt–Winters seasonal and nonseasonal smoothers; and a nonlinear smoother. Most of these smoothers were originally developed as ad hoc procedures and are used for reducing the noise in a time series (smoothing) or forecasting. Although they have limited application for signal extraction, these smoothers have all been found to be optimal for some underlying modern time-series models; see [TS] tssmooth. Diagnostic tools Stata’s time-series commands also include several preestimation and postestimation diagnostic and interpretation commands. corrgram estimates the autocorrelation function and partial autocorrelation function of a univariate time series, as well as Q statistics. These functions and statistics are often used to determine the appropriate model specification before fitting ARIMA models. corrgram can also be used with wntestb and wntestq to examine the residuals after fitting a model for evidence of model misspecification. Stata’s time-series commands also include the commands pergram and cumsp, which provide the log-standardized periodogram and the cumulative-sample spectral distribution, respectively, for time-series analysts who prefer to estimate in the frequency domain rather than the time domain. time series — Introduction to time-series commands 9 psdensity computes the spectral density implied by the parameters estimated by arfima, arima, or ucm. The estimated spectral density shows the relative importance of components at different frequencies. estat acplot computes the autocorrelation and autocovariance functions implied by the parameters estimated by arima. These functions provide a measure of the dependence structure in the time domain. xcorr estimates the cross-correlogram for bivariate time series and can similarly be used for both preestimation and postestimation. For example, the cross-correlogram can be used before fitting a transfer-function model to produce initial estimates of the IRF. This estimate can then be used to determine the optimal lag length of the input series to include in the model specification. It can also be used as a postestimation tool after fitting a transfer function. The cross-correlogram between the residual from a transfer-function model and the prewhitened input series of the model can be examined for evidence of model misspecification. When you fit ARMA or ARIMA models, the dependent variable being modeled must be covariance stationary (ARMA models), or the order of integration must be known (ARIMA models). Stata has three commands that can test for the presence of a unit root in a time-series variable: dfuller performs the augmented Dickey–Fuller test, pperron performs the Phillips–Perron test, and dfgls performs a modified Dickey–Fuller test. arfima can also be used to investigate the order of integration. After estimation, you can use estat aroots to check the stationarity of an ARMA process. After using mswitch to fit a Markov-switching model, two postestimation commands help interpret the results. estat transition reports the transition probabilities and the corresponding standard errors in a tabular form. estat duration computes the expected duration of being in a given state and displays See [TS] mswitch postestimation. After fitting a model with regress or ivregress, estat sbknown and estat sbsingle test for structural breaks. estat sbknown tests for breaks at known break dates, and estat sbsingle tests for a break at an unknown break date; see [TS] estat sbknown and [TS] estat sbsingle. The remaining diagnostic tools for univariate time series are for use after fitting a linear model via OLS with Stata’s regress command. They are documented collectively in [R] regress postestimation time series. They include estat dwatson, estat durbinalt, estat bgodfrey, and estat archlm. estat dwatson computes the Durbin–Watson d statistic to test for the presence of firstorder autocorrelation in the OLS residuals. estat durbinalt likewise tests for the presence of autocorrelation in the residuals. By comparison, however, Durbin’s alternative test is more general and easier to use than the Durbin–Watson test. With estat durbinalt, you can test for higher orders of autocorrelation, the assumption that the covariates in the model are strictly exogenous is relaxed, and there is no need to consult tables to compute rejection regions, as you must with the Durbin–Watson test. estat bgodfrey computes the Breusch–Godfrey test for autocorrelation in the residuals, and although the computations are different, the test in estat bgodfrey is asymptotically equivalent to the test in estat durbinalt. Finally, estat archlm performs Engle’s LM test for the presence of autoregressive conditional heteroskedasticity. Multivariate time series Estimators Stata provides commands for fitting the most widely applied multivariate time-series models. var and svar fit vector autoregressive and structural vector autoregressive models to stationary data. vec fits cointegrating vector error-correction models. dfactor fits dynamic-factor models. mgarch ccc, mgarch dcc, mgarch dvech, and mgarch vcc fit multivariate GARCH models. sspace fits state-space models. Many linear time-series models, including vector autoregressive moving-average (VARMA) models and structural time-series models, can be cast as state-space models and fit by sspace. 10 time series — Introduction to time-series commands Diagnostic tools Before fitting a multivariate time-series model, you must specify the number of lags of the dependent variable to include. varsoc produces statistics for determining the order of a VAR or VECM. Several postestimation commands perform the most common specification analysis on a previously fitted VAR or SVAR. You can use varlmar to check for serial correlation in the residuals, varnorm to test the null hypothesis that the disturbances come from a multivariate normal distribution, and varstable to see if the fitted VAR or SVAR is stable. Two common types of inference about VAR models are whether one variable Granger-causes another and whether a set of lags can be excluded from the model. vargranger reports Wald tests of Granger causation, and varwle reports Wald lag exclusion tests. Similarly, several postestimation commands perform the most common specification analysis on a previously fitted VECM. You can use veclmar to check for serial correlation in the residuals, vecnorm to test the null hypothesis that the disturbances come from a multivariate normal distribution, and vecstable to analyze the stability of the previously fitted VECM. VARs and VECMs are often fit to produce baseline forecasts. fcast produces dynamic forecasts from previously fitted VARs and VECMs. Many researchers fit VARs, SVARs, and VECMs because they want to analyze how unexpected shocks affect the dynamic paths of the variables. Stata has a suite of irf commands for estimating IRF functions and interpreting, presenting, and managing these estimates; see [TS] irf. Forecasting models Stata provides a set of commands for obtaining forecasts by solving models, collections of equations that jointly determine the outcomes of one or more variables. You use Stata estimation commands such as regress, reg3, var, and vec to fit stochastic equations and store the results using estimates store. Then you create a forecast model using forecast create and use commands, including forecast estimates and forecast identity, to build models consisting of estimation results, nonstochastic relationships (identities), and other model features. Models can be as simple as a single linear regression for which you want to obtain dynamic forecasts, or they can be complicated systems consisting of dozens of estimation results and identities representing a complete macroeconometric model. The forecast solve command allows you to obtain both stochastic and dynamic forecasts. Confidence intervals for forecasts can be obtained via stochastic simulation incorporating both parameter uncertainty and additive random shocks. By using forecast adjust, you can incorporate outside information and specify different paths for some of the model’s variables to obtain forecasts under alternative scenarios. References Baum, C. F. 2005. Stata: The language of choice for time-series analysis? Stata Journal 5: 46–63. Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Box-Steffensmeier, J. M., J. R. Freeman, M. P. Hitt, and J. C. W. Pevehouse. 2014. Time Series Analysis for the Social Science. New York: Cambridge University Press. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Pickup, M. 2015. Introduction to Time Series Analysis. Thousand Oaks, CA: Sage. time series — Introduction to time-series commands 11 Pisati, M. 2001. sg162: Tools for spatial data analysis. Stata Technical Bulletin 60: 21–37. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 277–298. College Station, TX: Stata Press. Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101–115. Also see [U] 1.3 What’s new [R] intro — Introduction to base reference manual Title arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description arch fits regression models in which the volatility of a series varies through time. Usually, periods of high and low volatility are grouped together. ARCH models estimate future volatility as a function of prior volatility. To accomplish this, arch fits models of autoregressive conditional heteroskedasticity (ARCH) by using conditional maximum likelihood. In addition to ARCH terms, models may include multiplicative heteroskedasticity. Gaussian (normal), Student’s t, and generalized error distributions are supported. Concerning the regression equation itself, models may also contain ARCH-in-mean and ARMA terms. Quick start ARCH model of y with first- and second-order ARCH components and regressor x using tsset data arch y x, arch(1,2) Add a second-order GARCH component arch y x, arch(1,2) garch(2) Add an autoregressive component of order 2 and a moving-average component of order 3 arch y x, arch(1,2) garch(2) ar(2) ma(3) As above, but with the conditional variance included in the mean equation arch y x, arch(1,2) garch(2) ar(2) ma(3) archm EGARCH model of order 2 for y with an autoregressive component of order 1 arch y, earch(2) egarch(2) ar(1) 12 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators Menu ARCH/GARCH Statistics > Time series > ARCH/GARCH > ARCH and GARCH models > ARCH/GARCH > Nelson’s EGARCH model > Threshold ARCH model > GJR form of threshold ARCH model EARCH/EGARCH Statistics > Time series ABARCH/ATARCH/SDGARCH Statistics > Time series > ARCH/GARCH ARCH/TARCH/GARCH Statistics > Time series > ARCH/GARCH ARCH/SAARCH/GARCH Statistics > Time series > ARCH/GARCH > Simple asymmetric ARCH model > ARCH/GARCH > Power ARCH model > ARCH/GARCH > Nonlinear ARCH model > ARCH/GARCH > Nonlinear ARCH model with one shift > ARCH/GARCH > Asymmetric power ARCH model > ARCH/GARCH > Nonlinear power ARCH model PARCH/PGARCH Statistics > Time series NARCH/GARCH Statistics > Time series NARCHK/GARCH Statistics > Time series APARCH/PGARCH Statistics > Time series NPARCH/PGARCH Statistics > Time series 13 14 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators Syntax arch depvar  indepvars options   if   in   weight   , options  Description Model noconstant arch(numlist) garch(numlist) saarch(numlist) tarch(numlist) aarch(numlist) narch(numlist) narchk(numlist) abarch(numlist) atarch(numlist) sdgarch(numlist) earch(numlist) egarch(numlist) parch(numlist) tparch(numlist) aparch(numlist) nparch(numlist) nparchk(numlist) pgarch(numlist) constraints(constraints) collinear suppress constant term ARCH terms GARCH terms simple asymmetric ARCH terms threshold ARCH terms asymmetric ARCH terms nonlinear ARCH terms nonlinear ARCH terms with single shift absolute value ARCH terms absolute threshold ARCH terms lags of σt news terms in Nelson’s (1991) EGARCH model lags of ln(σt2 ) power ARCH terms threshold power ARCH terms asymmetric power ARCH terms nonlinear power ARCH terms nonlinear power ARCH terms with single shift power GARCH terms apply specified linear constraints keep collinear variables Model 2 archm archmlags(numlist) archmexp(exp) arima(# p ,# d ,# q ) ar(numlist) ma(numlist) include ARCH-in-mean term in the mean-equation specification include specified lags of conditional variance in mean equation apply transformation in exp to any ARCH-in-mean terms specify ARIMA(p, d, q) model for dependent variable autoregressive terms of the structural model disturbance moving-average terms of the structural model disturbances Model 3   distribution(dist # ) het(varlist) savespace use dist distribution for errors (may be gaussian, normal, t, or ged; default is gaussian) include varlist in the specification of the conditional variance conserve memory during estimation arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 15 Priming arch0(xb) arch0(xb0) arch0(xbwt) arch0(xb0wt) arch0(zero) arch0(#) arma0(zero) arma0(p) arma0(q) arma0(pq) arma0(#) condobs(#) compute priming values on the basis of the expected unconditional variance; the default compute priming values on the basis of the estimated variance of the residuals from OLS compute priming values on the basis of the weighted sum of squares from OLS residuals compute priming values on the basis of the weighted sum of squares from OLS residuals, with more weight at earlier times set priming values of ARCH terms to zero set priming values of ARCH terms to # set all priming values of ARMA terms to zero; the default begin estimation after observation p, where p is the maximum AR lag in model begin estimation after observation q , where q is the maximum MA lag in model begin estimation after observation (p + q ) set priming values of ARMA terms to # set conditioning observations at the start of the sample to # SE/Robust vce(vcetype) vcetype may be opg, robust, or oim Reporting level(#) detail nocnsreport display options set confidence level; default is level(95) report list of gaps in time series do not display constraints control columns and column formats, row spacing, and line width Maximization maximize options control the maximization process; seldom used coeflegend display legend instead of statistics You must tsset your data before using arch; see [TS] tsset. depvar and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands. iweights are allowed; see [U] 11.1.6 weight. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. To fit an ARCH(# m ) model with Gaussian errors, type . arch depvar . . . , arch(1/#m ) To fit a GARCH(# m , # k ) model assuming that the errors follow Student’s t distribution with 7 degrees of freedom, type . arch depvar . . . , arch(1/#m ) garch(1/#k ) distribution(t 7) You can also fit many other models. 16 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators Details of syntax The basic model arch fits is yt = xt β + t Var(t ) = σt2 = γ0 + A(σ, ) + B(σ, )2 (1) The yt equation may optionally include ARCH-in-mean and ARMA terms: yt = xt β + X 2 ψi g(σt−i ) + ARMA(p, q) + t i If no options are specified, A() = B() = 0, and the model collapses to linear regression. The following options add to A() (α, γ , and κ represent parameters to be estimated): Terms added to A() Option arch() A() = A()+ α1,1 2t−1 + α1,2 2t−2 + · · · garch() 2 2 A() = A()+ α2,1 σt−1 + α2,2 σt−2 + ··· saarch() A() = A()+ α3,1 t−1 + α3,2 t−2 + · · · tarch() A() = A()+ α4,1 2t−1 (t−1 > 0) + α4,2 2t−2 (t−2 > 0) + · · · aarch() A() = A()+ α5,1 (|t−1 | + γ5,1 t−1 )2 + α5,2 (|t−2 | + γ5,2 t−2 )2 + · · · narch() A() = A()+ α6,1 (t−1 − κ6,1 )2 + α6,2 (t−2 − κ6,2 )2 + · · · narchk() A() = A()+ α7,1 (t−1 − κ7 )2 + α7,2 (t−2 − κ7 )2 + · · · The following options add to B(): Terms added to B() Option abarch() B() = B()+ α8,1 |t−1 | + α8,2 |t−2 | + · · · atarch() B() = B()+ α9,1 |t−1 |(t−1 > 0) + α9,2 |t−2 |(t−2 > 0) + · · · sdgarch() B() = B()+ α10,1 σt−1 + α10,2 σt−2 + · · · Each option requires a numlist argument (see [U] 11.1.8 numlist), which determines the lagged terms included. arch(1) specifies α1,1 2t−1 , arch(2) specifies α1,2 2t−2 , arch(1,2) specifies α1,1 2t−1 + α1,2 2t−2 , arch(1/3) specifies α1,1 2t−1 + α1,2 2t−2 + α1,3 2t−3 , etc. If the earch() or egarch() option is specified, the basic model fit is yt = xt β + X 2 ψi g(σt−i ) + ARMA(p, q) + t i (2) lnVar(t ) = lnσt2 = γ0 + C( lnσ, z) + A(σ, ) + B(σ, )2 where zt = t /σt . A() and B() are given as above, but A() and B() now add to lnσt2 rather than σt2 . (The options corresponding to A() and B() are rarely specified here.) C() is given by arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 17 Terms added to C() Option earch() p C() = C() +α11,1 zt−1 + γ11,1 (|zt−1 | − p2/π) +α11,2 zt−2 + γ11,2 (|zt−2 | − 2/π) + · · · egarch() 2 2 C() = C() +α12,1 lnσt−1 + α12,2 lnσt−2 + ··· Instead, if the parch(), tparch(), aparch(), nparch(), nparchk(), or pgarch() options are specified, the basic model fit is X 2 yt = xt β + ψi g(σt−i ) + ARMA(p, q) + t i (3) ϕ ϕ/2 2 {Var(t )} = σt = γ0 + D(σ, ) + A(σ, ) + B(σ, ) where ϕ is a parameter to be estimated. A() and B() are given as above, but A() and B() now add ϕ to σt . (The options corresponding to A() and B() are rarely specified here.) D() is given by Terms added to D() Option parch() ϕ D() = D()+ α13,1 ϕ t−1 + α13,2 t−2 + · · · tparch() ϕ D() = D()+ α14,1 ϕ t−1 (t−1 > 0) + α14,2 t−2 (t−2 > 0) + · · · aparch() D() = D()+ α15,1 (|t−1 | + γ15,1 t−1 )ϕ + α15,2 (|t−2 | + γ15,2 t−2 )ϕ + · · · nparch() D() = D()+ α16,1 |t−1 − κ16,1 |ϕ + α16,2 |t−2 − κ16,2 |ϕ + · · · nparchk() D() = D()+ α17,1 |t−1 − κ17 |ϕ + α17,2 |t−2 − κ17 |ϕ + · · · pgarch() ϕ ϕ D() = D()+ α18,1 σt−1 + α18,2 σt−2 + ··· Common models Common term Options to specify ARCH (Engle 1982) arch() GARCH (Bollerslev 1986) arch() garch() ARCH-in-mean (Engle, Lilien, and Robins 1987) archm arch() [garch()] GARCH with ARMA terms arch() garch() ar() ma() EGARCH (Nelson 1991) earch() egarch() TARCH, threshold ARCH (Zakoian 1994) abarch() atarch() sdgarch() GJR, form of threshold ARCH (Glosten, Jagannathan, and Runkle 1993) arch() tarch() [garch()] SAARCH, simple asymmetric ARCH (Engle 1990) arch() saarch() [garch()] PARCH, power ARCH (Higgins and Bera 1992) parch() [pgarch()] NARCH, nonlinear ARCH narch() [garch()] NARCHK, nonlinear ARCH with one shift narchk() [garch()] A-PARCH, asymmetric power ARCH (Ding, Granger, and Engle 1993) aparch() [pgarch()] NPARCH, nonlinear power ARCH nparch() [pgarch()] 18 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators In all cases, you type   arch depvar indepvars , options where options are chosen from the table above. Each option requires that you specify as its argument a numlist that specifies the lags to be included. For most ARCH models, that value will be 1. For instance, to fit the classic first-order GARCH model on cpi, you would type . arch cpi, arch(1) garch(1) If you wanted to fit a first-order GARCH model of cpi on wage, you would type . arch cpi wage, arch(1) garch(1) If, for any of the options, you want first- and second-order terms, specify optionname(1/2). Specifying garch(1) arch(1/2) would fit a GARCH model with first- and second-order ARCH terms. If you specified arch(2), only the lag 2 term would be included. arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 19 Reading arch output The regression table reported by arch when using the normal distribution for the errors will appear as op.depvar Coef. Std. Err. z P>|z| [95% Conf. Interval] depvar x1 x2 # ... L1. L2. # # ... ... _cons # ... sigma2 # ... ar L1. # ... ma L1. # ... z1 z2 # ... L1. L2. # # ... ... arch L1. # ... garch L1. # ... aparch L1. etc. # ... _cons # ... power # ... ARCHM ARMA HET ARCH POWER Dividing lines separate “equations”. The first one, two, or three equations report the mean model: X 2 yt = xt β + ψi g(σt−i ) + ARMA(p, q) + t i The first equation reports β, and the equation will be named [depvar]; if you fit a model on d.cpi, the first equation would be named [cpi]. In Stata, the coefficient on x1 in the above example could be referred to as [depvar] b[x1]. The coefficient on the lag 2 value of x2 would be referred to as [depvar] b[L2.x2]. Such notation would be used, for instance, in a later test command; see [R] test. 20 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators The [ARCHM] equation reports the ψ coefficients if your model includes ARCH-in-mean terms; see options discussed under the Model 2 tabPbelow. Most ARCH-in-mean models include only a 2 2 contemporaneous variance term, so the term i ψi g(σt−i ) becomes ψσt . The coefficient ψ will 2 be [ARCHM] b[sigma2]. If your model includes lags of σt , the additional coefficients will be [ARCHM] b[L1.sigma2], and so on. If you specify a transformation g() (option archmexp()), the coefficients will be [ARCHM] b[sigma2ex], [ARCHM] b[L1.sigma2ex], and so on. sigma2ex refers to g(σt2 ), the transformed value of the conditional variance. The [ARMA] equation reports the ARMA coefficients if your model includes them; see options discussed under the Model 2 tab below. This equation includes one or two “variables” named ar and ma. In later test statements, you could refer to the coefficient on the first lag of the autoregressive term by typing [ARMA] b[L1.ar] or simply [ARMA] b[L.ar] (the L operator is assumed to be lag 1 if you do not specify otherwise). The second lag on the moving-average term, if there were one, could be referred to by typing [ARMA] b[L2.ma]. The next one, two, or three equations report the variance model. The [HET] equation reports the multiplicative heteroskedasticity if the model includes it. When you fit such a model, you specify the variables (and their lags), determining the multiplicative heteroskedasticity; after estimation, their coefficients are simply [HET] b[op.varname]. The [ARCH] equation reports the ARCH, GARCH, etc., terms by referring to “variables” arch, garch, and so on. For instance, if you specified arch(1) garch(1) when you fit the model, the 2 . The coefficients would be named conditional variance is given by σt2 = γ0 + α1,1 2t−1 + α2,1 σt−1 [ARCH] b[ cons] (γ0 ), [ARCH] b[L.arch] (α1,1 ), and [ARCH] b[L.garch] (α2,1 ). The [POWER] equation appears only if you are fitting a variance model in the form of (3) above; the estimated ϕ is the coefficient [POWER] b[power]. Also, if you use the distribution() option and specify either Student’s t or the generalized error distribution but do not specify the degree-of-freedom or shape parameter, then you will see two additional rows in the table. The final row contains the estimated degree-of-freedom or shape parameter. Immediately preceding the final row is a transformed version of the parameter that arch used during estimation to ensure that the degree-of-freedom parameter is greater than two or that the shape parameter is positive. The naming convention for estimated ARCH, GARCH, etc., parameters is as follows (definitions for parameters αi , γi , and κi can be found in the tables for A(), B(), C(), and D() above): arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators Option 1st parameter = [ARCH] = [ARCH] = [ARCH] = [ARCH] = [ARCH] = [ARCH] = [ARCH] 2nd parameter Common parameter arch() garch() saarch() tarch() aarch() narch() narchk() α1 α2 α3 α4 α5 α6 α7 abarch() atarch() sdgarch() α8 = [ARCH] b[abarch] α9 = [ARCH] b[atarch] α10 = [ARCH] b[sdgarch] earch() egarch() α11 = [ARCH] b[earch] α12 = [ARCH] b[egarch] γ11 = [ARCH] b[earch a] parch() tparch() aparch() nparch() nparchk() pgarch() α13 α14 α15 α16 α17 α18 ϕ = [POWER] ϕ = [POWER] γ15 = [ARCH] b[aparch e] ϕ = [POWER] κ16 = [ARCH] b[nparch k] ϕ = [POWER] κ17 = [ARCH] b[nparch k] ϕ = [POWER] ϕ = [POWER] = [ARCH] = [ARCH] = [ARCH] = [ARCH] = [ARCH] = [ARCH] b[arch] b[garch] b[saarch] b[tarch] b[aarch] b[narch] b[narch] b[parch] b[tparch] b[aparch] b[nparch] b[nparch] b[pgarch] 21 γ5 = [ARCH] b[aarch e] κ6 = [ARCH] b[narch k] κ7 = [ARCH] b[narch k] b[power] b[power] b[power] b[power] b[power] b[power] Options   Model noconstant; see [R] estimation options. arch(numlist) specifies the ARCH terms (lags of 2t ). Specify arch(1) to include first-order terms, arch(1/2) to specify first- and second-order terms, arch(1/3) to specify first-, second-, and third-order terms, etc. Terms may be omitted. Specify arch(1/3 5) to specify terms with lags 1, 2, 3, and 5. All the options work this way. arch() may not be specified with aarch(), narch(), narchk(), nparchk(), or nparch(), as this would result in collinear terms. garch(numlist) specifies the GARCH terms (lags of σt2 ). saarch(numlist) specifies the simple asymmetric ARCH terms. Adding these terms is one way to make the standard ARCH and GARCH models respond asymmetrically to positive and negative innovations. Specifying saarch() with arch() and garch() corresponds to the SAARCH model of Engle (1990). saarch() may not be specified with narch(), narchk(), nparchk(), or nparch(), as this would result in collinear terms. tarch(numlist) specifies the threshold ARCH terms. Adding these is another way to make the standard ARCH and GARCH models respond asymmetrically to positive and negative innovations. Specifying tarch() with arch() and garch() corresponds to one form of the GJR model (Glosten, Jagannathan, and Runkle 1993). tarch() may not be specified with tparch() or aarch(), as this would result in collinear terms. aarch(numlist) specifies the lags of the two-parameter term αi (|t | + γi t )2 . This term provides the same underlying form of asymmetry as including arch() and tarch(), but it is expressed in a different way. aarch() may not be specified with arch() or tarch(), as this would result in collinear terms. 22 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators narch(numlist) specifies the lags of the two-parameter term αi (t − κi )2 . This term allows the minimum conditional variance to occur at a value of lagged innovations other than zero. For any term specified at lag L, the minimum contribution to conditional variance of that lag occurs when 2t−L = κL —the squared innovations at that lag are equal to the estimated constant κL . narch() may not be specified with arch(), saarch(), narchk(), nparchk(), or nparch(), as this would result in collinear terms. narchk(numlist) specifies the lags of the two-parameter term αi (t − κ)2 ; this is a variation of narch() with κ held constant for all lags. narchk() may not be specified with arch(), saarch(), narch(), nparchk(), or nparch(), as this would result in collinear terms. abarch(numlist) specifies lags of the term |t |. atarch(numlist) specifies lags of |t |(t > 0), where (t > 0) represents the indicator function returning 1 when true and 0 when false. Like the TARCH terms, these ATARCH terms allow the effect of unanticipated innovations to be asymmetric about zero. sdgarch(numlist) specifies lags of σt . Combining atarch(), abarch(), and sdgarch() produces the model by Zakoian (1994) that the author called the TARCH model. The acronym TARCH, however, refers to any model using thresholding to obtain asymmetry. p earch(numlist) specifies lags of the two-parameter term αzt +γ(|zt |− 2/π). These terms represent the influence of news—lagged innovations—in Nelson’s (1991) EGARCH model. For these terms, zt = t /σt , and arch assumes zt ∼ N (0, 1). Nelson derived the general form of an EGARCH model for any assumed distribution and performed estimation assuming a generalized error distribution (GED). See Hamilton (1994) for a derivation where zt is assumed normal. The zt terms can be parameterized in either of these two equivalent ways. arch uses Nelson’s original parameterization; see Hamilton (1994) for an equivalent alternative. egarch(numlist) specifies lags of ln(σt2 ). ϕ For the following options, the model is parameterized in terms of h(t )ϕ and σt . One ϕ is estimated, even when more than one option is specified. parch(numlist) specifies lags of |t |ϕ . parch() combined with pgarch() corresponds to the class of nonlinear models of conditional variance suggested by Higgins and Bera (1992). tparch(numlist) specifies lags of (t > 0)|t |ϕ , where (t > 0) represents the indicator function returning 1 when true and 0 when false. As with tarch(), tparch() specifies terms that allow for a differential impact of “good” (positive innovations) and “bad” (negative innovations) news for lags specified by numlist. tparch() may not be specified with tarch(), as this would result in collinear terms. aparch(numlist) specifies lags of the two-parameter term α(|t | + γt )ϕ . This asymmetric power ARCH model, A-PARCH, was proposed by Ding, Granger, and Engle (1993) and corresponds to a Box–Cox function in the lagged innovations. The authors fit the original A-PARCH model on more than 16,000 daily observations of the Standard and Poor’s 500, and for good reason. As the number of parameters and the flexibility of the specification increase, more data are required to estimate the parameters of the conditional heteroskedasticity. See Ding, Granger, and Engle (1993) for a discussion of how seven popular ARCH models nest within the A-PARCH model. When γ goes to 1, the full term goes to zero for many observations and can then be numerically unstable. arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 23 nparch(numlist) specifies lags of the two-parameter term α|t − κi |ϕ . nparch() may not be specified with arch(), saarch(), narch(), narchk(), or nparchk(), as this would result in collinear terms. nparchk(numlist) specifies lags of the two-parameter term α|t −κ|ϕ ; this is a variation of nparch() with κ held constant for all lags. This is the direct analog of narchk(), except for the power of ϕ. nparchk() corresponds to an extended form of the model of Higgins and Bera (1992) as presented by Bollerslev, Engle, and Nelson (1994). nparchk() would typically be combined with the pgarch() option. nparchk() may not be specified with arch(), saarch(), narch(), narchk(), or nparch(), as this would result in collinear terms. ϕ pgarch(numlist) specifies lags of σt . constraints(constraints), collinear; see [R] estimation options.   Model 2 archm specifies that an ARCH-in-mean term be included in the specification of the mean equation. This term allows the expected value of depvar to depend on the conditional variance. ARCH-in-mean is most commonly used in evaluating financial time series when a theory supports a tradeoff between asset risk and return. By default, no ARCH-in-mean terms are included in the model. archm specifies that the contemporaneous expected conditional variance be included in the mean equation. For example, typing . arch y x, archm arch(1) specifies the model yt = β0 + β1 xt + ψσt2 + t σt2 = γ0 + γ2t−1 archmlags(numlist) is an expansion of archm that includes lags of the conditional variance σt2 in the mean equation. To specify a contemporaneous and once-lagged variance, specify either archm archmlags(1) or archmlags(0/1). archmexp(exp) applies the transformation in exp to any ARCH-in-mean terms in the model. The expression should contain an X wherever a value of the conditional variance is to enter the expression. This option can be used to produce the commonly used ARCH-in-mean of the conditional standard deviation. With the example from archm, typing . arch y x, archm arch(1) archmexp(sqrt(X)) specifies the mean equation yt = β0 + β1 xt + ψσt + t . Alternatively, typing . arch y x, archm arch(1) archmexp(1/sqrt(X)) specifies yt = β0 + β1 xt + ψ/σt + t . arima(# p ,# d ,# q ) is an alternative, shorthand notation for specifying autoregressive models in the dependent variable. The dependent variable and any independent variables are differenced # d times, 1 through # p lags of autocorrelations are included, and 1 through # q lags of moving averages are included. For example, the specification . arch y, arima(2,1,3) is equivalent to . arch D.y, ar(1/2) ma(1/3) 24 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators The former is easier to write for classic ARIMA models of the mean equation, but it is not nearly as expressive as the latter. If gaps in the AR or MA lags are to be modeled, or if different operators are to be applied to independent variables, the latter syntax is required. ar(numlist) specifies the autoregressive terms of the structural model disturbance to be included in the model. For example, ar(1/3) specifies that lags 1, 2, and 3 of the structural disturbance be included in the model. ar(1,4) specifies that lags 1 and 4 be included, possibly to account for quarterly effects. If the model does not contain regressors, these terms can also be considered autoregressive terms for the dependent variable; see [TS] arima. ma(numlist) specifies the moving-average terms to be included in the model. These are the terms for the lagged innovations or white-noise disturbances.   Model 3   distribution(dist # ) specifies the distribution to assume for the error term. dist may be gaussian, normal, t, or ged. gaussian and normal are synonyms, and # cannot be specified with them. If distribution(t) is specified, arch assumes that the errors follow Student’s t distribution, and the degree-of-freedom parameter is estimated along with the other parameters of the model. If distribution(t #) is specified, then arch uses Student’s t distribution with # degrees of freedom. # must be greater than 2. If distribution(ged) is specified, arch assumes that the errors have a generalized error distribution, and the shape parameter is estimated along with the other parameters of the model. If distribution(ged #) is specified, then arch uses the generalized error distribution with shape parameter #. # must be positive. The generalized error distribution is identical to the normal distribution when the shape parameter equals 2. het(varlist) specifies that varlist be included in the specification of the conditional variance. varlist may contain time-series operators. This varlist enters the variance specification collectively as multiplicative heteroskedasticity; see Judge et al. (1985). If het() is not specified, the model will not contain multiplicative heteroskedasticity. Assume that the conditional variance depends on variables x and w and has an ARCH(1) component. We request this specification by using the het(x w) arch(1) options, and this corresponds to the conditional-variance model σt2 = exp(λ0 + λ1 xt + λ2 wt ) + α2t−1 Multiplicative heteroskedasticity enters differently with an EGARCH model because the variance is already specified in logs. For the het(x w) earch(1) egarch(1) options, the variance model is ln(σt2 ) = λ0 + λ1 xt + λ2 wt + αzt−1 + γ(|zt−1 | − p 2 2/π) + δ ln(σt−1 ) savespace conserves memory by retaining only those variables required for estimation. The original dataset is restored after estimation. This option is rarely used and should be specified only if there is insufficient memory to fit a model without the option. arch requires considerably more temporary storage during estimation than most estimation commands in Stata. arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators  25  Priming arch0(cond method) is a rarely used option that specifies how to compute the conditioning (presample or priming) values for σt2 and 2t . In the presample period, it is assumed that σt2 = 2t and that this value is constant. If arch0() is not specified, the priming values are computed as the expected unconditional variance given the current estimates of the β coefficients and any ARMA parameters. arch0(xb), the default, specifies that the priming values are the expected unconditional variance PT 2 of the model, which is t /T , where b t is computed from the mean equation and any 1 b ARMA terms. arch0(xb0) specifies that the priming values are the estimated variance of the residuals from an OLS estimate of the mean equation. arch0(xbwt) specifies that the priming values are the weighted sum of the b t2 from the current conditional mean equation (and ARMA terms) that places more weight on estimates of 2t at the beginning of the sample. arch0(xb0wt) specifies that the priming values are the weighted sum of the b t2 from an OLS estimate of the mean equation (and ARMA terms) that places more weight on estimates of 2t at the beginning of the sample. arch0(zero) specifies that the priming values are 0. Unlike the priming values for ARIMA models, 0 is generally not a consistent estimate of the presample conditional variance or squared innovations. arch0(#) specifies that σt2 = 2t = # for any specified nonnegative #. Thus arch0(0) is equivalent to arch0(zero). arma0(cond method) is a rarely used option that specifies how the t values are initialized at the beginning of the sample for the ARMA component, if the model has one. This option has an effect only when AR or MA terms are included in the model (the ar(), ma(), or arima() options specified). arma0(zero), the default, specifies that all priming values of t be taken as 0. This fits the model over the entire requested sample and takes t as its expected value of 0 for all lags required by the ARMA terms; see Judge et al. (1985). arma0(p), arma0(q), and arma0(pq) specify that estimation begin after priming the recursions for a certain number of observations. p specifies that estimation begin after the pth observation in the sample, where p is the maximum AR lag in the model; q specifies that estimation begin after the q th observation in the sample, where q is the maximum MA lag in the model; and pq specifies that estimation begin after the (p + q )th observation in the sample. During the priming period, the recursions necessary to generate predicted disturbances are performed, but results are used only to initialize preestimation values of t . To understand the definition of preestimation, say that you fit a model in 10/100. If the model is specified with ar(1,2), preestimation refers to observations 10 and 11. The ARCH terms σt2 and 2t are also updated over these observations. Any required lags of t before the priming period are taken to be their expected value of 0, and 2t and σt2 take the values specified in arch0(). arma0(#) specifies that the presample values of t are to be taken as # for all lags required by the ARMA terms. Thus arma0(0) is equivalent to arma0(zero). condobs(#) is a rarely used option that specifies a fixed number of conditioning observations at the start of the sample. Over these priming observations, the recursions necessary to generate predicted disturbances are performed, but only to initialize preestimation values of t , 2t , and σt2 . 26 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators Any required lags of t before the initialization period are taken to be their expected value of 0 (or the value specified in arma0()), and required values of 2t and σt2 assume the values specified by arch0(). condobs() can be used if conditioning observations are desired for the lags in the ARCH terms of the model. If arma() is also specified, the maximum number of conditioning observations required by arma() and condobs(#) is used.   SE/Robust vce(vcetype) specifies the type of standard error reported, which includes types that are robust to some kinds of misspecification (robust) and that are derived from asymptotic theory (oim, opg); see [R] vce option. For ARCH models, the robust or quasi–maximum likelihood estimates (QMLE) of variance are robust to symmetric nonnormality in the disturbances. The robust variance estimates generally are not robust to functional misspecification of the mean equation; see Bollerslev and Wooldridge (1992). The robust variance estimates computed by arch are based on the full Huber/White/sandwich formulation, as discussed in [P] robust. Many other software packages report robust estimates that set some terms to their expectations of zero (Bollerslev and Wooldridge 1992), which saves them from calculating second derivatives of the log-likelihood function.   Reporting level(#); see [R] estimation options. detail specifies that a detailed list of any gaps in the series be reported, including gaps due to missing observations or missing data for the dependent variable or independent variables. nocnsreport; see [R] estimation options. display options: noci, nopvalues, vsquish, cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), gtolerance(#), nrtolerance(#), nonrtolerance, and from(init specs); see [R] maximize for all options except gtolerance(), and see below for information on gtolerance(). These options are often more important for ARCH models than for other maximum likelihood models because of convergence problems associated with ARCH models— ARCH model likelihoods are notoriously difficult to maximize. Setting technique() to something other than the default or BHHH changes the vcetype to vce(oim). The following options are all related to maximization and are either particularly important in fitting ARCH models or not available for most other estimators. gtolerance(#) specifies the tolerance for the gradient relative to the coefficients. When |gi bi | ≤ gtolerance() for all parameters bi and the corresponding elements of the gradient gi , the gradient tolerance criterion is met. The default gradient tolerance for arch is gtolerance(.05). gtolerance(999) may be specified to disable the gradient criterion. If the optimizer becomes stuck with repeated “(backed up)” messages, the gradient probably still contains substantial values, but an uphill direction cannot be found for the likelihood. With this option, results can often be obtained, but whether the global maximum likelihood has been found is unclear. arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 27 When the maximization is not going well, it is also possible to set the maximum number of iterations (see [R] maximize) to the point where the optimizer appears to be stuck and to inspect the estimation results at that point. from(init specs) specifies the initial values of the coefficients. ARCH models may be sensitive to initial values and may have coefficient values that correspond to local maximums. The default starting values are obtained via a series of regressions, producing results that, on the basis of asymptotic theory, are consistent for the β and ARMA parameters and generally reasonable for the rest. Nevertheless, these values may not always be feasible in that the likelihood function cannot be evaluated at the initial values arch first chooses. In such cases, the estimation is restarted with ARCH and ARMA parameters initialized to zero. It is possible, but unlikely, that even these values will be infeasible and that you will have to supply initial values yourself. The standard syntax for from() accepts a matrix, a list of values, or coefficient name value pairs; see [R] maximize. arch also allows the following: from(archb0) sets the starting value for all the ARCH/GARCH/. . . parameters in the conditionalvariance equation to 0. from(armab0) sets the starting value for all ARMA parameters in the model to 0. from(archb0 armab0) sets the starting value for all ARCH/GARCH/. . . and ARMA parameters to 0. The following option is available with arch but is not shown in the dialog box: coeflegend; see [R] estimation options. Remarks and examples The volatility of a series is not constant through time; periods of relatively low volatility and periods of relatively high volatility tend to be grouped together. This is a commonly observed characteristic of economic time series and is even more pronounced in many frequently sampled financial series. ARCH models seek to estimate this time-dependent volatility as a function of observed prior volatility. Sometimes the model of volatility is of more interest than the model of the conditional mean. As implemented in arch, the volatility model may also include regressors to account for a structural component in the volatility—usually referred to as multiplicative heteroskedasticity. ARCH models were introduced by Engle (1982) in a study of inflation rates, and there has since been a barrage of proposed parametric and nonparametric specifications of autoregressive conditional heteroskedasticity. Overviews of the literature can found in Bollerslev, Engle, and Nelson (1994) and Bollerslev, Chou, and Kroner (1992). Introductions to basic ARCH models appear in many general econometrics texts, including Davidson and MacKinnon (1993), Greene (2012), Kmenta (1997), Stock and Watson (2011), and Wooldridge (2013). Harvey (1989) and Enders (2004) provide introductions to ARCH in the larger context of econometric time-series modeling, and Hamilton (1994) gives considerably more detail in the same context. Becketti (2013, chap. 8) provides a simple introduction to ARCH modeling with an emphasis on how to use Stata’s arch command. arch fits models of autoregressive conditional heteroskedasticity (ARCH, GARCH, etc.) using conditional maximum likelihood. By “conditional”, we mean that the likelihood is computed based on an assumed or estimated set of priming values for the squared innovations 2t and variances σt2 prior to the estimation sample; see Hamilton (1994) or Bollerslev (1986). Sometimes more conditioning is done on the first a, g , or a + g observations in the sample, where a is the maximum ARCH term lag and g is the maximum GARCH term lag (or the maximum lags from the other ARCH family terms). 28 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators The original ARCH model proposed by Engle (1982) modeled the variance of a regression model’s disturbances as a linear function of lagged values of the squared regression disturbances. We can write an ARCH(m) model as yt = xt β + t σt2 = γ0 + γ1 2t−1 + γ2 2t−2 + · · · + γm 2t−m where (conditional mean) (conditional variance) 2t is the squared residuals (or innovations) γi are the ARCH parameters The ARCH model has a specification for both the conditional mean and the conditional variance, and the variance is a function of the size of prior unanticipated innovations— 2t . This model was generalized by Bollerslev (1986) to include lagged values of the conditional variance—a GARCH model. The GARCH(m, k) model is written as yt = xt β + t 2 2 2 σt2 = γ0 + γ1 2t−1 + γ2 2t−2 + · · · + γm 2t−m + δ1 σt−1 + δ2 σt−2 + · · · + δk σt−k where γi are the ARCH parameters δi are the GARCH parameters In his pioneering work, Engle (1982) assumed that the error term, t , followed a Gaussian (normal) distribution: t ∼ N (0, σt2 ). However, as Mandelbrot (1963) and many others have noted, the distribution of stock returns appears to be leptokurtotic, meaning that extreme stock returns are more frequent than would be expected if the returns were normally distributed. Researchers have therefore assumed other distributions that can have fatter tails than the normal distribution; arch allows you to fit models assuming the errors follow Student’s t distribution or the generalized error distribution. The t distribution has fatter tails than the normal distribution; as the degree-of-freedom parameter approaches infinity, the t distribution converges to the normal distribution. The generalized error distribution’s tails are fatter than the normal distribution’s when the shape parameter is less than two and are thinner than the normal distribution’s when the shape parameter is greater than two. The GARCH model of conditional variance can be considered an ARMA process in the squared innovations, although not in the variances as the equations might seem to suggest; see Hamilton (1994). Specifically, the standard GARCH model implies that the squared innovations result from 2t = γ0 + (γ1 + δ1 )2t−1 + (γ2 + δ2 )2t−2 + · · · + (γk + δk )2t−k + wt − δ1 wt−1 − δ2 wt−2 − δ3 wt−3 where wt = 2t − σt2 wt is a white-noise process that is fundamental for 2t One of the primary benefits of the GARCH specification is its parsimony in identifying the conditional variance. As with ARIMA models, the ARMA specification in GARCH allows the conditional variance to be modeled with fewer parameters than with an ARCH specification alone. Empirically, many series with a conditionally heteroskedastic disturbance have been adequately modeled with a GARCH(1,1) specification. arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 29 An ARMA process in the disturbances can easily be added to the mean equation. For example, the mean equation can be written with an ARMA(1, 1) disturbance as yt = xt β + ρ(yt−1 − xt−1 β) + θt−1 + t with an obvious generalization to ARMA(p, q) by adding terms; see [TS] arima for more discussion of this specification. This change affects only the conditional-variance specification in that 2t now results from a different specification of the conditional mean. Much of the literature on ARCH models focuses on alternative specifications of the variance equation. arch allows many of these specifications to be requested using the saarch() through pgarch() options, which imply that one or more terms may be changed or added to the specification of the variance equation. These alternative specifications also address asymmetry. Both the ARCH and GARCH specifications imply a symmetric impact of innovations. Whether an innovation 2t is positive or negative makes no difference to the expected variance σt2 in the ensuing periods; only the size of the innovation matters—good news and bad news have the same effect. Many theories, however, suggest that positive and negative innovations should vary in their impact. For risk-averse investors, a large unanticipated drop in the market is more likely to lead to higher volatility than a large unanticipated increase (see Black [1976], Nelson [1991]). saarch(), tarch(), aarch(), abarch(), earch(), aparch(), and tparch() allow various specifications of asymmetric effects. narch(), narchk(), nparch(), and nparchk() imply an asymmetric impact of a specific form. All the models considered so far have a minimum conditional variance when the lagged innovations are all zero. “No news is good news” when it comes to keeping the conditional variance small. narch(), narchk(), nparch(), and nparchk() also have a symmetric response to innovations, but they are not centered at zero. The entire news-response function (response to innovations) is shifted horizontally so that minimum variance lies at some specific positive or negative value for prior innovations. ARCH-in-mean models allow the conditional variance of the series to influence the conditional mean. This is particularly convenient for modeling the risk–return relationship in financial series; the riskier an investment, with all else equal, the lower its expected return. ARCH-in-mean models modify the specification of the conditional mean equation to be yt = xt β + ψσt2 + t (ARCH-in-mean) Although this linear form in the current conditional variance has dominated the literature, arch allows the conditional variance to enter the mean equation through a nonlinear transformation g() and for this transformed term to be included contemporaneously or lagged. 2 2 yt = xt β + ψ0 g(σt2 ) + ψ1 g(σt−1 ) + ψ2 g(σt−2 ) + · · · + t Square root is the most commonly used g() transformation because researchers want to include a linear term for the conditional standard deviation, but any transform g() is allowed. Example 1: ARCH model Consider a simple model of the U.S. Wholesale Price Index (WPI) (Enders 2004, 87–93), which we also consider in [TS] arima. The data are quarterly over the period 1960q1 through 1990q4. In [TS] arima, we fit a model of the continuously compounded rate of change in the WPI, ln(WPIt ) − ln(WPIt−1 ). The graph of the differenced series—see [TS] arima —clearly shows periods of high volatility and other periods of relative tranquility. This makes the series a good candidate for ARCH modeling. Indeed, price indices have been a common target of ARCH models. Engle (1982) presented the original ARCH formulation in an analysis of U.K. inflation rates. 30 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators First, we fit a constant-only model by OLS and test ARCH effects by using Engle’s Lagrange multiplier test (estat archlm). . use http://www.stata-press.com/data/r14/wpi1 . regress D.ln_wpi Source SS df Model Residual 0 .02521709 0 122 . .000206697 Total .02521709 122 .000206697 D.ln_wpi Coef. _cons .0108215 MS Number of obs F(0, 122) Prob > F R-squared Adj R-squared Root MSE = = = = = = 123 0.00 . 0.0000 0.0000 .01438 Std. Err. t P>|t| [95% Conf. Interval] .0012963 8.35 0.000 .0082553 .0133878 . estat archlm, lags(1) LM test for autoregressive conditional heteroskedasticity (ARCH) lags(p) chi2 df Prob > chi2 1 8.366 1 0.0038 vs. H0: no ARCH effects H1: ARCH(p) disturbance Because the LM test shows a p-value of 0.0038, which is well below 0.05, we reject the null hypothesis of no ARCH(1) effects. Thus we can further estimate the ARCH(1) parameter by specifying arch(1). See [R] regress postestimation time series for more information on Engle’s LM test. The first-order generalized ARCH model (GARCH, Bollerslev 1986) is the most commonly used specification for the conditional variance in empirical work and is typically written GARCH(1, 1). We can estimate a GARCH(1, 1) process for the log-differenced series by typing . arch D.ln_wpi, arch(1) garch(1) (setting optimization to BHHH) Iteration 0: log likelihood = Iteration 1: log likelihood = (output omitted ) Iteration 10: log likelihood = 355.23458 365.64586 373.23397 ARCH family regression Sample: 1960q2 - 1990q4 Distribution: Gaussian Log likelihood = 373.234 D.ln_wpi Coef. Number of obs Wald chi2(.) Prob > chi2 = = = 123 . . OPG Std. Err. z P>|z| [95% Conf. Interval] ln_wpi _cons .0061167 .0010616 5.76 0.000 .0040361 .0081974 arch L1. .4364123 .2437428 1.79 0.073 -.0413147 .9141394 garch L1. .4544606 .1866606 2.43 0.015 .0886127 .8203086 _cons .0000269 .0000122 2.20 0.028 2.97e-06 .0000508 ARCH arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 31 We have estimated the ARCH(1) parameter to be 0.436 and the GARCH(1) parameter to be 0.454, so our fitted GARCH(1, 1) model is yt = 0.0061 + t 2 σt2 = 0.436 2t−1 + 0.454 σt−1 where yt = ln(wpit ) − ln(wpit−1 ). The model Wald test and probability are both reported as missing (.). By convention, Stata reports the model test for the mean equation. Here and fairly often for ARCH models, the mean equation consists only of a constant, and there is nothing to test. Example 2: ARCH model with ARMA process We can retain the GARCH(1, 1) specification for the conditional variance and model the mean as an ARMA process with AR(1) and MA(1) terms as well as a fourth-lag MA term to control for quarterly seasonal effects by typing 32 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators . arch D.ln_wpi, ar(1) ma(1 4) arch(1) garch(1) (setting optimization to BHHH) Iteration 0: log likelihood = 380.9997 Iteration 1: log likelihood = 388.57823 Iteration 2: log likelihood = 391.34143 Iteration 3: log likelihood = 396.36991 Iteration 4: log likelihood = 398.01098 (switching optimization to BFGS) Iteration 5: log likelihood = 398.23668 BFGS stepping has contracted, resetting BFGS Hessian (0) Iteration 6: log likelihood = 399.21497 Iteration 7: log likelihood = 399.21537 (backed up) (output omitted ) (switching optimization to BHHH) Iteration 15: log likelihood = 399.51441 Iteration 16: log likelihood = 399.51443 Iteration 17: log likelihood = 399.51443 ARCH family regression -- ARMA disturbances Sample: 1960q2 - 1990q4 Distribution: Gaussian Log likelihood = 399.5144 D.ln_wpi Coef. Number of obs Wald chi2(3) Prob > chi2 = = = 123 153.56 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] ln_wpi _cons .0069541 .0039517 1.76 0.078 -.000791 .0146992 ar L1. .7922674 .1072225 7.39 0.000 .5821153 1.00242 ma L1. L4. -.341774 .2451724 .1499943 .1251131 -2.28 1.96 0.023 0.050 -.6357574 -.0000447 -.0477905 .4903896 arch L1. .2040449 .1244991 1.64 0.101 -.0399688 .4480587 garch L1. .6949687 .1892176 3.67 0.000 .3241091 1.065828 _cons .0000119 .0000104 1.14 0.253 -8.52e-06 .0000324 ARMA ARCH To clarify exactly what we have estimated, we could write our model as yt = 0.007 + 0.792 (yt−1 − 0.007) − 0.342 t−1 + 0.245 t−4 + t 2 σt2 = 0.204 2t−1 + .695 σt−1 where yt = ln(wpit ) − ln(wpit−1 ). The ARCH(1) coefficient, 0.204, is not significantly different from zero, but the ARCH(1) and GARCH(1) coefficients are significant collectively. If you doubt this, you can check with test. arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 33 . test [ARCH]L1.arch [ARCH]L1.garch ( 1) ( 2) [ARCH]L.arch = 0 [ARCH]L.garch = 0 chi2( 2) = Prob > chi2 = 84.92 0.0000 (For comparison, we fit the model over the same sample used in example 1 of [TS] arima; Enders fits this GARCH model but over a slightly different sample.) Technical note The rather ugly iteration log on the previous result is typical, as difficulty in converging is common in ARCH models. This is actually a fairly well-behaved likelihood for an ARCH model. The “switching optimization to . . . ” messages are standard messages from the default optimization method for arch. The “backed up” messages are typical of BFGS stepping as the BFGS Hessian is often overoptimistic, particularly during early iterations. These messages are nothing to be concerned about. Nevertheless, watch out for the messages “BFGS stepping has contracted, resetting BFGS Hessian” and “backed up”, which can flag problems that may result in an iteration log that goes on and on. Stata will never report convergence and will never report final results. The question is, when do you give up and press Break, and if you do, what then? If the “BFGS stepping has contracted” message occurs repeatedly (more than, say, five times), it often indicates that convergence will never be achieved. Literally, it means that the BFGS algorithm was stuck and reset its Hessian and take a steepest-descent step. The “backed up” message, if it occurs repeatedly, also indicates problems, but only if the likelihood value is simultaneously not changing. If the message occurs repeatedly but the likelihood value is changing, as it did above, all is going well; it is just going slowly. If you have convergence problems, you can specify options to assist the current maximization method or try a different method. Or, your model specification and data may simply lead to a likelihood that is not concave in the allowable region and thus cannot be maximized. If you see the “backed up” message with no change in the likelihood, you can reset the gradient tolerance to a larger value. Specifying the gtolerance(999) option disables gradient checking, allowing convergence to be declared more easily. This does not guarantee that convergence will be declared, and even if it is, the global maximum likelihood may not have been found. You can also try to specify initial values. Finally, you can try a different maximization method; see options discussed under the Maximization tab above. ARCH models are notorious for having convergence difficulties. Unlike in most estimators in Stata, it is common for convergence to require many steps or even to fail. This is particularly true of the explicitly nonlinear terms such as aarch(), narch(), aparch(), or archm (ARCH-in-mean), and of any model with several lags in the ARCH terms. There is not always a solution. You can try other maximization methods or different starting values, but if your data do not support your assumed ARCH structure, convergence simply may not be possible. ARCH models can be susceptible to irrelevant regressors or unnecessary lags, whether in the specification of the conditional mean or in the conditional variance. In these situations, arch will often continue to iterate, making little to no improvement in the likelihood. We view this conservative approach as better than declaring convergence prematurely when the likelihood has not been fully 34 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators maximized. arch is estimating the conditional form of second sample moments, often with flexible functions, and that is asking much of the data. Technical note if exp and in range are interpreted differently with commands accepting time-series operators. The time-series operators are resolved before the conditions are tested, which may lead to some confusion. Note the results of the following list commands: . use http://www.stata-press.com/data/r14/archxmpl . list t y l.y in 5/10 L. y t y 5. 6. 7. 8. 9. 1961q1 1961q2 1961q3 1961q4 1962q1 30.8 30.5 30.5 30.6 30.7 30.7 30.8 30.5 30.5 30.6 10. 1962q2 30.6 30.7 . keep in 5/10 (118 observations deleted) . list t y l.y L. y t y 1. 2. 3. 4. 5. 1961q1 1961q2 1961q3 1961q4 1962q1 30.8 30.5 30.5 30.6 30.7 . 30.8 30.5 30.5 30.6 6. 1962q2 30.6 30.7 We have one more lagged observation for y in the first case: l.y was resolved before the in restriction was applied. In the second case, the dataset no longer contains the value of y to compute the first lag. This means that . use http://www.stata-press.com/data/r14/archxmpl, clear . arch y l.x if twithin(1962q2, 1990q3), arch(1) is not the same as . keep if twithin(1962q2, 1990q3) . arch y l.x, arch(1) arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 35 Example 3: Asymmetric effects—EGARCH model Continuing with the WPI data, we might be concerned that the economy as a whole responds differently to unanticipated increases in wholesale prices than it does to unanticipated decreases. Perhaps unanticipated increases lead to cash flow issues that affect inventories and lead to more volatility. We can see if the data support this supposition by specifying an ARCH model that allows an asymmetric effect of “news”—innovations or unanticipated changes. One of the most popular such models is EGARCH (Nelson 1991). The full first-order EGARCH model for the WPI can be specified as follows: . use http://www.stata-press.com/data/r14/wpi1, clear . arch D.ln_wpi, ar(1) ma(1 4) earch(1) egarch(1) (setting optimization to BHHH) Iteration 0: log likelihood = 227.5251 Iteration 1: log likelihood = 381.68426 (output omitted ) Iteration 23: log likelihood = 405.31453 ARCH family regression -- ARMA disturbances Sample: 1960q2 - 1990q4 Number of obs Distribution: Gaussian Wald chi2(3) Log likelihood = 405.3145 Prob > chi2 D.ln_wpi Coef. = = = 123 156.02 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] ln_wpi _cons .0087342 .0034004 2.57 0.010 .0020695 .0153989 ar L1. .7692139 .0968393 7.94 0.000 .5794124 .9590154 ma L1. L4. -.3554623 .2414626 .1265721 .0863834 -2.81 2.80 0.005 0.005 -.6035391 .0721543 -.1073855 .4107709 earch L1. .4063939 .11635 3.49 0.000 .1783521 .6344358 earch_a L1. .2467327 .1233357 2.00 0.045 .0049993 .4884662 egarch L1. .8417332 .0704074 11.96 0.000 .7037372 .9797291 _cons -1.488366 .6604354 -2.25 0.024 -2.782795 -.1939363 ARMA ARCH Our result for the variance is ln(σt2 ) = −1.49 + .406 zt−1 + .247 ( zt−1 − p 2 2/π ) + .842 ln(σt−1 ) where zt = t /σt , which is distributed as N (0, 1). This is a strong indication for a leverage effect. The positive L1.earch coefficient implies that positive innovations (unanticipated price increases) are more destabilizing than negative innovations. The effect appears strong (0.406) and is substantially larger than the symmetric effect (0.247). In fact, the relative scales of the two coefficients imply that the positive leverage completely dominates the symmetric effect. 36 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators This can readily be seen if we plot what is often referred to as the news-response or news-impact function. This curve shows the resulting conditional variance as a function of unanticipated news, in the form of innovations, that is, the conditional variance σt2 as a function of t . Thus we must evaluate σt2 for various values of t —say, −4 to 4—and then graph the result. Example 4: Asymmetric power ARCH model As an example of a frequently sampled, long-run series, consider the daily closing indices of the Dow Jones Industrial Average, variable dowclose. To avoid the first half of the century, when the New York Stock Exchange was open for Saturday trading, only data after 1jan1953 are used. The compound return of the series is used as the dependent variable and is graphed below. −.3 −.2 −.1 0 .1 DOW, compound return on DJIA 01jan1950 01jan1960 01jan1970 date 01jan1980 01jan1990 We formed this difference by referring to D.ln dow, but only after playing a trick. The series is daily, and each observation represents the Dow closing index for the day. Our data included a time variable recorded as a daily date. We wanted, however, to model the log differences in the series, and we wanted the span from Friday to Monday to appear as a single-period difference. That is, the day before Monday is Friday. Because our dataset was tsset with date, the span from Friday to Monday was 3 days. The solution was to create a second variable that sequentially numbered the observations. By tsseting the data with this new variable, we obtained the desired differences. . generate t = _n . tsset t arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 37 Now our data look like this: . use http://www.stata-press.com/data/r14/dow1, clear . generate dayofwk = dow(date) . list date dayofwk t ln_dow D.ln_dow in 1/8 D. ln_dow date dayofwk t ln_dow 1. 2. 3. 4. 5. 02jan1953 05jan1953 06jan1953 07jan1953 08jan1953 5 1 2 3 4 1 2 3 4 5 5.677096 5.682899 5.677439 5.672636 5.671259 . .0058026 -.0054603 -.0048032 -.0013762 6. 7. 8. 09jan1953 12jan1953 13jan1953 5 1 2 6 7 8 5.661223 5.653191 5.659134 -.0100365 -.0080323 .0059433 . list date dayofwk t ln_dow D.ln_dow in -8/l D. ln_dow date dayofwk t ln_dow 9334. 9335. 9336. 9337. 9338. 08feb1990 09feb1990 12feb1990 13feb1990 14feb1990 4 5 1 2 3 9334 9335 9336 9337 9338 7.880188 7.881635 7.870601 7.872665 7.872577 .0016198 .0014472 -.011034 .0020638 -.0000877 9339. 9340. 9341. 15feb1990 16feb1990 20feb1990 4 5 2 9339 9340 9341 7.88213 7.876863 7.862054 .009553 -.0052676 -.0148082 The difference operator D spans weekends because the specified time variable, t, is not a true date and has a difference of 1 for all observations. We must leave this contrived time variable in place during estimation, or arch will be convinced that our dataset has gaps. If we were using calendar dates, we would indeed have gaps. Ding, Granger, and Engle (1993) fit an A-PARCH model of daily returns of the Standard and Poor’s 500 (S&P 500) for 3jan1928–30aug1991. We will fit the same model for the Dow data shown above. The model includes an AR(1) term as well as the A-PARCH specification of conditional variance. 38 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators . arch D.ln_dow, ar(1) aparch(1) pgarch(1) (setting optimization to BHHH) Iteration 0: log likelihood = 31139.547 Iteration 1: log likelihood = 31350.751 (output omitted ) Iteration 68: log likelihood = 32273.555 Iteration 69: log likelihood = 32273.555 ARCH family regression -- AR disturbances Sample: 2 - 9341 Distribution: Gaussian Log likelihood = 32273.56 D.ln_dow Coef. (backed up) Number of obs Wald chi2(1) Prob > chi2 = = = 9,340 175.46 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] ln_dow _cons .0001786 .0000875 2.04 0.041 7.15e-06 .00035 ar L1. .1410944 .0106519 13.25 0.000 .1202171 .1619716 aparch L1. .0626323 .0034307 18.26 0.000 .0559082 .0693564 aparch_e L1. -.3645093 .0378485 -9.63 0.000 -.4386909 -.2903277 pgarch L1. .9299015 .0030998 299.99 0.000 .923826 .935977 _cons 7.19e-06 2.53e-06 2.84 0.004 2.23e-06 .0000121 power 1.585187 .0629186 25.19 0.000 1.461869 1.708505 ARMA ARCH POWER In the iteration log, the final iteration reports the message “backed up”. For most estimators, ending on a “backed up” message would be a cause for great concern, but not with arch or, for that matter, arima, as long as you do not specify the gtolerance() option. arch and arima, by default, monitor the gradient and declare convergence only if, in addition to everything else, the gradient is small enough. The fitted model demonstrates substantial asymmetry, with the large negative L1.aparch e coefficient indicating that the market responds with much more volatility to unexpected drops in returns (bad news) than it does to increases in returns (good news). Example 5: ARCH model with nonnormal errors Stock returns tend to be leptokurtotic, meaning that large returns (either positive or negative) occur more frequently than one would expect if returns were in fact normally distributed. Here we refit the previous A-PARCH model assuming the errors follow the generalized error distribution, and we let arch estimate the shape parameter of the distribution. arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators . use http://www.stata-press.com/data/r14/dow1, clear . arch D.ln_dow, ar(1) aparch(1) pgarch(1) distribution(ged) (setting optimization to BHHH) Iteration 0: log likelihood = 31139.547 Iteration 1: log likelihood = 31348.13 (output omitted ) Iteration 74: log likelihood = 32486.461 ARCH family regression -- AR disturbances Sample: 2 - 9341 Number of obs Distribution: GED Wald chi2(1) Log likelihood = 32486.46 Prob > chi2 D.ln_dow Coef. = = = 39 9,340 178.22 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] ln_dow _cons .0002735 .000078 3.51 0.000 .0001207 .0004264 ar L1. .1337479 .0100187 13.35 0.000 .1141116 .1533842 aparch L1. .0641772 .0049401 12.99 0.000 .0544949 .0738595 aparch_e L1. -.405225 .0573059 -7.07 0.000 -.5175426 -.2929074 pgarch L1. .9341739 .0045668 204.56 0.000 .9252231 .9431247 _cons .0000216 .0000117 1.84 0.066 -1.39e-06 .0000446 power 1.32524 .1030699 12.86 0.000 1.123227 1.527253 /lnshape .3527019 .0094819 37.20 0.000 .3341177 .371286 shape 1.422907 .0134919 1.396707 1.449598 ARMA ARCH POWER The ARMA and ARCH coefficients are similar to those we obtained when we assumed normally distributed errors, though we do note that the power term is now closer to 1. The estimated shape parameter for the generalized error distribution is shown at the bottom of the output. Here the shape parameter is 1.42; because it is less than 2, the distribution of the errors has tails that are fatter than they would be if the errors were normally distributed. Example 6: ARCH model with constraints Engle’s (1982) original model, which sparked the interest in ARCH, provides an example requiring constraints. Most current ARCH specifications use GARCH terms to provide flexible dynamic properties without estimating an excessive number of parameters. The original model was limited to ARCH terms, and to help cope with the collinearity of the terms, a declining lag structure was imposed in the parameters. The conditional variance equation was specified as σt2 = α0 + α(.4 t−1 + .3 t−2 + .2 t−3 + .1 t−4 ) = α0 + .4 αt−1 + .3 αt−2 + .2 αt−3 + .1 αt−4 40 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators From the earlier arch output, we know how the coefficients will be named. In Stata, the formula is σt2 = [ARCH] cons + .4 [ARCH]L1.arch t−1 + .3 [ARCH]L2.arch t−2 + .2 [ARCH]L3.arch t−3 + .1 [ARCH]L4.arch t−4 We could specify these linear constraints many ways, but the following seems fairly intuitive; see [R] constraint for syntax. . . . . use http://www.stata-press.com/data/r14/wpi1, clear constraint 1 (3/4)*[ARCH]l1.arch = [ARCH]l2.arch constraint 2 (2/4)*[ARCH]l1.arch = [ARCH]l3.arch constraint 3 (1/4)*[ARCH]l1.arch = [ARCH]l4.arch The original model was fit on U.K. inflation; we will again use the WPI data and retain our earlier specification of the mean equation, which differs from Engle’s U.K. inflation model. With our constraints, we type . arch D.ln_wpi, ar(1) ma(1 4) arch(1/4) constraints(1/3) (setting optimization to BHHH) Iteration 0: log likelihood = 396.80198 Iteration 1: log likelihood = 399.07809 (output omitted ) Iteration 9: log likelihood = 399.46243 ARCH family regression -- ARMA disturbances Sample: 1960q2 - 1990q4 Number of obs Distribution: Gaussian Wald chi2(3) Log likelihood = 399.4624 Prob > chi2 ( 1) .75*[ARCH]L.arch - [ARCH]L2.arch = 0 ( 2) .5*[ARCH]L.arch - [ARCH]L3.arch = 0 ( 3) .25*[ARCH]L.arch - [ARCH]L4.arch = 0 D.ln_wpi Coef. = = = 123 123.32 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] ln_wpi _cons .0077204 .0034531 2.24 0.025 .0009525 .0144883 ar L1. .7388168 .1126811 6.56 0.000 .5179659 .9596676 ma L1. L4. -.2559691 .2528923 .1442861 .1140185 -1.77 2.22 0.076 0.027 -.5387646 .02942 .0268264 .4763645 arch L1. L2. L3. L4. .2180138 .1635103 .1090069 .0545034 .0737787 .055334 .0368894 .0184447 2.95 2.95 2.95 2.95 0.003 0.003 0.003 0.003 .0734101 .0550576 .0367051 .0183525 .3626174 .2719631 .1813087 .0906544 _cons .0000483 7.66e-06 6.30 0.000 .0000333 .0000633 ARMA ARCH L1.arch, L2.arch, L3.arch, and L4.arch coefficients have the constrained relative sizes. arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 41 Stored results arch stores the following in e(): Scalars e(N) e(N gaps) e(condobs) e(k) e(k eq) e(k eq model) e(k dv) e(k aux) e(df m) e(ll) e(chi2) e(p) e(archi) e(archany) e(tdf) e(shape) e(tmin) e(tmax) e(power) e(rank) e(ic) e(rc) e(converged) Macros e(cmd) e(cmdline) e(depvar) e(covariates) e(eqnames) e(wtype) e(wexp) e(title) e(tmins) e(tmaxs) e(dist) e(mhet) e(dfopt) e(chi2type) e(vce) e(vcetype) e(ma) e(ar) e(arch) e(archm) e(archmexp) e(earch) e(egarch) e(aarch) e(narch) e(aparch) e(nparch) e(saarch) e(parch) e(tparch) e(abarch) e(tarch) number of observations number of gaps number of conditioning observations number of parameters number of equations in e(b) number of equations in overall model test number of dependent variables number of auxiliary parameters model degrees of freedom log likelihood χ2 significance σ02 =20 , priming values 1 if model contains ARCH terms, 0 otherwise degrees of freedom for Student’s t distribution shape parameter for generalized error distribution minimum time maximum time ϕ for power ARCH terms rank of e(V) number of iterations return code 1 if converged, 0 otherwise arch command as typed name of dependent variable list of covariates names of equations weight type weight expression title in estimation output formatted minimum time formatted maximum time distribution for error term: gaussian, t, or ged 1 if multiplicative heteroskedasticity yes if degrees of freedom for t distribution or shape parameter for GED distribution was estimated; no otherwise Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. lags for moving-average terms lags for autoregressive terms lags for ARCH terms ARCH-in-mean lags ARCH-in-mean exp lags for EARCH terms lags for EGARCH terms lags for AARCH terms lags for NARCH terms lags for A-PARCH terms lags for NPARCH terms lags for SAARCH terms lags for PARCH terms lags for TPARCH terms lags for ABARCH terms lags for TARCH terms 42 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators lags for ATARCH terms lags for SDGARCH terms lags for PGARCH terms lags for GARCH terms type of optimization type of ml method name of likelihood-evaluator program maximization technique maximization technique, including number of iterations number of iterations performed before switching techniques b V program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins e(atarch) e(sdgarch) e(pgarch) e(garch) e(opt) e(ml method) e(user) e(technique) e(tech) e(tech steps) e(properties) e(estat cmd) e(predict) e(marginsok) e(marginsnotok) Matrices e(b) e(Cns) e(ilog) e(gradient) e(V) e(V modelbased) Functions e(sample) coefficient vector constraints matrix iteration log (up to 20 iterations) gradient vector variance–covariance matrix of the estimators model-based variance marks estimation sample Methods and formulas The mean equation for the model fit by arch and with ARMA terms can be written as ( ) p p p X X X 2 2 yt = xt β + ψi g(σt−i ) + ρj yt−j − xt−j β − ψi g(σt−j−i ) i=1 + q X j=1 i=1 θk t−k + t (conditional mean) k=1 where β are the regression parameters, ψ are the ARCH-in-mean parameters, ρ are the autoregression parameters, θ are the moving-average parameters, and g() is a general function, see the archmexp() option. Any of the parameters in this full specification of the conditional mean may be zero. For example, the model need not have moving-average parameters (θ = 0) or ARCH-in-mean parameters (ψ = 0). The variance equation will be one of the following: σ 2 = γ0 + A(σ, ) + B(σ, )2 ln σt2 σtϕ (1) 2 = γ0 + C( lnσ, z) + A(σ, ) + B(σ, ) 2 = γ0 + D(σ, ) + A(σ, ) + B(σ, ) (2) (3) arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 43 where A(σ, ), B(σ, ), C( lnσ, z), and D(σ, ) are linear sums of the appropriate ARCH terms; see Details of syntax for more information. Equation (1) is used if no EGARCH or power ARCH terms are included in the model, (2) if EGARCH terms are included, and (3) if any power ARCH terms are included; see Details of syntax. Methods and formulas are presented under the following headings: Priming values Likelihood from prediction error decomposition Missing data Priming values The above model is recursive with potentially long memory. It is necessary to assume preestimation sample values for t , 2t , and σt2 to begin the recursions, and the remaining computations are therefore conditioned on these priming values, which can be controlled using the arch0() and arma0() options. See options discussed under the Priming tab above. The arch0(xb0wt) and arch0(xbwt) options compute a weighted sum of estimated disturbances with more weight on the early observations. With either of these options, σt20 −i = 2t0 −i = (1 − 0.7) T −1 X 0.7T −t−1 2T −t ∀i t=0 where t0 is the first observation for which the likelihood is computed; see options discussed under the Priming tab above. The 2t are all computed from the conditional mean equation. If arch0(xb0wt) is specified, β, ψi , ρj , and θk are taken from initial regression estimates and held constant during optimization. If arch0(xbwt) is specified, the current estimates of β, ψi , ρj , and θk are used to compute 2t on every iteration. If any ψi is in the mean equation (ARCH-in-mean is specified), the estimates of 2t from the initial regression estimates are not consistent. Likelihood from prediction error decomposition The likelihood function for ARCH has a particularly simple form. Given priming (or conditioning) values of t , 2t , and σt2 , the mean equation above can be solved recursively for every t (prediction error decomposition). Likewise, the conditional variance can be computed recursively for each observation by using the variance equation. Using these predicted errors, their associated variances, and the assumption that t ∼ N (0, σt2 ), we find that the log likelihood for each observation t is 1 ln Lt = − 2 ( ln(2πσt2 ) 2 + t2 σt ) If we assume that t ∼ t(df), then as given in Hamilton (1994, 662),  ln Lt = ln Γ df + 1 2   − ln Γ df 2  −     1 2t ln (df − 2)πσt2 + (df + 1) ln 1 + 2 (df − 2)σt2 The likelihood is not defined for df ≤ 2, so instead of estimating df directly, we estimate m = ln(df−2). Then df = exp(m) + 2 > 2 for any m. 44 arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators Following Bollerslev, Engle, and Nelson (1994, 2978), the log likelihood for the tth observation, assuming t ∼ GED(s), is ln Lt = ln s − ln λ −  1 t s+1 ln 2 − ln Γ s−1 − s 2 λσt where ( λ= s  )1/2 Γ s−1 22/s Γ (3s−1 ) To enforce the restriction that s > 0, we estimate r = ln s. This command supports the Huber/White/sandwich estimator of the variance using vce(robust). See [P] robust, particularly Maximum likelihood estimators and Methods and formulas. Missing data ARCH allows missing data or missing observations but does not attempt to condition on the surrounding data. If a dynamic component cannot be computed— t , 2t , and/or σt2 —its priming value is substituted. If a covariate, the dependent variable, or the entire observation is missing, the observation does not enter the likelihood, and its dynamic components are set to their priming values for that observation. This is acceptable only asymptotically and should not be used with a great deal of missing data.   Robert Fry Engle (1942– ) was born in Syracuse, New York. He earned degrees in physics and economics at Williams College and Cornell and then worked at MIT and the University of California, San Diego, before moving to New York University Stern School of Business in 2000. He was awarded the 2003 Nobel Prize in Economics for research on autoregressive conditional heteroskedasticity and is a leading expert in time-series analysis, especially the analysis of financial markets.  References Adkins, L. C., and R. C. Hill. 2011. Using Stata for Principles of Econometrics. 4th ed. Hoboken, NJ: Wiley. Baum, C. F. 2000. sts15: Tests for stationarity of a time series. Stata Technical Bulletin 57: 36–39. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 356–360. College Station, TX: Stata Press. Baum, C. F., and R. I. Sperling. 2000. sts15.1: Tests for stationarity of a time series: Update. Stata Technical Bulletin 58: 35–36. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 360–362. College Station, TX: Stata Press. Baum, C. F., and V. L. Wiggins. 2000. sts16: Tests for long memory in a time series. Stata Technical Bulletin 57: 39–44. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 362–368. College Station, TX: Stata Press. Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Berndt, E. K., B. H. Hall, R. E. Hall, and J. A. Hausman. 1974. Estimation and inference in nonlinear structural models. Annals of Economic and Social Measurement 3/4: 653–665. Black, F. 1976. Studies of stock price volatility changes. Proceedings of the American Statistical Association, Business and Economics Statistics 177–181. Bollerslev, T. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307–327. Bollerslev, T., R. Y. Chou, and K. F. Kroner. 1992. ARCH modeling in finance. Journal of Econometrics 52: 5–59. arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators 45 Bollerslev, T., R. F. Engle, and D. B. Nelson. 1994. ARCH models. In Vol. 4 of Handbook of Econometrics, ed. R. F. Engle and D. L. McFadden. Amsterdam: Elsevier. Bollerslev, T., and J. M. Wooldridge. 1992. Quasi-maximum likelihood estimation and inference in dynamic models with time-varying covariances. Econometric Reviews 11: 143–172. Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press. Diebold, F. X. 2003. The ET Interview: Professor Robert F. Engle. Econometric Theory 19: 1159–1193. Ding, Z., C. W. J. Granger, and R. F. Engle. 1993. A long memory property of stock market returns and a new model. Journal of Empirical Finance 1: 83–106. Enders, W. 2004. Applied Econometric Time Series. 2nd ed. New York: Wiley. Engle, R. F. 1982. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50: 987–1007. . 1990. Discussion: Stock volatility and the crash of ’87. Review of Financial Studies 3: 103–106. Engle, R. F., D. M. Lilien, and R. P. Robins. 1987. Estimating time varying risk premia in the term structure: The ARCH-M model. Econometrica 55: 391–407. Glosten, L. R., R. Jagannathan, and D. E. Runkle. 1993. On the relation between the expected value and the volatility of the nominal excess return on stocks. Journal of Finance 48: 1779–1801. Greene, W. H. 2012. Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. . 1990. The Econometric Analysis of Time Series. 2nd ed. Cambridge, MA: MIT Press. Higgins, M. L., and A. K. Bera. 1992. A class of nonlinear ARCH models. International Economic Review 33: 137–158. Hill, R. C., W. E. Griffiths, and G. C. Lim. 2011. Principles of Econometrics. 4th ed. Hoboken, NJ: Wiley. Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T.-C. Lee. 1985. The Theory and Practice of Econometrics. 2nd ed. New York: Wiley. Kmenta, J. 1997. Elements of Econometrics. 2nd ed. Ann Arbor: University of Michigan Press. Mandelbrot, B. B. 1963. The variation of certain speculative prices. Journal of Business 36: 394–419. Nelson, D. B. 1991. Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59: 347–370. Pickup, M. 2015. Introduction to Time Series Analysis. Thousand Oaks, CA: Sage. Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. 2007. Numerical Recipes: The Art of Scientific Computing. 3rd ed. New York: Cambridge University Press. Stock, J. H., and M. W. Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: Addison–Wesley. Wooldridge, J. M. 2013. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western. Zakoian, J. M. 1994. Threshold heteroskedastic models. Journal of Economic Dynamics and Control 18: 931–955. Also see [TS] arch postestimation — Postestimation tools for arch [TS] tsset — Declare data to be time-series data [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] mgarch — Multivariate GARCH models [R] regress — Linear regression [U] 20 Estimation and postestimation commands Title arch postestimation — Postestimation tools for arch Postestimation commands Also see predict margins Remarks and examples Postestimation commands The following postestimation commands are available after arch: Command Description estat ic estat summarize estat vce estimates forecast lincom Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest margins marginsplot nlcom predict predictnl test testnl 46 arch postestimation — Postestimation tools for arch 47 predict Description for predict predict creates a new variable containing predictions such as expected values and residuals. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for predict Statistics > Postestimation Syntax for predict   predict statistic type newvar  if   in   , statistic options  Description Main xb y variance het residuals yresiduals predicted values for mean equation—the differenced series; the default predicted values for the mean equation in y —the undifferenced series predicted values for the conditional variance predicted values of the variance, considering only the multiplicative heteroskedasticity residuals or predicted innovations residuals or predicted innovations in y —the undifferenced series These statistics are available both in and out of sample; type predict for the estimation sample. options . . . if e(sample) . . . if wanted only Description Options dynamic(time constant) at(varname | #  varnameσ2 | # σ2 ) t0(time constant) structural how to handle the lags of yt make static predictions set starting point for the recursions to time constant calculate considering the structural component only time constant is a # or a time literal, such as td(1jan1995) or tq(1995q1), etc.; see Conveniently typing SIF values in [D] datetime. 48 arch postestimation — Postestimation tools for arch Options for predict Six statistics can be computed by using predict after arch: the predictions of the mean equation (option xb, the default), the undifferenced predictions of the mean equation (option y), the predictions of the conditional variance (option variance), the predictions of the multiplicative heteroskedasticity component of variance (option het), the predictions of residuals or innovations (option residuals), and the predictions of residuals or innovations in terms of y (option yresiduals). Given the dynamic nature of ARCH models and because the dependent variable might be differenced, there are other ways of computing each statistic. We can use all the data on the dependent variable available right up to the time of each prediction (the default, which is often called a one-step prediction), or we can use the data up to a particular time, after which the predicted value of the dependent variable is used recursively to make later predictions (option dynamic()). Either way, we can consider or ignore the ARMA disturbance component, which is considered by default and is ignored if you specify the structural option. We might also be interested in predictions at certain fixed points where we specify the prior values of t and σt2 (option at()).   Main xb, the default, calculates the predictions from the mean equation. If D.depvar is the dependent variable, these predictions are of D.depvar and not of depvar itself. y specifies that predictions of depvar are to be made even if the model was specified for, say, D.depvar. variance calculates predictions of the conditional variance σ bt2 . het calculates predictions of the multiplicative heteroskedasticity component of variance. residuals calculates the residuals. If no other options are specified, these are the predicted innovations t ; that is, they include any ARMA component. If the structural option is specified, these are the residuals from the mean equation, ignoring any ARMA terms; see structural below. The residuals are always from the estimated equation, which may have a differenced dependent variable; if depvar is differenced, they are not the residuals of the undifferenced depvar. yresiduals calculates the residuals for depvar, even if the model was specified for, say, D.depvar. As with residuals, the yresiduals are computed from the model, including any ARMA component. If the structural option is specified, any ARMA component is ignored and yresiduals are the residuals from the structural equation; see structural below.   Options dynamic(time constant) specifies how lags of yt in the model are to be handled. If dynamic() is not specified, actual values are used everywhere lagged values of yt appear in the model to produce one-step-ahead forecasts. dynamic(time constant) produces dynamic (also known as recursive) forecasts. time constant specifies when the forecast is to switch from one step ahead to dynamic. In dynamic forecasts, references to yt evaluate to the prediction of yt for all periods at or after time constant; they evaluate to the actual value of yt for all prior periods. dynamic(10), for example, would calculate predictions where any reference to yt with t < 10 evaluates to the actual value of yt and any reference to yt with t ≥ 10 evaluates to the prediction of yt . This means that one-step-ahead predictions would be calculated for t < 10 and dynamic predictions would be calculated thereafter. Depending on the lag structure of the model, the dynamic predictions might still refer to some actual values of yt . You may also specify dynamic(.) to have predict automatically switch from one-step-ahead to dynamic predictions at p + q , where p is the maximum AR lag and q is the maximum MA lag. arch postestimation — Postestimation tools for arch 49 at(varname | #  varnameσ2 | # σ2 ) makes static predictions. at() and dynamic() may not be specified together. Specifying at() allows static evaluation of results for a given set of disturbances. This is useful, for instance, in generating the news response function. at() specifies two sets of values to be used for t and σt2 , the dynamic components in the model. These specified values are treated as given. Also, any lagged values of depvar in the model are obtained from the real values of the dependent variable. All computations are based on actual data and the given values. at() requires that you specify two arguments, which can be either a variable name or a number. The first argument supplies the values to be used for t ; the second supplies the values to be used for σt2 . If σt2 plays no role in your model, the second argument may be specified as ‘.’ to indicate missing. t0(time constant) specifies the starting point for the recursions to compute the predicted statistics; disturbances are assumed to be 0 for t < t0(). The default is to set t0() to the minimum t observed in the estimation sample, meaning that observations before that are assumed to have disturbances of 0. t0() is irrelevant if structural is specified because then all observations are assumed to have disturbances of 0. t0(5), for example, would begin recursions at t = 5. If your data were quarterly, you might instead type t0(tq(1961q2)) to obtain the same result. Any ARMA component in the mean equation or GARCH term in the conditional-variance equation makes arch recursive and dependent on the starting point of the predictions. This includes one-step-ahead predictions. structural makes the calculation considering the structural component only, ignoring any ARMA terms, and producing the steady-state equilibrium predictions. 50 arch postestimation — Postestimation tools for arch margins Description for margins margins estimates margins of response for expected values. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . )  predict(statistic . . . ) . . .   options statistic Description xb y variance het predicted values for mean equation—the differenced series; the default predicted values for the mean equation in y —the undifferenced series predicted values for the conditional variance predicted values of the variance, considering only the multiplicative heteroskedasticity not allowed with margins not allowed with margins residuals yresiduals  Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins. Remarks and examples Example 1 Continuing with our EGARCH model example (example 3) in [TS] arch, we can see that predict, at() calculates σt2 given a set of specified innovations (t , t−1 , . . .) and prior conditional variances 2 2 , σt−2 , . . .). The syntax is (σt−1 . predict newvar, variance at(epsilon sigma2) epsilon and sigma2 are either variables or numbers. Using sigma2 is a little tricky because you specify values of σt2 , which predict is supposed to predict. predict does not simply copy variable sigma2 into newvar but uses the lagged values contained in sigma2 to produce the predicted value of σt2 . It does this for all t, and those results are saved in newvar. (If you are interested in dynamic predictions of σt2 , see Options for predict.) We will generate predictions for σt2 , assuming that the lagged values of σt2 are 1, and we will vary t from −4 to 4. First, we will create variable et containing t , and then we will create and graph the predictions: arch postestimation — Postestimation tools for arch 51 . generate et = (_n-64)/15 . predict sigma2, variance at(et 1) . line sigma2 et in 2/l, m(i) c(l) title(News response function) 0 Conditional variance, one−step .5 1 1.5 2 2.5 News response function −4 −2 0 et 2 4 The positive asymmetry does indeed dominate the shape of the news response function. In fact, the response is a monotonically increasing function of news. The form of the response function shows that, for our simple model, only positive, unanticipated price increases have the destabilizing effect that we observe as larger conditional variances. Example 2 Continuing with our ARCH model with constraints example (example 6) in [TS] arch, using lincom we can recover the α parameter from the original specification. . lincom [ARCH]l1.arch/.4 ( 1) 2.5*[ARCH]L.arch = 0 D.ln_wpi Coef. (1) .5450344 Std. Err. z P>|z| [95% Conf. Interval] .1844468 2.95 0.003 .1835253 .9065436 Any arch parameter could be used to produce an identical estimate. Also see [TS] arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators [U] 20 Estimation and postestimation commands Title arfima — Autoregressive fractionally integrated moving-average models Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description arfima estimates the parameters of autoregressive fractionally integrated moving-average (ARFIMA) models. Long-memory processes are stationary processes whose autocorrelation functions decay more slowly than short-memory processes. The ARFIMA model provides a parsimonious parameterization of long-memory processes that nests the autoregressive moving-average (ARMA) model, which is widely used for short-memory processes. By allowing for fractional degrees of integration, the ARFIMA model also generalizes the autoregressive integrated moving-average (ARIMA) model with integer degrees of integration. See [TS] arima for ARMA and ARIMA parameter estimation. Quick start Autoregressive fractionally integrated moving-average model for y with regressor x using tsset data arfima y x Add autoregressive components of orders 1 and 2 and a moving-average component of order 4 arfima y x, ar(1 2) ma(4) ARIMA for y with autoregressive components of orders 1 and 2 arfima y, ar(1 2) smemory Menu Statistics > Time series > ARFIMA models 52 arfima — Autoregressive fractionally integrated moving-average models 53 Syntax arfima depvar  indepvars   if   in   , options  Description options Model noconstant ar(numlist) ma(numlist) smemory mle mpl constraints(numlist) collinear suppress constant term autoregressive terms moving-average terms estimate short-memory model without fractional integration maximum likelihood estimates; the default maximum modified-profile-likelihood estimates apply specified linear constraints do not drop collinear variables SE/Robust vce(vcetype) vcetype may be oim or robust Reporting level(#) nocnsreport display options set confidence level; default is level(95) do not display constraints control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling Maximization maximize options control the maximization process; seldom used coeflegend display legend instead of statistics You must tsset your data before using arfima; see [TS] tsset. indepvars may contain factor variables; see [U] 11.4.3 Factor variables. depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. Options   Model noconstant; see [R] estimation options. ar(numlist) specifies the autoregressive (AR) terms to be included in the model. An AR(p), p ≥ 1, specification would be ar(1/p). This model includes all lags from 1 to p, but not all lags need to be included. For example, the specification ar(1 p) would specify an AR(p) with only lags 1 and p included, setting all the other AR lag parameters to 0. ma(numlist) specifies the moving-average terms to be included in the model. These are the terms for the lagged innovations (white-noise disturbances). ma(1/q ), q ≥ 1, specifies an MA(q ) model, but like the ar() option, not all lags need to be included. 54 arfima — Autoregressive fractionally integrated moving-average models smemory causes arfima to fit a short-memory model with d = 0. This option causes arfima to estimate the parameters of an ARMA model by a method that is asymptotically equivalent to that produced by arima; see [TS] arima. mle causes arfima to estimate the parameters by maximum likelihood. This method is the default. mpl causes arfima to estimate the parameters by maximum modified profile likelihood (MPL). The MPL estimator of the fractional-difference parameter has less small-sample bias than the maximum likelihood estimator when there are covariates in the model. mpl may only be specified when there is a constant term or indepvars in the model, and it may not be combined with the mle option. constraints(numlist), collinear; see [R] estimation options.   SE/Robust vce(vcetype) specifies the type of standard error reported, which includes types that are robust to some kinds of misspecification (robust) and that are derived from asymptotic theory (oim); see [R] vce option. Options vce(robust) and mpl may not be combined.   Reporting level(#), nocnsreport; see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), gtolerance(#), nonrtolerance(#), and from(init specs); see [R] maximize for all options. Some special points for arfima’s maximize options are listed below. technique(algorithm spec) sets the optimization algorithm. The default algorithm is BFGS and BHHH is not allowed. See [R] maximize for a description of the available optimization algorithms. You can specify multiple optimization methods. For example, technique(bfgs 10 nr) requests that the optimizer perform 10 BFGS iterations and then switch to Newton–Raphson until convergence. iterate(#) sets the maximum number of iterations. When the maximization is not going well, set the maximum number of iterations to the point where the optimizer appears to be stuck and inspect the estimation results at that point. from(matname) allows you to specify starting values for the model parameters in a row vector. We recommend that you use the iterate(0) option, retrieve the initial estimates from e(b), and modify these elements. The following option is available with arfima but is not shown in the dialog box: coeflegend; see [R] estimation options. arfima — Autoregressive fractionally integrated moving-average models 55 Remarks and examples Long-memory processes are stationary processes whose autocorrelation functions decay more slowly than short-memory processes. Because the autocorrelations die out so slowly, long-memory processes display a type of long-run dependence. The autoregressive fractionally integrated movingaverage (ARFIMA) model provides a parsimonious parameterization of long-memory processes. This parameterization nests the autoregressive moving-average (ARMA) model, which is widely used for short-memory processes. The ARFIMA model also generalizes the autoregressive integrated moving-average (ARIMA) model with integer degrees of integration. ARFIMA models provide a solution for the tendency to overdifference stationary series that exhibit long-run dependence. In the ARIMA approach, a nonstationary time series is differenced d times until the differenced series is stationary, where d is an integer. Such series are said to be integrated of order d, denoted I(d), with not differencing, I(0), being the option for stationary series. Many series exhibit too much dependence to be I(0) but are not I(1), and ARFIMA models are designed to represent these series. The ARFIMA model allows for a continuum of fractional differences, −0.5 < d < 0.5. The generalization to fractional differences allows the ARFIMA model to handle processes that are neither I(0) nor I(1), to test for overdifferencing, and to model long-run effects that only die out at long horizons. Technical note An ARIMA model for the series yt is given by ρ(L)(1 − L)d yt = θ(L)t (1) where ρ(L) = (1 − ρ1 L − ρ2 L2 − · · · − ρp Lp ) is the autoregressive (AR) polynomial in the lag operator L; Lyt = yt−1 ; θ(L) = (1 + θ1 L + θ2 L2 + · · · + θp Lp ) is the moving-average (MA) lag polynomial; t is the independent and identically distributed innovation term; and d is the integer number of differences required to make the yt stationary. An ARFIMA model is also specified by (1) with the generalization that −0.5 < d < 0.5. Series with d ≥ 0.5 are handled by differencing and subsequent ARFIMA modeling. Because long-memory processes are stationary, one might be tempted to approximate the processes with many terms in an ARMA model. But these approximate models are difficult to fit and to interpret because ARMA models with many terms are difficult to estimate and the ARMA parameterization has an inherent short-run nature. In contrast, the ARFIMA model has the d parameter for the long-run dependence and ARMA parameters for short-run dependence. Using different parameters for different types of dependence facilitates estimation and interpretation, as discussed by Sowell (1992a). Technical note An ARFIMA model specifies a fractionally integrated ARMA process. Formally, the ARFIMA model specifies that yt = (1 − L)−d {ρ(L)}−1 θ(L)t The short-run ARMA process ρ(L)−1 θ(L)t captures the short-run effects, and the long-run effects are captured by fractionally integrating the short-run ARMA process. 56 arfima — Autoregressive fractionally integrated moving-average models Essentially, the fractional-integration parameter d captures the long-run effects, and the ARMA parameters capture the short-run effects. Having separate parameters for short-run and long-run effects makes the ARFIMA model more flexible and easier to interpret than the ARMA model. After estimating the ARFIMA parameters, the short-run effects are obtained by setting d = 0, whereas the long-run effects use the estimated value for d. The short-run effects describe the behavior of the fractionally differenced process (1 − L)d yt , whereas the long-run effects describe the behavior of the fractionally integrated yt . ARFIMA models have been useful in fields as diverse as hydrology and economics. Long-memory processes were first introduced in hydrology by Hurst (1951). Hosking (1981), in hydrology, and Granger and Joyeux (1980), in economics, independently discovered the ARFIMA representation of long-memory processes. Beran (1994), Baillie (1996), and Palma (2007) provide good introductions to long-memory processes and ARFIMA models. Example 1: Mount Campito tree ring data Baillie (1996) discusses a time series of measurements of the widths of the annual rings of a Mount Campito Bristlecone pine. The series contains measurements on rings formed in the tree from 3436 BC to 1969 AD. Essentially, larger widths were good years for the tree and narrower widths were harsh years. We begin by plotting the time series. . use http://www.stata-press.com/data/r14/campito (Campito Mnt. tree ring data from 3435BC to 1969AD) 0 20 tree ring width in 0.01mm 40 60 80 100 . tsline width, xlabel(-3435(500)1969) ysize(2) −3435 −2935 −2435 −1935 −1435 −935 −435 year 65 565 1065 1565 2065 Good years and bad years seem to run together, causing the appearance of local trends. The local trends are evidence of dependence, but they are not as pronounced as those in a nonstationary series. arfima — Autoregressive fractionally integrated moving-average models 57 We plot the autocorrelations for another view: −0.20 Autocorrelations of width 0.00 0.20 0.40 0.60 0.80 . ac width, ysize(2) 0 10 20 Lag 30 40 Bartlett’s formula for MA(q) 95% confidence bands The autocorrelations do not start below 1 but decay very slowly. Granger and Joyeux (1980) show that the autocorrelations from an ARMA model decay exponentially, whereas the autocorrelations from an ARFIMA process decay at the much slower hyperbolic rate. Box, Jenkins, and Reinsel (2008) define short-memory processes as those whose autocorrelations decay exponentially fast and long-memory processes as those whose autocorrelations decay at the hyperbolic rate. The above plot of autocorrelations looks closer to hyperbolic than exponential. Together, the above plots make us suspect that the series was generated by a long-memory process. We see evidence that the series is stationary but that the autocorrelations die out much slower than a short-memory process would predict. 58 arfima — Autoregressive fractionally integrated moving-average models Given that we believe the data was generated by a stationary process, we begin by fitting the data to an ARMA model. We begin by using a short-memory model because a comparison of the results highlights the advantages of using an ARFIMA model for a long-memory process. . arima width, ar(1/2) ma(1) technique(bhhh 4 nr) (setting optimization to BHHH) Iteration 0: log likelihood = -18934.593 Iteration 1: log likelihood = -18914.337 Iteration 2: log likelihood = -18913.407 Iteration 3: log likelihood = -18913.24 (switching optimization to Newton-Raphson) Iteration 4: log likelihood = -18913.214 Iteration 5: log likelihood = -18913.208 Iteration 6: log likelihood = -18913.208 ARIMA regression Sample: -3435 - 1969 Number of obs Wald chi2(3) Log likelihood = -18913.21 Prob > chi2 OIM Std. Err. width Coef. _cons 42.45055 1.02142 ar L1. L2. 1.264367 -.2848827 ma L1. /sigma z = = = 5405 133686.46 0.0000 P>|z| [95% Conf. Interval] 41.56 0.000 40.44861 44.4525 .0253199 .0227534 49.94 -12.52 0.000 0.000 1.214741 -.3294785 1.313993 -.240287 -.8066007 .0189699 -42.52 0.000 -.8437811 -.7694204 8.005814 .0770004 103.97 0.000 7.854896 8.156732 width ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. The estimated coefficients seem high in magnitude. We use estat aroots to investigate further. . estat aroots Eigenvalue stability condition Eigenvalue .9709661 .2934013 Modulus .970966 .293401 All the eigenvalues lie inside the unit circle. AR parameters satisfy stability condition. Eigenvalue stability condition Eigenvalue .8066007 Modulus .806601 All the eigenvalues lie inside the unit circle. MA parameters satisfy invertibility condition. The roots of the AR polynomial are 0.971 and 0.293, and the root of the MA polynomial is −0.807; all of these are less than one in magnitude, indicating that the series is stationary and invertible arfima — Autoregressive fractionally integrated moving-average models 59 but has a high level of persistence. See Hamilton (1994, 59) and [TS] estat aroots for details about computing and interpreting the roots of the polynomials from the estimated ARIMA coefficients. Below we estimate the parameters of an ARFIMA model with only the fractional difference parameter and a constant. . arfima width Iteration 0: log likelihood Iteration 1: log likelihood Iteration 2: log likelihood Iteration 3: log likelihood Iteration 4: log likelihood Iteration 5: log likelihood Iteration 6: log likelihood Iteration 7: log likelihood Refining estimates: Iteration 0: log likelihood Iteration 1: log likelihood ARFIMA regression Sample: -3435 - 1969 = = = = = = = = -18918.219 -18916.84 -18908.508 -18908.508 -18907.29 -18907.286 -18907.279 -18907.279 (backed up) = -18907.279 = -18907.279 Number of obs Wald chi2(1) Prob > chi2 Log likelihood = -18907.279 = = = 5,405 1864.43 0.0000 OIM Std. Err. z P>|z| [95% Conf. Interval] 44.01432 9.174318 4.80 0.000 26.03299 61.99565 d .4468888 .0103497 43.18 0.000 .4266038 .4671737 /sigma2 63.92927 1.229754 51.99 0.000 61.519 66.33955 width Coef. _cons width ARFIMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. The estimate of d is large and statistically significant. The relative parsimony of the ARFIMA model is illustrated by the fact that the estimates of the standard deviation of the idiosyncratic errors are about the same in the 5-parameter ARMA model and the 3-parameter ARFIMA model. 60 arfima — Autoregressive fractionally integrated moving-average models Let’s add an AR parameter to the above ARFIMA model: . arfima width, ar(1) Iteration 0: log likelihood Iteration 1: log likelihood Iteration 2: log likelihood Iteration 3: log likelihood Iteration 4: log likelihood Iteration 5: log likelihood Iteration 6: log likelihood Refining estimates: Iteration 0: log likelihood Iteration 1: log likelihood ARFIMA regression Sample: -3435 - 1969 = = = = = = = -18910.997 -18910.949 -18908.158 -18907.248 -18907.233 -18907.233 -18907.233 (backed up) (backed up) = -18907.233 = -18907.233 Number of obs Wald chi2(2) Prob > chi2 Log likelihood = -18907.233 = = = 5,405 1875.34 0.0000 OIM Std. Err. z P>|z| [95% Conf. Interval] 43.98774 8.685171 5.06 0.000 26.96512 61.01036 ar L1. .0063323 .0209987 0.30 0.763 -.0348244 .047489 d .4432471 .0158937 27.89 0.000 .412096 .4743981 /sigma2 63.92915 1.229755 51.99 0.000 61.51887 66.33942 width Coef. _cons width ARFIMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. That the estimated AR term is tiny and statistically insignificant indicates that the d parameter has accounted for all the dependence in the series. As mentioned above, there is a sense in which the main advantages of an ARFIMA model over an ARMA model for long-memory processes are the relative parsimony of the ARFIMA parameterization and the ability of the ARFIMA parameterization to separate out the long-run effects from the short-run effects. If the true process was generated from an ARFIMA model, an ARMA model with many terms can approximate the process, but the terms make estimation difficult and the lack of separate long-run and short-run parameters complicates interpretation. This example highlights the relative parsimony of the ARFIMA model. In the examples below, we illustrate the advantages of having separate parameters for long-run and short-run effects. Technical note You may be wondering what long-run effects can be produced by a model for stationary processes. Because the autocorrelations of a long-memory process die out so slowly, the spectral density becomes infinite as the frequency goes to 0 and the impulse–response functions die out at a much slower rate. The spectral density of a process describes the relative contributions of random components at different frequencies to the variance of the process, with the low-frequency components corresponding to long-run effects. See [TS] psdensity for an introduction to estimating and interpreting spectral densities implied by the estimated parameters of parametric models. arfima — Autoregressive fractionally integrated moving-average models 61 Granger and Joyeux (1980) motivate ARFIMA models by noting that their implied spectral densities are finite except at frequency 0 with 0 < d < 0.5, whereas stationary ARMA models have finite spectral densities at all frequencies. Granger and Joyeux (1980) argue that the ability of ARFIMA models to capture this long-range dependence, which cannot be captured by stationary ARMA models, is an important advantage of ARFIMA models over ARMA models when modeling long-memory processes. Impulse–response functions are the coefficients on the infinite-order MA representation of a process, and they describe how a shock feeds though the dynamic system. If the process is stationary, the coefficients decay to 0 and they sum to a finite constant. As expected, the coefficients from an ARFIMA model die out at a slower rate than those from an ARMA model. Because the ARMA terms model the short-run effects and the d parameter models the long-run effects, an ARFIMA model specifies both a short-run impulse–response function and a long-run impulse–response function. When an ARMA model is used to approximate a long-memory model, the ARMA impulse–response-function coefficients confound the two effects. Example 2 In this example, we model the log of the monthly levels of carbon dioxide above Mauna Loa, Hawaii. To remove the seasonality, we model the twelfth seasonal difference of the log of the series. This example illustrates that the ARFIMA model parameterizes long-run and short-run effects, whereas the ARMA model confounds the two effects. (Sowell [1992a] discusses this point in greater depth.) We begin by fitting the series to an ARMA model with an AR(1) term and an MA(2). . use http://www.stata-press.com/data/r14/mloa, clear . arima S12.log, ar(1) ma(2) (setting optimization to BHHH) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = 2000.9262 2001.5484 2001.5637 2001.5641 2001.5641 ARIMA regression Sample: 1960m1 - 1990m12 Number of obs Wald chi2(2) Prob > chi2 Log likelihood = 2001.564 S12.log Coef. _cons .0036754 .0002475 ar L1. .7354346 ma L2. /sigma OPG Std. Err. z = = = 372 500.41 0.0000 P>|z| [95% Conf. Interval] 14.85 0.000 .0031903 .0041605 .0357715 20.56 0.000 .6653237 .8055456 .1353086 .0513156 2.64 0.008 .0347319 .2358853 .0011129 .0000401 27.77 0.000 .0010344 .0011914 log ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. 62 arfima — Autoregressive fractionally integrated moving-average models All the parameters are statistically significant, and they indicate a high degree of dependence. Below we nest the previously fit ARMA model into an ARFIMA model. . arfima S12.log, ar(1) ma(2) Iteration 0: log likelihood Iteration 1: log likelihood Iteration 2: log likelihood Iteration 3: log likelihood Iteration 4: log likelihood Refining estimates: Iteration 0: log likelihood Iteration 1: log likelihood = = = = = 2006.0757 2006.0774 2006.0775 2006.0804 2006.0805 = = 2006.0805 2006.0805 (backed up) (backed up) ARFIMA regression Sample: 1960m1 - 1990m12 Log likelihood = 2006.0805 S12.log Coef. S12.log _cons Number of obs Wald chi2(3) Prob > chi2 = = = 372 248.88 0.0000 OIM Std. Err. z P>|z| [95% Conf. Interval] .003616 .0012968 2.79 0.005 .0010743 .0061578 ar L1. .2160894 .1015547 2.13 0.033 .0170458 .415133 ma L2. .1633916 .0516905 3.16 0.002 .0620801 .2647031 d .4042573 .0805417 5.02 0.000 .2463985 .562116 /sigma2 1.20e-06 8.84e-08 13.63 0.000 1.03e-06 1.38e-06 ARFIMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. All the parameters are statistically significant at the 5% level. That the confidence interval for the fractional-difference parameter d includes numbers greater than 0.5 is evidence that the series may be nonstationary. Alternatively, we proceed as if the series is stationary, and the wide confidence interval for d reflects the difficulty of fitting a complicated dynamic model with only 372 observations. With the above caveat, we can now proceed to compare the interpretations of the ARMA and ARFIMA estimates. We compare these estimates in terms of their implied spectral densities. The spectral density of a stationary time series describes the relative importance of components at different frequencies. See [TS] psdensity for an introduction to spectral densities. Below we quietly refit the ARMA model and use psdensity to estimate the parametric spectral density implied by the ARMA parameter estimates. . quietly arima S12.log, ar(1) ma(2) . psdensity d_arma omega1 The psdensity command above put the estimated ARMA spectral density into the new variable d arma at the frequencies stored in the new variable omega1. Below we quietly refit the ARFIMA model and use psdensity to estimate the long-run parametric spectral density and then the short-run parametric spectral density implied by the ARFIMA parameter estimates. The long-run estimates use the estimated d, and the short-run estimates set d to 0 (as is arfima — Autoregressive fractionally integrated moving-average models 63 implied by specifying the smemory option). The long-run estimates describe the fractionally integrated series, and the short-run estimates describe the fractionally differenced series. . quietly arfima S12.log, ar(1) ma(2) . psdensity d_arfima omega2 . psdensity ds_arfima omega3, smemory Now that we have the ARMA estimates, the long-run ARFIMA estimates, and the short-run ARFIMA estimates, we graph them below. 0 1 2 3 4 5 . line d_arma d_arfima omega1, name(lmem) nodraw . line d_arma ds_arfima omega1, name(smem) nodraw . graph combine lmem smem, cols(1) xcommon 0 1 2 3 Frequency ARFIMA long−memory spectral density 0 .5 1 1.5 ARMA spectral density 0 1 2 3 Frequency ARMA spectral density ARFIMA short−memory spectral density The top graph contains a plot of the spectral densities implied by the ARMA parameter estimates and by the long-run ARFIMA parameter estimates. As discussed by Granger and Joyeux (1980), the two models imply different spectral densities for frequencies close to 0 when d > 0. When d > 0, the spectral density implied by the ARFIMA estimates diverges to infinity, whereas the spectral density implied by the ARMA estimates remains finite at frequency 0 for stable ARMA processes. This difference reflects the ability of ARFIMA models to capture long-run effects that ARMA models only capture as the parameters approach those of an unstable model. The bottom graph contains a plot of the spectral densities implied by the ARMA parameter estimates and by the short-run ARFIMA parameter estimates, which are the ARMA parameters for the fractionally differenced process. Comparing the two plots illustrates the ability of the short-run ARFIMA parameters to capture both low-frequency and high-frequency components in the fractionally differenced series. In contrast, the ARMA parameters captured only low-frequency components in the fractionally integrated series. Comparing the ARFIMA and ARMA spectral densities in the two graphs illustrates that the additional fractional-difference parameter allows the ARFIMA model to identify both long-run and short-run effects, which the ARMA model confounds. 64 arfima — Autoregressive fractionally integrated moving-average models Technical note As noted above, the spectral density of an ARFIMA process with d > 0 diverges to infinity as the frequency goes to 0. In contrast, the spectral density of an ARFIMA process with d < 0 is 0 at frequency 0. The autocorrelation function of an ARFIMA process with d < 0 also decays at the slower hyperbolic rate. ARFIMA processes with d < 0 are sometimes called antipersistent because all the autocorrelations for lags greater than 0 are negative. Hosking (1981), Baillie (1996), and others refer to ARFIMA processes with d < 0 as “intermediate memory” processes and ARFIMA processes with d > 0 as long-memory processes. Box, Jenkins, and Reinsel (2008, 429) define long-memory processes as those with the slower hyperbolic rate of decay, which includes ARFIMA processes with d < 0. We follow Box, Jenkins, and Reinsel (2008) and thus call ARFIMA processes for −0.5 < d < 0 and 0 < d < 0.5 long-memory processes. Sowell (1992a) uses the properties of ARFIMA processes with d < 0 to derive tests for whether a series was generated by an I(1) process or an I(d) process with d < 1. Example 3 In this example, we use arfima to test whether a series is nonstationary. More specifically, we test whether the series was generated by an I(1) process by testing whether the first difference of the series is overdifferenced. We have monthly data on the log of the number of reported cases of mumps in New York City between January 1928 and December 1972. We believe that the series is stationary, after accounting for the monthly seasonal effects. We use an ARFIMA model for differenced series to test the null hypothesis of nonstationarity. We use the confidence interval for the d parameter from an ARFIMA model for the first difference of the log of the series to perform the test. If the right-hand end of the 95% CI is less than 0, we conclude that the differenced series was overdifferenced, which implies that the original series was not nonstationary. More formally, if yt is I(1), then ∆yt = yt − yt−1 must be I(0). If ∆yt is I(d) with d < 0, then ∆yt is overdifferenced and yt is I(d) with d < 1. We use seasonal indicators to account for the seasonal effects. In the output below, we specify the mpl option to use the MPL estimator that is less biased in the presence of covariates. arfima computes the maximum likelihood estimates (MLE) for the parameters of this stationary and invertible Gaussian process. Alternatively, the maximum MPL estimates may be computed. See Methods and formulas for a description of these two estimation techniques, but suffice it to say that the MLE estimates for d are biased in the presence of exogenous variables, even the constant term, for small samples. The MPL estimator reduces this bias; see Hauser (1999) and Doornik and Ooms (2004). arfima — Autoregressive fractionally integrated moving-average models 65 . use http://www.stata-press.com/data/r14/mumps2, clear (Hipel and Mcleod (1994), http://robjhyndman.com/tsdldata/epi/mumps.dat) . arfima D.log i.month, ma(1 2) mpl Iteration 0: log modified profile likelihood = 53.766763 Iteration 1: log modified profile likelihood = 54.388641 Iteration 2: log modified profile likelihood = 54.934726 (backed up) Iteration 3: log modified profile likelihood = 54.937524 (backed up) Iteration 4: log modified profile likelihood = 55.002186 Iteration 5: log modified profile likelihood = 55.20462 Iteration 6: log modified profile likelihood = 55.205939 Iteration 7: log modified profile likelihood = 55.205949 Iteration 8: log modified profile likelihood = 55.205949 Refining estimates: Iteration 0: log modified profile likelihood = 55.205949 Iteration 1: log modified profile likelihood = 55.205949 ARFIMA regression Sample: 1928m2 - 1972m6 Number of obs = 533 Wald chi2(14) = 1360.28 Log modified profile likelihood = 55.205949 Prob > chi2 = 0.0000 OIM Std. Err. D.log Coef. z P>|z| [95% Conf. Interval] month February March April May June July August September October November December -.220719 .0314683 -.2800296 -.3703179 -.4722035 -.9613239 -1.063042 -.7577301 -.3024251 -.0115317 .0247135 .0428112 .0424718 .0460084 .0449932 .0446764 .0448375 .0449272 .0452529 .0462887 .0426911 .0430401 -5.16 0.74 -6.09 -8.23 -10.57 -21.44 -23.66 -16.74 -6.53 -0.27 0.57 0.000 0.459 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.787 0.566 -.3046275 -.0517749 -.3702043 -.4585029 -.5597676 -1.049204 -1.151098 -.8464242 -.3931494 -.0952046 -.0596435 -.1368105 .1147115 -.1898548 -.2821329 -.3846394 -.873444 -.9749868 -.669036 -.2117009 .0721413 .1090705 _cons .3656807 .0303215 12.06 0.000 .3062517 .4251096 ma L1. L2. .258056 .1972011 .0684414 .0506439 3.77 3.89 0.000 0.000 .1239133 .0979409 .3921986 .2964612 d -.2329426 .067336 -3.46 0.001 -.3649187 -.1009664 D.log ARFIMA We interpret the fact that the estimated 95% CI is strictly less than 0 to mean that the differenced series is overdifferenced, which implies that the original series is stationary. 66 arfima — Autoregressive fractionally integrated moving-average models Stored results arfima stores the following in e(): Scalars e(N) e(k) e(k eq) e(k dv) e(k aux) e(df m) e(ll) e(chi2) e(p) e(s2) e(tmin) e(tmax) e(ar max) e(ma max) e(rank) e(ic) e(rc) e(converged) e(constant) Macros e(cmd) e(cmdline) e(depvar) e(covariates) e(method) e(eqnames) e(title) e(tmins) e(tmaxs) e(chi2type) e(vce) e(vcetype) e(ma) e(ar) e(technique) e(tech steps) e(properties) e(estat cmd) e(predict) e(marginsok) e(marginsnotok) e(asbalanced) e(asobserved) Matrices e(b) e(Cns) e(ilog) e(gradient) e(V) e(V modelbased) Functions e(sample) number of observations number of parameters number of equations in e(b) number of dependent variables number of auxiliary parameters model degrees of freedom log likelihood χ2 significance idiosyncratic error variance estimate, if e(method) = mpl minimum time maximum time maximum AR lag maximum MA lag rank of e(V) number of iterations return code 1 if converged, 0 otherwise 0 if noconstant, 1 otherwise arfima command as typed name of dependent variable list of covariates mle or mpl names of equations title in estimation output formatted minimum time formatted maximum time Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. lags for MA terms lags for AR terms maximization technique number of iterations performed before switching techniques b V program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins factor variables fvset as asbalanced factor variables fvset as asobserved coefficient vector constraints matrix iteration log (up to 20 iterations) gradient vector variance–covariance matrix of the estimators model-based variance marks estimation sample arfima — Autoregressive fractionally integrated moving-average models 67 Methods and formulas Methods and formulas are presented under the following headings: Introduction The likelihood function The autocovariance function The profile likelihood The MPL Introduction We model an observed second-order stationary time-series yt , t = 1, . . . , T , using the ARFIMA(p, d, q) model defined as ρ(Lp )(1 − L)d (yt − xt β) = θ(Lq )t where ρ(Lp ) = 1 − ρ1 L − ρ2 L2 − · · · − ρp Lp θ(Lq ) = 1 + θ1 L + θ2 L2 + · · · + θq Lq ∞ X (1 − L) = (−1)j d j=0 Γ(j + d) Lj Γ(j + 1)Γ(d) j and the lag operator is defined as L yt = yt−j , t = 1, . . . , T and j = 1, . . . , t − 1; t ∼ N (0, σ 2 ); Γ() is the gamma function; and −0.5 < d < 0.5, d 6= 0. The row vector xt contains the exogenous variables specified as indepvars in the arfima syntax. The process is stationary and invertible for −0.5 < d < 0.5; the roots of the AR polynomial, ρ(z) = 1 − ρ1 z − ρ2 z 2 − · · · − ρp z p = 0, and the MA polynomial, θ(z) = 1 + θ1 z + θ2 z 2 + · · · + θq z q = 0, lie outside the unit circle and there are no common roots. When 0 < d < 0.5, the process has long P∞ memory in that the autocovariance function, γh , decays to 0 at a hyperbolic rate, such that has long memory in that the autocovariance h=−∞ |γh | = ∞. When −0.5 < d < 0, the process also P ∞ function, γh , decays to 0 at a hyperbolic rate such that h=−∞ |γh | < ∞. (As discussed in the text, some authors refer to ARFIMA processes with −0.5 < d < 0 as having intermediate memory, but we follow Box, Jenkins, and Reinsel [2008] and refer to them as long-memory processes.) Granger and Joyeux (1980), Hosking (1981), Sowell (1992b), Sowell (1992a), Baillie (1996), and Palma (2007) provide overviews of long-memory processes, fractional integration, and introductions to ARFIMA models. The likelihood function Estimation of the ARFIMA parameters ρ, θ, d, β and σ 2 is done by the method of maximum 0 b0, σ likelihood. The log Gaussian likelihood of y given parameter estimates η b = (b ρ0 , b θ , db, β b2 ) is `(y|b η) = − 1 b + (y − Xβ b −1 (y − Xβ b )0 V b) T log(2π) + log |V| 2 (2) 68 arfima — Autoregressive fractionally integrated moving-average models where the covariance matrix V has a Toeplitz structure  γ γ1 γ2 0 γ0 γ1  γ1 V= .. ..  .. . . . γT −1 γT −2 γT −3 . . . γT −1  . . . γT −2  ..  ..  . . ... γ0 Var(yt ) = γ0 , Cov(yt , yt−h ) = γh (for h = 1, . . . , t − 1), and t = 1, . . . , T (Sowell 1992b). We use the Durbin–Levinson algorithm (Palma 2007; Golub and Van Loan 2013) to factor and invert V. Using only the vector of autocovariances γ, the Durbin–Levinson algorithm will compute b −0.5 L b −1 (y − Xβ b ), where L is lower triangular and V = LDL0 and D = Diag(ν), b  = D νt = Var(yt ). The algorithm performs these computations without generating the T × T matrix L−1 . During optimization, we restrict the fractional-integration parameter to (−0.5, 0.5) using a logistic transform, d∗ = log {(x + 0.5)/(0.5 − x)}, so that the range of d∗ encompasses the real line. During the “Refining estimates” step, the fractional-integration parameter is transformed back to the restricted space, where we obtain its standard error from the observed information matrix. The autocovariance function Computation of the autocovariances γh is given by Sowell (1992b) with numerical enhancements by Doornik and Ooms (2003) and is reviewed by Palma (2007, sec. 3.2.4). We reproduce it here. The autocovariance of an ARFIMA(0, d, 0) process is γh∗ = σ 2 Γ(h + d) Γ(1 − 2d) Γ(1 − d)Γ(d) Γ(1 + h − d) where h = 0, 1, . . . . For ARFIMA(p, d, q ), we have γh = σ 2 q X p X ψ(i)ξj C(d, p + i − h, ρj ) (3) i=−q j=1 where ψ(i) = minX (q,q+i) θk θk−i k= max(0,i)  −1 p  Y  Y ξj = ρj (1 − ρi ρj ) (ρj − ρm )   i=1 m6=j and γh∗  2p ρ F (d + h, 1, 1 − d + h, ρ) + F (d − h, 1, 1 − d − h, ρ) − 1 σ2 F (·) is the hypergeometric series (Gradshteyn and Ryzhik 2007) C(d, h, ρ) = F (a, b, c, x) = 1 + ab a(a + 1)b(b + 1) 2 a(a + 1)(a + 2)b(b + 1)(b + 2) 3 x+ x + x + ··· c·1 c(c + 1) · 1 · 2 c(c + 1)(c + 2) · 1 · 2 · 3 The series recursions are evaluated backward as Doornik and Ooms (2003) emphasize. Doornik and Ooms (2003) also provide other computational enhancements, such as not dividing by ρj in (3). arfima — Autoregressive fractionally integrated moving-average models 69 The profile likelihood Doornik and Ooms (2003) show that the parameters σ 2 and β can be concentrated out of the likelihood. Using (2), the MLE for σ 2 is 1 b −1 (y − Xβ b )0 R b) (y − Xβ T σ b2 = where R = 1 σ2 V (4) and b −1 X)−1 X0 R b −1 y b = (X0 R β (5) is the weighted least-squares estimates for β. Substituting (4) into (2) results in the profile likelihood `p (y|b ηr ) = − T 2  1 + log(2π) + 1 b + log σ log |R| b2 T  We compute the MLEs using the profile likelihood for the reduced parameter set ηr = (ρ0 , θ0 , d). Equations (4) and (5) provide MLEs for σ 2 and β to create the full parameter vector η = (β0 , ρ0 , θ0 , d, σ 2 ). We follow with the “Refining estimates” step, optimizing on the log likelihood (1). The refining step does not change the estimates; it produces the coefficient variance–covariance matrix from the observed information matrix. Using this profile likelihood prevents the use of the BHHH optimization method because there are no observation-level scores. The MPL The small-sample MLE for d can be biased when there are exogenous variables in the model. The MPL reduces this bias (Hauser 1999; Doornik and Ooms 2004). The mpl option will direct arfima to use this optimization criterion. The MPL is expressed as `m (y|b ηr ) = − T {1 + log(2π)} − 2  1 1 − T 2   b − log |R| T −k−2 2  log σ b2 − 1 b −1 X| log |X0 R 2 where k = rank(X) (An and Bloomfield 1993). There is no MPL estimator for σ 2 , and you will notice its absence from the coefficient table. However, the unbiased estimate assuming ARFIMA(0, 0, 0), σ e2 = b −1 (y − Xβ b )0 R b) (y − Xβ T −k is stored in e() for postestimation computation of the forecast and residual root mean squared errors. References An, S., and P. Bloomfield. 1993. Cox and Reid’s modification in regression models with correlated errors. Technical report, Department of Statistics, North Carolina State University, Raleigh, NC. Baillie, R. T. 1996. Long memory processes and fractional integration in econometrics. Journal of Econometrics 73: 5–59. 70 arfima — Autoregressive fractionally integrated moving-average models Beran, J. 1994. Statistics for Long-Memory Processes. Boca Raton: Chapman & Hall/CRC. Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: Wiley. Doornik, J. A., and M. Ooms. 2003. Computational aspects of maximum likelihood estimation of autoregressive fractionally integrated moving average models. Computational Statistics & Data Analysis 42: 333–348. . 2004. Inference and forecasting for ARFIMA models with an application to US and UK inflation. Studies in Nonlinear Dynamics & Econometrics 8: 1–23. Golub, G. H., and C. F. Van Loan. 2013. Matrix Computations. 4th ed. Baltimore: Johns Hopkins University Press. Gradshteyn, I. S., and I. M. Ryzhik. 2007. Table of Integrals, Series, and Products. 7th ed. San Diego: Elsevier. Granger, C. W. J., and R. Joyeux. 1980. An introduction to long-memory time series models and fractional differencing. Journal of Time Series Analysis 1: 15–29. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Hauser, M. A. 1999. Maximum likelihood estimators for ARMA and ARFIMA models: a Monte Carlo study. Journal of Statistical Planning and Inference 80: 229–255. Hosking, J. R. M. 1981. Fractional differencing. Biometrika 68: 165–176. Hurst, H. E. 1951. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116: 770–779. Palma, W. 2007. Long-Memory Time Series: Theory and Methods. Hoboken, NJ: Wiley. Sowell, F. 1992a. Modeling long-run behavior with the fractional ARIMA model. Journal of Monetary Economics 29: 277–302. . 1992b. Maximum likelihood estimation of stationary univariate fractionally integrated time series models. Journal of Econometrics 53: 165–188. Also see [TS] arfima postestimation — Postestimation tools for arfima [TS] tsset — Declare data to be time-series data [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] sspace — State-space models [U] 20 Estimation and postestimation commands Title arfima postestimation — Postestimation tools for arfima Postestimation commands Methods and formulas predict References margins Also see Remarks and examples Postestimation commands The following postestimation commands are of special interest after arfima: Command Description estat acplot irf psdensity estimate autocorrelations and autocovariances create and analyze IRFs estimate the spectral density The following standard postestimation commands are also available: ∗ ∗ ∗ ∗ ∗ Command Description contrast estat ic estat summarize estat vce estimates forecast lincom contrasts and ANOVA-style joint tests of estimates Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions pairwise comparisons of estimates Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest margins marginsplot nlcom predict predictnl pwcompare test testnl ∗ estat ic, margins, marginsplot, nlcom, and predictnl are not appropriate after arfima, mpl. 71 72 arfima postestimation — Postestimation tools for arfima predict Description for predict predict creates a new variable containing predictions such as expected values, fractionally differenced series, and innovations. All predictions are available as static one-step-ahead predictions, and the dependent variable is also available as a dynamic multistep prediction. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic newvar  if   in   , statistic options  Description Main xb residuals rstandard fdifference predicted values; the default predicted innovations standardized innovations fractionally differenced series These statistics are available both in and out of sample; type predict the estimation sample. options . . . if e(sample) . . . if wanted only for Description Options   rmse( type newvar) dynamic(datetime) put the estimated root mean squared error of the predicted statistic in a new variable; only permitted with options xb and residuals forecast the time series starting at datetime; only permitted with option xb datetime is a # or a time literal, such as td(1jan1995) or tq(1995q1); see [D] datetime. Options for predict  Main  xb, the default, calculates the predictions for the level of depvar. residuals calculates the predicted innovations. rstandard calculates the standardized innovations. fdifference calculates the fractionally differenced predictions of depvar. arfima postestimation — Postestimation tools for arfima  73  Options   rmse( type newvar) puts the root mean squared errors of the predicted statistics into the specified new variables. The root mean squared errors measure the variances due to the disturbances but do not account for estimation error. rmse() is only permitted with the xb and residuals options. dynamic(datetime) specifies when predict starts producing dynamic forecasts. The specified datetime must be in the scale of the time variable specified in tsset, and the datetime must be inside a sample for which observations on the dependent variables are available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of 2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains exogenous variables, they must be present for the whole predicted sample. dynamic() may only be specified with xb. margins Description for margins margins estimates margins of response for expected values. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . ) statistic Description xb residuals rstandard fdifference predicted values; not allowed with not allowed with not allowed with  options  the default margins margins margins Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins. Remarks and examples Remarks are presented under the following headings: Forecasting after ARFIMA IRF results for ARFIMA 74 arfima postestimation — Postestimation tools for arfima Forecasting after ARFIMA We assume that you have already read [TS] arfima. In this section, we illustrate some of the features of predict after fitting an ARFIMA model using arfima. Example 1 We have monthly data on the one-year Treasury bill secondary market rate imported from the Federal Reserve Bank (FRED) database using freduse; see Drukker (2006) and Stata YouTube video: Using freduse to download time-series data from the Federal Reserve for an introduction to freduse. Below we fit an ARFIMA model with two autoregressive terms and one moving-average term to the data. . use http://www.stata-press.com/data/r14/tb1yr (FRED, 1-year treasury bill; secondary market rate, monthly 1959-2001) . arfima tb1yr, ar(1/2) ma(1) Iteration 0: log likelihood = -235.31856 Iteration 1: log likelihood = -235.26104 (backed up) Iteration 2: log likelihood = -235.25974 (backed up) Iteration 3: log likelihood = -235.2544 (backed up) Iteration 4: log likelihood = -235.13353 Iteration 5: log likelihood = -235.13063 Iteration 6: log likelihood = -235.12108 Iteration 7: log likelihood = -235.11917 Iteration 8: log likelihood = -235.11869 Iteration 9: log likelihood = -235.11868 Refining estimates: Iteration 0: log likelihood = -235.11868 Iteration 1: log likelihood = -235.11868 ARFIMA regression Sample: 1959m7 - 2001m8 Number of obs = 506 Wald chi2(4) = 1864.15 Log likelihood = -235.11868 Prob > chi2 = 0.0000 OIM Std. Err. z P>|z| 5.496709 2.920357 1.88 0.060 -.2270864 11.2205 ar L1. L2. .2326107 .3885212 .1136655 .0835665 2.05 4.65 0.041 0.000 .0098304 .2247337 .4553911 .5523086 ma L1. .7755848 .0669562 11.58 0.000 .6443531 .9068166 d .4606489 .0646542 7.12 0.000 .333929 .5873688 /sigma2 .1466495 .009232 15.88 0.000 .1285551 .1647439 tb1yr Coef. _cons [95% Conf. Interval] tb1yr ARFIMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. All the parameters are statistically significant at the 5% level, and they indicate a high degree of dependence in the series. In fact, the confidence interval for the fractional-difference parameter d indicates that the series may be nonstationary. We will proceed as if the series is stationary and suppose that it is fractionally integrated of order 0.46. arfima postestimation — Postestimation tools for arfima 75 We begin our postestimation analysis by predicting the series in sample: . predict ptb (option xb assumed) We continue by using the estimated fractional-difference parameter to fractionally difference the original series and by plotting the original series, the predicted series, and the fractionally differenced series. See [TS] arfima for a definition of the fractional-difference operator. . predict fdtb, fdifference 0 5 10 15 . twoway tsline tb1yr ptb fdtb, legend(cols(1)) 1960m1 1970m1 1980m1 month 1990m1 2000m1 1−Year Treasury Bill: Secondary Market Rate xb prediction tb1yr fractionally differenced The above graph shows that the in-sample predictions appear to track the original series well and that the fractionally differenced series looks much more like a stationary series than does the original. Example 2 In this example, we use the above estimates to produce a dynamic forecast and a confidence interval for the forecast for the one-year treasury bill rate and plot them. We begin by extending the dataset and using predict to put the dynamic forecast in the new ftb variable and the root mean squared error of the forecast in the new rtb variable. (As discussed in Methods and formulas, the root mean squared error of the forecast accounts for the idiosyncratic error but not for the estimation error.) . tsappend, add(12) . predict ftb, xb dynamic(tm(2001m9)) rmse(rtb) Now we compute a 90% confidence interval around the dynamic forecast and plot the original series, the in-sample forecast, the dynamic forecast, and the confidence interval of the dynamic forecast. . scalar z = invnormal(0.95) . generate lb = ftb - z*rtb if month>=tm(2001m9) (506 missing values generated) . generate ub = ftb + z*rtb if month>=tm(2001m9) (506 missing values generated) 76 arfima postestimation — Postestimation tools for arfima 2 3 4 5 6 7 . twoway tsline tb1yr ftb if month>tm(1998m12) || > tsrline lb ub if month>=tm(2001m9), > legend(cols(1) label(3 "90% prediction interval")) 1999m1 2000m1 2001m1 month 2002m1 1−Year Treasury Bill: Secondary Market Rate xb prediction, dynamic(tm(2001m9)) 90% prediction interval IRF results for ARFIMA We assume that you have already read [TS] irf and [TS] irf create. In this section, we illustrate how to calculate the impulse–response function (IRF) of an ARFIMA model. Example 3 Here we use the estimates obtained in example 1 to calculate the IRF of the ARFIMA model; see [TS] irf and [TS] irf create for more details about IRFs. arfima postestimation — Postestimation tools for arfima . irf (file (file (file . irf 77 create arfima, step(50) set(myirf) myirf.irf created) myirf.irf now active) myirf.irf updated) graph irf arfima, tb1yr, tb1yr 1.5 1 .5 0 0 50 step 95% CI impulse−response function (irf) Graphs by irfname, impulse variable, and response variable The figure shows that a shock to tb1yr causes an initial spike in tb1yr, after which the impact of the shock starts decaying slowly. This behavior is characteristic of long-memory processes. Methods and formulas Denote γh , h = 1, . . . , t, to be the autocovariance function of the ARFIMA(p, d, q ) process for two observations, yt and yt−h , h time periods apart. The covariance matrix V of the process of length T has a Toeplitz structure of  γ γ1 γ2 . . . γT −1  0 γ0 γ1 . . . γT −2   γ1 V= .. .. ..  ..  ..  . . . . . γT −1 γT −2 γT −3 . . . γ0 where the process variance is γ0 = Var(yt ). We factor V = LDL0 , where L is lower triangular and D = Diag(νt ). The structure of L−1 is of importance.  1 0 0 ... 0 0 1 0 ... 0 0  −τ1,1   −τ2,2 −τ2,1 1 ... 0 0 L−1 =   .. .. .. .. ..  ..   . . . . . . −τT −1,T −1 −τT −1,T −2 −τT −1,T −2 . . . −τT −1,1 1 Let z = yt − xt β. The best linear predictor of zt+1 based on z1 , z2 , . . . , zt is zbt+1 = Pt t −1 up to, but k=1 τt,k zt−k+1 . Define −τt = (−τt,t , −τt,t−1 , . . . , −τt−1,1 ) to be the tth row of L −1 not including, the diagonal. Then τt = Vt γt , where Vt is the t × t upper left submatrix of V and γt = (γ1 , γ2 , . . . , γt )0 . Hence, the best linear predictor of the innovations is computed as b  = L−1 z, b In practice, the computation is b =b and the one-step predictions are y  + Xβ. 78 arfima postestimation — Postestimation tools for arfima   b −1 y − Xβ b + Xβ b b=L y b and V b are computed from the maximum likelihood estimates. We use the Durbin–Levinson where L b , invert L b , and scale y − Xβ b using algorithm (Palma 2007; Golub and Van Loan 2013) to factor V only the vector of estimated autocovariances γ b. The prediction error variances of the one-step predictions are computed recursively in the Durbin– Levinson algorithm. They are the νt elements in the diagonal matrix D computed from the Cholesky 2 ). factorization of V. The recursive formula is ν0 = γ0 , and νt = νt−1 (1 − τt,t b −1b z, where Forecasting is carried out as described by Beran (1994, sec. 8.7), b zT +k = γ e0k V 0 γT +k−1 , γ bT +k−2 , . . . , γ bk ). The forecast mean squared error is computed as MSE(b zT +k ) = γ b0 − γ ek = (b 0 b −1 −1 b ek . Computation of V γ ek is carried out efficiently using algorithm 4.7.2 of Golub and Van γ ek V γ Loan (2013). References Beran, J. 1994. Statistics for Long-Memory Processes. Boca Raton: Chapman & Hall/CRC. Drukker, D. M. 2006. Importing Federal Reserve economic data. Stata Journal 6: 384–386. Golub, G. H., and C. F. Van Loan. 2013. Matrix Computations. 4th ed. Baltimore: Johns Hopkins University Press. Palma, W. 2007. Long-Memory Time Series: Theory and Methods. Hoboken, NJ: Wiley. Also see [TS] arfima — Autoregressive fractionally integrated moving-average models [TS] estat acplot — Plot parametric autocorrelation and autocovariance functions [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] psdensity — Parametric spectral density estimation after arima, arfima, and ucm [U] 20 Estimation and postestimation commands Title arima — ARIMA, ARMAX, and other dynamic regression models Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description arima fits univariate models for a time series, where the disturbances are allowed to follow a linear autoregressive moving-average (ARMA) specification. When independent variables are included in the specification, such models are often called ARMAX models; and when independent variables are not specified, they reduce to Box–Jenkins autoregressive integrated moving-average (ARIMA) models in the dependent variable. Quick start AR(1) model using tsset data arima y, ar(1) MA(1) model arima y, ma(1) ARMA(2,1) model arima y, ar(1/2) ma(1) Same as above arima y, arima(2,0,1) As above, and take first difference of y and restrict estimation to years 1990 to 2010 arima D.y if tin(1990,2010), ar(1/2) ma(1) Same as above arima y if tin(1990,2010), arima(2,1,1) Multiplicative SARIMA model with quarterly data arima y, arima(2,1,1) sarima(2,1,0,4) ARMAX model with covariates x1 and x2, an AR(1) process, and robust standard errors arima y x, ar(1) vce(robust) Menu Statistics > Time series > ARIMA and ARMAX models 79 80 arima — ARIMA, ARMAX, and other dynamic regression models Syntax Basic syntax for a regression model with ARMA disturbances   arima depvar indepvars , ar(numlist) ma(numlist) Basic syntax for an ARIMA(p, d, q) model arima depvar , arima(# p ,# d ,# q ) Basic syntax for a multiplicative seasonal ARIMA(p, d, q) × (P, D, Q)s model arima depvar , arima(# p ,# d ,# q ) sarima(# P ,# D ,# Q ,# s ) Full syntax arima depvar  indepvars options   if   in   weight   , options  Description Model noconstant arima(# p ,# d ,# q ) ar(numlist) ma(numlist) constraints(constraints) collinear suppress constant term specify ARIMA(p, d, q ) model for dependent variable autoregressive terms of the structural model disturbance moving-average terms of the structural model disturbance apply specified linear constraints keep collinear variables Model 2 sarima(# P ,# D ,# Q ,# s ) mar(numlist, #s ) mma(numlist, #s ) specify period-#s multiplicative seasonal ARIMA term multiplicative seasonal autoregressive term; may be repeated multiplicative seasonal moving-average term; may be repeated Model 3 condition savespace diffuse p0(# | matname) state0(# | matname) use conditional MLE instead of full MLE conserve memory during estimation use diffuse prior for starting Kalman filter recursions use alternate prior for starting Kalman recursions; seldom used use alternate state vector for starting Kalman filter recursions SE/Robust vce(vcetype) vcetype may be opg, robust, or oim Reporting level(#) detail nocnsreport display options set confidence level; default is level(95) report list of gaps in time series do not display constraints control columns and column formats, row spacing, and line width Maximization maximize options control the maximization process; seldom used coeflegend display legend instead of statistics arima — ARIMA, ARMAX, and other dynamic regression models 81 You must tsset your data before using arima; see [TS] tsset. depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands. iweights are allowed; see [U] 11.1.6 weight. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. Options   Model noconstant; see [R] estimation options. arima(# p ,# d ,# q ) is an alternative, shorthand notation for specifying models with ARMA disturbances. The dependent variable and any independent variables are differenced # d times, and 1 through # p lags of autocorrelations and 1 through # q lags of moving averages are included in the model. For example, the specification . arima D.y, ar(1/2) ma(1/3) is equivalent to . arima y, arima(2,1,3) The latter is easier to write for simple ARMAX and ARIMA models, but if gaps in the AR or MA lags are to be modeled, or if different operators are to be applied to independent variables, the first syntax is required. ar(numlist) specifies the autoregressive terms of the structural model disturbance to be included in the model. For example, ar(1/3) specifies that lags of 1, 2, and 3 of the structural disturbance be included in the model; ar(1 4) specifies that lags 1 and 4 be included, perhaps to account for additive quarterly effects. If the model does not contain regressors, these terms can also be considered autoregressive terms for the dependent variable. ma(numlist) specifies the moving-average terms to be included in the model. These are the terms for the lagged innovations (white-noise disturbances). constraints(constraints), collinear; see [R] estimation options. If constraints are placed between structural model parameters and ARMA terms, the first few iterations may attempt steps into nonstationary areas. This process can be ignored if the final solution is well within the bounds of stationary solutions.   Model 2 sarima(# P ,# D ,# Q ,#s ) is an alternative, shorthand notation for specifying the multiplicative seasonal components of models with ARMA disturbances. The dependent variable and any independent variables are lag-# s seasonally differenced #D times, and 1 through # P seasonal lags of autoregressive terms and 1 through # Q seasonal lags of moving-average terms are included in the model. For example, the specification . arima DS12.y, ar(1/2) ma(1/3) mar(1/2,12) mma(1/2,12) is equivalent to . arima y, arima(2,1,3) sarima(2,1,2,12) 82 arima — ARIMA, ARMAX, and other dynamic regression models mar(numlist, # s ) specifies the lag-# s multiplicative seasonal autoregressive terms. For example, mar(1/2,12) requests that the first two lag-12 multiplicative seasonal autoregressive terms be included in the model. mma(numlist, # s ) specified the lag-# s multiplicative seasonal moving-average terms. For example, mma(1 3,12) requests that the first and third (but not the second) lag-12 multiplicative seasonal moving-average terms be included in the model.   Model 3 condition specifies that conditional, rather than full, maximum likelihood estimates be produced. The presample values for t and µt are taken to be their expected value of zero, and the estimate of the variance of t is taken to be constant over the entire sample; see Hamilton (1994, 132). This estimation method is not appropriate for nonstationary series but may be preferable for long series or for models that have one or more long AR or MA lags. diffuse, p0(), and state0() have no meaning for models fit from the conditional likelihood and may not be specified with condition. If the series is long and stationary and the underlying data-generating process does not have a long memory, estimates will be similar, whether estimated by unconditional maximum likelihood (the default), conditional maximum likelihood (condition), or maximum likelihood from a diffuse prior (diffuse). In small samples, however, results of conditional and unconditional maximum likelihood may differ substantially; see Ansley and Newbold (1980). Whereas the default unconditional maximum likelihood estimates make the most use of sample information when all the assumptions of the model are met, Harvey (1989) and Ansley and Kohn (1985) argue for diffuse priors often, particularly in ARIMA models corresponding to an underlying structural model. The condition or diffuse options may also be preferred when the model contains one or more long AR or MA lags; this avoids inverting potentially large matrices (see diffuse below). When condition is specified, estimation is performed by the arch command (see [TS] arch), and more control of the estimation process can be obtained using arch directly. condition cannot be specified if the model contains any multiplicative seasonal terms. savespace specifies that memory use be conserved by retaining only those variables required for estimation. The original dataset is restored after estimation. This option is rarely used and should be used only if there is not enough space to fit a model without the option. However, arima requires considerably more temporary storage during estimation than most estimation commands in Stata. diffuse specifies that a diffuse prior (see Harvey 1989 or 1993) be used as a starting point for the Kalman filter recursions. Using diffuse, nonstationary models may be fit with arima (see the p0() option below; diffuse is equivalent to specifying p0(1e9)). By default, arima uses the unconditional expected value of the state vector ξt (see Methods and formulas) and the mean squared error (MSE) of the state vector to initialize the filter. When the process is stationary, this corresponds to the expected value and expected variance of a random draw from the state vector and produces unconditional maximum likelihood estimates of the parameters. When the process is not stationary, however, this default is not appropriate, and the unconditional MSE cannot be computed. For a nonstationary process, another starting point must be used for the recursions. In the absence of nonsample or presample information, diffuse may be specified to start the recursions from a state vector of zero and a state MSE matrix corresponding to an effectively arima — ARIMA, ARMAX, and other dynamic regression models 83 infinite variance on this initial state. This method amounts to an uninformative and improper prior that is updated to a proper MSE as data from the sample become available; see Harvey (1989). Nonstationary models may also correspond to models with infinite variance given a particular specification. This and other problems with nonstationary series make convergence difficult and sometimes impossible. diffuse can also be useful if a model contains one or more long AR or MA lags. Computation of the unconditional MSE of the state vector (see Methods and formulas) requires construction and inversion of a square matrix that is of dimension {max(p, q + 1)}2 , where p and q are the maximum AR and MA lags, respectively. If q = 27, for example, we would require a 784-by-784 matrix. Estimation with diffuse does not require this matrix. For large samples, there is little difference between using the default starting point and the diffuse starting point. Unless the series has a long memory, the initial conditions affect the likelihood of only the first few observations. p0(# | matname) is a rarely specified option that can be used for nonstationary series or when an alternate prior for starting the Kalman recursions is desired (see diffuse above for a discussion of the default starting point and Methods and formulas for background). matname specifies a matrix to be used as the MSE of the state vector for starting the Kalman filter recursions— P1|0 . Instead, one number, #, may be supplied, and the MSE of the initial state vector P1|0 will have this number on its diagonal and all off-diagonal values set to zero. This option may be used with nonstationary series to specify a larger or smaller diagonal for P1|0 than that supplied by diffuse. It may also be used with state0() when you believe that you have a better prior for the initial state vector and its MSE.  state0(# | matname) is a rarely used option that specifies an alternate initial state vector, ξ1|0 (see Methods and formulas), for starting the Kalman filter recursions. If # is specified, all elements of the vector are taken to be #. The default initial state vector is state0(0).  SE/Robust vce(vcetype) specifies the type of standard error reported, which includes types that are robust to some kinds of misspecification (robust) and that are derived from asymptotic theory (oim, opg); see [R] vce option. For state-space models in general and ARMAX and ARIMA models in particular, the robust or quasi–maximum likelihood estimates (QMLEs) of variance are robust to symmetric nonnormality in the disturbances, including, as a special case, heteroskedasticity. The robust variance estimates are not generally robust to functional misspecification of the structural or ARMA components of the model; see Hamilton (1994, 389) for a brief discussion.   Reporting level(#); see [R] estimation options. detail specifies that a detailed list of any gaps in the series be reported, including gaps due to missing observations or missing data for the dependent variable or independent variables. nocnsreport; see [R] estimation options. display options: noci, nopvalues, vsquish, cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options. 84  arima — ARIMA, ARMAX, and other dynamic regression models  Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), gtolerance(#), nonrtolerance(#), and from(init specs); see [R] maximize for all options except gtolerance(), and see below for information on gtolerance(). These options are sometimes more important for ARIMA models than most maximum likelihood models because of potential convergence problems with ARIMA models, particularly if the specified model and the sample data imply a nonstationary model. Several alternate optimization methods, such as Berndt–Hall–Hall–Hausman (BHHH) and Broyden– Fletcher–Goldfarb–Shanno (BFGS), are provided for ARIMA models. Although ARIMA models are not as difficult to optimize as ARCH models, their likelihoods are nevertheless generally not quadratic and often pose optimization difficulties; this is particularly true if a model is nonstationary or nearly nonstationary. Because each method approaches optimization differently, some problems can be successfully optimized by an alternate method when one method fails. Setting technique() to something other than the default or BHHH changes the vcetype to vce(oim). The following options are all related to maximization and are either particularly important in fitting ARIMA models or not available for most other estimators. technique(algorithm spec) specifies the optimization technique to use to maximize the likelihood function. technique(bhhh) specifies the Berndt–Hall–Hall–Hausman (BHHH) algorithm. technique(dfp) specifies the Davidon–Fletcher–Powell (DFP) algorithm. technique(bfgs) specifies the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm. technique(nr) specifies Stata’s modified Newton–Raphson (NR) algorithm. You can specify multiple optimization methods. For example, technique(bhhh 10 nr 20) requests that the optimizer perform 10 BHHH iterations, switch to Newton–Raphson for 20 iterations, switch back to BHHH for 10 more iterations, and so on. The default for arima is technique(bhhh 5 bfgs 10). gtolerance(#) specifies the tolerance for the gradient relative to the coefficients. When |gi bi | ≤ gtolerance() for all parameters bi and the corresponding elements of the gradient gi , the gradient tolerance criterion is met. The default gradient tolerance for arima is gtolerance(.05). gtolerance(999) may be specified to disable the gradient criterion. If the optimizer becomes stuck with repeated “(backed up)” messages, the gradient probably still contains substantial values, but an uphill direction cannot be found for the likelihood. With this option, results can often be obtained, but whether the global maximum likelihood has been found is unclear. When the maximization is not going well, it is also possible to set the maximum number of iterations (see [R] maximize) to the point where the optimizer appears to be stuck and to inspect the estimation results at that point. from(init specs) allows you to set the starting values of the model coefficients; see [R] maximize for a general discussion and syntax options. The standard syntax for from() accepts a matrix, a list of values, or coefficient name value pairs; see [R] maximize. arima also accepts from(armab0), which sets the starting value for all ARMA parameters in the model to zero prior to optimization. arima — ARIMA, ARMAX, and other dynamic regression models 85 ARIMA models may be sensitive to initial conditions and may have coefficient values that correspond to local maximums. The default starting values for arima are generally good, particularly in large samples for stationary series. The following option is available with arima but is not shown in the dialog box: coeflegend; see [R] estimation options. Remarks and examples Remarks are presented under the following headings: Introduction ARIMA models Multiplicative seasonal ARIMA models ARMAX models Dynamic forecasting Video example Introduction arima fits both standard ARIMA models that are autoregressive in the dependent variable and structural models with ARMA disturbances. Good introductions to the former models can be found in Box, Jenkins, and Reinsel (2008); Hamilton (1994); Harvey (1993); Newton (1988); Diggle (1990); and many others. The latter models are developed fully in Hamilton (1994) and Harvey (1989), both of which provide extensive treatment of the Kalman filter (Kalman 1960) and the state-space form used by arima to fit the models. Becketti (2013) discusses ARIMA models and Stata’s arima command, and he devotes an entire chapter explaining how the principles of ARIMA models are applied to real datasets in practice. Consider a first-order autoregressive moving-average process. Then arima estimates all the parameters in the model yt = xt β + µt µt = ρµt−1 + θt−1 + t structural equation disturbance, ARMA(1, 1) where ρ θ t is the first-order autocorrelation parameter is the first-order moving-average parameter ∼ i.i.d. N (0, σ 2 ), meaning that t is a white-noise disturbance You can combine the two equations and write a general ARMA(p, q) in the disturbances process as yt = xt β + ρ1 (yt−1 − xt−1 β) + ρ2 (yt−2 − xt−2 β) + · · · + ρp (yt−p − xt−p β) + θ1 t−1 + θ2 t−2 + · · · + θq t−q + t It is also common to write the general form of the ARMA model more succinctly using lag operator notation as ρ(Lp )(yt − xt β) = θ(Lq )t ARMA(p, q) where ρ(Lp ) = 1 − ρ1 L − ρ2 L2 − · · · − ρp Lp θ(Lq ) = 1 + θ1 L + θ2 L2 + · · · + θq Lq and Lj yt = yt−j . Multiplicative seasonal ARMAX and ARIMA models can also be fit. 86 arima — ARIMA, ARMAX, and other dynamic regression models For stationary series, full or unconditional maximum likelihood estimates are obtained via the Kalman filter. For nonstationary series, if some prior information is available, you can specify initial values for the filter by using state0() and p0() as suggested by Hamilton (1994) or assume an uninformative prior by using the diffuse option as suggested by Harvey (1989). Missing data are allowed and are handled using the Kalman filter and methods suggested by Harvey (1989 and 1993); see Methods and formulas. ARIMA models Pure ARIMA models without a structural component do not have regressors and are often written as autoregressions in the dependent variable, rather than autoregressions in the disturbances from a structural equation. For example, an ARMA(1, 1) model can be written as yt = α + ρyt−1 + θt−1 + t (1a) Other than a scale factor for the constant term α, these models are equivalent to the ARMA in the disturbances formulation estimated by arima, though the latter are more flexible and allow a wider class of models. To see this effect, replace xt β in the structural equation above with a constant term β0 so that yt = β0 + µt = β0 + ρµt−1 + θt−1 + t = β0 + ρ(yt−1 − β0 ) + θt−1 + t = (1 − ρ)β0 + ρyt−1 + θt−1 + t (1b) Equations (1a) and (1b) are equivalent, with α = (1 − ρ)β0 , so whether we consider an ARIMA model as autoregressive in the dependent variable or disturbances is immaterial. Our illustration can easily be extended from the ARMA(1, 1) case to the general ARIMA(p, d, q) case. arima — ARIMA, ARMAX, and other dynamic regression models 87 Example 1: ARIMA model Enders (2004, 87–93) considers an ARIMA model of the U.S. Wholesale Price Index (WPI) using quarterly data over the period 1960q1 through 1990q4. The simplest ARIMA model that includes differencing and both autoregressive and moving-average components is the ARIMA(1,1,1) specification. We can fit this model with arima by typing . use http://www.stata-press.com/data/r14/wpi1 . arima wpi, arima(1,1,1) (setting optimization to BHHH) Iteration 0: log likelihood = -139.80133 Iteration 1: log likelihood = -135.6278 Iteration 2: log likelihood = -135.41838 Iteration 3: log likelihood = -135.36691 Iteration 4: log likelihood = -135.35892 (switching optimization to BFGS) Iteration 5: log likelihood = -135.35471 Iteration 6: log likelihood = -135.35135 Iteration 7: log likelihood = -135.35132 Iteration 8: log likelihood = -135.35131 ARIMA regression Sample: 1960q2 - 1990q4 Log likelihood = -135.3513 Number of obs Wald chi2(2) Prob > chi2 = = = 123 310.64 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] .7498197 .3340968 2.24 0.025 .0950019 1.404637 ar L1. .8742288 .0545435 16.03 0.000 .7673256 .981132 ma L1. -.4120458 .1000284 -4.12 0.000 -.6080979 -.2159938 /sigma .7250436 .0368065 19.70 0.000 .6529042 .7971829 D.wpi Coef. _cons wpi ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. Examining the estimation results, we see that the AR(1) coefficient is 0.874, the MA(1) coefficient is −0.412, and both are highly significant. The estimated standard deviation of the white-noise disturbance  is 0.725. This model also could have been fit by typing . arima D.wpi, ar(1) ma(1) The D. placed in front of the dependent variable wpi is the Stata time-series operator for differencing. Thus we would be modeling the first difference in WPI from the second quarter of 1960 through the fourth quarter of 1990 because the first observation is lost because of differencing. This second syntax allows a richer choice of models. 88 arima — ARIMA, ARMAX, and other dynamic regression models Example 2: ARIMA model with additive seasonal effects After examining first-differences of WPI, Enders chose a model of differences in the natural logarithms to stabilize the variance in the differenced series. The raw data and first-difference of the logarithms are graphed below. US Wholesale Price Index −− difference of logs 25 −.04 −.02 50 0 75 .02 .04 100 .06 .08 125 US Wholesale Price Index 1960q1 1970q1 1980q1 1990q1 1960q1 1970q1 quarterly date 1980q1 1990q1 quarterly date On the basis of the autocorrelations, partial autocorrelations (see graphs below), and the results of preliminary estimations, Enders identified an ARMA model in the log-differenced series. 0 10 20 Lag Bartlett’s formula for MA(q) 95% confidence bands 30 40 −0.40 −0.40 Autocorrelations of D.ln_wpi −0.20 0.00 0.20 0.40 Partial autocorrelations of D.ln_wpi −0.20 0.00 0.20 0.40 0.60 0.60 . ac D.ln_wpi, ylabels(-.4(.2).6) . pac D.ln_wpi, ylabels(-.4(.2).6) 0 10 20 Lag 30 40 95% Confidence bands [se = 1/sqrt(n)] In addition to an autoregressive term and an MA(1) term, an MA(4) term is included to account for a remaining quarterly effect. Thus the model to be fit is ∆ ln(wpit ) = β0 + ρ1 {∆ ln(wpit−1 ) − β0 } + θ1 t−1 + θ4 t−4 + t arima — ARIMA, ARMAX, and other dynamic regression models 89 We can fit this model with arima and Stata’s standard difference operator: . arima D.ln_wpi, ar(1) ma(1 4) (setting optimization to BHHH) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = (switching optimization to BFGS) Iteration 5: log likelihood = Iteration 6: log likelihood = Iteration 7: log likelihood = Iteration 8: log likelihood = Iteration 9: log likelihood = Iteration 10: log likelihood = 382.67447 384.80754 384.84749 385.39213 385.40983 385.9021 385.95646 386.02979 386.03326 386.03354 386.03357 ARIMA regression Sample: 1960q2 - 1990q4 Log likelihood = 386.0336 D.ln_wpi Coef. Number of obs Wald chi2(3) Prob > chi2 = = = 123 333.60 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] ln_wpi _cons .0110493 .0048349 2.29 0.022 .0015731 .0205255 ar L1. .7806991 .0944946 8.26 0.000 .5954931 .965905 ma L1. L4. -.3990039 .3090813 .1258753 .1200945 -3.17 2.57 0.002 0.010 -.6457149 .0737003 -.1522928 .5444622 /sigma .0104394 .0004702 22.20 0.000 .0095178 .0113609 ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. In this final specification, the log-differenced series is still highly autocorrelated at a level of 0.781, though innovations have a negative impact in the ensuing quarter (−0.399) and a positive seasonal impact of 0.309 in the following year. Technical note In one way, the results differ from most of Stata’s estimation commands: the standard error of the coefficients is reported as OPG Std. Err. The default standard errors and covariance matrix for arima estimates are derived from the outer product of gradients (OPG). This is one of three asymptotically equivalent methods of estimating the covariance matrix of the coefficients (only two of which are usually tractable to derive). Discussions and derivations of all three estimates can be found in Davidson and MacKinnon (1993), Greene (2012), and Hamilton (1994). Bollerslev, Engle, and Nelson (1994) suggest that the OPG estimates are more numerically stable in time-series regressions when the likelihood and its derivatives depend on recursive computations, which is certainly the case for the Kalman filter. To date, we have found no numerical instabilities in either estimate of the covariance matrix—subject to the stability and convergence of the overall model. 90 arima — ARIMA, ARMAX, and other dynamic regression models Most of Stata’s estimation commands provide covariance estimates derived from the Hessian of the likelihood function. These alternate estimates can also be obtained from arima by specifying the vce(oim) option. Multiplicative seasonal ARIMA models Many time series exhibit a periodic seasonal component, and a seasonal ARIMA model, often abbreviated SARIMA, can then be used. For example, monthly sales data for air conditioners have a strong seasonal component, with sales high in the summer months and low in the winter months. In the previous example, we accounted for quarterly effects by fitting the model (1 − ρ1 L){∆ ln(wpit ) − β0 } = (1 + θ1 L + θ4 L4 )t This is an additive seasonal ARIMA model, in the sense that the first- and fourth-order MA terms work additively: (1 + θ1 L + θ4 L4 ). Another way to handle the quarterly effect would be to fit a multiplicative seasonal ARIMA model. A multiplicative SARIMA model of order (1, 1, 1) × (0, 0, 1)4 for the ln(wpit ) series is (1 − ρ1 L){∆ ln(wpit ) − β0 } = (1 + θ1 L)(1 + θ4,1 L4 )t or, upon expanding terms, ∆ ln(wpit ) = β0 + ρ1 {∆ ln(wpit ) − β0 } + θ1 t−1 + θ4,1 t−4 + θ1 θ4,1 t−5 + t (2) In the notation (1, 1, 1) × (0, 0, 1)4 , the (1, 1, 1) means that there is one nonseasonal autoregressive term (1 − ρ1 L) and one nonseasonal moving-average term (1 + θ1 L) and that the time series is first-differenced one time. The (0, 0, 1)4 indicates that there is no lag-4 seasonal autoregressive term, that there is one lag-4 seasonal moving-average term (1 + θ4,1 L4 ), and that the series is seasonally differenced zero times. This is known as a multiplicative SARIMA model because the nonseasonal and seasonal factors work multiplicatively: (1 + θ1 L)(1 + θ4,1 L4 ). Multiplying the terms imposes nonlinear constraints on the parameters of the fifth-order lagged values; arima imposes these constraints automatically. To further clarify the notation, consider a (2, 1, 1) × (1, 1, 2)4 multiplicative SARIMA model: (1 − ρ1 L − ρ2 L2 )(1 − ρ4,1 L4 )∆∆4 zt = (1 + θ1 L)(1 + θ4,1 L4 + θ4,2 L8 )t (3) where ∆ denotes the difference operator ∆yt = yt − yt−1 and ∆s denotes the lag-s seasonal difference operator ∆s yt = yt − yt−s . Expanding (3), we have zet = ρ1 zet−1 + ρ2 zet−2 + ρ4,1 zet−4 − ρ1 ρ4,1 zet−5 − ρ2 ρ4,1 zet−6 + θ1 t−1 + θ4,1 t−4 + θ1 θ4,1 t−5 + θ4,2 t−8 + θ1 θ4,2 t−9 + t where zet = ∆∆4 zt = ∆(zt − zt−4 ) = zt − zt−1 − (zt−4 − zt−5 ) and zt = yt − xt β if regressors are included in the model, zt = yt − β0 if just a constant term is included, and zt = yt otherwise. arima — ARIMA, ARMAX, and other dynamic regression models 91 More generally, a (p, d, q) × (P, D, Q)s multiplicative SARIMA model is q Q ρ(Lp )ρs (LP )∆d ∆D s zt = θ(L )θs (L )t where ρs (LP ) = (1 − ρs,1 Ls − ρs,2 L2s − · · · − ρs,P LP s ) θs (LQ ) = (1 + θs,1 Ls + θs,2 L2s + · · · + θs,Q LQs ) ρ(Lp ) and θ(Lq ) were defined previously, ∆d means apply the ∆ operator d times, and similarly for ∆D s . Typically, d and D will be 0 or 1; and p, q , P , and Q will seldom be more than 2 or 3. s will typically be 4 for quarterly data and 12 for monthly data. In fact, the model can be extended to include both monthly and quarterly seasonal factors, as we explain below. If a plot of the data suggests that the seasonal effect is proportional to the mean of the series, then the seasonal effect is probably multiplicative and a multiplicative SARIMA model may be appropriate. Box, Jenkins, and Reinsel (2008, sec. 9.3.1) suggest starting with a multiplicative SARIMA model with any data that exhibit seasonal patterns and then exploring nonmultiplicative SARIMA models if the multiplicative models do not fit the data well. On the other hand, Chatfield (2004, 14) suggests that taking the logarithm of the series will make the seasonal effect additive, in which case an additive SARIMA model as fit in the previous example would be appropriate. In short, the analyst should probably try both additive and multiplicative SARIMA models to see which provides better fits and forecasts. Unless diffuse is used, arima must create square matrices of dimension {max(p, q + 1)}2 , where p and q are the maximum AR and MA lags, respectively; and the inclusion of long seasonal terms can make this dimension rather large. For example, with monthly data, you might fit a (0, 1, 1)×(0, 1, 2)12 2 SARIMA model. The maximum MA lag is 2 × 12 + 1 = 25, requiring a matrix with 26 = 676 rows and columns. Example 3: Multiplicative SARIMA model One of the most common multiplicative SARIMA specifications is the (0, 1, 1) × (0, 1, 1)12 “airline” model of Box, Jenkins, and Reinsel (2008, sec. 9.2). The dataset airline.dta contains monthly international airline passenger data from January 1949 through December 1960. After first- and seasonally differencing the data, we do not suspect the presence of a trend component, so we use the noconstant option with arima: 92 arima — ARIMA, ARMAX, and other dynamic regression models . use http://www.stata-press.com/data/r14/air2 (TIMESLAB: Airline passengers) . generate lnair = ln(air) . arima lnair, arima(0,1,1) sarima(0,1,1,12) noconstant (setting optimization to BHHH) Iteration 0: log likelihood = 223.8437 Iteration 1: log likelihood = 239.80405 (output omitted ) Iteration 8: log likelihood = 244.69651 ARIMA regression Sample: 14 - 144 Number of obs Wald chi2(2) Log likelihood = 244.6965 Prob > chi2 DS12.lnair Coef. OPG Std. Err. z P>|z| = = = 131 84.53 0.0000 [95% Conf. Interval] ARMA ma L1. -.4018324 .0730307 -5.50 0.000 -.5449698 -.2586949 ma L1. -.5569342 .0963129 -5.78 0.000 -.745704 -.3681644 /sigma .0367167 .0020132 18.24 0.000 .0327708 .0406625 ARMA12 Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. Thus our model of the monthly number of international airline passengers is ∆∆12 lnairt = −0.402t−1 − 0.557t−12 + 0.224t−13 + t σ b = 0.037 In (2), for example, the coefficient on t−13 is the product of the coefficients on the t−1 and t−12 terms (0.224 ≈ −0.402 × −0.557). arima labeled the dependent variable DS12.lnair to indicate that it has applied the difference operator ∆ and the lag-12 seasonal difference operator ∆12 to lnair; see [U] 11.4.4 Time-series varlists for more information. We could have fit this model by typing . arima DS12.lnair, ma(1) mma(1, 12) noconstant For simple multiplicative models, using the sarima() option is easier, though this second syntax allows us to incorporate more complicated seasonal terms. The mar() and mma() options can be repeated, allowing us to control for multiple seasonal patterns. For example, we may have monthly sales data that exhibit a quarterly pattern as businesses purchase our product at the beginning of calendar quarters when new funds are budgeted, and our product is purchased more frequently in a few months of the year than in most others, even after we control for quarterly fluctuations. Thus we might choose to fit the model (1−ρL)(1−ρ4,1 L4 )(1−ρ12,1 L12 )(∆∆4 ∆12 salest −β0 ) = (1+θL)(1+θ4,1 L4 )(1+θ12,1 L12 )t Although this model looks rather complicated, estimating it using arima is straightforward: . arima DS4S12.sales, ar(1) mar(1, 4) mar(1, 12) ma(1) mma(1, 4) mma(1, 12) arima — ARIMA, ARMAX, and other dynamic regression models 93 If we instead wanted to include two lags in the lag-4 seasonal AR term and the first and third (but not the second) term in the lag-12 seasonal MA term, we would type . arima DS4S12.sales, ar(1) mar(1 2, 4) mar(1, 12) ma(1) mma(1, 4) mma(1 3, 12) However, models with multiple seasonal terms can be difficult to fit. Usually, one seasonal factor with just one or two AR or MA terms is adequate. ARMAX models Thus far all our examples have been pure ARIMA models in which the dependent variable was modeled solely as a function of its past values and disturbances. Also, arima can fit ARMAX models, which model the dependent variable in terms of a linear combination of independent variables, as well as an ARMA disturbance process. The prais command (see [TS] prais), for example, allows you to control for only AR(1) disturbances, whereas arima allows you to control for a much richer dynamic error structure. arima allows for both nonseasonal and seasonal ARMA components in the disturbances. Example 4: ARMAX model For a simple example of a model including covariates, we can estimate an update of Friedman and Meiselman’s (1963) equation representing the quantity theory of money. They postulate a straightforward relationship between personal-consumption expenditures (consump) and the money supply as measured by M2 (m2). consumpt = β0 + β1 m2t + µt Friedman and Meiselman fit the model over a period ending in 1956; we will refit the model over the period 1959q1 through 1981q4. We restrict our attention to the period prior to 1982 because the Federal Reserve manipulated the money supply extensively in the later 1980s to control inflation, and the relationship between consumption and the money supply becomes much more complex during the later part of the decade. To demonstrate arima, we will include both an autoregressive term and a moving-average term for the disturbances in the model; the original estimates included neither. Thus we model the disturbance of the structural equation as µt = ρµt−1 + θt−1 + t As per the original authors, the relationship is estimated on seasonally adjusted data, so there is no need to include seasonal effects explicitly. Obtaining seasonally unadjusted data and simultaneously modeling the structural and seasonal effects might be preferable. We will restrict the estimation to the desired sample by using the tin() function in an if expression; see [FN] Selecting time-span functions. By leaving the first argument of tin() blank, we are including all available data through the second date (1981q4). We fit the model by typing 94 arima — ARIMA, ARMAX, and other dynamic regression models . use http://www.stata-press.com/data/r14/friedman2, clear . arima consump m2 if tin(, 1981q4), ar(1) ma(1) (setting optimization to BHHH) Iteration 0: log likelihood = -344.67575 Iteration 1: log likelihood = -341.57248 (output omitted ) Iteration 10: log likelihood = -340.50774 ARIMA regression Sample: 1959q1 - 1981q4 Number of obs Wald chi2(3) Log likelihood = -340.5077 Prob > chi2 consump Coef. OPG Std. Err. z P>|z| = = = 92 4394.80 0.0000 [95% Conf. Interval] consump m2 _cons 1.122029 -36.09872 .0363563 56.56703 30.86 -0.64 0.000 0.523 1.050772 -146.9681 1.193286 74.77062 ar L1. .9348486 .0411323 22.73 0.000 .8542308 1.015467 ma L1. .3090592 .0885883 3.49 0.000 .1354293 .4826891 /sigma 9.655308 .5635157 17.13 0.000 8.550837 10.75978 ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. We find a relatively small money velocity with respect to consumption (1.122) over this period, although consumption is only one facet of the income velocity. We also note a very large first-order autocorrelation in the disturbances, as well as a statistically significant first-order moving average. arima — ARIMA, ARMAX, and other dynamic regression models 95 We might be concerned that our specification has led to disturbances that are heteroskedastic or non-Gaussian. We refit the model by using the vce(robust) option. . arima consump m2 if tin(, 1981q4), ar(1) ma(1) vce(robust) (setting optimization to BHHH) Iteration 0: log pseudolikelihood = -344.67575 Iteration 1: log pseudolikelihood = -341.57248 (output omitted ) Iteration 10: log pseudolikelihood = -340.50774 ARIMA regression Sample: 1959q1 - 1981q4 Number of obs Wald chi2(3) Prob > chi2 Log pseudolikelihood = -340.5077 consump Coef. Semirobust Std. Err. z P>|z| = = = 92 1176.26 0.0000 [95% Conf. Interval] consump m2 _cons 1.122029 -36.09872 .0433302 28.10478 25.89 -1.28 0.000 0.199 1.037103 -91.18308 1.206954 18.98564 ar L1. .9348486 .0493428 18.95 0.000 .8381385 1.031559 ma L1. .3090592 .1605359 1.93 0.054 -.0055854 .6237038 /sigma 9.655308 1.082639 8.92 0.000 7.533375 11.77724 ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. We note a substantial increase in the estimated standard errors, and our once clearly significant moving-average term is now only marginally significant. Dynamic forecasting Another feature of the arima command is the ability to use predict afterward to make dynamic forecasts. Suppose that we wish to fit the regression model yt = β0 + β1 xt + ρyt−1 + t by using a sample of data from t = 1 . . . T and make forecasts beginning at time f . If we use regress or prais to fit the model, then we can use predict to make one-step-ahead forecasts. That is, predict will compute c0 + β c1 xf + ρbyf −1 ybf = β Most importantly, here predict will use the actual value of y at period f − 1 in computing the forecast for time f . Thus, if we use regress or prais, we cannot make forecasts for any periods beyond f = T + 1 unless we have observed values for y for those periods. 96 arima — ARIMA, ARMAX, and other dynamic regression models If we instead fit our model with arima, then predict can produce dynamic forecasts by using the Kalman filter. If we use the dynamic(f ) option, then for period f predict will compute c0 + β c1 xf + ρbyf −1 ybf = β by using the observed value of yf −1 just as predict after regress or prais. However, for period f + 1 predict newvar, dynamic(f ) will compute c0 + β c1 xf +1 + ρbybf ybf +1 = β using the predicted value of yf instead of the observed value. Similarly, the period f + 2 forecast will be c0 + β c1 xf +2 + ρbybf +1 ybf +2 = β Of course, because our model includes the regressor xt , we can make forecasts only through periods for which we have observations on xt . However, for pure ARIMA models, we can compute dynamic forecasts as far beyond the final period of our dataset as desired. For more information on predict after arima, see [TS] arima postestimation. Video example Time series, part 5: Introduction to ARMA/ARIMA models Stored results arima stores the following in e(): Scalars e(N) e(N gaps) e(k) e(k eq) e(k eq model) e(k dv) e(k1) e(df m) e(ll) e(sigma) e(chi2) e(p) e(tmin) e(tmax) e(ar max) e(ma max) e(rank) e(ic) e(rc) e(converged) number of observations number of gaps number of parameters number of equations in e(b) number of equations in overall model test number of dependent variables number of variables in first equation model degrees of freedom log likelihood sigma χ2 significance minimum time maximum time maximum AR lag maximum MA lag rank of e(V) number of iterations return code 1 if converged, 0 otherwise arima — ARIMA, ARMAX, and other dynamic regression models Macros e(cmd) e(cmdline) e(depvar) e(covariates) e(eqnames) e(wtype) e(wexp) e(title) e(tmins) e(tmaxs) e(chi2type) e(vce) e(vcetype) e(ma) e(ar) e(mari) e(mmai) e(seasons) e(unsta) e(opt) e(ml method) e(user) e(technique) e(tech steps) e(properties) e(estat cmd) e(predict) e(marginsok) e(marginsnotok) Matrices e(b) e(Cns) e(ilog) e(gradient) e(V) e(V modelbased) Functions e(sample) 97 arima command as typed name of dependent variable list of covariates names of equations weight type weight expression title in estimation output formatted minimum time formatted maximum time Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. lags for moving-average terms lags for autoregressive terms multiplicative AR terms and lag i=1... (# seasonal AR terms) multiplicative MA terms and lag i=1... (# seasonal MA terms) seasonal lags in model unstationary or blank type of optimization type of ml method name of likelihood-evaluator program maximization technique number of iterations performed before switching techniques b V program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins coefficient vector constraints matrix iteration log (up to 20 iterations) gradient vector variance–covariance matrix of the estimators model-based variance marks estimation sample Methods and formulas Estimation is by maximum likelihood using the Kalman filter via the prediction error decomposition; see Hamilton (1994), Gourieroux and Monfort (1997), or, in particular, Harvey (1989). Any of these sources will serve as excellent background for the fitting of these models with the state-space form; each source also provides considerable detail on the method outlined below. Methods and formulas are presented under the following headings: ARIMA model Kalman filter equations Kalman filter or state-space representation of the ARIMA model Kalman filter recursions Kalman filter initial conditions Likelihood from prediction error decomposition Missing data 98 arima — ARIMA, ARMAX, and other dynamic regression models ARIMA model The model to be fit is yt = xt β + µt p q X X µt = ρi µt−i + θj t−j + t i=1 j=1 which can be written as the single equation yt = x t β + p X ρi (yt−i − xt−i β) + i=1 q X θj t−j + t j=1 Some of the ρs and θs may be constrained to zero or, for multiplicative seasonal models, the products of other parameters. Kalman filter equations We will roughly follow Hamilton’s (1994) notation and write the Kalman filter ξt = Fξt−1 + vt 0 (state equation) 0 y t = A x t + H ξ t + wt and  vt wt   ∼N  0, Q 0 0 R (observation equation)  We maintain the standard Kalman filter matrix and vector notation, although for univariate models yt , wt , and R are scalars. Kalman filter or state-space representation of the ARIMA model A univariate ARIMA model can be cast in state-space form by defining the Kalman filter matrices as follows (see Hamilton [1994], or Gourieroux and Monfort [1997], for details): arima — ARIMA, ARMAX, and other dynamic regression models ρ1 ρ2 1 0 F= 0 1 0 0   t−1  0     ...  vt =    ...    ... 0 . . . ρp−1 ... 0 ... 0 ... 1  99  ρp 0   0 0 A0 = β H0 = [ 1 θ1 θ2 . . . θq ] wt = 0 The Kalman filter representation does not require the moving-average terms to be invertible. Kalman filter recursions To demonstrate how missing data are handled, the updating recursions for the Kalman filter will be left in two steps. Writing the updating equations as one step using the gain matrix K is common. We will provide the updating equations with little justification; see the sources listed above for details. As a linear combination of a vector of random variables, the state ξt can be updated to its expected value on the basis of the prior state as ξt|t−1 = Fξt−1 + vt−1 (4) This state is a quadratic form that has the covariance matrix Pt|t−1 = FPt−1 F0 + Q The estimator of yt is (5) bt|t−1 = xt β + H0 ξt|t−1 y which implies an innovation or prediction error bt|t−1 bιt = yt − y This value or vector has mean squared error (MSE) Mt = H0 Pt|t−1 H + R Now the expected value of ξt conditional on a realization of yt is with MSE ξt = ξt|t−1 + Pt|t−1 HM−1 ιt t b (6) 0 Pt = Pt|t−1 − Pt|t−1 HM−1 t H Pt|t−1 (7) This expression gives the full set of Kalman filter recursions. 100 arima — ARIMA, ARMAX, and other dynamic regression models Kalman filter initial conditions When the series is stationary, conditional on xt β, the initial conditions for the filter can be considered a random draw from the stationary distribution of the state equation. The initial values of the state and the state MSE are the expected values from this stationary distribution. For an ARIMA model, these can be written as ξ1|0 = 0 and vec(P1|0 ) = (Ir2 − F ⊗ F)−1 vec(Q) where vec() is an operator representing the column matrix resulting from stacking each successive column of the target matrix. If the series is not stationary, the initial state conditions do not constitute a random draw from a stationary distribution, and some other values must be chosen. Hamilton (1994) suggests that they be chosen based on prior expectations, whereas Harvey suggests a diffuse and improper prior having a state vector of 0 and an infinite variance. This method corresponds to P1|0 with diagonal elements of ∞. Stata allows either approach to be taken for nonstationary series—initial priors may be specified with state0() and p0(), and a diffuse prior may be specified with diffuse. Likelihood from prediction error decomposition Given the outputs from the Kalman filter recursions and assuming that the state and observation vectors are Gaussian, the likelihood for the state-space model follows directly from the resulting multivariate normal in the predicted innovations. The log likelihood for observation t is lnLt = − 1 ιt ln(2π) + ln(|Mt |) − bι0t M−1 t b 2 This command supports the Huber/White/sandwich estimator of the variance using vce(robust). See [P] robust, particularly Maximum likelihood estimators and Methods and formulas. Missing data Missing data, whether a missing dependent variable yt , one or more missing covariates xt , or completely missing observations, are handled by continuing the state-updating equations without any contribution from the data; see Harvey (1989 and 1993). That is, (4) and (5) are iterated for every missing observation, whereas (6) and (7) are ignored. Thus, for observations with missing data, ξt = ξt|t−1 and Pt = Pt|t−1 . Without any information from the sample, this effectively assumes that the prediction error for the missing observations is 0. Other methods of handling missing data on the basis of the EM algorithm have been suggested, for example, Shumway (1984, 1988). arima — ARIMA, ARMAX, and other dynamic regression models 101  George Edward Pelham Box (1919–2013) was born in Kent, England, and earned degrees in statistics at the University of London. After work in the chemical industry, he taught and researched at Princeton and the University of Wisconsin. His many major contributions to statistics include papers and books in Bayesian inference, robustness (a term he introduced to statistics), modeling strategy, experimental design and response surfaces, time-series analysis, distribution theory, transformations, and nonlinear estimation.  Gwilym Meirion Jenkins (1933–1982) was a British mathematician and statistician who spent his career in industry and academia, working for extended periods at Imperial College London and the University of Lancaster before running his own company. His interests were centered on time series and he collaborated with G. E. P. Box on what are often called Box–Jenkins models. The last years of Jenkins’s life were marked by a slowly losing battle against Hodgkin’s disease.  References Ansley, C. F., and R. J. Kohn. 1985. Estimation, filtering, and smoothing in state space models with incompletely specified initial conditions. Annals of Statistics 13: 1286–1316. Ansley, C. F., and P. Newbold. 1980. Finite sample properties of estimators for autoregressive moving average models. Journal of Econometrics 13: 159–183. Baum, C. F. 2000. sts15: Tests for stationarity of a time series. Stata Technical Bulletin 57: 36–39. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 356–360. College Station, TX: Stata Press. Baum, C. F., and T. Room. 2001. sts18: A test for long-range dependence in a time series. Stata Technical Bulletin 60: 37–39. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 370–373. College Station, TX: Stata Press. Baum, C. F., and R. I. Sperling. 2000. sts15.1: Tests for stationarity of a time series: Update. Stata Technical Bulletin 58: 35–36. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 360–362. College Station, TX: Stata Press. Baum, C. F., and V. L. Wiggins. 2000. sts16: Tests for long memory in a time series. Stata Technical Bulletin 57: 39–44. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 362–368. College Station, TX: Stata Press. Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Berndt, E. K., B. H. Hall, R. E. Hall, and J. A. Hausman. 1974. Estimation and inference in nonlinear structural models. Annals of Economic and Social Measurement 3/4: 653–665. Bollerslev, T., R. F. Engle, and D. B. Nelson. 1994. ARCH models. In Vol. 4 of Handbook of Econometrics, ed. R. F. Engle and D. L. McFadden. Amsterdam: Elsevier. Box, G. E. P. 1983. Obituary: G. M. Jenkins, 1933–1982. Journal of the Royal Statistical Society, Series A 146: 205–206. Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: Wiley. Box-Steffensmeier, J. M., J. R. Freeman, M. P. Hitt, and J. C. W. Pevehouse. 2014. Time Series Analysis for the Social Science. New York: Cambridge University Press. Chatfield, C. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC. David, J. S. 1999. sts14: Bivariate Granger causality test. Stata Technical Bulletin 51: 40–41. Reprinted in Stata Technical Bulletin Reprints, vol. 9, pp. 350–351. College Station, TX: Stata Press. Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press. DeGroot, M. H. 1987. A conversation with George Box. Statistical Science 2: 239–258. Diggle, P. J. 1990. Time Series: A Biostatistical Introduction. Oxford: Oxford University Press. Enders, W. 2004. Applied Econometric Time Series. 2nd ed. New York: Wiley. Friedman, M., and D. Meiselman. 1963. The relative stability of monetary velocity and the investment multiplier in the United States, 1897–1958. In Stabilization Policies, Commission on Money and Credit, 123–126. Englewood Cliffs, NJ: Prentice Hall. 102 arima — ARIMA, ARMAX, and other dynamic regression models Gourieroux, C. S., and A. Monfort. 1997. Time Series and Dynamic Models. Trans. ed. G. M. Gallo. Cambridge: Cambridge University Press. Greene, W. H. 2012. Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. . 1993. Time Series Models. 2nd ed. Cambridge, MA: MIT Press. Hipel, K. W., and A. I. McLeod. 1994. Time Series Modelling of Water Resources and Environmental Systems. Amsterdam: Elsevier. Holan, S. H., R. Lund, and G. Davis. 2010. The ARMA alphabet soup: A tour of ARMA model variants. Statistics Surveys 4: 232–274. Kalman, R. E. 1960. A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, Series D 82: 35–45. McDowell, A. W. 2002. From the help desk: Transfer functions. Stata Journal 2: 71–85. . 2004. From the help desk: Polynomial distributed lag models. Stata Journal 4: 180–189. Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth. Pickup, M. 2015. Introduction to Time Series Analysis. Thousand Oaks, CA: Sage. Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. 2007. Numerical Recipes: The Art of Scientific Computing. 3rd ed. New York: Cambridge University Press. Sánchez, G. 2012. Comparing predictions after arima with manual computations. The Stata Blog: Not Elsewhere Classified. http://blog.stata.com/2012/02/16/comparing-predictions-after-arima-with-manual-computations/. Shumway, R. H. 1984. Some applications of the EM algorithm to analyzing incomplete time series data. In Time Series Analysis of Irregularly Observed Data, ed. E. Parzen, 290–324. New York: Springer. . 1988. Applied Statistical Time Series Analysis. Upper Saddle River, NJ: Prentice Hall. Wang, Q., and N. Wu. 2012. Menu-driven X-12-ARIMA seasonal adjustment in Stata. Stata Journal 12: 214–241. Also see [TS] arima postestimation — Postestimation tools for arima [TS] tsset — Declare data to be time-series data [TS] arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators [TS] dfactor — Dynamic-factor models [TS] forecast — Econometric model forecasting [TS] mgarch — Multivariate GARCH models [TS] mswitch — Markov-switching regression models [TS] prais — Prais – Winsten and Cochrane – Orcutt regression [TS] sspace — State-space models [TS] ucm — Unobserved-components model [R] regress — Linear regression [U] 20 Estimation and postestimation commands Title arima postestimation — Postestimation tools for arima Postestimation commands Reference predict Also see margins Remarks and examples Postestimation commands The following postestimation commands are of special interest after arima: Command Description estat acplot estat aroots irf psdensity estimate autocorrelations and autocovariances check stability condition of estimates create and analyze IRFs estimate the spectral density The following standard postestimation commands are also available: Command Description estat ic estat summarize estat vce estimates forecast lincom lrtest margins marginsplot nlcom predict predictnl test testnl Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses 103 104 arima postestimation — Postestimation tools for arima predict Description for predict predict creates a new variable containing predictions such as expected values and mean squared errors. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic newvar  if   in   , statistic options  Description Main xb stdp y mse residuals yresiduals predicted values for mean equation—the differenced series; the default standard error of the linear prediction predicted values for the mean equation in y —the undifferenced series mean squared error of the predicted values residuals or predicted innovations residuals or predicted innovations in y , reversing any time-series operators These statistics are available both in and out of sample; type predict . . . if e(sample) the estimation sample. Predictions are not available for conditional ARIMA models fit to panel data. options . . . if wanted only for Description Options dynamic(time constant) t0(time constant) structural how to handle the lags of yt set starting point for the recursions to time constant calculate considering the structural component only time constant is a # or a time literal, such as td(1jan1995) or tq(1995q1); see Conveniently typing SIF values in [D] datetime. Options for predict Six statistics can be computed using predict after arima: the predictions from the model (the default also given by xb), the standard error of the linear prediction (stdp), the predictions after reversing any time-series operators applied to the dependent variable (y), the MSE of xb (mse), the predictions of residuals or innovations (residual), and the predicted residuals or innovations in terms of y (yresiduals). Given the dynamic nature of the ARMA component and because the dependent variable might be differenced, there are other ways of computing each. We can use all the data on the dependent variable that is available right up to the time of each prediction (the default, which is arima postestimation — Postestimation tools for arima 105 often called a one-step prediction), or we can use the data up to a particular time, after which the predicted value of the dependent variable is used recursively to make later predictions (dynamic()). Either way, we can consider or ignore the ARMA disturbance component (the component is considered by default and is ignored if you specify structural). All calculations can be made in or out of sample.   Main xb, the default, calculates the predictions from the model. If D.depvar is the dependent variable, these predictions are of D.depvar and not of depvar itself. stdp calculates the standard error of the linear prediction xb. stdp does not include the variation arising from the disturbance equation; use mse to calculate standard errors and confidence bands around the predicted values. y specifies that predictions of depvar be made, even if the model was specified in terms of, say, D.depvar. mse calculates the MSE of the predictions. residuals calculates the residuals. If no other options are specified, these are the predicted innovations t ; that is, they include the ARMA component. If structural is specified, these are the residuals µt from the structural equation; see structural below. yresiduals calculates the residuals in terms of depvar, even if the model was specified in terms of, say, D.depvar. As with residuals, the yresiduals are computed from the model, including any ARMA component. If structural is specified, any ARMA component is ignored, and yresiduals are the residuals from the structural equation; see structural below.   Options dynamic(time constant) specifies how lags of yt in the model are to be handled. If dynamic() is not specified, actual values are used everywhere that lagged values of yt appear in the model to produce one-step-ahead forecasts. dynamic(time constant) produces dynamic (also known as recursive) forecasts. time constant specifies when the forecast is to switch from one step ahead to dynamic. In dynamic forecasts, references to yt evaluate to the prediction of yt for all periods at or after time constant; they evaluate to the actual value of yt for all prior periods. For example, dynamic(10) would calculate predictions in which any reference to yt with t < 10 evaluates to the actual value of yt and any reference to yt with t ≥ 10 evaluates to the prediction of yt . This means that one-step-ahead predictions are calculated for t < 10 and dynamic predictions thereafter. Depending on the lag structure of the model, the dynamic predictions might still refer some actual values of yt . You may also specify dynamic(.) to have predict automatically switch from one-step-ahead to dynamic predictions at p + q , where p is the maximum AR lag and q is the maximum MA lag. t0(time constant) specifies the starting point for the recursions to compute the predicted statistics; disturbances are assumed to be 0 for t < t0(). The default is to set t0() to the minimum t observed in the estimation sample, meaning that observations before that are assumed to have disturbances of 0. t0() is irrelevant if structural is specified because then all observations are assumed to have disturbances of 0. t0(5) would begin recursions at t = 5. If the data were quarterly, you might instead type t0(tq(1961q2)) to obtain the same result. 106 arima postestimation — Postestimation tools for arima The ARMA component of ARIMA models is recursive and depends on the starting point of the predictions. This includes one-step-ahead predictions. structural specifies that the calculation be made considering the structural component only, ignoring the ARMA terms, producing the steady-state equilibrium predictions. margins Description for margins margins estimates margins of response for expected values. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . )  predict(statistic . . . ) . . .   options  statistic Description xb y stdp mse residuals yresiduals predicted values for mean equation—the differenced series; the default predicted values for the mean equation in y —the undifferenced series not allowed with margins not allowed with margins not allowed with margins not allowed with margins Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins. Remarks and examples Remarks are presented under the following headings: Forecasting after ARIMA IRF results for ARIMA Forecasting after ARIMA We assume that you have already read [TS] arima. In this section, we illustrate some of the features of predict after fitting ARIMA, ARMAX, and other dynamic models by using arima. In example 2 of [TS] arima, we fit the model ∆ ln(wpit ) = β0 + ρ1 {∆ ln(wpit−1 ) − β0 } + θ1 t−1 + θ4 t−4 + t arima postestimation — Postestimation tools for arima 107 by typing . use http://www.stata-press.com/data/r14/wpi1 . arima D.ln_wpi, ar(1) ma(1 4) (output omitted ) If we use the command . predict xb, xb then Stata computes xbt as xbt = βb0 + ρb1 {∆ ln(wpit−1 ) − βb0 } + θb1 b t−1 + θb4 b t−4 where b t−j = n ∆ ln(wpit−j ) − xbt−j 0 t−j >0 otherwise meaning that predict newvar, xb calculates predictions by using the metric of the dependent variable. In this example, the dependent variable represented changes in ln(wpit ), and so the predictions are likewise for changes in that variable. If we instead use . predict y, y Stata computes yt as yt = xbt + ln(wpit−1 ) so that yt represents the predicted levels of ln(wpit ). In general, predict newvar, y will reverse any time-series operators applied to the dependent variable during estimation. If we want to ignore the ARMA error components when making predictions, we use the structural option, . predict xbs, xb structural which generates xbst = βb0 because there are no regressors in this model, and . predict ys, y structural generates yst = βb0 + ln(wpit−1 ) Example 1: Dynamic forecasts An attractive feature of the arima command is the ability to make dynamic forecasts. In example 4 of [TS] arima, we fit the model consumpt = β0 + β1 m2t + µt µt = ρµt−1 + θt−1 + t First, we refit the model by using data up through the first quarter of 1978, and then we will evaluate the one-step-ahead and dynamic forecasts. . use http://www.stata-press.com/data/r14/friedman2 . keep if time<=tq(1981q4) (67 observations deleted) . arima consump m2 if tin(, 1978q1), ar(1) ma(1) (output omitted ) 108 arima postestimation — Postestimation tools for arima To make one-step-ahead forecasts, we type . predict chat, y (52 missing values generated) (Because our dependent variable contained no time-series operators, we could have instead used predict chat, xb and accomplished the same thing.) We will also make dynamic forecasts, switching from observed values of consump to forecasted values at the first quarter of 1978: . predict chatdy, dynamic(tq(1978q1)) y (52 missing values generated) The following graph compares the forecasted values to the observed values for the first few years following the estimation sample: 1200 Billions of dollars 1400 1600 1800 2000 Personal consumption 1977q1 1978q1 1979q1 1980q1 Quarter Observed Dynamic forecast (1978q1) 1981q1 1982q1 One−step−ahead forecast The one-step-ahead forecasts never deviate far from the observed values, though over time the dynamic forecasts have larger errors. To understand why that is the case, rewrite the model as consumpt = β0 + β1 m2t + ρµt−1 + θt−1 + t  = β0 + β1 m2t + ρ consumpt−1 − β0 − β1 m2t−1 + θt−1 + t This form shows that the forecasted value of consumption at time t depends on the value of consumption at time t − 1. When making the one-step-ahead forecast for period t, we know the actual value of consumption at time t − 1. On the other hand, with the dynamic(tq(1978q1)) option, the forecasted value of consumption for period 1978q1 is based on the observed value of consumption in period 1977q4, but the forecast for 1978q2 is based on the forecast value for 1978q1, the forecast for 1978q3 is based on the forecast value for 1978q2, and so on. Thus, with dynamic forecasts, prior forecast errors accumulate over time. The following graph illustrates this effect. arima postestimation — Postestimation tools for arima 109 −200 Forecast − Actual −150 −100 −50 0 Forecast error 1978q1 1979q1 1980q1 Quarter One−step−ahead forecast 1981q1 1982q1 Dynamic forecast (1978q1) IRF results for ARIMA We assume that you have already read [TS] irf and [TS] irf create. In this section, we illustrate how to calculate the impulse–response function (IRF) of an ARIMA model. Example 2 Consider a model of the quarterly U.S. money supply, as measured by M1, from Enders (2004). Enders (2004, 93–97) discusses why seasonal shopping patterns cause seasonal effects in M1. The variable lnm1 contains data on the natural log of the money supply. We fit seasonal and nonseasonal ARIMA models and compare the IRFs calculated from both models. We fit the following nonseasonal ARIMA model ∆∆4 lnm1t = ρ1 (∆∆4 lnm1t−1 ) + ρ4 (∆∆4 lnm1t−4 ) + t The code below fits the above model and saves a set of IRF results to a file called myirf.irf. 110 arima postestimation — Postestimation tools for arima . use http://www.stata-press.com/data/r14/m1nsa, clear (U.S. money supply (M1) from Enders (2004), 95-99.) . arima DS4.lnm1, ar(1 4) noconstant nolog ARIMA regression Sample: 1961q2 - 2008q2 Number of obs Wald chi2(2) Log likelihood = 579.3036 Prob > chi2 DS4.lnm1 Coef. OPG Std. Err. z P>|z| = = = 189 78.34 0.0000 [95% Conf. Interval] ARMA ar L1. L4. .3551862 -.3275808 .0503011 .0594953 7.06 -5.51 0.000 0.000 .2565979 -.4441895 .4537745 -.210972 /sigma .0112678 .0004882 23.08 0.000 .0103109 .0122246 Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. . irf create nonseasonal, set(myirf) step(30) (file myirf.irf created) (file myirf.irf now active) (file myirf.irf updated) We fit the following seasonal ARIMA model (1 − ρ1 L)(1 − ρ4,1 L4 )∆∆4 lnm1t = t The code below fits this nonseasonal ARIMA model and saves a set of IRF results to the active IRF file, which is myirf.irf. . arima DS4.lnm1, ar(1) mar(1,4) noconstant nolog ARIMA regression Sample: 1961q2 - 2008q2 Number of obs Wald chi2(2) Log likelihood = 588.6689 Prob > chi2 DS4.lnm1 Coef. = = = 189 119.78 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] ARMA ar L1. .489277 .0538033 9.09 0.000 .3838245 .5947296 ar L1. -.4688653 .0601248 -7.80 0.000 -.5867076 -.3510229 /sigma .0107075 .0004747 22.56 0.000 .0097771 .0116379 ARMA4 Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. . irf create seasonal, step(30) (file myirf.irf updated) We now have two sets of IRF results in the file myirf.irf. We can graph both IRF functions side by side by calling irf graph. arima postestimation — Postestimation tools for arima 111 . irf graph irf nonseasonal, DS4.lnm1, DS4.lnm1 seasonal, DS4.lnm1, DS4.lnm1 1 .5 0 −.5 0 10 20 30 0 10 20 30 step 95% CI impulse−response function (irf) Graphs by irfname, impulse variable, and response variable The trajectories of the IRF functions are similar: each figure shows that a shock to lnm1 causes a temporary oscillation in lnm1 that dies out after about 15 time periods. This behavior is characteristic of short-memory processes. See [TS] psdensity for an introduction to estimating spectral densities using the parameters estimated by arima. Reference Enders, W. 2004. Applied Econometric Time Series. 2nd ed. New York: Wiley. Also see [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] estat acplot — Plot parametric autocorrelation and autocovariance functions [TS] estat aroots — Check the stability condition of ARIMA estimates [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] psdensity — Parametric spectral density estimation after arima, arfima, and ucm [U] 20 Estimation and postestimation commands Title corrgram — Tabulate and graph autocorrelations Description Syntax Remarks and examples Acknowledgment Quick start Options for corrgram Stored results References Menu Options for ac and pac Methods and formulas Also see Description corrgram produces a table of the autocorrelations, partial autocorrelations, and portmanteau (Q) statistics. It also displays a character-based plot of the autocorrelations and partial autocorrelations. See [TS] wntestq for more information on the Q statistic. ac produces a correlogram (a graph of autocorrelations) with pointwise confidence intervals that is based on Bartlett’s formula for MA(q) processes. pac produces a partial correlogram (a√graph of partial autocorrelations) with confidence intervals calculated using a standard error of 1/ n. The residual variances for each lag may optionally be included on the graph. Quick start Produce correlogram for y using tsset data corrgram y As above, but limit the number of computed autocorrelations to 10 corrgram y, lags(10) Plot the autocorrelation function for y ac y As above, and generate newv to hold the autocorrelations ac y, generate(newv) Plot partial autocorrelation function for y and include standardized residual variances in the graph pac y, srv Menu corrgram Statistics > Time series > Graphs > Autocorrelations & partial autocorrelations > Time series > Graphs > Correlogram (ac) > Time series > Graphs > Partial correlogram (pac) ac Statistics pac Statistics 112 corrgram — Tabulate and graph autocorrelations Syntax Autocorrelations, partial autocorrelations, and portmanteau (Q) statistics       corrgram varname if in , corrgram options Graph autocorrelations with confidence intervals       ac varname if in , ac options Graph partial autocorrelations with confidence intervals       pac varname if in , pac options corrgram options Description Main lags(#) noplot yw calculate # autocorrelations suppress character-based plots calculate partial autocorrelations by using Yule–Walker equations ac options Description Main lags(#) generate(newvar) level(#) fft calculate # autocorrelations generate a variable to hold the autocorrelations set confidence level; default is level(95) calculate autocorrelation by using Fourier transforms Plot line options marker options marker label options change look of dropped lines change look of markers (color, size, etc.) add marker labels; change look or position CI plot ciopts(area options) affect rendition of the confidence bands Add plots addplot(plot) add other plots to the generated graph Y axis, X axis, Titles, Legend, Overall twoway options any options other than by() documented in [G-3] twoway options 113 114 corrgram — Tabulate and graph autocorrelations pac options Description Main lags(#) generate(newvar) yw level(#) calculate # partial autocorrelations generate a variable to hold the partial autocorrelations calculate partial autocorrelations by using Yule–Walker equations set confidence level; default is level(95) Plot line options marker options marker label options change look of dropped lines change look of markers (color, size, etc.) add marker labels; change look or position CI plot ciopts(area options) affect rendition of the confidence bands SRV plot srv srvopts(marker options) include standardized residual variances in graph affect rendition of the plotted standardized residual variances (SRVs) Add plots addplot(plot) add other plots to the generated graph Y axis, X axis, Titles, Legend, Overall twoway options any options other than by() documented in [G-3] twoway options You must tsset your data before using corrgram, ac, or pac; see [TS] tsset. Also, the time series must be dense (nonmissing and no gaps in the time variable) in the sample if you specify the fft option. varname may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options for corrgram   Main lags(#) specifies the number of autocorrelations to calculate. The default is to use min(bn/2c− 2, 40), where bn/2c is the greatest integer less than or equal to n/2. noplot prevents the character-based plots from being in the listed table of autocorrelations and partial autocorrelations. yw specifies that the partial autocorrelations be calculated using the Yule–Walker equations instead of using the default regression-based technique. yw cannot be used if srv is used. Options for ac and pac   Main lags(#) specifies the number of autocorrelations to calculate. The default is to use min(bn/2c− 2, 40), where bn/2c is the greatest integer less than or equal to n/2. generate(newvar) specifies a new variable to contain the autocorrelation (ac command) or partial autocorrelation (pac command) values. This option is required if the nograph option is used. corrgram — Tabulate and graph autocorrelations 115 nograph (implied when using generate() in the dialog box) prevents ac and pac from constructing a graph. This option requires the generate() option. yw (pac only) specifies that the partial autocorrelations be calculated using the Yule–Walker equations instead of using the default regression-based technique. yw cannot be used if srv is used. level(#) specifies the confidence level, as a percentage, for the confidence bands in the ac or pac graph. The default is level(95) or as set by set level; see [R] level. fft (ac only) specifies that the autocorrelations be calculated using two Fourier transforms. This technique can be faster than simply iterating over the requested number of lags.   Plot line options, marker options, and marker label options affect the rendition of the plotted autocorrelations (with ac) or partial autocorrelations (with pac). line options specify the look of the dropped lines, including pattern, width, and color; see [G-3] line options. marker options specify the look of markers. This look includes the marker symbol, the marker size, and its color and outline; see [G-3] marker options. marker label options specify if and how the markers are to be labeled; see [G-3] marker label options.   CI plot ciopts(area options) affects the rendition of the confidence bands; see [G-3] area options.   SRV plot srv (pac only) specifies that the standardized residual variances be plotted with the partial autocorrelations. srv cannot be used if yw is used. srvopts(marker options) (pac only) affects the rendition of the plotted standardized residual variances; see [G-3] marker options. This option implies the srv option.   Add plots addplot(plot) adds specified plots to the generated graph; see [G-3] addplot option.   Y axis, X axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, excluding by(). These include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see [G-3] saving option). Remarks and examples Remarks are presented under the following headings: Basic examples Video example 116 corrgram — Tabulate and graph autocorrelations Basic examples corrgram tabulates autocorrelations, partial autocorrelations, and portmanteau (Q) statistics and plots the autocorrelations and partial autocorrelations. The Q statistics are the same as those produced by [TS] wntestq. ac produces graphs of the autocorrelations, and pac produces graphs of the partial autocorrelations. See Becketti (2013) for additional examples of how these commands are used in practice. Example 1 Here we use the international airline passengers dataset (Box, Jenkins, and Reinsel 2008, Series G). This dataset has 144 observations on the monthly number of international airline passengers from 1949 through 1960. We can list the autocorrelations and partial autocorrelations by using corrgram. . use http://www.stata-press.com/data/r14/air2 (TIMESLAB: Airline passengers) . corrgram air, lags(20) -1 0 1 -1 0 1 LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.9480 0.8756 0.8067 0.7526 0.7138 0.6817 0.6629 0.6556 0.6709 0.7027 0.7432 0.7604 0.7127 0.6463 0.5859 0.5380 0.4997 0.4687 0.4499 0.4416 0.9589 -0.3298 0.2018 0.1450 0.2585 -0.0269 0.2043 0.1561 0.5686 0.2926 0.8402 0.6127 -0.6660 -0.3846 0.0787 -0.0266 -0.0581 -0.0435 0.2773 -0.0405 132.14 245.65 342.67 427.74 504.8 575.6 643.04 709.48 779.59 857.07 944.39 1036.5 1118 1185.6 1241.5 1289 1330.4 1367 1401.1 1434.1 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 corrgram — Tabulate and graph autocorrelations 117 We can use ac to produce a graph of the autocorrelations. −1.00 Autocorrelations of air −0.50 0.00 0.50 1.00 . ac air, lags(20) 0 5 10 Lag 15 20 Bartlett’s formula for MA(q) 95% confidence bands The data probably have a trend component as well as a seasonal component. First-differencing will mitigate the effects of the trend, and seasonal differencing will help control for seasonality. To accomplish this goal, we can use Stata’s time-series operators. Here we graph the partial autocorrelations after controlling for trends and seasonality. We also use srv to include the standardized residual variances. Partial autocorrelations of DS12.air −0.50 0.00 0.50 1.00 . pac DS12.air, lags(20) srv 0 5 10 Lag 15 20 95% CI Partial autocorrelations of DS12.air Standardized variances 95% Confidence bands [se = 1/sqrt(n)] See [U] 11.4.4 Time-series varlists for more information about time-series operators. 118 corrgram — Tabulate and graph autocorrelations Video example Time series, part 4: Correlograms and partial correlograms Stored results corrgram stores the following in r(): Scalars r(lags) r(ac#) r(pac#) r(q#) number of lags AC for lag # PAC for lag # Q for lag # Matrices r(AC) r(PAC) r(Q) vector of autocorrelations vector of partial autocorrelations vector of Q statistics Methods and formulas Box, Jenkins, and Reinsel (2008, sec. 2.1.4); Newton (1988); Chatfield (2004); and Hamilton (1994) provide excellent descriptions of correlograms. Newton (1988) also discusses the calculation of the various quantities. The autocovariance function for a time series x1 , x2 , . . . , xn is defined for |v| < n as n−|v| 1 X b (xi − x)(xi+v − x) R(v) = n i=1 where x is the sample mean, and the autocorrelation function is then defined as ρbv = b R(v) b R(0) The variance of ρbv is given by Bartlett’s formula for MA(q) processes. From Brockwell and Davis (2002, 94), we have   1/n   v=1 v−1 P 2 Var(b ρv ) = 1 ρb (i) v>1 n 1+2 i=1 The partial autocorrelation at lag v measures the correlation between xt and xt+v after the effects of xt+1 , . . . , xt+v−1 have been removed. By default, corrgram and pac use a regression-based method to estimate it. We run an OLS regression of xt on xt−1 , . . . , xt−v and a constant term. The estimated coefficient on xt−v is our estimate of the v th partial autocorrelation. The residual variance b . is the estimated variance of that regression, which we then standardize by dividing by R(0) If the yw option is specified, corrgram and pac use the Yule–Walker equations to estimate the partial autocorrelations. Per Enders (2010, 66–67), let φvv denote the v th partial autocorrelation coefficient. We then have φb11 = ρb1 corrgram — Tabulate and graph autocorrelations and for v > 1 ρbv − φbvv = v−1 P j=1 1− 119 φbv−1,j ρbv−j v−1 P j=1 φbv−1,j ρbj and φbvj = φbv−1,j − φbvv φbv−1,v−j j = 1, 2, . . . , v − 1 Unlike the regression-based method, the Yule–Walker equations-based method ensures that the firstsample partial autocorrelation equal the first-sample autocorrelation coefficient, as must be true in the population; see Greene (2008, 725). McCullough (1998) discusses other methods of estimating φvv ; he finds that relative to other methods, such as linear regression, the Yule–Walker equations-based method performs poorly, in part because it is susceptible to numerical error. Box, Jenkins, and Reinsel (2008, 69) also caution against using the Yule–Walker equations-based method, especially with data that are nearly nonstationary. Acknowledgment The ac and pac commands are based on the ac and pac commands written by Sean Becketti (1992), a past editor of the Stata Technical Bulletin and author of the Stata Press book Introduction to Time Series Using Stata. References Becketti, S. 1992. sts1: Autocorrelation and partial autocorrelation graphs. Stata Technical Bulletin 5: 27–28. Reprinted in Stata Technical Bulletin Reprints, vol. 1, pp. 221–223. College Station, TX: Stata Press. . 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: Wiley. Brockwell, P. J., and R. A. Davis. 2002. Introduction to Time Series and Forecasting. 2nd ed. New York: Springer. Chatfield, C. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC. Enders, W. 2010. Applied Econometric Time Series. 3rd ed. New York: Wiley. Greene, W. H. 2008. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. McCullough, B. D. 1998. Algorithm choice for (partial) autocorrelation functions. Journal of Economic and Social Measurement 24: 265–278. Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth. Also see [TS] tsset — Declare data to be time-series data [TS] pergram — Periodogram [TS] wntestq — Portmanteau (Q) test for white noise Title cumsp — Cumulative spectral distribution Description Options Also see Quick start Remarks and examples Menu Methods and formulas Syntax References Description cumsp plots the cumulative sample spectral-distribution function evaluated at the natural frequencies for a (dense) time series. Quick start Plot cumulative sample spectral-distribution function for y using tsset data cumsp y As above, and create newv containing the cumulative distribution estimates cumsp y, generate(newv) Menu Statistics > Time series > Graphs > Cumulative spectral distribution 120 cumsp — Cumulative spectral distribution 121 Syntax cumsp varname  if   in   , options  Description options Main generate(newvar) create newvar holding distribution values Plot cline options marker options marker label options affect rendition of the plotted points connected by lines change look of markers (color, size, etc.) add marker labels; change look or position Add plots addplot(plot) add other plots to the generated graph Y axis, X axis, Titles, Legend, Overall twoway options any options other than by() documented in [G-3] twoway options You must tsset your data before using cumsp; see [TS] tsset. Also, the time series must be dense (nonmissing with no gaps in the time variable) in the sample specified. varname may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main generate(newvar) specifies a new variable to contain the estimated cumulative spectral-distribution values.   Plot cline options affect the rendition of the plotted points connected by lines; see [G-3] cline options. marker options specify the look of markers. This look includes the marker symbol, the marker size, and its color and outline; see [G-3] marker options. marker label options specify if and how the markers are to be labeled; see [G-3] marker label options.   Add plots addplot(plot) provides a way to add other plots to the generated graph; see [G-3] addplot option.   Y axis, X axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, excluding by(). These include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see [G-3] saving option). 122 cumsp — Cumulative spectral distribution Remarks and examples Example 1 Here we use the international airline passengers dataset (Box, Jenkins, and Reinsel 2008, Series G). This dataset has 144 observations on the monthly number of international airline passengers from 1949 through 1960. In the cumulative sample spectral distribution function for these data, we also request a vertical line at frequency 1/12. Because the data are monthly, there will be a pronounced jump in the cumulative sample spectral-distribution plot at the 1/12 value if there is an annual cycle in the data. . use http://www.stata-press.com/data/r14/air2 (TIMESLAB: Airline passengers) . cumsp air, xline(.083333333) 1.00 0.00 0.00 0.20 0.40 0.60 0.80 Airline Passengers (1949−1960) Cumulative spectral distribution 0.20 0.40 0.60 0.80 1.00 Sample spectral distribution function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 Points evaluated at the natural frequencies The cumulative sample spectral-distribution function clearly illustrates the annual cycle. Methods and formulas A time series of interest is decomposed into a unique set of sinusoids of various frequencies and amplitudes. A plot of the sinusoidal amplitudes versus the frequencies for the sinusoidal decomposition of a time series gives us the spectral density of the time series. If we calculate the sinusoidal amplitudes for a discrete set of “natural” frequencies (1/n, 2/n, . . . , q/n), we obtain the periodogram. Let x(1), . . . , x(n) be a time series, and let ωk = (k − 1)/n denote the natural frequencies for k = 1, . . . , bn/2c + 1 where b c indicates the greatest integer function. Define Ck2 n 1 X = 2 x(t)e2πi(t−1)ωk n t=1 A plot of nCk2 versus ωk is then called the periodogram. 2 cumsp — Cumulative spectral distribution 123 The sample spectral density may then be defined as fb(ωk ) = nCk2 . If we let fb(ω1 ), . . . , fb(ωQ ) be the sample spectral density function of the time series evaluated at the frequencies ωj = (j − 1)/Q for j = 1, . . . , Q and we let q = bQ/2c + 1, then k X Fb(ωk ) = fb(ωj ) i=1 q X fb(ωj ) i=1 is the sample spectral-distribution function of the time series. References Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: Wiley. Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth. Also see [TS] tsset — Declare data to be time-series data [TS] corrgram — Tabulate and graph autocorrelations [TS] pergram — Periodogram Title dfactor — Dynamic-factor models Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description dfactor estimates the parameters of dynamic-factor models by maximum likelihood. Dynamicfactor models are flexible models for multivariate time series in which unobserved factors have a vector autoregressive structure, exogenous covariates are permitted in both the equations for the latent factors and the equations for observable dependent variables, and the disturbances in the equations for the dependent variables may be autocorrelated. Quick start Dynamic-factor model with y1 and y2 a function of x and an unobserved factor that follows a third-order autoregressive process using tsset data dfactor (y1 y2=x) (f=, ar(1/3)) As above, but with equations for the observed variables following an autoregressive process of order 1 dfactor (y1 y2=x, ar(1)) (f=, ar(1/3)) As above, but with an unstructured covariance matrix for the errors of y1 and y2 dfactor (y1 y2=x, ar(1) covstructure(unstructured)) (f=, ar(1/3)) Menu Statistics > Multivariate time series > Dynamic-factor models 124 dfactor — Dynamic-factor models 125 Syntax dfactor obs eq  fac eq   if   in   , options  obs eq specifies the equation for the observed dependent variables, and it has the form     (depvars = exog d , sopts ) fac eq specifies the equation for the unobserved factors, and it has the form     , sopts ) (facvars = exog f depvars are the observed dependent variables. exog d are the exogenous variables that enter into the equations for the observed dependent variables. (All factors are automatically entered into the equations for the observed dependent variables.) facvars are the names for the unobserved factors in the model. You may specify the names of existing variables in facvars, but dfactor treats them only as names and takes no notice that they are also variables. exog f are the exogenous variables that enter into the equations for the factors. options Description Model constraints(constraints) apply specified linear constraints SE/Robust vce(vcetype) vcetype may be oim or robust Reporting level(#) nocnsreport display options set confidence level; default is level(95) do not display constraints control columns and column formats, row spacing, display of omitted variables and base and empty cells, and factor-variable labeling Maximization maximize options from(matname) control the maximization process; seldom used specify initial values for the maximization process; seldom used Advanced method(method) specify the method for calculating the log likelihood; seldom used coeflegend display legend instead of statistics 126 dfactor — Dynamic-factor models Description sopts Model suppress constant term from the equation; allowed only in obs eq ar(numlist) autoregressive terms arstructure(arstructure) structure of autoregressive coefficient matrices covstructure(covstructure) covariance structure noconstant arstructure Description diagonal ltriangular general diagonal matrix; the default lower triangular matrix general matrix covstructure Description identity dscalar diagonal unstructured identity matrix diagonal scalar matrix diagonal matrix symmetric, positive-definite matrix method Description hybrid use the stationary Kalman filter and the De Jong diffuse Kalman filter; the default use the stationary De Jong method and the De Jong diffuse Kalman filter dejong You must tsset your data before using dfactor; see [TS] tsset. exog d and exog f may contain factor variables; see [U] 11.4.3 Factor variables. depvars, exog d, and exog f may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. Options   Model constraints(constraints) apply linear constraints. Some specifications require linear constraints for parameter identification. noconstant suppresses the constant term. ar(numlist) specifies the vector autoregressive lag structure in the equation. By default, no lags are included in either the observable or the factor equations. arstructure(diagonal|ltriangular|general) specifies the structure of the matrices in the vector autoregressive lag structure. dfactor — Dynamic-factor models 127 arstructure(diagonal) specifies the matrices to be diagonal—separate parameters for each lag, but no cross-equation autocorrelations. arstructure(diagonal) is the default for both the observable and the factor equations. arstructure(ltriangular) specifies the matrices to be lower triangular—parameterizes a recursive, or Wold causal, structure. arstructure(general) specifies the matrices to be general matrices—separate parameters for each possible autocorrelation and cross-correlation. covstructure(identity | dscalar | diagonal | unstructured) specifies the covariance structure of the errors. covstructure(identity) specifies a covariance matrix equal to an identity matrix, and it is the default for the errors in the factor equations. covstructure(dscalar) specifies a covariance matrix equal to σ 2 times an identity matrix. covstructure(diagonal) specifies a diagonal covariance matrix, and it is the default for the errors in the observable variables. covstructure(unstructured) specifies a symmetric, positive-definite covariance matrix with parameters for all variances and covariances.   SE/Robust vce(vcetype) specifies the estimator for the variance–covariance matrix of the estimator. vce(oim), the default, causes dfactor to use the observed information matrix estimator. vce(robust) causes dfactor to use the Huber/White/sandwich estimator.   Reporting level(#); see [R] estimation options. nocnsreport; see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), and sformat(% fmt); see [R] estimation options.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), and from(matname); see [R] maximize for all options except from(), and see below for information on from(). These options are seldom used. from(matname) specifies initial values for the maximization process. from(b0) causes dfactor to begin the maximization algorithm with the values in b0. b0 must be a row vector; the number of columns must equal the number of parameters in the model; and the values in b0 must be in the same order as the parameters in e(b). This option is seldom used.   Advanced method(method) specifies how to compute the log likelihood. dfactor writes the model in statespace form and uses sspace to estimate the parameters; see [TS] sspace. method() offers two methods for dealing with some of the technical aspects of the state-space likelihood. This option is seldom used. 128 dfactor — Dynamic-factor models method(hybrid), the default, uses the Kalman filter with model-based initial values when the model is stationary and uses the De Jong (1988, 1991) diffuse Kalman filter when the model is nonstationary. method(dejong) uses the De Jong (1988) method for estimating the initial values for the Kalman filter when the model is stationary and uses the De Jong (1988, 1991) diffuse Kalman filter when the model is nonstationary. The following option is available with dfactor but is not shown in the dialog box: coeflegend; see [R] estimation options. Remarks and examples Remarks are presented under the following headings: An introduction to dynamic-factor models Some examples An introduction to dynamic-factor models dfactor estimates the parameters of dynamic-factor models by maximum likelihood (ML). Dynamicfactor models represent a vector of k endogenous variables as linear functions of nf < k unobserved factors and some exogenous covariates. The unobserved factors and the disturbances in the equations for the observed variables may follow vector autoregressive structures. Dynamic-factor models have been developed and applied in macroeconomics; see Geweke (1977), Sargent and Sims (1977), Stock and Watson (1989, 1991), and Watson and Engle (1983). Dynamic-factor models are very flexible; in a sense, they are too flexible. Constraints must be imposed to identify the parameters of dynamic-factor and static-factor models. The parameters in the default specifications in dfactor are identified, but other specifications require additional restrictions. The factors are identified only up to a sign, which means that the coefficients on the unobserved factors can flip signs and still produce the same predictions and the same log likelihood. The flexibility of the model sometimes produces convergence problems. dfactor is designed to handle cases in which the number of modeled endogenous variables, k , is small. The ML estimator is implemented by writing the model in state-space form and by using the Kalman filter to derive and implement the log likelihood. As k grows, the number of parameters quickly exceeds the number that can be estimated. dfactor — Dynamic-factor models 129 A dynamic-factor model has the form yt = Pft + Qxt + ut ft = Rwt + A1 ft−1 + A2 ft−2 + · · · + At−p ft−p + νt ut = C1 ut−1 + C2 ut−2 + · · · + Ct−q ut−q + t where the definitions are given in the following table: Item yt P ft Q xt ut R wt Ai νt Ci t Dimension k×1 k × nf nf × 1 k × nx nx × 1 k×1 nf × nw nw × 1 nf × nf nf × 1 k×k k×1 Definition vector of dependent variables matrix of parameters vector of unobservable factors matrix of parameters vector of exogenous variables vector of disturbances matrix of parameters vector of exogenous variables matrix of autocorrelation parameters for i ∈ {1, 2, . . . , p} vector of disturbances matrix of autocorrelation parameters for i ∈ {1, 2, . . . , q} vector of disturbances By selecting different numbers of factors and lags, the dynamic-factor model encompasses the six models in the table below: Dynamic factors with vector autoregressive errors Dynamic factors Static factors with vector autoregressive errors Static factors Vector autoregressive errors Seemingly unrelated regression (DFAR) (DF) (SFAR) (SF) (VAR) (SUR) nf nf nf nf nf nf >0 >0 >0 >0 =0 =0 p>0 p>0 p=0 p=0 p=0 p=0 q q q q q q >0 =0 >0 =0 >0 =0 In addition to the time-series models, dfactor can estimate the parameters of SF models and SUR models. dfactor can place equality constraints on the disturbance covariances, which sureg and var do not allow. Some examples Example 1: Dynamic-factor model Stock and Watson (1989, 1991) wrote a simple macroeconomic model as a DF model, estimated the parameters by ML, and extracted an economic indicator. In this example, we estimate the parameters of a DF model. In [TS] dfactor postestimation, we extend this example and extract an economic indicator for the differenced series. We have data on an industrial-production index, ipman; real disposable income, income; an aggregate weekly hours index, hours; and aggregate unemployment, unemp. We believe that these variables are first-difference stationary. We model their first-differences as linear functions of an unobserved factor that follows a second-order autoregressive process. 130 dfactor — Dynamic-factor models . use http://www.stata-press.com/data/r14/dfex (St. Louis Fed (FRED) macro data) . dfactor (D.(ipman income hours unemp) = , noconstant) (f = , ar(1/2)) searching for initial values .................... (setting technique to bhhh) Iteration 0: log likelihood = -675.19823 Iteration 1: log likelihood = -666.74344 (output omitted ) Refining estimates: Iteration 0: log likelihood = -662.09507 Iteration 1: log likelihood = -662.09507 Dynamic-factor model Sample: 1972m2 - 2008m11 Number of obs Wald chi2(6) Prob > chi2 Log likelihood = -662.09507 Coef. = = = 442 751.95 0.0000 OIM Std. Err. z P>|z| [95% Conf. Interval] f f L1. L2. .2651932 .4820398 .0568663 .0624635 4.66 7.72 0.000 0.000 .1537372 .3596136 .3766491 .604466 f .3502249 .0287389 12.19 0.000 .2938976 .4065522 f .0746338 .0217319 3.43 0.001 .0320401 .1172276 f .2177469 .0186769 11.66 0.000 .1811407 .254353 f -.0676016 .0071022 -9.52 0.000 -.0815217 -.0536816 .1383158 .2773808 .0911446 .0237232 .0167086 .0188302 .0080847 .0017932 8.28 14.73 11.27 13.23 0.000 0.000 0.000 0.000 .1055675 .2404743 .0752988 .0202086 .1710641 .3142873 .1069903 .0272378 D.ipman D.income D.hours D.unemp var(De.ipman) var(De.income) var(De.hours) var(De.unemp) Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. For a discussion of the atypical iteration log, see example 1 in [TS] sspace. The header in the output describes the estimation sample, reports the log-likelihood function at the maximum, and gives the results of a Wald test against the null hypothesis that the coefficients on the independent variables, the factors, and the autoregressive components are all zero. In this example, the null hypothesis that all parameters except for the variance parameters are zero is rejected at all conventional levels. The results in the estimation table indicate that the unobserved factor is quite persistent and that it is a significant predictor for each of the observed variables. dfactor — Dynamic-factor models 131 dfactor writes the DF model as a state-space model and uses the same methods as sspace to estimate the parameters. Example 5 in [TS] sspace writes the model considered here in state-space form and uses sspace to estimate the parameters. Technical note The signs of the coefficients on the unobserved factors are not identified. They are not identified because we can multiply the unobserved factors and the coefficients on the unobserved factors by negative one without changing the log likelihood or any of the model predictions. Altering either the starting values for the maximization process, the maximization technique() used, or the platform on which the command is run can cause the signs of the estimated coefficients on the unobserved factors to change. Changes in the signs of the estimated coefficients on the unobserved factors do not alter the implications of the model or the model predictions. Example 2: Dynamic-factor model with covariates Here we extend the previous example by allowing the errors in the equations for the observables to be autocorrelated. This extension yields a constrained VAR model with an unobserved autocorrelated factor. We estimate the parameters by typing 132 dfactor — Dynamic-factor models . dfactor (D.(ipman income hours unemp) = , noconstant ar(1)) (f = , ar(1/2)) searching for initial values .............. (setting technique to bhhh) Iteration 0: log likelihood = -659.68789 Iteration 1: log likelihood = -631.6043 (output omitted ) Refining estimates: Iteration 0: log likelihood = -610.28846 Iteration 1: log likelihood = -610.28846 Dynamic-factor model Sample: 1972m2 - 2008m11 Number of obs = 442 Wald chi2(10) = 990.91 Log likelihood = -610.28846 Prob > chi2 = 0.0000 Coef. OIM Std. Err. z P>|z| [95% Conf. Interval] f f L1. L2. .4058457 .3663499 .0906183 .0849584 4.48 4.31 0.000 0.000 .2282371 .1998344 .5834544 .5328654 De.ipman e.ipman LD. -.2772149 .068808 -4.03 0.000 -.4120761 -.1423538 De.income e.income LD. -.2213824 .0470578 -4.70 0.000 -.3136141 -.1291508 De.hours e.hours LD. -.3969317 .0504256 -7.87 0.000 -.495764 -.2980994 De.unemp e.unemp LD. -.1736835 .0532071 -3.26 0.001 -.2779675 -.0693995 f .3214972 .027982 11.49 0.000 .2666535 .3763408 f .0760412 .0173844 4.37 0.000 .0419684 .110114 f .1933165 .0172969 11.18 0.000 .1594151 .2272179 f -.0711994 .0066553 -10.70 0.000 -.0842435 -.0581553 .1387909 .2636239 .0822919 .0218056 .0154558 .0179043 .0071096 .0016658 8.98 14.72 11.57 13.09 0.000 0.000 0.000 0.000 .1084981 .2285322 .0683574 .0185407 .1690837 .2987157 .0962265 .0250704 D.ipman D.income D.hours D.unemp var(De.ipman) var(De.income) var(De.hours) var(De.unemp) Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. dfactor — Dynamic-factor models 133 The autoregressive (AR) terms are displayed in error notation. e.varname stands for the error in the equation for varname. The estimate of the pth AR term from y1 on y2 is reported as Lpe.y1 in equation e.y2. In the above output, the estimated first-order AR term of D.ipman on D.ipman is −0.277 and is labeled as LDe.ipman in equation De.ipman. The previous two examples illustrate how to use dfactor to estimate the parameters of DF models. Although the previous example indicates that the more general DFAR model fits the data well, we use these data to illustrate how to estimate the parameters of more restrictive models. Example 3: A VAR with constrained error variance In this example, we use dfactor to estimate the parameters of a SUR model with constraints on the error-covariance matrix. The model is also a constrained VAR with constraints on the error-covariance matrix, because we include the lags of two dependent variables as exogenous variables to model the dynamic structure of the data. Previous exploratory work suggested that we should drop the lag of D.unemp from the model. 134 dfactor — Dynamic-factor models . constraint 1 [cov(De.unemp,De.income)]_cons = 0 . dfactor (D.(ipman income unemp) = LD.(ipman income), noconstant > covstructure(unstructured)), constraints(1) searching for initial values ............ (setting technique to bhhh) Iteration 0: log likelihood = -569.34353 Iteration 1: log likelihood = -548.7669 (output omitted ) Refining estimates: Iteration 0: log likelihood = -535.12973 Iteration 1: log likelihood = -535.12973 Dynamic-factor model Sample: 1972m3 - 2008m11 Number of obs Wald chi2(6) Prob > chi2 Log likelihood = -535.12973 ( 1) [cov(De.income,De.unemp)]_cons = 0 Coef. = = = 441 88.32 0.0000 OIM Std. Err. z P>|z| [95% Conf. Interval] D.ipman ipman LD. .206276 .0471654 4.37 0.000 .1138335 .2987185 income LD. .1867384 .0512139 3.65 0.000 .086361 .2871158 D.income ipman LD. .1043733 .0434048 2.40 0.016 .0193015 .1894451 income LD. -.1957893 .0471305 -4.15 0.000 -.2881634 -.1034153 ipman LD. -.0865823 .0140747 -6.15 0.000 -.1141681 -.0589964 income LD. -.0200749 .0152828 -1.31 0.189 -.0500285 .0098788 .3243902 .0218533 14.84 0.000 .2815584 .3672219 .0445794 .013696 3.25 0.001 .0177358 .071423 -.0298076 .2747234 .0047755 .0185008 -6.24 14.85 0.000 0.000 -.0391674 .2384624 -.0204478 .3109844 (constrained) .0019453 14.85 0.000 .0250738 .0326994 D.unemp var(De.ipman) cov(De.ipman, De.income) cov(De.ipman, De.unemp) var(De.income) cov(De.income, De.unemp) var(De.unemp) 0 .0288866 Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The output indicates that the model fits well, except that the lag of first-differenced income is not a significant predictor of first-differenced unemployment. dfactor — Dynamic-factor models 135 Technical note The previous example shows how to use dfactor to estimate the parameters of a SUR model with constraints on the error-covariance matrix. Neither sureg nor var allows for constraints on the error-covariance matrix. Without the constraints on the error-covariance matrix and including the lag of D.unemp, . dfactor (D.(ipman income unemp) = LD.(ipman income unemp), > noconstant covstructure(unstructured)) (output omitted ) . var D.(ipman income unemp), lags(1) noconstant (output omitted ) and . sureg (D.ipman LD.(ipman income unemp), noconstant) > (D.income LD.(ipman income unemp), noconstant) > (D.unemp LD.(ipman income unemp), noconstant) (output omitted ) produce the same estimates after allowing for small numerical differences. Example 4: A lower-triangular VAR with constrained error variance The previous example estimated the parameters of a constrained VAR model with a constraint on the error-covariance matrix. This example makes two refinements on the previous one: we use an unconditional estimator instead of a conditional estimator, and we constrain the AR parameters to have a lower triangular structure. (See the next technical note for a discussion of conditional and unconditional estimators.) The results are 136 dfactor — Dynamic-factor models . constraint 1 [cov(De.unemp,De.income)]_cons = 0 . dfactor (D.(ipman income unemp) = , ar(1) arstructure(ltriangular) noconstant > covstructure(unstructured)), constraints(1) searching for initial values ............ (setting technique to bhhh) Iteration 0: log likelihood = -543.89917 Iteration 1: log likelihood = -541.47792 (output omitted ) Refining estimates: Iteration 0: log likelihood = -540.36159 Iteration 1: log likelihood = -540.36159 Dynamic-factor model Sample: 1972m2 - 2008m11 Number of obs Wald chi2(6) Prob > chi2 Log likelihood = -540.36159 ( 1) [cov(De.income,De.unemp)]_cons = 0 Coef. = = = 442 75.48 0.0000 OIM Std. Err. z P>|z| [95% Conf. Interval] De.ipman e.ipman LD. .2297308 .0473147 4.86 0.000 .1369957 .3224659 De.income e.ipman LD. .1075441 .0433357 2.48 0.013 .0226077 .1924805 e.income LD. -.2209485 .047116 -4.69 0.000 -.3132943 -.1286028 De.unemp e.ipman LD. -.0975759 .0151301 -6.45 0.000 -.1272304 -.0679215 e.income LD. -.0000467 .0147848 -0.00 0.997 -.0290244 .0289309 e.unemp LD. -.0795348 .0482213 -1.65 0.099 -.1740469 .0149773 .3335286 .0224282 14.87 0.000 .2895702 .377487 .0457804 .0139123 3.29 0.001 .0185127 .0730481 -.0329438 .2743375 .0051423 .0184657 -6.41 14.86 0.000 0.000 -.0430226 .2381454 -.022865 .3105296 (constrained) .00199 14.68 0.000 .0253083 .0331092 var(De.ipman) cov(De.ipman, De.income) cov(De.ipman, De.unemp) var(De.income) cov(De.income, De.unemp) var(De.unemp) 0 .0292088 Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The estimated AR terms of D.income and D.unemp on D.unemp are −0.000047 and −0.079535, and they are not significant at the 1% or 5% levels. The estimated AR term of D.ipman on D.income is 0.107544 and is significant at the 5% level but not at the 1% level. dfactor — Dynamic-factor models 137 Technical note We obtained the unconditional estimator in example 4 by specifying the ar() option instead of including the lags of the endogenous variables as exogenous variables, as we did in example 3. The unconditional estimator has an additional observation and is more efficient. This change is analogous to estimating an AR coefficient by arima instead of using regress on the lagged endogenous variable. For example, to obtain the unconditional estimator in a univariate model, typing . arima D.ipman, ar(1) noconstant technique(nr) (output omitted ) will produce the same estimated AR coefficient as . dfactor (D.ipman, ar(1) noconstant) (output omitted ) We obtain the conditional estimator by typing either . regress D.ipman LD.ipman, noconstant (output omitted ) or . dfactor (D.ipman = LD.ipman, noconstant) (output omitted ) Example 5: A static factor model In this example, we fit regional unemployment data to an SF model. We have data on the unemployment levels for the four regions in the U.S. census: west for the West, south for the South, ne for the Northeast, and midwest for the Midwest. We treat the variables as first-difference stationary and model the first-differences of these variables. Using dfactor yields 138 dfactor — Dynamic-factor models . use http://www.stata-press.com/data/r14/urate (Monthly unemployment rates in US Census regions) . dfactor (D.(west south ne midwest) = , noconstant ) (z = ) searching for initial values ............. (setting technique to bhhh) Iteration 0: log likelihood = 872.71993 Iteration 1: log likelihood = 873.04786 (output omitted ) Refining estimates: Iteration 0: log likelihood = 873.0755 Iteration 1: log likelihood = 873.0755 Dynamic-factor model Sample: 1990m2 - 2008m12 Number of obs Wald chi2(4) Log likelihood = 873.0755 Prob > chi2 Coef. OIM Std. Err. z = = = 227 342.56 0.0000 P>|z| [95% Conf. Interval] D.west z .0978324 .0065644 14.90 0.000 .0849664 .1106983 z .0859494 .0061762 13.92 0.000 .0738442 .0980546 z .0918607 .0072814 12.62 0.000 .0775893 .106132 z .0861102 .0074652 11.53 0.000 .0714787 .1007417 .0036887 .0038902 .0064074 .0074749 .0005834 .0005228 .0007558 .0008271 6.32 7.44 8.48 9.04 0.000 0.000 0.000 0.000 .0025453 .0028656 .0049261 .0058538 .0048322 .0049149 .0078887 .009096 D.south D.ne D.midwest var(De.west) var(De.south) var(De.ne) var(De.midw~t) Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The estimates indicate that we could reasonably suppose that the unobserved factor has the same effect on the changes in unemployment in all four regions. The output below shows that we cannot reject the null hypothesis that these coefficients are the same. . test ( 1) ( 2) ( 3) [D.west]z = [D.south]z = [D.ne]z = [D.midwest]z [D.west]z - [D.south]z = 0 [D.west]z - [D.ne]z = 0 [D.west]z - [D.midwest]z = 0 chi2( 3) = 3.58 Prob > chi2 = 0.3109 Example 6: A static factor with constraints In this example, we impose the constraint that the unobserved factor has the same impact on changes in unemployment in all four regions. This constraint was suggested by the results of the previous example. The previous example did not allow for any dynamics in the variables, a problem we alleviate by allowing the disturbances in the equation for each observable to follow an AR(1) process. dfactor — Dynamic-factor models . constraint 2 [D.west]z = [D.south]z . constraint 3 [D.west]z = [D.ne]z . constraint 4 [D.west]z = [D.midwest]z . dfactor (D.(west south ne midwest) = , noconstant ar(1)) (z = ), > constraints(2/4) searching for initial values ............. (setting technique to bhhh) Iteration 0: log likelihood = 827.97004 Iteration 1: log likelihood = 874.74471 (output omitted ) Refining estimates: Iteration 0: log likelihood = 880.97488 Iteration 1: log likelihood = 880.97488 Dynamic-factor model Sample: 1990m2 - 2008m12 Number of obs = Wald chi2(5) = Log likelihood = 880.97488 Prob > chi2 = ( 1) [D.west]z - [D.south]z = 0 ( 2) [D.west]z - [D.ne]z = 0 ( 3) [D.west]z - [D.midwest]z = 0 Coef. OIM Std. Err. z P>|z| 227 363.34 0.0000 [95% Conf. Interval] De.west e.west LD. .1297198 .0992663 1.31 0.191 -.0648386 .3242781 De.south e.south LD. -.2829014 .0909205 -3.11 0.002 -.4611023 -.1047004 e.ne LD. .2866958 .0847851 3.38 0.001 .12052 .4528715 De.midwest e.midwest LD. .0049427 .0782188 0.06 0.950 -.1483634 .1582488 z .0904724 .0049326 18.34 0.000 .0808047 .1001401 z .0904724 .0049326 18.34 0.000 .0808047 .1001401 z .0904724 .0049326 18.34 0.000 .0808047 .1001401 z .0904724 .0049326 18.34 0.000 .0808047 .1001401 .0038959 .0035518 .0058173 .0075444 .0005111 .0005097 .0006983 .0008268 7.62 6.97 8.33 9.12 0.000 0.000 0.000 0.000 .0028941 .0025528 .0044488 .0059239 .0048977 .0045507 .0071859 .009165 De.ne D.west D.south D.ne D.midwest var(De.west) var(De.south) var(De.ne) var(De.midw~t) Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. 139 140 dfactor — Dynamic-factor models The results indicate that the model might not fit well. Two of the four AR coefficients are statistically insignificant, while the two significant coefficients have opposite signs and sum to about zero. We suspect that a DF model might fit these data better than an SF model with autocorrelated disturbances. Stored results dfactor stores the following in e(): Scalars e(N) e(k) e(k aux) e(k eq) e(k eq model) e(k dv) e(k obser) e(k factor) e(o ar max) e(f ar max) e(df m) e(ll) e(chi2) e(p) e(tmin) e(tmax) e(stationary) e(rank) e(ic) e(rc) e(converged) significance minimum time in sample maximum time in sample 1 if the estimated parameters indicate a stationary model, 0 otherwise rank of VCE number of iterations return code 1 if converged, 0 otherwise Macros e(cmd) e(cmdline) e(depvar) e(obser deps) e(covariates) e(indeps) e(factor deps) e(tvar) e(eqnames) e(model) e(title) e(tmins) e(tmaxs) e(o ar) e(f ar) e(observ cov) e(factor cov) e(chi2type) e(vce) e(vcetype) e(opt) e(method) e(initial values) e(technique) e(tech steps) e(datasignature) e(datasignaturevars) e(properties) dfactor command as typed unoperated names of dependent variables in observation equations names of dependent variables in observation equations list of covariates independent variables names of unobserved factors in model variable denoting time within groups names of equations type of dynamic-factor model specified title in estimation output formatted minimum time formatted maximum time list of AR terms for disturbances list of AR terms for factors structure of observation-error covariance matrix structure of factor-error covariance matrix Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. type of optimization likelihood method type of initial values maximization technique iterations taken in maximization technique(s) the checksum variables used in calculation of checksum b V number of observations number of parameters number of auxiliary parameters number of equations in e(b) number of equations in overall model test number of dependent variables number of observation equations number of factors specified number of AR terms for the disturbances number of AR terms for the factors model degrees of freedom log likelihood χ2 dfactor — Dynamic-factor models e(estat cmd) e(predict) e(marginsok) e(marginsnotok) e(asbalanced) e(asobserved) Matrices e(b) e(Cns) e(ilog) e(gradient) e(V) e(V modelbased) Functions e(sample) 141 program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins factor variables fvset as asbalanced factor variables fvset as asobserved coefficient vector constraints matrix iteration log (up to 20 iterations) gradient vector variance–covariance matrix of the estimators model-based variance marks estimation sample Methods and formulas dfactor writes the specified model as a state-space model and uses sspace to estimate the parameters by maximum likelihood. See Lütkepohl (2005, 619–621) for how to write the DF model in state-space form. See [TS] sspace for the technical details. References De Jong, P. 1988. The likelihood for a state space model. Biometrika 75: 165–169. . 1991. The diffuse Kalman filter. Annals of Statistics 19: 1073–1083. Geweke, J. 1977. The dynamic factor analysis of economic time series models. In Latent Variables in Socioeconomic Models, ed. D. J. Aigner and A. S. Goldberger, 365–383. Amsterdam: North-Holland. Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Sargent, T. J., and C. A. Sims. 1977. Business cycle modeling without pretending to have too much a priori economic theory. In New Methods in Business Cycle Research: Proceedings from a Conference, ed. C. A. Sims, 45–109. Minneapolis: Federal Reserve Bank of Minneapolis. Stock, J. H., and M. W. Watson. 1989. New indexes of coincident and leading economic indicators. In NBER Macroeconomics Annual 1989, ed. O. J. Blanchard and S. Fischer, vol. 4, 351–394. Cambridge, MA: MIT Press. . 1991. A probability model of the coincident economic indicators. In Leading Economic Indicators: New Approaches and Forecasting Records, ed. K. Lahiri and G. H. Moore, 63–89. Cambridge: Cambridge University Press. Watson, M. W., and R. F. Engle. 1983. Alternative algorithms for the estimation of dymanic factor, MIMIC and varying coefficient regression models. Journal of Econometrics 23: 385–400. Also see [TS] dfactor postestimation — Postestimation tools for dfactor [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] sspace — State-space models [TS] tsset — Declare data to be time-series data [TS] var — Vector autoregressive models [R] regress — Linear regression [R] sureg — Zellner’s seemingly unrelated regression [U] 20 Estimation and postestimation commands Title dfactor postestimation — Postestimation tools for dfactor Postestimation commands Also see predict Remarks and examples Methods and formulas Postestimation commands The following standard postestimation commands are available after dfactor: Command Description estat ic estat summarize estat vce estimates forecast lincom Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest nlcom predict predictnl test testnl 142 dfactor postestimation — Postestimation tools for dfactor 143 predict Description for predict predict creates a new variable containing predictions such as expected values, unobserved factors, autocorrelated disturbances, and innovations. The root mean squared error is available for all predictions. All predictions are also available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic { stub* | newvarlist }  if   in   , statistic options  Description Main y xb xbf factors residuals innovations dependent variable, which is xbf + residuals linear predictions using the observable independent variables linear predictions using the observable independent variables plus the factor contributions unobserved factor variables autocorrelated disturbances innovations, the observed dependent variable minus the predicted y These statistics are available both in and out of sample; type predict the estimation sample. options . . . if e(sample) . . . if wanted only for Description Options equation(eqnames) rmse(stub* | newvarlist) dynamic(time constant) specify name(s) of equation(s) for which predictions are to be made put estimated root mean squared errors of predicted objects in new variables begin dynamic forecast at specified time Advanced smethod(method) method for predicting unobserved states method Description onestep smooth filter predict using past information predict using all sample information predict using past and contemporaneous information 144 dfactor postestimation — Postestimation tools for dfactor Options for predict The mathematical notation used in this section is defined in Description of [TS] dfactor.   Main y, xb, xbf, factors, residuals, and innovations specify the statistic to be predicted. y, the default, predicts the dependent variables. The predictions include the contributions of the unobserved factors, the linear predictions by using the observable independent variables, and bb b t+u bt. any autocorrelation, P ft + Qx b t. xb calculates the linear prediction by using the observable independent variables, Qx xbf calculates the contributions of the unobserved factors plus the linear prediction by using the bb b t. observable independent variables, P ft + Qx b t+A b 1b b 2b b t−pb factors estimates the unobserved factors, b ft = Rw ft−1 + A ft−2 + · · · + A ft−p . b 1u b 2u b t−q u bt = C b t−1 + C b t−2 + · · · + C b t−q . residuals calculates the autocorrelated residuals, u bb b t−u bt. innovations calculates the innovations, b t = yt − P ft + Qx   Options equation(eqnames) specifies the equation(s) for which the predictions are to be calculated. You specify equation names, such as equation(income consumption) or equation(factor1 factor2), to identify the equations. For the factors statistic, you must specify names of equations for factors; for all other statistics, you must specify names of equations for observable variables. If you do not specify equation() and do not specify stub*, the results are the same as if you had specified the name of the first equation for the predicted statistic. equation() may not be specified with stub*. rmse(stub* | newvarlist) puts the root mean squared errors of the predicted objects into the specified new variables. The root mean squared errors measure the variances due to the disturbances but do not account for estimation error. dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified time constant must be in the scale of the time variable specified in tsset, and the time constant must be inside a sample for which observations on the dependent variables are available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of 2008, assuming that your time variable is quarterly, see [D] datetime. If the model contains exogenous variables, they must be present for the whole predicted sample. dynamic() may not be specified with xb, xbf, innovations, smethod(filter), or smethod(smooth).   Advanced smethod(method) specifies the method used to predict the unobserved states in the model. smethod() may not be specified with xb. smethod(onestep), the default, causes predict to use previous information on the dependent variables. The Kalman filter is performed on previous periods, but only the one-step predictions are made for the current period. smethod(smooth) causes predict to estimate the states at each time period using all the sample data by the Kalman smoother. dfactor postestimation — Postestimation tools for dfactor 145 smethod(filter) causes predict to estimate the states at each time period using previous and contemporaneous data by the Kalman filter. The Kalman filter is performed on previous periods and the current period. smethod(filter) may be specified only with factors and residuals. Remarks and examples We assume that you have already read [TS] dfactor. In this entry, we illustrate some of the features of predict after using dfactor. dfactor writes the specified model as a state-space model and estimates the parameters by maximum likelihood. The unobserved factors and the residuals are states in the state-space form of the model, and they are estimated by the Kalman filter or the Kalman smoother. The smethod() option controls how these states are estimated. The Kalman filter or Kalman smoother is run over the specified sample. Changing the sample can alter the predicted value for a given observation, because the Kalman filter and Kalman smoother are recursive algorithms. After estimating the parameters of a dynamic-factor model, there are many quantities of potential interest. Here we will discuss several of these statistics and illustrate how to use predict to compute them. Example 1: One-step, out-of-sample forecasts Let’s begin by estimating the parameters of the dynamic-factor model considered in example 2 in [TS] dfactor. 146 dfactor postestimation — Postestimation tools for dfactor . use http://www.stata-press.com/data/r14/dfex (St. Louis Fed (FRED) macro data) . dfactor (D.(ipman income hours unemp) = , noconstant ar(1)) (f = , ar(1/2)) (output omitted ) While several of the six statistics computed by predict might be of interest, we will look only at a few of these statistics for D.ipman. We begin by obtaining one-step predictions in the estimation sample and a six-month dynamic forecast for D.ipman. The graph of the in-sample predictions indicates that our model accounts only for a small fraction of the variability in D.ipman. . tsappend, add(6) . predict Dipman_f, dynamic(tm(2008m12)) equation(D.ipman) (option y assumed; fitted values) −4 −2 0 2 . tsline D.ipman Dipman_f if month<=tm(2008m11), lcolor(gs13) xtitle("") > legend(rows(2)) 1970m1 1980m1 1990m1 2000m1 2010m1 Dipman y prediction, Dipman, dynamic(tm(2008m12)) Graphing the last year of the sample and the six-month out-of-sample forecast yields −6 −4 −2 0 2 . tsline D.ipman Dipman_f if month>=tm(2008m1), xtitle("") legend(rows(2)) 2008m1 2008m4 2008m7 2008m10 2009m1 Dipman y prediction, Dipman, dynamic(tm(2008m12)) 2009m4 dfactor postestimation — Postestimation tools for dfactor 147 Example 2: Estimating an unobserved factor Another common task is to estimate an unobserved factor. We can estimate the unobserved factor at each time period by using only previous information (the smethod(onestep) option), previous and contemporaneous information (the smethod(filter) option), or all the sample information (the smethod(smooth) option). We are interested in the one-step predictive power of the unobserved factor, so we use the default, smethod(onestep). −4 −2 0 2 . predict fac if e(sample), factor . tsline D.ipman fac, lcolor(gs10) xtitle("") legend(rows(2)) 1970m1 1980m1 1990m1 2000m1 2010m1 Dipman factors, f, onestep Methods and formulas dfactor estimates the parameters by writing the model in state-space form and using sspace. Analogously, predict after dfactor uses the methods described in [TS] sspace postestimation. The unobserved factors and the residuals are states in the state-space form of the model. See Methods and formulas of [TS] sspace postestimation for how predictions are made after estimating the parameters of a state-space model. Also see [TS] dfactor — Dynamic-factor models [TS] sspace — State-space models [TS] sspace postestimation — Postestimation tools for sspace [U] 20 Estimation and postestimation commands Title dfgls — DF-GLS unit-root test Description Options Acknowledgments Quick start Remarks and examples References Menu Stored results Also see Syntax Methods and formulas Description dfgls performs a modified Dickey–Fuller t test for a unit root in which the series has been transformed by a generalized least-squares regression. Quick start Modified Dickey–Fuller unit-root test for y1 using GLS-transformed series using tsset data dfgls y1 As above, for series y2 that has no linear time trend dfgls y2, notrend As above, but with at most 2 lags dfgls y2, notrend maxlag(2) Menu Statistics > Time series > Tests > DF-GLS test for a unit root 148 dfgls — DF-GLS unit-root test 149 Syntax dfgls varname  if   in   , options  Description options Main maxlag(#) notrend ers use # as the highest lag order for Dickey–Fuller GLS regressions series is stationary around a mean instead of around a linear time trend present interpolated critical values from Elliott, Rothenberg, and Stock (1996) You must tsset your data before using dfgls; see [TS] tsset. varname may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main maxlag(#) sets the value of k , the highest lag order for the first-differenced, detrended variable in the Dickey–Fuller regression. By default, dfgls sets k according to the method proposed by Schwert (1989); that is, dfgls sets kmax = floor[12{(T + 1)/100}0.25 ]. notrend specifies that the alternative hypothesis be that the series is stationary around a mean instead of around a linear time trend. By default, a trend is included. ers specifies that dfgls should present interpolated critical values from tables presented by Elliott, Rothenberg, and Stock (1996), which they obtained from simulations. See Critical values under Methods and formulas for details. Remarks and examples dfgls tests for a unit root in a time series. It performs the modified Dickey–Fuller t test (known as the DF-GLS test) proposed by Elliott, Rothenberg, and Stock (1996). Essentially, the test is an augmented Dickey–Fuller test, similar to the test performed by Stata’s dfuller command, except that the time series is transformed via a generalized least squares (GLS) regression before performing the test. Elliott, Rothenberg, and Stock and later studies have shown that this test has significantly greater power than the previous versions of the augmented Dickey–Fuller test. dfgls performs the DF-GLS test for the series of models that include 1 to k lags of the firstdifferenced, detrended variable, where k can be set by the user or by the method described in Schwert (1989). Stock and Watson (2011, 644–649) provide an excellent discussion of the approach. As discussed in [TS] dfuller, the augmented Dickey–Fuller test involves fitting a regression of the form ∆yt = α + βyt−1 + δt + ζ1 ∆yt−1 + ζ2 ∆yt−2 + · · · + ζk ∆yt−k + t and then testing the null hypothesis H0 : β = 0. The DF-GLS test is performed analogously but on GLS-detrended data. The null hypothesis of the test is that yt is a random walk, possibly with drift. There are two possible alternative hypotheses: yt is stationary about a linear time trend or yt is stationary with a possibly nonzero mean but with no linear time trend. The default is to use the former. To specify the latter alternative, use the notrend option. 150 dfgls — DF-GLS unit-root test Example 1 Here we use the German macroeconomic dataset and test whether the natural log of investment exhibits a unit root. We use the default options with dfgls. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . dfgls ln_inv DF-GLS for ln_inv Number of obs = 80 Maxlag = 11 chosen by Schwert criterion DF-GLS tau 1% Critical 5% Critical 10% Critical [lags] Test Statistic Value Value Value 11 -2.925 10 -2.671 9 -2.766 8 -3.259 7 -3.536 6 -3.115 5 -3.054 4 -3.016 3 -2.071 2 -1.675 1 -1.752 Opt Lag (Ng-Perron seq t) = Min SC = -6.169137 at lag Min MAIC = -6.136371 at lag -3.610 -3.610 -3.610 -3.610 -3.610 -3.610 -3.610 -3.610 -3.610 -3.610 -3.610 7 with RMSE 4 with RMSE 1 with RMSE -2.763 -2.798 -2.832 -2.865 -2.898 -2.929 -2.958 -2.986 -3.012 -3.035 -3.055 -2.489 -2.523 -2.555 -2.587 -2.617 -2.646 -2.674 -2.699 -2.723 -2.744 -2.762 .0388771 .0398949 .0440319 The null hypothesis of a unit root is not rejected for lags 1–3, it is rejected at the 10% level for lags 9–10, and it is rejected at the 5% level for lags 4–8 and 11. For comparison, we also test for a unit root in log of investment by using dfuller with two different lag specifications. We need to use the trend option with dfuller because it is not included by default. dfgls — DF-GLS unit-root test 151 . dfuller ln_inv, lag(4) trend Augmented Dickey-Fuller test for unit root Number of obs = 87 Interpolated Dickey-Fuller Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value Z(t) -3.133 -4.069 -3.463 -3.158 MacKinnon approximate p-value for Z(t) = 0.0987 . dfuller ln_inv, lag(7) trend Augmented Dickey-Fuller test for unit root Number of obs = 84 Interpolated Dickey-Fuller Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value Z(t) -3.994 -4.075 -3.466 -3.160 MacKinnon approximate p-value for Z(t) = 0.0090 The critical values and the test statistic produced by dfuller with 4 lags do not support rejecting the null hypothesis, although the MacKinnon approximate p-value is less than 0.1. With 7 lags, the critical values and the test statistic reject the null hypothesis at the 5% level, and the MacKinnon approximate p-value is less than 0.01. That the dfuller results are not as strong as those produced by dfgls is not surprising because the DF-GLS test with a trend has been shown to be more powerful than the standard augmented Dickey–Fuller test. Stored results If maxlag(0) is specified, dfgls stores the following in r(): Scalars r(rmse0) r(dft0) RMSE DF-GLS statistic Otherwise, dfgls stores the following in r(): Scalars r(maxlag) r(N) r(sclag) r(maiclag) r(optlag) Matrices r(results) highest lag order k number of observations lag chosen by Schwarz criterion lag chosen by modified AIC method lag chosen by sequential-t method k, MAIC, SIC, RMSE, and DF-GLS statistics Methods and formulas dfgls tests for a unit root. There are two possible alternative hypotheses: yt is stationary around a linear trend or yt is stationary with no linear time trend. Under the first alternative hypothesis, the DF-GLS test is performed by first estimating the intercept and trend via GLS. The GLS estimation is performed by generating the new variables, yet , xt , and zt , where 152 dfgls — DF-GLS unit-root test ye1 = y1 yet = yt − α∗ yt−1 , t = 2, . . . , T x1 = 1 xt = 1 − α ∗ , t = 2, . . . , T z1 = 1 zt = t − α∗ (t − 1) and α∗ = 1 − (13.5/T ). An OLS regression is then estimated for the equation yet = δ0 xt + δ1 zt + t The OLS estimators δb0 and δb1 are then used to remove the trend from yt ; that is, we generate y ∗ = yt − (δb0 + δb1 t) Finally, we perform an augmented Dickey–Fuller test on the transformed variable by fitting the OLS regression k X ∗ ∗ ∆yt∗ = α + βyt−1 + ζj ∆yt−j + t (1) j=1 and then test the null hypothesis H0: β = 0 by using tabulated critical values. To perform the DF-GLS test under the second alternative hypothesis, we proceed as before but define α∗ = 1 − (7/T ), eliminate z from the GLS regression, compute y ∗ = yt − δ0 , fit the augmented Dickey–Fuller regression by using the newly transformed variable, and perform a test of the null hypothesis that β = 0 by using the tabulated critical values. dfgls reports the DF-GLS statistic and its critical values obtained from the regression in (1) for k ∈ {1, 2, . . . , kmax }. By default, dfgls sets kmax = floor[12{(T + 1)/100}0.25 ] as proposed by Schwert (1989), although you can override this choice with another value. The sample size available with kmax lags is used in all the regressions. Because there are kmax lags of the first-differenced series, kmax + 1 observations are lost, leaving T − kmax observations. dfgls requires that the sample of T + 1 observations on yt = (y0 , y1 , . . . , yT ) have no gaps. dfgls reports the results of three different methods for choosing which value of k to use. These are method 1, the Ng–Perron sequential t; method 2, the minimum Schwarz information criterion (SIC); and method 3, the Ng–Perron modified Akaike information criterion (MAIC). Although the SIC has a long history in time-series modeling, the Ng–Perron sequential t was developed by Ng and Perron (1995), and the MAIC was developed by Ng and Perron (2000). The SIC can be calculated using either the log likelihood or the sum-of-squared errors from a regression; dfgls uses the latter definition. Specifically, for each k SIC 2 = ln(rmse d ) + (k + 1) ln(T − kmax ) (T − kmax ) dfgls — DF-GLS unit-root test where rmse d = 1 (T − kmax ) T X 153 ebt2 t=kmax +1 dfgls reports the value of the smallest SIC and the k that produced it. Ng and Perron (1995) derived a sequential-t algorithm for choosing k : i. Set n = 0 and run the regression in method 2 with all kmax − n lags. If the coefficient on βkmax is significantly different from zero at level α, choose k to kmax . Otherwise, continue to ii. ii. If n < kmax , set n = n + 1 and continue to iii. Otherwise, set k = 0 and stop. iii. Run the regression in method 2 with kmax − n lags. If the coefficient on βkmax −n is significantly different from zero at level α, choose k to kmax − n. Otherwise, return to ii. Per Ng and Perron (1995), dfgls uses α = 10%. dfgls reports the k selected by this sequential-t algorithm and the rmse d from the regression. Method 3 is based on choosing k to minimize the MAIC. The MAIC is calculated as MAIC(k) 2 = ln(rmse d )+ where τ (k) = 1 rmse d b2 2 β0 2{τ (k) + k} T − kmax T X yet2 t=kmax +1 and ye was defined previously. Critical values By default, dfgls uses the 5% and 10% critical values computed from the response surface analysis of Cheung and Lai (1995). Because Cheung and Lai (1995) did not present results for the 1% case, the 1% critical values are always interpolated from the critical values presented by ERS. ERS presented critical values, obtained from simulations, for the DF-GLS test with a linear trend and showed that the critical values for the mean-only DF-GLS test were the same as those for the ADF test. If dfgls is run with the ers option, dfgls will present interpolated critical values from these tables. The method of interpolation is standard. For the trend case, below 50 observations and above 200 there is no interpolation; the values for 50 and ∞ are reported from the tables. For a value N that lies between two values in the table, say, N1 and N2 , with corresponding critical values CV1 and CV2 , the critical value CV = CV1 + N − N1 (CV2 − CV1 ) N1 is presented. The same method is used for the mean-only case, except that interpolation is possible for values between 50 and 500. 154 dfgls — DF-GLS unit-root test Acknowledgments We thank Christopher F. Baum of the Department of Economics at Boston College and author of the Stata Press books An Introduction to Modern Econometrics Using Stata and An Introduction to Stata Programming and Richard Sperling for a previous version of dfgls. References Cheung, Y.-W., and K. S. Lai. 1995. Lag order and critical values of a modified Dickey–Fuller test. Oxford Bulletin of Economics and Statistics 57: 411–419. Dickey, D. A., and W. A. Fuller. 1979. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74: 427–431. Elliott, G. R., T. J. Rothenberg, and J. H. Stock. 1996. Efficient tests for an autoregressive unit root. Econometrica 64: 813–836. Ng, S., and P. Perron. 1995. Unit root tests in ARMA models with data-dependent methods for the selection of the truncation lag. Journal of the American Statistical Association 90: 268–281. . 2000. Lag length selection and the construction of unit root tests with good size and power. Econometrica 69: 1519–1554. Schwert, G. W. 1989. Tests for unit roots: A Monte Carlo investigation. Journal of Business and Economic Statistics 2: 147–159. Stock, J. H., and M. W. Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: Addison–Wesley. Also see [TS] dfuller — Augmented Dickey–Fuller unit-root test [TS] pperron — Phillips–Perron unit-root test [TS] tsset — Declare data to be time-series data [XT] xtunitroot — Panel-data unit-root tests Title dfuller — Augmented Dickey–Fuller unit-root test Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description dfuller performs the augmented Dickey–Fuller test that a variable follows a unit-root process. The null hypothesis is that the variable contains a unit root, and the alternative is that the variable was generated by a stationary process. You may optionally exclude the constant, include a trend term, and include lagged values of the difference of the variable in the regression. Quick start Augmented Dickey–Fuller test for presence of a unit root in y using tsset data dfuller y As above, but with a trend term dfuller y, trend Augmented Dickey–Fuller test for presence of a unit root in y with a drift term dfuller y, drift As above, but include 3 lagged differences and display the regression table dfuller y, drift lags(3) regress Menu Statistics > Time series > Tests > Augmented Dickey-Fuller unit-root test 155 156 dfuller — Augmented Dickey–Fuller unit-root test Syntax dfuller varname  if   in   , options  Description options Main noconstant trend drift regress lags(#) suppress constant term in regression include trend term in regression include drift term in regression display regression table include # lagged differences You must tsset your data before using dfuller; see [TS] tsset. varname may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main noconstant suppresses the constant term (intercept) in the model and indicates that the process under the null hypothesis is a random walk without drift. noconstant cannot be used with the trend or drift option. trend specifies that a trend term be included in the associated regression and that the process under the null hypothesis is a random walk, perhaps with drift. This option may not be used with the noconstant or drift option. drift indicates that the process under the null hypothesis is a random walk with nonzero drift. This option may not be used with the noconstant or trend option. regress specifies that the associated regression table appear in the output. By default, the regression table is not produced. lags(#) specifies the number of lagged difference terms to include in the covariate list. Remarks and examples Dickey and Fuller (1979) developed a procedure for testing whether a variable has a unit root or, equivalently, that the variable follows a random walk. Hamilton (1994, 528–529) describes the four different cases to which the augmented Dickey–Fuller test can be applied. The null hypothesis is always that the variable has a unit root. They differ in whether the null hypothesis includes a drift term and whether the regression used to obtain the test statistic includes a constant term and time trend. Becketti (2013, chap. 9) provides additional examples showing how to conduct these tests. The true model is assumed to be yt = α + yt−1 + ut where ut is an independent and identically distributed zero-mean error term. In cases one and two, presumably α = 0, which is a random walk without drift. In cases three and four, we allow for a drift term by letting α be unrestricted. dfuller — Augmented Dickey–Fuller unit-root test 157 The Dickey–Fuller test involves fitting the model yt = α + ρyt−1 + δt + ut by ordinary least squares (OLS), perhaps setting α = 0 or δ = 0. However, such a regression is likely to be plagued by serial correlation. To control for that, the augmented Dickey–Fuller test instead fits a model of the form ∆yt = α + βyt−1 + δt + ζ1 ∆yt−1 + ζ2 ∆yt−2 + · · · + ζk ∆yt−k + t (1) where k is the number of lags specified in the lags() option. The noconstant option removes the constant term α from this regression, and the trend option includes the time trend δt, which by default is not included. Testing β = 0 is equivalent to testing ρ = 1, or, equivalently, that yt follows a unit root process. In the first case, the null hypothesis is that yt follows a random walk without drift, and (1) is fit without the constant term α and the time trend δt. The second case has the same null hypothesis as the first, except that we include α in the regression. In both cases, the population value of α is zero under the null hypothesis. In the third case, we hypothesize that yt follows a unit root with drift, so that the population value of α is nonzero; we do not include the time trend in the regression. Finally, in the fourth case, the null hypothesis is that yt follows a unit root with or without drift so that α is unrestricted, and we include a time trend in the regression. The following table summarizes the four cases. Case 1 2 3 4 Process under null hypothesis Regression restrictions dfuller option Random walk without drift Random walk without drift Random walk with drift Random walk with or without drift α = 0, δ = 0 δ=0 δ=0 (none) noconstant (default) drift trend Except in the third case, the t-statistic used to test H0: β = 0 does not have a standard distribution. Hamilton (1994, chap. 17) derives the limiting distributions, which are different for each of the three other cases. The critical values reported by dfuller are interpolated based on the tables in Fuller (1996). MacKinnon (1994) shows how to approximate the p-values on the basis of a regression surface, and dfuller also reports that p-value. In the third case, where the regression includes a constant term and under the null hypothesis the series has a nonzero drift parameter α, the t statistic has the usual t distribution; dfuller reports the one-sided critical values and p-value for the test of H0 against the alternative Ha: β < 0, which is equivalent to ρ < 1. Deciding which case to use involves a combination of theory and visual inspection of the data. If economic theory favors a particular null hypothesis, the appropriate case can be chosen based on that. If a graph of the data shows an upward trend over time, then case four may be preferred. If the data do not show a trend but do have a nonzero mean, then case two would be a valid alternative. Example 1 In this example, we examine the international airline passengers dataset from Box, Jenkins, and Reinsel (2008, Series G). This dataset has 144 observations on the monthly number of international airline passengers from 1949 through 1960. Because the data show a clear upward trend, we use the trend option with dfuller to include a constant and time trend in the augmented Dickey–Fuller regression. 158 dfuller — Augmented Dickey–Fuller unit-root test . use http://www.stata-press.com/data/r14/air2 (TIMESLAB: Airline passengers) . dfuller air, lags(3) trend regress Augmented Dickey-Fuller test for unit root Test Statistic Z(t) -6.936 Number of obs = 140 Interpolated Dickey-Fuller 1% Critical 5% Critical 10% Critical Value Value Value -4.027 -3.445 -3.145 MacKinnon approximate p-value for Z(t) = 0.0000 D.air Coef. air L1. LD. L2D. L3D. _trend _cons -.5217089 .5572871 .095912 .14511 1.407534 44.49164 Std. Err. .0752195 .0799894 .0876692 .0879922 .2098378 7.78335 t -6.94 6.97 1.09 1.65 6.71 5.72 P>|t| 0.000 0.000 0.276 0.101 0.000 0.000 [95% Conf. Interval] -.67048 .399082 -.0774825 -.0289232 .9925118 29.09753 -.3729379 .7154923 .2693065 .3191433 1.822557 59.88575 Here we can overwhelmingly reject the null hypothesis of a unit root at all common significance levels. From the regression output, the estimated β of −0.522 implies that ρ = (1 − 0.522) = 0.478. Experiments with fewer or more lags in the augmented regression yield the same conclusion. Example 2 In this example, we use the German macroeconomic dataset to determine whether the log of consumption follows a unit root. We will again use the trend option, because consumption grows over time. dfuller — Augmented Dickey–Fuller unit-root test 159 . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . tsset qtr time variable: qtr, 1960q1 to 1982q4 delta: 1 quarter . dfuller ln_consump, lags(4) trend Augmented Dickey-Fuller test for unit root Number of obs = 87 Interpolated Dickey-Fuller Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value Z(t) -1.318 -4.069 -3.463 -3.158 MacKinnon approximate p-value for Z(t) = 0.8834 As we might expect from economic theory, here we cannot reject the null hypothesis that log consumption exhibits a unit root. Again using different numbers of lag terms yield the same conclusion. Stored results dfuller stores the following in r(): Scalars r(N) r(lags) r(Zt) r(p) number of observations number of lagged differences Dickey–Fuller test statistic MacKinnon approximate p-value (if there is a constant or trend in associated regression) Methods and formulas In the OLS estimation of an AR(1) process with Gaussian errors, yt = ρyt−1 + t where t are independent and identically distributed as N (0, σ 2 ) and y0 = 0, the OLS estimate (based on an n-observation time series) of the autocorrelation parameter ρ is given by n X ρbn = If |ρ| < 1, then √ yt−1 yt t=1 n X yt2 t=1 n(b ρn − ρ) → N (0, 1 − ρ2 ) If this result were valid when ρ = 1, the resulting distribution would have a variance of zero. When ρ = 1, the OLS estimate ρb still converges in probability to one, though we need to find a suitable nondegenerate distribution so that we can perform hypothesis tests of H0 : ρ = 1. Hamilton (1994, chap. 17) provides a superb exposition of the requisite theory. 160 dfuller — Augmented Dickey–Fuller unit-root test To compute the test statistics, we fit the augmented Dickey–Fuller regression ∆yt = α + βyt−1 + δt + k X ζj ∆yt−j + et j=1 via OLS where, depending on the options specified, the constant term α or time trend δt is omitted and k is the number of lags specified in the lags() option. The test statistic for H0 : β = 0 is b σβ , where σ Zt = β/b bβ is the standard error of βb. The critical values included in the output are linearly interpolated from the table of values that appears in Fuller (1996), and the MacKinnon approximate p-values use the regression surface published in MacKinnon (1994).  David Alan Dickey (1945– ) was born in Ohio and obtained degrees in mathematics at Miami University and a PhD in statistics at Iowa State University in 1976 as a student of Wayne Fuller. He works at North Carolina State University and specializes in time-series analysis.  Wayne Arthur Fuller (1931– ) was born in Iowa, obtained three degrees at Iowa State University and then served on the faculty between 1959 and 2001. He has made many distinguished contributions to time series, measurement-error models, survey sampling, and econometrics.  References Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: Wiley. Dickey, D. A., and W. A. Fuller. 1979. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74: 427–431. Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. MacKinnon, J. G. 1994. Approximate asymptotic distribution functions for unit-root and cointegration tests. Journal of Business and Economic Statistics 12: 167–176. Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth. Also see [TS] tsset — Declare data to be time-series data [TS] dfgls — DF-GLS unit-root test [TS] pperron — Phillips–Perron unit-root test [XT] xtunitroot — Panel-data unit-root tests Title estat acplot — Plot parametric autocorrelation and autocovariance functions Description Options Also see Quick start Remarks and examples Menu for estat Methods and formulas Syntax References Description estat acplot plots the estimated autocorrelation and autocovariance functions of a stationary process using the parameters of a previously fit parametric model. estat acplot is available after arima and arfima; see [TS] arima and [TS] arfima. Quick start Autocorrelation function using estimates from arima or arfima estat acplot Autocovariance function using estimates from arima or arfima estat acplot, covariance As above, and save results in mydata.dta estat acplot, covariance saving(mydata) Menu for estat Statistics > Postestimation 161 162 estat acplot — Plot parametric autocorrelation and autocovariance functions Syntax estat acplot  , options  options Description   saving( filename , . . . ) save results to filename; save variables in double precision; save variables with prefix stubname set confidence level; default is level(95) use # autocorrelations calculate autocovariances; the default is to calculate autocorrelations report short-memory ACF; only allowed after arfima level(#) lags(#) covariance smemory CI plot ciopts(rcap options) affect rendition of the confidence bands Plot marker options marker label options cline options change look of markers (color, size, etc.) add marker labels; change look or position affect rendition of the plotted points Y axis, X axis, Titles, Legend, Overall twoway options any options other than by() documented in [G-3] twoway options Options   saving( filename , suboptions ) creates a Stata data file (.dta file) consisting of the autocorrelation estimates, standard errors, and confidence bounds. Five variables are saved: lag (lag number), ac (autocorrelation estimate), se (standard error), ci l (lower confidence bound), and ci u (upper confidence bound). double specifies that the variables be saved as doubles, meaning 8-byte reals. By default, they are saved as floats, meaning 4-byte reals. name(stubname) specifies that variables be saved with prefix stubname. replace indicates that filename be overwritten if it exists. level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or as set by set level; see [R] level. lags(#) specifies the number of autocorrelations to calculate. The default is to use min{floor(n/2) − 2, 40}, where floor(n/2) is the greatest integer less than or equal to n/2 and n is the number of observations. covariance specifies the calculation of autocovariances instead of the default autocorrelations. smemory specifies that the ARFIMA fractional integration parameter be ignored. The computed autocorrelations are for the short-memory ARMA component of the model. This option is allowed only after arfima.   CI plot ciopts(rcap options) affects the rendition of the confidence bands; see [G-3] rcap options. estat acplot — Plot parametric autocorrelation and autocovariance functions  163  Plot marker options affect the rendition of markers drawn at the plotted points, including their shape, size, color, and outline; see [G-3] marker options. marker label options specify if and how the markers are to be labeled; see [G-3] marker label options. cline options affect whether lines connect the plotted points and the rendition of those lines; see [G-3] cline options.   Y axis, X axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, except by(). These include options for titling the graph (see [G-3] title options) and options for saving the graph to disk (see [G-3] saving option). Remarks and examples The dependent variable evolves over time because of random shocks in the time domain representation. The autocovariances γj , j ∈ {0, 1, . . . , ∞}, of a covariance-stationary process yt specify its variance and dependence structure, and the autocorrelations ρj , j ∈ {1, 2, . . . , ∞}, provide a scalefree measure of yt ’s dependence structure. The autocorrelation at lag j specifies whether realizations at time t and realizations at time t − j are positively related, unrelated, or negatively related. estat acplot uses the estimated parameters of a parametric model to estimate and plot the autocorrelations and autocovariances of a stationary process. 164 estat acplot — Plot parametric autocorrelation and autocovariance functions Example 1 In example 1 of [TS] arima, we fit an ARIMA(1,1,1) model of the U.S. Wholesale Price Index (WPI) using quarterly data over the period 1960q1 through 1990q4. . use http://www.stata-press.com/data/r14/wpi1 . arima wpi, arima(1,1,1) (setting optimization to BHHH) Iteration 0: log likelihood = -139.80133 Iteration 1: log likelihood = -135.6278 Iteration 2: log likelihood = -135.41838 Iteration 3: log likelihood = -135.36691 Iteration 4: log likelihood = -135.35892 (switching optimization to BFGS) Iteration 5: log likelihood = -135.35471 Iteration 6: log likelihood = -135.35135 Iteration 7: log likelihood = -135.35132 Iteration 8: log likelihood = -135.35131 ARIMA regression Sample: 1960q2 - 1990q4 Number of obs Wald chi2(2) Prob > chi2 Log likelihood = -135.3513 = = = 123 310.64 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] .7498197 .3340968 2.24 0.025 .0950019 1.404637 ar L1. .8742288 .0545435 16.03 0.000 .7673256 .981132 ma L1. -.4120458 .1000284 -4.12 0.000 -.6080979 -.2159938 /sigma .7250436 .0368065 19.70 0.000 .6529042 .7971829 D.wpi Coef. _cons wpi ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. Now we use estat acplot to estimate the autocorrelations implied by the estimated ARMA parameters. We include lags(50) to indicate that autocorrelations be computed for 50 lags. By default, a 95% confidence interval is provided for each autocorrelation. estat acplot — Plot parametric autocorrelation and autocovariance functions 165 . estat acplot, lags(50) 0 .2 Autocorrelations .4 .6 .8 1 Parametric autocorrelations of D.wpi with 95% confidence intervals 0 10 20 30 quarterly lag 40 50 The graph is similar to a typical autocorrelation function of an AR(1) process with a positive coefficient. The autocorrelations of a stationary AR(1) process decay exponentially toward zero. Methods and formulas The autocovariance function for ARFIMA models is described in Methods and formulas of [TS] arfima. The autocovariance function for ARIMA models is obtained by setting the fractional difference parameter to zero. Box, Jenkins, and Reinsel (2008) provide excellent descriptions of the autocovariance function for ARIMA and seasonal ARIMA models. Palma (2007) provides an excellent summary of the autocovariance function for ARFIMA models. References Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: Wiley. Palma, W. 2007. Long-Memory Time Series: Theory and Methods. Hoboken, NJ: Wiley. Also see [TS] arfima — Autoregressive fractionally integrated moving-average models [TS] arima — ARIMA, ARMAX, and other dynamic regression models Title estat aroots — Check the stability condition of ARIMA estimates Description Options Reference Quick start Remarks and examples Also see Menu for estat Stored results Syntax Methods and formulas Description estat aroots checks the eigenvalue stability condition after estimating the parameters of an ARIMA model using arima. A graph of the eigenvalues of the companion matrices for the AR and MA polynomials is also produced. estat aroots is available only after arima; see [TS] arima. Quick start Verify that all eigenvalues of the autoregressive polynomial lie inside the unit circle after arima estat aroots As above, but suppress the graph estat aroots, nograph Label each plotted eigenvalue with its distance from the unit circle estat aroots, dlabel Menu for estat Statistics > Postestimation 166 estat aroots — Check the stability condition of ARIMA estimates 167 Syntax estat aroots  , options  options Description nograph dlabel modlabel suppress graph of eigenvalues for the companion matrices label eigenvalues with the distance from the unit circle label eigenvalues with the modulus Grid nogrid  pgrid( . . . ) suppress polar grid circles specify radii and appearance of polar grid circles; see Options for details Plot marker options change look of markers (color, size, etc.) Reference unit circle rlopts(cline options) affect rendition of reference unit circle Y axis, X axis, Titles, Legend, Overall twoway options any options other than by() documented in [G-3] twoway options Options nograph specifies that no graph of the eigenvalues of the companion matrices be drawn. dlabel labels each eigenvalue with its distance from the unit circle. dlabel cannot be specified with modlabel. modlabel labels the eigenvalues with their moduli. modlabel cannot be specified with dlabel.   Grid nogrid suppresses the polar grid circles.    pgrid( numlist , line options ) determines the radii and appearance of the polar grid circles. By default, the graph includes nine polar grid circles with radii 0.1, 0.2, . . . , 0.9 that have the grid line style. The numlist specifies the radii for the polar grid circles. The line options determine the appearance of the polar grid circles; see [G-3] line options. Because the pgrid() option can be repeated, circles with different radii can have distinct appearances.   Plot marker options specify the look of markers. This look includes the marker symbol, the marker size, and its color and outline; see [G-3] marker options.   Reference unit circle rlopts(cline options) affect the rendition of the reference unit circle; see [G-3] cline options. 168  estat aroots — Check the stability condition of ARIMA estimates  Y axis, X axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, except by(). These include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see [G-3] saving option). Remarks and examples Inference after arima requires that the variable yt be covariance stationary. The variable yt is covariance stationary if its first two moments exist and are time invariant. More explicitly, yt is covariance stationary if 1. E(yt ) is finite and not a function of t; 2. Var(yt ) is finite and independent of t; and 3. Cov(yt , ys ) is a finite function of |t − s| but not of t or s alone. The stationarity of an ARMA process depends on the autoregressive (AR) parameters. If the inverse roots of the AR polynomial all lie inside the unit circle, the process is stationary, invertible, and has an infinite-order moving-average (MA) representation. Hamilton (1994, chap. 1) shows that if the modulus of each eigenvalue of the matrix F(ρ) is strictly less than 1, the estimated ARMA is stationary; see Methods and formulas for the definition of the matrix F(ρ). The MA part of an ARMA process can be rewritten as an infinite-order AR process provided that the MA process is invertible. Hamilton (1994, chap. 1) shows that if the modulus of each eigenvalue of the matrix F(θ) is strictly less than 1, the estimated ARMA is invertible; see Methods and formulas for the definition of the matrix F(θ). Example 1 In this example, we check the stability condition of the SARIMA model that we fit in example 3 of [TS] arima. We begin by reestimating the parameters of the model. . use http://www.stata-press.com/data/r14/air2 (TIMESLAB: Airline passengers) . generate lnair = ln(air) estat aroots — Check the stability condition of ARIMA estimates . arima lnair, arima(0,1,1) sarima(0,1,1,12) noconstant (setting optimization to BHHH) Iteration 0: log likelihood = 223.8437 Iteration 1: log likelihood = 239.80405 Iteration 2: log likelihood = 244.10265 Iteration 3: log likelihood = 244.65895 Iteration 4: log likelihood = 244.68945 (switching optimization to BFGS) Iteration 5: log likelihood = 244.69431 Iteration 6: log likelihood = 244.69647 Iteration 7: log likelihood = 244.69651 Iteration 8: log likelihood = 244.69651 ARIMA regression Sample: 14 - 144 Number of obs Wald chi2(2) Log likelihood = 244.6965 Prob > chi2 DS12.lnair Coef. OPG Std. Err. z P>|z| = = = 169 131 84.53 0.0000 [95% Conf. Interval] ARMA ma L1. -.4018324 .0730307 -5.50 0.000 -.5449698 -.2586949 ma L1. -.5569342 .0963129 -5.78 0.000 -.745704 -.3681644 /sigma .0367167 .0020132 18.24 0.000 .0327708 .0406625 ARMA12 Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. We can now use estat aroots to check the stability condition of the MA part of the model. . estat aroots Eigenvalue stability condition Eigenvalue .824798 .824798 .9523947 -.824798 -.824798 -.4761974 -.4761974 2.776e-16 2.776e-16 .4761974 .4761974 -.9523947 .4018324 + - .4761974i .4761974i + + + + - .4761974i .4761974i .824798i .824798i .9523947i .9523947i .824798i .824798i Modulus .952395 .952395 .952395 .952395 .952395 .952395 .952395 .952395 .952395 .952395 .952395 .952395 .401832 All the eigenvalues lie inside the unit circle. MA parameters satisfy invertibility condition. 170 estat aroots — Check the stability condition of ARIMA estimates −1 −.5 Imaginary 0 .5 1 Inverse roots of MA polynomial −1 −.5 0 Real .5 1 Because the modulus of each eigenvalue is strictly less than 1, the MA process is invertible and can be represented as an infinite-order AR process. The graph produced by estat aroots displays the eigenvalues with the real components on the x axis and the imaginary components on the y axis. The graph indicates visually that these eigenvalues are just inside the unit circle. Stored results aroots stores the following in r(): Matrices r(Re ar) real part of the eigenvalues of F (ρ) r(Im ar) imaginary part of the eigenvalues of F (ρ) r(Modulus ar) modulus of the eigenvalues of F (ρ) r(ar) F (ρ), the AR companion matrix real part of the eigenvalues of F (θ) r(Re ma) imaginary part of the eigenvalues of F (θ) r(Im ma) r(Modulus ma) modulus of the eigenvalues of F (θ) r(ma) F (θ), the MA companion matrix Methods and formulas Recall the general form of the ARMA model, ρ(Lp )(yt − xt β) = θ(Lq )t where ρ(Lp ) = 1 − ρ1 L − ρ2 L2 − · · · − ρp Lp θ(Lq ) = 1 + θ1 L + θ2 L2 + · · · + θq Lq and Lj yt = yt−j . estat aroots — Check the stability condition of ARIMA estimates 171 estat aroots forms the companion matrix γ1 1  0 F(γ) =   .  .. γ2 0 1 .. . 0 0  . . . γr−1 ... 0 ... 0 .. .. . . ... 1  γr 0  0 ..  .  0 where γ = ρ and r = p for the AR part of ARMA, and γ = −θ and r = q for the MA part of ARMA. aroots obtains the eigenvalues of F by using matrix eigenvalues. The modulus of the √ complex eigenvalue r + ci is r2 + c2 . As shown by Hamilton (1994, chap. 1), a process is stable and invertible if the modulus of each eigenvalue of F is strictly less than 1. Reference Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Also see [TS] arima — ARIMA, ARMAX, and other dynamic regression models Title estat sbknown — Test for a structural break with a known break date Description Options Reference Quick start Remarks and examples Also see Menu for estat Stored results Syntax Methods and formulas Description estat sbknown performs a Wald or a likelihood-ratio (LR) test of whether the coefficients in a time-series regression vary over the periods defined by known break dates. estat sbknown requires that the current estimation results be from regress or ivregress 2sls. Quick start Test for a structural break at January 1983 for current estimation results estat sbknown, break(tm(1983m1)) As above, but for the first quarter of 1997 estat sbknown, break(tq(1997q1)) As above, but perform an LR test instead of a Wald test estat sbknown, break(tq(1997q1)) lr Perform a Wald test for multiple breaks at dates 1997q1 and 2005q1 estat sbknown, break(tq(1997q1) tq(2005q1)) Menu for estat Statistics > Postestimation 172 estat sbknown — Test for a structural break with a known break date 173 Syntax estat sbknown, break(time constant list) options ∗  options  Description break(time constantlist)  specify one or more break dates breakvars( varlist , constant ) specify variables to be included in the test; by default, all coefficients are tested wald request a Wald test; the default lr request an LR test ∗ break() is required. You must tsset your data before using estat sbknown; see [TS] tsset. Options break(time constant list) specifies a list of one or more hypothesized break dates. break() is required with at least one break date. time constant list is a list of one or more time constant elements specified using dates in Stata internal form (SIF) or human-readable form (HRF) format. If you specify the time constant list using HRF, you must use one of the datetime pseudofunctions; see [D] datetime.    breakvars( varlist , constant ) specifies variables to be included in the test. By default, all the coefficients are tested. constant specifies that a constant be included in the list of variables to be tested. constant may be specified only if the original model was fit with a constant term. wald requests that a Wald test be performed. This is the default. lr requests that an LR test be performed instead of a Wald test. Remarks and examples estat sbknown performs a test of the null hypothesis that the coefficients do not vary over the subsamples defined by the specified known break dates. The null hypothesis of no structural break can be tested using a Wald or an LR test. Consider the linear regression yt = xt β + t A model with a structural break allows the coefficients to change after a break date. If b is the break date, the model is  xt β + t if t ≤ b yt = xt (β + δ) + t if t > b For this model, the null and alternative hypotheses are H0 : δ = 0 and Ha : δ 6= 0. For a classical linear model where t is independent and identically distributed, this test is known as the Chow (1960) test. 174 estat sbknown — Test for a structural break with a known break date Example 1: Test for a single known break date In usmacro.dta, we have data for the fedfunds series from the third quarter of 1954 to the fourth quarter of 2010 from the Federal Reserve Economic Database (FRED), a macroeconomic database provided by the Federal Reserve Bank of Saint Louis. The data are plotted below. 0 5 federal funds rate 10 15 20 U.S. Federal Funds Rate 1950q1 1960q1 1970q1 1980q1 1990q1 quarterly time variable 2000q1 2010q1 We note that the 1970s and 1980s were characterized by periods of high interest rates, with the interest rate peaking in 1981q2. We want to model the federal funds rate as a function of its first lag, but we are concerned that there may be a structural break after 1981q2. We fit the model parameters using regress, and then we use estat sbknown to test for a structural break. . use http://www.stata-press.com/data/r14/usmacro (Federal Reserve Economic Data - St. Louis Fed) . regress fedfunds L.fedfunds (output omitted ) . estat sbknown, break(tq(1981q2)) Wald test for a structural break: Known break date Number of obs = Sample: 1954q4 - 2010q4 Break date: 1981q2 Ho: No structural break chi2(2) = 6.4147 Prob > chi2 = 0.0405 Exogenous variables: L.fedfunds Coefficients included in test: L.fedfunds _cons 225 We reject the null hypothesis of no structural break at the 5% level. Example 2: Test for multiple known breaks Suppose we divide the data into three subsamples for the periods 1954q4 to 1970q1, 1970q2 to 1995q1, and 1995q2 to 2010q4, specified by the break dates at 1970q1 and 1995q1. We would like to test whether the coefficients are the same in these subsamples. We do this by specifying multiple dates in the break() option. estat sbknown — Test for a structural break with a known break date . estat sbknown, break(tq(1970q1) tq(1995q1)) Wald test for a structural break: Known break date Number of obs = Sample: 1954q4 - 2010q4 Break date: 1970q1 1995q1 Ho: No structural break chi2(4) = 4.6739 Prob > chi2 = 0.3224 Exogenous variables: L.fedfunds Coefficients included in test: L.fedfunds _cons 175 225 We fail to reject the null hypothesis of no structural break for the specified dates. Stored results estat sbknown stores the following in r(): Scalars r(chi2) r(p) r(df) Macros r(breakdate) r(breakvars) r(test) χ2 test statistic level of significance degrees of freedom list of break dates list of variables whose coefficients are included in the test type of test Methods and formulas A test for a structural break with a known break date can be constructed by fitting a linear regression with an indicator variable as yt = xt β + (b ≤ t)xt δ + t The null hypothesis of no structural break is H0 : δ = 0. This can be tested by constructing a Wald statistic or an LR statistic, both with χ2 (k) as the limiting distribution, where k is the number of parameters in the model. A regression model with multiple breaks may be expressed as  yt = xt β + xt (b1 ≤ t < b2 )δ1 + (b2 ≤ t < b3 )δ2 + · · · + (bm ≤ t)δm + t where b1 , . . . , bm are m ≥ 2 break dates. The null hypothesis of no structural break is a joint test given by H0 : δ1 = · · · = δm = 0. Reference Chow, G. C. 1960. Tests of equality between sets of coefficients in two linear regressions. Econometrica 28: 591–605. 176 estat sbknown — Test for a structural break with a known break date Also see [TS] estat sbsingle — Test for a structural break with an unknown break date [TS] tsset — Declare data to be time-series data [R] ivregress — Single-equation instrumental-variables regression [R] regress — Linear regression Title estat sbsingle — Test for a structural break with an unknown break date Description Options References Quick start Remarks and examples Also see Menu for estat Stored results Syntax Methods and formulas Description estat sbsingle performs a test of whether the coefficients in a time-series regression vary over the periods defined by an unknown break date. estat sbsingle requires that the current estimation results be from regress or ivregress 2sls. Quick start Supremum Wald test for a structural break at an unknown break date for current estimation results using default symmetric trimming of 15% estat sbsingle Same as above estat sbsingle, swald As above, but also report average Wald test estat sbsingle, swald awald Supremum Wald test with symmetric trimming of 20% estat sbsingle, trim(20) As above, but use asymmetric trimming with a left trim of 10% and a right trim of 20% estat sbsingle, ltrim(10) rtrim(20) Menu for estat Statistics > Postestimation 177 178 estat sbsingle — Test for a structural break with an unknown break date Syntax estat sbsingle  , options  Description    breakvars( varlist , constant ) specify variables to be included in the test; by default, all coefficients are tested trim(#) specify a trimming percentage; default is trim(15) ltrim(# l ) specify a left trimming percentage rtrim(# r ) specify a right trimming percentage swald request a supremum Wald test; the default awald request an average Wald test ewald request an exponential Wald test all report all tests slr request a supremum likelihood-ratio (LR) test alr request an average LR test elr request an exponential LR test generate(newvarlist) create newvarlist containing Wald or LR test statistics nodots suppress iteration dots options You must tsset your data before using estat sbsingle; see [TS] tsset. Options    breakvars( varlist , constant ) specifies variables to be included in the test. By default, all the coefficients are tested. constant specifies that a constant be included in the list of variables to be tested. constant may be specified only if the original model was fit with a constant term. trim(#) specifies an equal left and right trimming percentage as an integer. Specifying trim(#) causes the observation at the #th percentile to be treated as the first possible break date and the observation at the (100 − #)th percentile to be treated as the last possible break date. By default, the trimming percentage is set to 15 but may be set to any value between 1 and 49. ltrim(# l ) specifies a left trimming percentage as an integer. Specifying ltrim(# l ) causes the observation at the # l th percentile to be treated as the first possible break date. This option must be specified with rtrim(# r ) and may not be combined with trim(#). # l must be between 1 and 99. rtrim(# r ) specifies a right trimming percentage as an integer. Specifying rtrim(# r ) causes the observation at the (100 − # r )th percentile to be treated as the last possible break date. This option must be specified with ltrim(# l ) and may not be combined with trim(#). # r must be less than (100 − # l ). Specifying # l = # r is equivalent to specifying trim(#) with # = # l = # r . swald requests that a supremum Wald test be performed. This is the default. awald requests that an average Wald test be performed. ewald requests that an exponential Wald test be performed. all specifies that all tests be displayed in a table. slr requests that a supremum LR test be performed. estat sbsingle — Test for a structural break with an unknown break date 179 alr requests that an average LR test be performed. elr requests that an exponential LR test be performed. generate(newvarlist) creates either one or two new variables containing the Wald statistics, LR statistics, or both that are transformed and used to calculate the requested Wald or LR tests. If you request only Wald-type tests (swald, awald, or ewald) or only LR-type tests (slr, alr, or elr), then you may specify only one varname in generate(). By default, newvar will contain Wald or LR statistics, depending on the type of test specified. A variable containing Wald statistics and a variable containing LR statistics are created if you specify both Wald-type and LR-type tests and specify two varnames in generate(). If you only specify one varname in generate() with Wald-type and LR-type tests specified, then Wald statistics are returned. If no test is specified and generate() is specified, Wald statistics are returned. nodots suppresses display of the iteration dots. By default, one dot character is displayed for each iteration in the range of possible break dates. Remarks and examples estat sbsingle constructs a test statistic for a structural break without imposing a known break date by combining the test statistics computed for each possible break date in the sample. estat sbsingle uses the maximum, an average, or the exponential of the average of the tests computed at each possible break date. The test at each possible break date can be either a Wald or an LR test. The limiting distribution of each of these test statistics is known but nonstandard. Not only is each test statistic a function of many sample statistics, but each of these test statistics also depends on the unknown break date, which is not identified under the null hypothesis; see Davies (1987) for a seminal treatment. Tests that use the maximum of the sample tests are known as supremum tests. Specifically, the supremum Wald test uses the maximum of the sample Wald tests, and the supremum LR test uses the maximum of the sample LR tests. The intuition behind these tests is to compare the maximum sample test with what could be expected under the null hypothesis of no break (Quandt [1960], Kim and Siegmund [1989], and Andrews [1993]). Supremum tests have much less power than average tests and exponential tests. Average tests use the average of the sample tests, and exponential tests use the natural log of the average of the exponential of the sample tests. An average test is optimal when the alternative hypothesis is a small change in parameter values at the structural break. An exponential test is optimal when the alternative hypothesis is a larger structural break. See Andrews and Ploberger (1994) for details about the properties of average and exponential tests. All tests implemented in estat sbsingle are a function of the sample statistics computed over a range of possible break dates. However, not all sample observations can be tested as break dates because there are insufficient observations to estimate the parameters for dates too near the beginning or the end of the sample. This identification problem is solved by trimming, which excludes observations too near the beginning or the end of the sample from the set of possible break dates. Andrews (1993) recommends a symmetric trimming of 15% when the researcher has no other information on good trimming values. Much research went into deriving the properties of the implemented tests, and we have cited only a few of the many papers on the subject. See Perron (2006) for an excellent survey. 180 estat sbsingle — Test for a structural break with an unknown break date Example 1: Test for a structural break with unknown break date In usmacro.dta, we have data for the fedfunds series from the third quarter of 1954 to the fourth quarter of 2010 from the Federal Reserve Economic Database (FRED) provided by the Federal Reserve Bank of Saint Louis. Consider a model for the federal funds rate as a function of its first lag and the inflation rate (inflation). We want to test whether coefficients changed at an unknown break date. Below, we fit the model using regress and perform the test using estat sbsingle. . use http://www.stata-press.com/data/r14/usmacro (Federal Reserve Economic Data - St. Louis Fed) . regress fedfunds L.fedfunds inflation (output omitted ) . estat sbsingle 1 2 3 4 5 .................................................. .................................................. .................................................. .... 50 100 150 Test for a structural break: Unknown break date Number of obs = Full sample: Trimmed sample: Estimated break date: Ho: No structural break 222 1955q3 - 2010q4 1964q1 - 2002q3 1980q4 Test Statistic p-value swald 14.1966 0.0440 Exogenous variables: L.fedfunds inflation Coefficients included in test: L.fedfunds inflation _cons By default, a supremum Wald test is performed. The output indicates that we reject the null hypothesis of no structural break at the 5% level and that the estimated break date is 1980q4. Some researchers perform more than one test. Below, we present results for the supremum Wald, average Wald, and average LR tests. . estat sbsingle, swald awald alr 1 2 3 4 5 .................................................. .................................................. .................................................. .... 50 100 150 Test for a structural break: Unknown break date Number of obs = Full sample: Trimmed sample: Ho: No structural break 222 1955q3 - 2010q4 1964q1 - 2002q3 Test Statistic p-value swald awald alr 14.1966 4.5673 4.6319 0.0440 0.1474 0.1411 Exogenous variables: L.fedfunds inflation Coefficients included in test: L.fedfunds inflation _cons estat sbsingle — Test for a structural break with an unknown break date 181 Only the supremum Wald test rejects the null hypothesis of no break. Example 2: Testing for a structural break in a subset of coefficients Below, we test the null hypothesis that there is a break in the intercept, when we assume that there is no break in either the autoregressive coefficient or the coefficient on inflation. . estat sbsingle, breakvars(, constant) 1 2 3 4 5 .................................................. .................................................. .................................................. .... Test for a structural break: Unknown break date Number of obs = Full sample: 1955q3 - 2010q4 Trimmed sample: 1964q1 - 2002q3 Estimated break date: 2001q1 Ho: No structural break Test Statistic p-value swald 6.7794 50 100 150 222 0.1141 Exogenous variables: L.fedfunds inflation Coefficients included in test: _cons We fail to reject the null hypothesis of no structural break in the intercept when there is no break in any other coefficient. Example 3: Reviewing sample test statistics The observation-level Wald or LR test statistics sometimes provide useful diagnostic information. Below, we use the generate() option to store the observation-level Wald statistics in the new variable wald, which we subsequently plot using tsline. 182 estat sbsingle — Test for a structural break with an unknown break date . estat sbsingle, breakvars(L.fedfunds) generate(wald) (output omitted ) . tsline wald, title("Wald test statistics") 0 1 Wald test statistics 2 3 4 Wald test statistics 1950q1 1960q1 1970q1 1980q1 1990q1 date (quarters) 2000q1 2010q1 We see a spike in the value of the test statistic at the estimated break date of 1980q4. The bump to left of the spike may indicate a second break. Example 4: Structural break test with an endogenous regressor We can use estat sbsingle to test for a structural break in a regression with endogenous variables. Suppose we want to estimate the New Keynesian hybrid Phillips curve, which defines inflation as a function of the lagged value of inflation (L.inflation), the output gap (ogap), and the expected value of inflation in t + 1 {Et (F.inflation)}, conditional on information available at time t (Gali and Gertler 1999). See [U] 11.4.4 Time-series varlists. Expected future inflation cannot be directly observed, so macroeconomists use instruments to predict the one-step-ahead inflation rate. This prediction is obtained by regressing the one-step ahead inflation rate on a set of instruments. We can write this mathematically as inflation = α + L.inflation β + ogap δ + Et (F.inflation) γ + t and F.inflation = zt θ + νt+1 where zt is a vector of instruments. The forecasted values given by Et (F.inflation|zt ) = zt b θ are uncorrelated with νt+1 by construction. In this example, we fit the Phillips curve model for the period 1970q1 to 1997q4. We are interested in testing whether expectation of future inflation changed during this period. We instrument the future value of inflation with the first two lags of inflation, the federal funds rate, and the output gap. We use ivregress 2sls to fit the model. estat sbsingle — Test for a structural break with an unknown break date 183 . ivregress 2sls inflation L.inflation ogap > (F.inflation = L(1/2).inflation L(1/2).ogap L(1/2).fedfunds) > if tin(1970q1,1997q4) (output omitted ) . estat sbsingle, breakvars(F.inflation) 1 2 3 4 5 .................................................. ............................ Test for a structural break: Unknown break date Number of obs = Full sample: 1970q1 - 1997q4 Trimmed sample: 1974q2 - 1993q4 Estimated break date: 1981q3 Ho: No structural break Test Statistic p-value swald 6.7345 50 112 0.1164 Coefficients included in test: F.inflation We fail to reject the null hypothesis of no structural break in the coefficient of expected future inflation. Stored results estat sbsingle stores the following in r(): Scalars r(chi2) r(p) r(df) Macros r(ltrim) r(rtrim) r(breakvars) r(breakdate) r(test) χ2 test statistic level of significance degrees of freedom start of trim date end of trim date list of variables whose coefficients are included in the test estimated break date only after supremum tests type of test Methods and formulas Each supremum test statistic is the maximum value of the test statistic that is obtained from a series of Wald or LR tests over a range of possible break dates in the sample. Let b denote a possible break date in the range [b1 , b2 ] for a sample size T . The supremum test statistic for testing the null hypothesis of no structural change in k coefficients is given by supremum ST = sup ST (b) b1 ≤b≤b2 184 estat sbsingle — Test for a structural break with an unknown break date where ST (b) is the Wald or LR test statistic evaluated at a potential break date b. The average and the exponential versions of the test statistic are b2 X 1 ST (b) b2 − b1 + 1 b=b1 "  # b2 X 1 1 exponential ST = ln exp ST (b) b2 − b1 + 1 2 average ST = b=b1 respectively. The limiting distributions of the test statistics are given by supremum ST →d sup S(λ) λ∈[ε1 ,ε2 ] average ST →d exponential ST →d where S(λ) = Z ε2 1 S(λ) dλ ε2 − ε1 ε1    Z ε2 1 1 ln S(λ) dλ exp ε 2 − ε 1 ε1 2 {Bk (λ) − λBk (1)}0 {Bk (λ) − λBk (1)} λ(1 − λ) Bk (λ) is a vector of k -dimensional independent Brownian motions, ε1 = b1 /T , ε2 = b2 /T , and λ = ε2 (1 − ε1 )/{ε1 (1 − ε2 )}. Computing the p-values for the nonstandard limiting distributions is computationally complicated. For each test, the reported p-value is computed using the method in Hansen (1997). References Andrews, D. W. K. 1993. Tests for parameter instability and structural change with unknown change point. Econometrica 61: 821–856. Andrews, D. W. K., and W. Ploberger. 1994. Optimal tests when a nuisance parameter is present only under the alternative. Econometrica 62: 1383–1414. Davies, R. B. 1987. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74: 33–43. Gali, J., and M. Gertler. 1999. Inflation dynamics: A structural econometric analysis. Journal of Monetary Economics 44: 195–222. Hansen, B. E. 1997. Approximate asymptotic Statistics 15: 60–67. p values for structural-change tests. Journal of Business and Economic Kim, H.-J., and D. Siegmund. 1989. The likelihood ratio test for a change-point in simple linear regression. Biometrika 76: 409–423. Perron, P. 2006. Dealing with structural breaks. In Palgrave Handbook of Econometrics: Econometric Theory, Vol I, ed. T. C. Mills and K. Patterson, 278–352. Basingstoke, UK: Palgrave. Quandt, R. E. 1960. Tests of the hypothesis that a linear regression system obeys two separate regimes. Journal of the American Statistical Association 55: 324–330. estat sbsingle — Test for a structural break with an unknown break date Also see [TS] estat sbknown — Test for a structural break with a known break date [TS] tsset — Declare data to be time-series data [R] ivregress — Single-equation instrumental-variables regression [R] regress — Linear regression 185 Title fcast compute — Compute dynamic forecasts after var, svar, or vec Description Options Also see Quick start Remarks and examples Menu Methods and formulas Syntax References Description fcast compute produces dynamic forecasts of the dependent variables in a model previously fit by var, svar, or vec. fcast compute creates new variables and, if necessary, extends the time frame of the dataset to contain the prediction horizon. Quick start Dynamic forecasts stored in f y1, f y2, and f y3 after fitting a model with var for dependent variables y1, y2, and y3 fcast compute f As above, but begin forecast on the first quarter of 1979 fcast compute f , dynamic(q(1979q1)) As above, but specify that 10 periods should be forecasted fcast compute f , dynamic(q(1979q1)) step(10) Menu Statistics > Multivariate time series > VEC/VAR forecasts 186 > Compute forecasts (required for graph) fcast compute — Compute dynamic forecasts after var, svar, or vec 187 Syntax After var and svar fcast compute prefix  , options1   , options2  After vec fcast compute prefix prefix is the prefix appended to the names of the dependent variables to create the names of the variables holding the dynamic forecasts. options1 Description Main step(#) dynamic(time constant) estimates(estname) replace set # periods to forecast; default is step(1) begin dynamic forecasts at time constant use previously stored results estname; default is to use active results replace existing forecast variables that have the same prefix Std. Errors nose bs bsp bscentile reps(#) nodots   saving(filename , replace ) suppress asymptotic standard errors obtain standard errors from bootstrapped residuals obtain standard errors from parametric bootstrap estimate bounds by using centiles of bootstrapped dataset perform # bootstrap replications; default is reps(200) suppress the usual dot after each bootstrap replication save bootstrap results as filename; use replace to overwrite existing filename Reporting level(#) set confidence level; default is level(95) 188 fcast compute — Compute dynamic forecasts after var, svar, or vec options2 Description Main step(#) dynamic(time constant) estimates(estname) replace differences set # periods to forecast; default is step(1) begin dynamic forecasts at time constant use previously stored results estname; default is to use active results replace existing forecast variables that have the same prefix save dynamic predictions of the first-differenced variables Std. Errors suppress asymptotic standard errors nose Reporting level(#) set confidence level; default is level(95) Default is to use asymptotic standard errors if no options are specified. fcast compute can be used only after var, svar, and vec; see [TS] var, [TS] var svar, and [TS] vec. You must tsset your data before using fcast compute; see [TS] tsset. Options   Main step(#) specifies the number of periods to be forecast. The default is step(1). dynamic(time constant) specifies the period to begin the dynamic forecasts. The default is the period after the last observation in the estimation sample. The dynamic() option accepts either a Stata date function that returns an integer or an integer that corresponds to a date using the current tsset format. dynamic() must specify a date in the range of two or more periods into the estimation sample to one period after the estimation sample. estimates(estname) specifies that fcast compute use the estimation results stored as estname. By default, fcast compute uses the active estimation results. See [R] estimates for more information on manipulating estimation results. replace causes fcast compute to replace the variables in memory with the specified predictions. differences specifies that fcast compute also save dynamic predictions of the first-differenced variables. differences can be specified only with vec estimation results.   Std. Errors nose specifies that the asymptotic standard errors of the forecasted levels and, thus the asymptotic confidence intervals for the levels, not be calculated. By default, the asymptotic standard errors and the asymptotic confidence intervals of the forecasted levels are calculated. bs specifies that fcast compute use confidence bounds estimated by a simulation method based on bootstrapping the residuals. bsp specifies that fcast compute use confidence bounds estimated via simulation in which the innovations are drawn from a multivariate normal distribution. bscentile specifies that fcast compute use centiles of the bootstrapped dataset to estimate the bounds of the confidence intervals. By default, fcast compute uses the estimated standard errors and the quantiles of the standard normal distribution determined by level(). fcast compute — Compute dynamic forecasts after var, svar, or vec 189 reps(#) gives the number of repetitions used in the simulations. The default is reps(200). nodots specifies that no dots be displayed while obtaining the simulation-based standard errors. By default, for each replication, a dot is displayed.   saving(filename , replace ) specifies the name of the file to hold the dataset that contains the bootstrap replications. The replace option overwrites any file with this name. replace specifies that filename be overwritten if it exists. This option is not shown in the dialog box.   Reporting level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals. Remarks and examples Researchers often use VARs and VECMs to construct dynamic forecasts. fcast compute computes dynamic forecasts of the dependent variables in a VAR or VECM previously fit by var, svar, or vec. If you are interested in conditional, one-step-ahead predictions, use predict (see [TS] var, [TS] var svar, and [TS] vec). To obtain and analyze dynamic forecasts, you fit a model, use fcast compute to compute the dynamic forecasts, and use fcast graph to graph the results. Example 1 Typing . . . . use http://www.stata-press.com/data/r14/lutkepohl2 var dln_inc dln_consump dln_inv if qtr<tq(1979q1) fcast compute m2_, step(8) fcast graph m2_dln_inc m2_dln_inv m2_dln_consump, observed fits a VAR with two lags, computes eight-step dynamic predictions for each endogenous variable, and produces the graph 190 fcast compute — Compute dynamic forecasts after var, svar, or vec Forecast for dln_inv 0 .02 −.1 −.05 0 .05 .1 .04 Forecast for dln_inc 1978q3 1979q1 1979q3 1980q1 1980q3 0 .02 .04 Forecast for dln_consump 1978q3 1979q1 1979q3 1980q1 1980q3 95% CI observed forecast The graph shows that the model is better at predicting changes in income and investment than in consumption. The graph also shows how quickly the predictions from the two-lag model settle down to their mean values. fcast compute creates new variables in the dataset. If there are K dependent variables in the previously fitted model, fcast compute generates 4K new variables: K new variables that hold the forecasted levels, named by appending the specified prefix to the name of the original variable K estimated lower bounds for the forecast interval, named by appending the specified prefix and the suffix “ LB” to the name of the original variable K estimated upper bounds for the forecast interval, named by appending the specified prefix and the suffix “ UB” to the name of the original variable K estimated standard errors of the forecast, named by appending the specified prefix and the suffix “ SE” to the name of the original variable If you specify options so that fcast compute does not calculate standard errors, the 3K variables that hold them and the bounds of the confidence intervals are not generated. If the model previously fit is a VECM, specifying differences generates another K variables that hold the forecasts of the first differences of the dependent variables, named by appending the prefix “prefixD ” to the name of the original variable. Example 2 Plots of the forecasts from different models along with the observations from a holdout sample can provide insights to their relative forecasting performance. Continuing the previous example, fcast compute — Compute dynamic forecasts after var, svar, or vec 191 −.05 0 .05 .1 . var dln_inc dln_consump dln_inv if qtr<tq(1979q1), lags(1/6) (output omitted ) . fcast compute m6_, step(8) . graph twoway line m6_dln_inv m2_dln_inv dln_inv qtr > if m6_dln_inv < ., legend(cols(1)) 1978q4 1979q2 1979q4 quarter 1980q2 1980q4 m6_dln_inv, dyn(1979q1) m2_dln_inv, dyn(1979q1) first−difference of ln_inv The model with six lags predicts changes in investment better than the two-lag model in some periods but markedly worse in other periods. Methods and formulas Predictions after var and svar A VAR with endogenous variables yt and exogenous variables xt can be written as yt = v + A1 yt−1 + · · · + Ap yt−p + Bxt + ut where t = 1, . . . , T yt = (y1t , . . . , yKt )0 is a K × 1 random vector, the Ai are fixed (K × K) matrices of parameters, xt is an (M × 1) vector of exogenous variables, B is a (K × M ) matrix of coefficients, v is a (K × 1) vector of fixed parameters, and ut is assumed to be white noise; that is, E(ut ) = 0K E(ut u0t ) = Σ E(ut u0s ) = 0K for t 6= s fcast compute will dynamically predict the variables in the vector yt conditional on p initial values of the endogenous variables and any exogenous xt . Adopting the notation from Lütkepohl (2005, 402) to fit the case at hand, the optimal h-step-ahead forecast of yt+h conditional on xt is 192 fcast compute — Compute dynamic forecasts after var, svar, or vec b 1 yt (h − 1) + · · · + A b p yt (h − p) + Bx b t b+A yt (h) = v (1) If there are no exogenous variables, (1) becomes b 1 yt (h − 1) + · · · + A b p yt (h − p) b+A yt (h) = v When there are no exogenous variables, fcast compute can compute the asymptotic confidence bounds. As shown by Lütkepohl (2005, 204–205), the asymptotic estimator of the covariance matrix of the prediction error is given by b (h) b (h) = Σ b y (h) + 1 Ω Σ b y T (2) where b y (h) = Σ h−1 X b iΣ bΦ b 0i Φ i=0 (h−1 ) (h−1 )0 T X  h−1−i 1 X X 0  b 0 h−1−i b b 0 b0 b b Z B ⊗ Φi Σβ Ω(h) = Zt B ⊗ Φi T t=0 i=0 t i=0   1 0 0 ... 0 0 b b b b b A1 A2 . . . Ap−1 Ap  v   0 0   0 IK 0 . . . b =  B 0 0   0 0 IK . .. ..  ..  ..  . . . 0 0 0 ... 0 0 Zt = (1, yt , . . . , yt−p−1 )0 b 0 = IK Φ i X bj bi = b i−j A Φ Φ IK 0 i = 1, 2, . . . j=1 b j = 0 for j > p A b is the estimate of the covariance matrix of the innovations, and Σ b β is the estimated VCE of the Σ coefficients in the VAR. The formula in (2) is general enough to handle the case in which constraints are placed on the coefficients in the VAR(p). b y (h) is the estimated mean squared error (MSE) of the Equation (2) is made up of two terms. Σ b b (h) forecast. Σy (h) estimates the error in the forecast arising from the unseen innovations. T −1 Ω estimates the error in the forecast that is due to using estimated coefficients instead of the true coefficients. As the sample size grows, uncertainty with respect to the coefficient estimates decreases, b (h) goes to zero. and T −1 Ω fcast compute — Compute dynamic forecasts after var, svar, or vec 193 If yt is normally distributed, the bounds for the asymptotic (1 − α)100% interval around the forecast for the k th component of yt , h periods ahead, are bk,t (h) ± z( α2 ) σ y bk (h) (3) b (h). where σ bk (h) is the k th diagonal element of Σ b y Specifying the bs option causes the standard errors to be computed via simulation, using bootstrapped residuals. Both var and svar contain estimators for the coefficients of a VAR that are conditional on the first p observations on the endogenous variables in the data. Similarly, these algorithms are conditional on the first p observations of the endogenous variables in the data. However, the simulation-based estimates of the standard errors are also conditional on the estimated coefficients. The asymptotic standard errors are not conditional on the coefficient estimates because the second term on the right-hand side of (2) accounts for the uncertainty arising from using estimated parameters. For a simulation with R repetitions, this method uses the following algorithm: 1. Fit the model and save the estimated coefficients. 2. Use the estimated coefficients to calculate the residuals. 3. Repeat steps 3a–3c R times. 3a. Draw a simple random sample with replacement of size T + h from the residuals. When the tth observation is drawn, all K residuals are selected, preserving any contemporaneous correlation among the residuals. 3b. Use the sampled residuals, p initial values of the endogenous variables, any exogenous variables, and the estimated coefficients to construct a new sample dataset. 3c. Save the simulated endogenous variables for the h forecast periods in the bootstrapped dataset. 4. For each endogenous variable and each forecast period, the simulated standard error is the estimated standard error of the R simulated forecasts. By default, the upper and lower bounds of the (1 − α)100% are estimated using the simulation-based estimates of the standard errors and the normality assumption, as in (3). If the bscentile option is specified, the sample centiles for the upper and lower bounds of the R simulated forecasts are used for the upper and lower bounds of the confidence intervals. If the bsp option is specified, a parametric simulation algorithm is used. Specifically, everything is as above except that 3a is replaced by 3a(bsp) as follows: 3a(bsp). Draw T + h observations from a multivariate normal distribution with covariance b matrix Σ. The algorithm above assumes that h forecast periods come after the original sample of T observations. If the h forecast periods lie within the original sample, smaller simulated datasets are sufficient. Dynamic forecasts after vec Methods and formulas of [TS] vec discusses how to obtain the one-step predicted differences and levels. fcast compute uses the previous dynamic predictions as inputs for later dynamic predictions. 194 fcast compute — Compute dynamic forecasts after var, svar, or vec Per Lütkepohl (2005, sec. 6.5), fcast compute uses  b (h) = Σ b y T T −d  h−1 X b iΩ bΦ bi Φ i=0 b i are the estimated matrices of impulse–response functions, T is the number of observations where the Φ b is the estimated cross-equation variance in the sample, d is the number of degrees of freedom, and Ω b matrix. The formulas for d and Ω are given in Methods and formulas of [TS] vec. b (h). The estimated standard errors at step h are the square roots of the diagonal elements of Σ b y Per Lütkepohl (2005), the estimated forecast-error variance does not consider parameter uncertainty. As the sample size gets infinitely large, the importance of parameter uncertainty diminishes to zero. References Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Also see [TS] fcast graph — Graph forecasts after fcast compute [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title fcast graph — Graph forecasts after fcast compute Description Options Quick start Remarks and examples Menu Also see Syntax Description fcast graph graphs dynamic forecasts of the endogenous variables from a VAR(p) or VECM that has already been obtained from fcast compute; see [TS] fcast compute. Quick start Graph forecasts in f y1 after fcast compute fcast graph f_y1 As above, and include observed values of the predicted variable fcast graph f_y1, observed As above, but suppress confidence bands fcast graph f_y1, observed noci Menu Statistics > Multivariate time series > VEC/VAR forecasts 195 > Graph forecasts 196 fcast graph — Graph forecasts after fcast compute Syntax fcast graph varlist  if   in   , options  where varlist contains one or more forecasted variables generated by fcast compute. Description options Main differences noci observed graph forecasts of the first-differenced variables (vec only) suppress confidence bands include observed values of the predicted variables Forecast plot cline options affect rendition of the forecast lines CI plot ciopts(area options) affect rendition of the confidence bands Observed plot obopts(cline options) affect rendition of the observed values Y axis, Time axis, Titles, Legend, Overall twoway options byopts(by option) any options other than by() documented in [G-3] twoway options affect appearance of the combined graph; see [G-3] by option Options   Main differences specifies that the forecasts of the first-differenced variables be graphed. This option is available only with forecasts computed by fcast compute after vec. The differences option implies noci. noci specifies that the confidence intervals be suppressed. By default, the confidence intervals are included. observed specifies that observed values of the predicted variables be included in the graph. By default, observed values are not graphed.   Forecast plot cline options affect the rendition of the plotted lines corresponding to the forecast; [G-3] cline options.  see  CI plot ciopts(area options) affects the rendition of the confidence bands for the forecasts; see [G-3] area options.   Observed plot obopts(cline options) affects the rendition of the observed values of the predicted variables; see [G-3] cline options. This option implies the observed option. fcast graph — Graph forecasts after fcast compute  197  Y axis, Time axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, excluding by(). byopts(by option) are documented in [G-3] by option. These options affect the appearance of the combined graph. Remarks and examples fcast graph graphs dynamic forecasts created by fcast compute. Example 1 In this example, we use a cointegrating VECM to model the state-level unemployment rates in Missouri, Indiana, Kentucky, and Illinois, and we graph the forecasts against a 6-month holdout sample. . use http://www.stata-press.com/data/r14/urates . vec missouri indiana kentucky illinois if t < tm(2003m7), trend(rconstant) > rank(2) lags(4) (output omitted ) . fcast compute m1_, step(6) . fcast graph m1_missouri m1_indiana m1_kentucky m1_illinois, observed 3 5 4 5.5 6 5 6.5 6 Forecast for indiana 7 Forecast for missouri Forecast for illinois 5 5.5 6 5.5 6 6.5 7 6.5 7 7.5 Forecast for kentucky 2003m6 2003m8 2003m10 2003m122003m6 95% CI observed 2003m8 2003m10 2003m12 forecast Because the 95% confidence bands for the predicted unemployment rates in Missouri and Indiana do not contain all their observed values, the model does not reliably predict these unemployment rates. 198 fcast graph — Graph forecasts after fcast compute Also see [TS] fcast compute — Compute dynamic forecasts after var, svar, or vec [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title forecast — Econometric model forecasting Description Also see Quick start Syntax Remarks and examples References Description forecast is a suite of commands for obtaining forecasts by solving models, collections of equations that jointly determine the outcomes of one or more variables. Equations can be stochastic relationships fit using estimation commands such as regress, ivregress, var, or reg3; or they can be nonstochastic relationships, called identities, that express one variable as a deterministic function of other variables. Forecasting models may also include exogenous variables whose values are already known or determined by factors outside the purview of the system being examined. The forecast commands can also be used to obtain dynamic forecasts in single-equation models. The forecast suite lets you incorporate outside information into your forecasts through the use of add factors and similar devices, and you can specify the future path for some model variables and obtain forecasts for other variables conditional on that path. Each set of forecast variables has its own name prefix or suffix, so you can compare forecasts based on alternative scenarios. Confidence intervals for forecasts can be obtained via stochastic simulation and can incorporate both parameter uncertainty and additive error terms. forecast works with both time-series and panel datasets. Time-series datasets may not contain any gaps, and panel datasets must be strongly balanced. This manual entry provides an overview of forecasting models and several examples showing how the forecast commands are used together. See the individual subcommands’ manual entries for detailed discussions of the various options available and specific remarks about those subcommands. Quick start Estimate a linear and an ARIMA regression and store their results as myreg and myarima, respectively regress y1 x1 x2 estimates store myreg arima y2 x3 y1, ar(1) ma(1) estimates store myarima Create a forecast model with the name mymodel forecast create mymodel Add stored estimates myreg and myarima to forecast model mymodel forecast estimates myreg forecast estimates myarima Compute dynamic forecasts from 2012 through 2020 of y1 and y2 using mymodel with nonmissing values of x1, x2, and x3 for the entire forecast horizon forecast solve, begin(2012) end(2020) See [TS] forecast adjust, [TS] forecast coefvector, [TS] forecast create, [TS] forecast describe, [TS] forecast drop, [TS] forecast estimates, [TS] forecast exogenous, [TS] forecast identity, [TS] forecast list, and [TS] forecast solve for additional Quick starts. 199 200 forecast — Econometric model forecasting Syntax forecast subcommand . . .  , options  subcommand Description create estimates identity coefvector exogenous solve adjust describe list clear drop query create a new model add estimation result to current model specify an identity (nonstochastic equation) specify an equation via a coefficient vector declare exogenous variables obtain one-step-ahead or dynamic forecasts adjust a variable by add factoring, replacing, etc. describe a model list all forecast commands composing current model clear current model from memory drop forecast variables check whether a forecast model has been started Remarks and examples A forecasting model is a system of equations that jointly determine the outcomes of one or more endogenous variables, whereby the term endogenous variables contrasts with exogenous variables, whose values are not determined by the interplay of the system’s equations. A model, in the context of the forecast commands, consists of 1. zero or more stochastic equations fit using Stata estimation commands and added to the current model using forecast estimates. These stochastic equations describe the behavior of endogenous variables. 2. zero or more nonstochastic equations (identities) defined using forecast identity. These equations often describe the behavior of endogenous variables that are based on accounting identities or adding-up conditions. 3. zero or more equations stored as coefficient vectors and added to the current model using forecast coefvector. Typically, you will fit your equations in Stata and use forecast estimates to add them to the model. forecast coefvector is used to add equations obtained elsewhere. 4. zero or more exogenous variables declared using forecast exogenous. 5. at least one stochastic equation or identity. 6. optional adjustments to be made to the variables of the model declared using forecast adjust. One use of adjustments is to produce forecasts under alternative scenarios. The forecast commands are designed to be easy to use, so without further ado, we dive headfirst into an example. forecast — Econometric model forecasting 201 Example 1: Klein’s model Example 3 of [R] reg3 shows how to fit Klein’s (1950) model of the U.S. economy using the three-stage least-squares estimator (3SLS). Here we focus on how to make forecasts from that model once the parameters have been estimated. In Klein’s model, there are seven equations that describe the seven endogenous variables. Three of those equations are stochastic relationships, while the rest are identities: ct it wpt yt pt kt wt = β0 + β1 pt + β2 pt−1 + β3 wt + 1t = β4 + β5 pt + β6 pt−1 + β7 kt−1 + 2t = β8 + β9 yt + β10 yt−1 + β11 yrt + 3t = ct + it + gt = yt − tt − wpt = kt−1 + it = wgt + wpt (1) (2) (3) (4) (5) (6) (7) The variables in the model are defined as follows: Name Description Type c p wp wg w i k y g t yr Consumption Private-sector profits Private-sector wages Government-sector wages Total wages Investment Capital stock National income Government spending Indirect bus. taxes + net exports Time trend = Year − 1931 endogenous endogenous endogenous exogenous endogenous endogenous endogenous endogenous exogenous exogenous exogenous Our model has four exogenous variables: government-sector wages (wg), government spending (g), a time-trend variable (yr), and, for simplicity, a variable that lumps indirect business taxes and net exports together (t). To make out-of-sample forecasts, we must populate those variables over the entire forecast horizon before solving our model. (We use the phrases “solve our model” and “obtain forecasts from our model” interchangeably.) We will illustrate the entire process of fitting and forecasting our model, though our focus will be on the latter task. See [R] reg3 for a more in-depth look at fitting models like this one. Before we solve our model, we first estimate the parameters of the stochastic equations by loading the dataset and calling reg3: 202 forecast — Econometric model forecasting . use http://www.stata-press.com/data/r14/klein2 . reg3 (c p L.p w) (i p L.p L.k) (wp y L.y yr), endog(w p y) exog(t wg g) Three-stage least-squares regression Equation c i wp Obs Parms RMSE "R-sq" chi2 P 21 21 21 3 3 3 .9443305 1.446736 .7211282 0.9801 0.8258 0.9863 864.59 162.98 1594.75 0.0000 0.0000 0.0000 [95% Conf. Interval] Coef. Std. Err. z P>|z| c p --. L1. .1248904 .1631439 .1081291 .1004382 1.16 1.62 0.248 0.104 -.0870387 -.0337113 .3368194 .3599992 w _cons .790081 16.44079 .0379379 1.304549 20.83 12.60 0.000 0.000 .715724 13.88392 .8644379 18.99766 p --. L1. -.0130791 .7557238 .1618962 .1529331 -0.08 4.94 0.936 0.000 -.3303898 .4559805 .3042316 1.055467 k L1. -.1948482 .0325307 -5.99 0.000 -.2586072 -.1310893 _cons 28.17785 6.793768 4.15 0.000 14.86231 41.49339 y --. L1. .4004919 .181291 .0318134 .0341588 12.59 5.31 0.000 0.000 .3381388 .1143411 .462845 .2482409 yr _cons .149674 1.797216 .0279352 1.115854 5.36 1.61 0.000 0.107 .094922 -.3898181 .2044261 3.984251 i wp Endogenous variables: Exogenous variables: c i wp w p y L.p L.k L.y yr t wg g The output from reg3 indicates that we have a total of six endogenous variables even though our model in fact has seven. The discrepancy stems from (6) of our model. The capital stock variable (k) is a function of the endogenous investment variable and is therefore itself endogenous. However, kt does not appear in any of our model’s stochastic equations, so we did not declare it in the endog() option of reg3; from a purely estimation perspective, the contemporaneous value of the capital stock variable is irrelevant, though it does play a role in terms of solving our model. We next store the estimation results using estimates store: . estimates store klein Now we are ready to define our model using the forecast commands. We first tell Stata to initialize a new model; we will call our model kleinmodel: . forecast create kleinmodel Forecast model kleinmodel started. forecast — Econometric model forecasting 203 The name you give the model mainly controls how output from forecast commands is labeled. More importantly, forecast create creates the internal data structures Stata uses to keep track of your model. The next step is to add all the equations to the model. To add the three stochastic equations we fit using reg3, we use forecast estimates: . forecast estimates klein Added estimation results from reg3. Forecast model kleinmodel now contains 3 endogenous variables. That command tells Stata to find the estimates stored as klein and add them to our model. forecast estimates uses those estimation results to determine that there are three endogenous variables (c, i, and wp), and it will save the estimated parameters and other information that forecast solve will later need to obtain predictions for those variables. forecast estimates confirmed our request by reporting that the estimation results added were from reg3. forecast estimates reports that our forecast model has three endogenous variables because our reg3 command included three left-hand-side variables. The fact that we specified three additional endogenous variables in the endog() option of reg3 so that reg3 reports a total of six endogenous variables is irrelevant to forecast. All that matters is the number of left-hand-side variables in the model. We also need to specify the four identities, equations (4) through (7), that determine the other four endogenous variables in our model. To do that, we use forecast identity: . forecast Forecast . forecast Forecast identity y = c + model kleinmodel identity p = y model kleinmodel i + now t now g contains 4 endogenous variables. wp contains 5 endogenous variables. . forecast Forecast . forecast Forecast identity k = L.k + i model kleinmodel now contains 6 endogenous variables. identity w = wg + wp model kleinmodel now contains 7 endogenous variables. You specify identities similarly to how you use the generate command, except that the left-hand-side variable is an endogenous variable in your model rather than a new variable you want to create in your dataset. Time-series operators often come in handy when specifying identities; here we expressed capital, a stock variable, as its previous value plus current-period investment, a flow variable. An identity defines an endogenous variable, so each time we use forecast identity, the number of endogenous variables in our forecast model increases by one. Finally, we will tell Stata about the four exogenous variables. We do that with the forecast exogenous command: . forecast Forecast . forecast Forecast . forecast Forecast . forecast Forecast exogenous wg model kleinmodel exogenous g model kleinmodel exogenous t model kleinmodel exogenous yr model kleinmodel now contains 1 declared exogenous variable. now contains 2 declared exogenous variables. now contains 3 declared exogenous variables. now contains 4 declared exogenous variables. 204 forecast — Econometric model forecasting forecast keeps track of the exogenous variables that you declare using the forecast exogenous command and reports the number currently in the model. When you later use forecast solve, forecast verifies that these variables contain nonmissing data over the forecast horizon. In fact, we could have instead typed . forecast exogenous wg g t yr but to avoid confusing ourselves, we prefer to issue one command for each variable in our model. Now Stata knows everything it needs to know about the structure of our model. klein2.dta in memory contains annual observations from 1920 to 1941. Before we make out-of-sample forecasts, we should first see how well our model works by comparing its forecasts with actual data. There are a couple of ways to do that. The first is to produce static forecasts. In static forecasts, actual values of all lagged variables that appear in the model are used. Because actual values will be missing beyond the last historical time period in the dataset, static forecasts can only forecast one period into the future (assuming only first lags appear in the model); for that reason, they are often called one-step-ahead forecasts. To obtain these one-step-ahead forecasts, we type . forecast solve, prefix(s_) begin(1921) static Computing static forecasts for model kleinmodel. Starting period: 1921 Ending period: 1941 Forecast prefix: s_ 1921: ............................................ 1922: .............................................. 1923: ............................................. (output omitted ) 1940: ............................................. 1941: .............................................. Forecast 7 variables spanning 21 periods. We specified begin(1921) to request that the first year for which forecasts are produced be 1921. Our model includes variables that are lagged one period; because our data start in 1920, 1921 is the first year in which we can evaluate all the equations of the model. If we did not specify the begin(1921) option, forecast solve would have started forecasting in 1941. By default, forecast solve looks for the earliest time period in which any of the endogenous variables contains a missing value and begins forecasting in that period. In klein2.dta, k is missing in 1941. The header of the output confirms that we requested static forecasts for our model, and it indicates that it will produce forecasts from 1921 through 1941, the last year in our dataset. By default, forecast solve produces a status report in which the time period being forecast is displayed along with a dot for each iteration the equation solver performs. The footer of the output confirms that we forecast seven endogenous variables for 21 years. forecast — Econometric model forecasting 205 The command we just typed will create seven new variables in our dataset, one for each endogenous variable, containing the static forecasts. Because we specified prefix(s ), the seven new variables will be named s c, s i, s wp, s y, s p, s k, and s w. Here we graph a subset of the variables and their forecasts: Static Forecasts Consumption 40 50 60 70 40 50 60 70 80 90 Total Income 1920 1925 1930 year 1935 1940 1920 1925 1935 1940 Private Wages −10 −5 0 5 20 30 40 50 60 Investment 1930 year 1920 1925 1930 year 1935 1940 1920 1925 1930 year 1935 1940 Solid lines denote actual values. Dashed lines denote forecast values. Our static forecasts appear to fit the data relatively well. Had they not fit well, we would have to go back and reexamine the specification of our model. If the static forecasts are poor, then the dynamic forecasts that use previous periods’ forecast values are unlikely to work well either. On the other hand, even if the model produces good static forecasts, it may not produce accurate dynamic forecasts more than one or two periods into the future. Another way to check how well a model forecasts is to produce dynamic forecasts for time periods in which observed values are available. Here we begin dynamic forecasts in 1936, giving us six years’ data with which to compare actual and forecast values and then graph our results: . forecast solve, prefix(d_) begin(1936) Computing dynamic forecasts for model kleinmodel. Starting period: 1936 Ending period: 1941 Forecast prefix: d_ 1936: ............................................ 1937: .......................................... 1938: ............................................. 1939: ............................................. 1940: ............................................ 1941: .............................................. Forecast 7 variables spanning 6 periods. 206 forecast — Econometric model forecasting Dynamic Forecasts Consumption 40 50 60 70 40 50 60 70 80 90 Total Income 1920 1925 1930 year 1935 1940 1920 1930 year 1935 1940 Private Wages −5 0 5 20 30 40 50 60 Investment 1925 1920 1925 1930 year 1935 1940 1920 1925 1930 year 1935 1940 Solid lines denote actual values. Dashed lines denote forecast values. Most of the in-sample forecasts look okay, though our model was unable to predict the outsized increase in investment in 1936 and the sharp drop in 1938. Our first example was particularly easy because all the endogenous variables appeared in levels. However, oftentimes the endogenous variables are better modeled using mathematical transformations such as logarithms, first differences, or percentage changes; transformations of the endogenous variables may appear as explanatory variables in other equations. The next few examples illustrate these complications. Example 2: Models with transformed endogenous variables hardware.dta contains hypothetical quarterly sales data from the Hughes Hardware Company, a huge regional distributor of building products. Hughes Hardware has three main product lines: dimensional lumber (dim), sheet goods such as plywood and fiberboard (sheet), and miscellaneous hardware, including fasteners and hand tools (misc). Based on past experience, we know that dimensional lumber sales are closely tied to the level of new home construction and that other product lines’ sales can be modeled in terms of the quantity of lumber sold. We are going to use the following set of equations to model sales of the three product lines: %∆dimt = β10 + β11 ln(startst ) + β12 %∆gdpt + β13 unratet + 1t sheett = β20 + β21 dimt + β22 %∆gdpt + β23 unratet + 2t misct = β30 + β31 dimt + β32 %∆gdpt + β33 unratet + 3t Here startst represents the number of new homes for which construction began in quarter t, gdpt denotes real (inflation-adjusted) gross domestic product (GDP), and unratet represents the quarterly average unemployment rate. Our equation for dimt is written in terms of percentage changes from quarter to quarter rather than in levels, and the percentage change in GDP appears as a regressor in each equation rather than the level of GDP itself. In our model, these three macroeconomic factors are exogenous, and here we will reserve the last few years’ data to make forecasts; in practice, we would need to make our own forecasts of these macroeconomic variables or else purchase a forecast. We will approximate the percentage change variables by taking first-differences of the natural logarithms of the respective underlying variables. In terms of estimation, this does not present any challenges. Here we load the dataset into memory, create the necessary log-transformed variables, forecast — Econometric model forecasting 207 and fit the three equations using regress with the data through the end of 2009. We use quietly to suppress the output from regress to save space, and we store each set of estimation results as we go. In Stata, we type . use http://www.stata-press.com/data/r14/hardware, clear (Hughes Hardware sales data) . generate lndim = ln(dim) . generate lngdp = ln(gdp) . generate lnstarts = ln(starts) . . . . . . quietly regress estimates store quietly regress estimates store quietly regress estimates store D.lndim lnstarts D.lngdp unrate if qdate <= tq(2009q4) dim sheet dim D.lngdp unrate if qdate <= tq(2009q4) sheet misc dim D.lngdp unrate if qdate <= tq(2009q4) misc The equations for sheet goods and miscellaneous items do not present any challenges for forecast, so we proceed by creating a new forecast model named salesfcast and adding those two equations: . forecast create salesfcast, replace (Forecast model kleinmodel ended.) Forecast model salesfcast started. . forecast estimates sheet Added estimation results from regress. Forecast model salesfcast now contains 1 endogenous variable. . forecast estimates misc Added estimation results from regress. Forecast model salesfcast now contains 2 endogenous variables. The equation for dimensional lumber requires more finesse. First, because our dependent variable contains a time-series operator, we must use the names() option of forecast estimates to specify a valid name for the endogenous variable being added: . forecast estimates dim, names(dlndim) Added estimation results from regress. Forecast model salesfcast now contains 3 endogenous variables. We have entered the endogenous variable dlndim into our model, but it represents the left-hand-side variable of the regression equation we just added. That is, dlndim is the first-difference of the logarithm of dim, the sales variable we ultimately want to forecast. We can specify an identity to reverse the first-differencing, providing us with a variable containing the logarithm of dim: . forecast identity lndim = L.lndim + dlndim Forecast model salesfcast now contains 4 endogenous variables. Finally, we can specify another identity to obtain dim from lndim: . forecast identity dim = exp(lndim) Forecast model salesfcast now contains 5 endogenous variables. 208 forecast — Econometric model forecasting Now we can solve the model. We will obtain dynamic forecasts starting in the first quarter of 2010, and we will use the log(off) option to suppress the iteration log: . forecast solve, begin(tq(2010q1)) log(off) Computing dynamic forecasts for model salesfcast. Starting period: 2010q1 Ending period: 2012q3 Forecast prefix: f_ Forecast 5 variables spanning 11 periods. We did not specify the prefix() or suffix() option, so by default, forecast prefixed our forecast variables with f . The following graph illustrates our forecasts: Hughes Hardware Sales ($mil.) 350 Dimensional Lumber 300 250 160 Sheet Goods 130 100 200 Miscellany 150 100 2008q1 2009q1 2010q1 2011q1 Forecast Actual 2012q1 Our model performed well in 2010, but it did not forecast the pickup in sales that occurred in 2011 and 2012. Technical note For more information about working with log-transformed variables, see the second technical note in [TS] forecast estimates. The forecast commands can also be used to make forecasts for strongly balanced panel datasets. A panel dataset is strongly balanced when all the panels have the same number of observations, and the observations for different panels were all made at the same times. Our next example illustrates how to produce a forecast with panel data and highlights a couple of key assumptions one must make. Example 3: Forecasting a panel dataset In the previous example, we mentioned that Hughes Hardware was a regional distributor of building products. In fact, Hughes Hardware operates in five states across the southern United States: Texas, Oklahoma, Louisiana, Arkansas, and Mississippi. The company is in the process of deciding whether it should open additional distribution centers or move existing ones to new locations. As part of the process, we need to make sales forecasts for each of the states the company serves. forecast — Econometric model forecasting 209 To make our state-level forecasts, we will use essentially the same model that we did for the company-wide forecast, though we will also include state-specific effects. The model we will use is %∆dimit = β10 + β11 ln(startsit ) + β12 rgspgrowthit + β13 unrateit + u1i + 1it sheetit = β20 + β21 dimit + β22 rgspgrowthit + β23 unrateit + u2i + 2it miscit = β30 + β31 dimit + β32 rgspgrowthit + β33 unrateit + u3i + 3it The subscript i indexes states, and we have replaced the gdp variable that was in our previous model with rgspgrowth, which measures the annual growth rate in real gross state product (GSP), the state-level analogue to national GDP. The GSP data are released only annually, so we have replicated the annual growth rate for all four quarterly observations in a given year. For example, rgspgrowth is about 5.3 for the four observations for the state of Texas in the year 2007; in 2007, Texas’s real GSP was 5.3% higher than in 2006. The state-level error terms are u1i , u2i , and u3i . Here we will use the fixed-effects estimator and fit the three equations via xtreg, fe, again using data only through the end of 2009 so that we can examine how well our model forecasts. Our first task is to fit the three equations and store the estimation results. At the same time, we will also use predict to obtain the predicted fixed-effects terms. You will see why in just a moment. Because the regression results are not our primary concern here, we will use quietly to suppress the output. In Stata, we type . use http://www.stata-press.com/data/r14/statehardware, clear (Hughes state-level sales data) . generate lndim = ln(dim) . generate lnstarts = ln(starts) . quietly xtreg D.lndim lnstarts rgspgrowth unrate if qdate <= tq(2009q4), fe . predict dlndim_u, u (45 missing values generated) . estimates store dim . quietly xtreg sheet dim rgspgrowth unrate if qdate <= tq(2009q4), fe . predict sheet_u, u (40 missing values generated) . estimates store sheet . quietly xtreg misc dim rgspgrowth unrate if qdate <= tq(2009q4), fe . predict misc_u, u (40 missing values generated) . estimates store misc Having fit the model, we are almost ready to make forecasts. First, though, we need to consider how to handle the state-level error terms. If we simply created a forecast model, added our three estimation results, then called forecast solve, Stata would forecast miscit , for example, as a function of dimit , rgspgrowthit , unrateit , and the estimate of the constant term β30 . However, our model implies that miscit also depends on u3i and the idiosyncratic error term 3it . We will ignore the idiosyncratic error for now (but see the discussion of simulations in [TS] forecast solve). By construction, u3i has a mean of zero when averaged across all panels, but in general, u3i is nonzero for any individual panel. Therefore, we should include it in our forecasts. After you fit a model with xtreg, you can predict the panel-specific error component for the subset of observations in the estimation sample. Typically, xtreg is used in situations where the number of observations per panel T is modest. In those cases, the estimates of the panel-specific error components are likely to be “noisy” (analogous to estimating a sample mean with just a few observations). Often asymptotic analyses of panel-data estimators assume T is fixed, and in those cases, the estimators of the panel-specific errors are inconsistent. 210 forecast — Econometric model forecasting However, in forecasting applications, the number of observations per panel is usually larger than in most other panel-data applications. With enough observations, we can have more confidence in the estimated panel-specific errors. If we are willing to assume that we have decent estimates of the panel-specific errors and that those panel-level effects will remain constant over the forecast horizon, then we can incorporate them into our forecasts. Because predict only provided us with estimates of the panel-level effects for the estimation sample, we need to extend them into the forecast horizon. An easy way to do that is to use egen to create a new set of variables: . by state: egen dlndim_u2 = mean(dlndim_u) . by state: egen sheet_u2 = mean(sheet_u) . by state: egen misc_u2 = mean(misc_u) We can use forecast adjust to incorporate these terms into our forecasts. The following commands define our forecast model, including the estimated panel-specific terms: . forecast create statemodel, replace (Forecast model salesfcast ended.) Forecast model statemodel started. . forecast estimates dim, name(dlndim) Added estimation results from xtreg. Forecast model statemodel now contains 1 endogenous . forecast adjust dlndim = dlndim + dlndim_u2 Endogenous variable dlndim now has 1 adjustment. . forecast identity lndim = L.lndim + dlndim Forecast model statemodel now contains 2 endogenous . forecast identity dim = exp(lndim) Forecast model statemodel now contains 3 endogenous . forecast estimates sheet Added estimation results from xtreg. Forecast model statemodel now contains 4 endogenous . forecast adjust sheet = sheet + sheet_u2 Endogenous variable sheet now has 1 adjustment. . forecast estimates misc Added estimation results from xtreg. Forecast model statemodel now contains 5 endogenous . forecast adjust misc = misc + misc_u2 Endogenous variable misc now has 1 adjustment. variable. variables. variables. variables. variables. We used forecast adjust to perform our adjustment to dlndim immediately after we added those estimation results so that we would not forget to do so and before we used identities to obtain the actual dim variable. However, we could have specified the adjustment at any time. Regardless of when you specify an adjustment, forecast solve performs those adjustments immediately after the variable being adjusted is computed. forecast — Econometric model forecasting 211 Now we can solve our model. Here we obtain dynamic forecasts beginning in the first quarter of 2010: . forecast solve, begin(tq(2010q1)) Computing dynamic forecasts for model statemodel. Starting period: 2010q1 Ending period: 2011q4 Number of panels: 5 Forecast prefix: f_ Solving panel 1 Solving panel 2 Solving panel 3 Solving panel 4 Solving panel 5 Forecast 5 variables spanning 8 periods for 5 panels. Here is our state-level forecast for sheet goods: Sales of Sheet Goods ($mil.) MS 8 6 4 5 6 7 13 14 15 16 17 10 LA 8 AR 2008 2010 2012 TX 8 70 9 80 10 11 12 90 100 110 OK 2008 2010 2012 2008 Forecast 2010 2012 Actual Similar to our company-wide forecast, our state-level forecast failed to call the bottom in sales that occurred in 2011. Because our model missed the shift in sales momentum in every one of the five states, we would be inclined to go back and try respecifying one or more of the equations in our model. On the other hand, if our model forecasted most of the states well but performed poorly in just a few states, then we would first want to investigate whether any events in those states could account for the unexpected results. Technical note Stata also provides the areg command for fitting a linear regression with a large dummy-variable set and is designed for situations where the number of groups (panels) is fixed, while the number of observations per panel increases with the sample size. When the goal is to create a forecast model for panel data, you should nevertheless use xtreg rather than areg. The forecast commands require knowledge of the panel-data settings declared using xtset as well as panel-related estimation information saved by the other panel-data commands in order to produce forecasts with panel datasets. 212 forecast — Econometric model forecasting In the previous example, none of our equations contained lagged dependent variables as regressors. If an equation did contain a lagged dependent variable, then one could use a dynamic panel-data (DPD) estimator such as xtabond, xtdpd, or xtdpdsys. DPD estimators are designed for cases where the number of observations per panel T is small. As shown by Nickell (1981), the bias of the standard fixed- and random-effects estimators in the presence of lagged dependent variables is of order 1/T and is thus particularly severe when each panel has relatively few observations. Judson and Owen (1999) perform Monte Carlo experiments to examine the relative performance of different panel-data estimators in the presence of lagged dependent variables when used with panel datasets having dimensions more commonly encountered in macroeconomic applications. Based on their results, while the bias of the standard fixed-effects estimator (LSDV in their notation) is not inconsequential even when T = 20, for T = 30, the fixed-effects estimator does work as well as most alternatives. The only estimator that appreciably outperformed the standard fixed-effects estimator when T = 30 is the least-squares dummy variable corrected estimator (LSDVC in their notation). Bruno (2005) provides a Stata implementation of that estimator. Many datasets used in forecasting situations contain even more observations per panel, so the “Nickell bias” is unlikely to be a major concern. In this manual entry, we have provided an overview of the forecast commands and provided several examples to get you started. The command-specific entries fill in the details. Video example Tour of forecasting References Box-Steffensmeier, J. M., J. R. Freeman, M. P. Hitt, and J. C. W. Pevehouse. 2014. Time Series Analysis for the Social Science. New York: Cambridge University Press. Bruno, G. S. F. 2005. Estimation and inference in dynamic unbalanced panel-data models with a small number of individuals. Stata Journal 5: 473–500. Judson, R. A., and A. L. Owen. 1999. Estimating dynamic panel data models: a guide for macroeconomists. Economics Letters 65: 9–15. Klein, L. R. 1950. Economic Fluctuations in the United States 1921–1941. New York: Wiley. Nickell, S. J. 1981. Biases in dynamic models with fixed effects. Econometrica 49: 1417–1426. Also see [TS] var — Vector autoregressive models [TS] tsset — Declare data to be time-series data [R] ivregress — Single-equation instrumental-variables regression [R] reg3 — Three-stage estimation for systems of simultaneous equations [R] regress — Linear regression [XT] xtreg — Fixed-, between-, and random-effects and population-averaged linear models [XT] xtset — Declare data to be panel data Title forecast adjust — Adjust a variable by add factoring, replacing, etc. Description Reference Quick start Also see Syntax Remarks and examples Stored results Description forecast adjust specifies an adjustment to be applied to an endogenous variable in the model. Adjustments are typically used to produce alternative forecast scenarios or to incorporate outside information into a model. For example, you could use forecast adjust with a macroeconomic model to simulate the effect of an oil price shock whereby the price of oil spikes $50 higher than your model otherwise predicts in a given quarter. Quick start Adjust the endogenous variable y in forecast to account for the variable shock in 1990 forecast adjust y = y + shock if year==1990 Adjust the endogenous variable y in forecast to account for a structural change in its mean that occurred in year 2000 forecast adjust y = y + 400000 if year > 2000 Syntax forecast adjust varname = exp  if   in  varname is the name of an endogenous variable that has been previously added to the model using forecast estimates or forecast coefvector. exp represents a Stata expression; see [U] 13 Functions and expressions. Remarks and examples When preparing a forecast, you often want to produce several different scenarios. The baseline scenario is the default forecast that your model produces. It reflects the interplay among the equations and exogenous variables without any outside forces acting on the model. Users of forecasts often want answers to questions like “What happens to the economy if housing prices decline 10% more than your baseline forecast suggests they will?” or “What happens to unemployment and interest rates if tax rates increase?” forecast adjust lets you explore such questions by specifying alternative paths for one or more endogenous variables in your model. Example 1: Revisiting the Klein model In example 1 of [TS] forecast, we produced a baseline forecast for the classic Klein (1950) model. We noted that investment declined quite substantially in 1938. Suppose the government had a plan such as a one-year investment tax credit that it could enact in 1939 to stimulate investment. Based on discussions with accountants, tax experts, and business leaders, say this plan would encourage an additional $1 billion in investment in 1939. How would this additional investment affect the economy? 213 214 forecast adjust — Adjust a variable by add factoring, replacing, etc. To answer this question, we first refit the Klein (1950) model from [TS] forecast using the data through 1938 and then obtain dynamic forecasts starting in 1939. We will prefix these forecast variables with bl to indicate they are the baseline forecasts. In Stata, we type . use http://www.stata-press.com/data/r14/klein2 . > . . . . . . . . quietly reg3 (c p L.p w) (i p L.p L.k) endog(w p y) exog(t wg g) estimates store klein forecast create kleinmodel Forecast model kleinmodel started. forecast estimates klein Added estimation results from reg3. Forecast model kleinmodel now contains forecast identity y = c + i + g Forecast model kleinmodel now contains forecast identity p = y - t - wp Forecast model kleinmodel now contains forecast identity k = L.k + i Forecast model kleinmodel now contains forecast identity w = wg + wp Forecast model kleinmodel now contains forecast exogenous wg Forecast model kleinmodel now contains . forecast Forecast . forecast Forecast (wp y L.y yr) if year < 1939, 3 endogenous variables. 4 endogenous variables. 5 endogenous variables. 6 endogenous variables. 7 endogenous variables. 1 declared exogenous variable. exogenous g model kleinmodel now contains 2 declared exogenous variables. exogenous t model kleinmodel now contains 3 declared exogenous variables. . forecast exogenous yr Forecast model kleinmodel now contains 4 declared exogenous variables. . forecast solve, prefix(bl_) begin(1939) Computing dynamic forecasts for model kleinmodel. Starting period: 1939 Ending period: 1941 Forecast prefix: bl_ 1939: ....................................................................... .................................................... 1940: ....................................................................... ................................................ 1941: ....................................................................... ................................................. Forecast 7 variables spanning 3 periods. To model our $1 billion increase in investment in 1939, we type . forecast adjust i = i + 1 if year == 1939 Endogenous variable i now has 1 adjustment. While computing the forecasts for 1939, whenever forecast evaluates the equation for i, it will set i to be higher than it would otherwise be by 1. Now we re-solve our model using the prefix alt to indicate this is an alternative forecast: forecast adjust — Adjust a variable by add factoring, replacing, etc. 215 . forecast solve, prefix(alt_) begin(1939) Computing dynamic forecasts for model kleinmodel. Starting period: Ending period: Forecast prefix: 1939: 1940: 1941: 1939 1941 alt_ ....................................................................... ................................................... ....................................................................... .............................................. ....................................................................... ................................................ Forecast 7 variables spanning 3 periods. The following graph shows how investment and total income respond to this policy shock. Effect of $1 billion investment tax credit Total Income 60 0 80 5 $ Billion $ Billion 100 10 15 120 Investment 1938 1939 1940 1941 1938 year 1939 1940 1941 year Solid lines denote forecast without tax credit Dashed lines denote forecast with tax credit Both investment and total income would be higher not just in 1939 but also in 1940; the higher capital stock implied by the additional investment raises total output (and hence income) even after the tax credit expires. Let’s look at these two variables in more detail: . list year bl_i alt_i bl_y alt_y if year >= 1938, sep(0) 19. 20. 21. 22. year bl_i alt_i bl_y alt_y 1938 1939 1940 1941 -1.9 3.757227 7.971523 16.16375 -1.9 6.276423 9.501909 16.20362 60.9 75.57685 89.67435 123.0809 60.9 80.71709 94.08473 124.238 Although we simulated a policy that we thought would encourage $1 billion in investment, investment in fact rises about $2.5 billion in 1939 according to our model. That is because higher investment raises total income, which also affects private-sector profits, which beget further changes in investment, and so on. The investment multiplier in this example might strike you as implausibly large, but it highlights an important attribute of forecasting models. Studying each equation’s estimated coefficients in isolation can help to unveil some specification errors, but one must also consider how those equations interact. 216 forecast adjust — Adjust a variable by add factoring, replacing, etc. It is possible to construct models in which each equation appears to be well specified, but the model nevertheless forecasts poorly or suggests unlikely behavior in response to policy shocks. In the previous example, we applied a single adjustment to a single endogenous variable in a single time period. However, forecast allows you to specify forecast adjust multiple times with each endogenous variable, and many real-world policy simulations require adjustments to multiple variables. You can also consider policies that affect variables for multiple periods. For example, suppose we wanted to see what would happen if our investment tax credit lasted two years instead of one. One way would be to use forecast adjust twice: . forecast adjust i = i + 1 if year == 1939 . forecast adjust i = i + 1 if year == 1940 A second way would be to make that adjustment using one command: . forecast adjust i = i + 1 if year == 1939 | year == 1940 To make adjustments lasting more than one or two periods, you should create an adjustment variable, which makes more sense. A third way to simulate our two-year tax credit is . generate i_adj = 0 . replace i_adj = 1 if year == 1939 | year == 1940 . forecast adjust i = i + i_adj So far in our discussion of forecast adjust, we have always shown an endogenous variable being adjusted by adding a number or variable to it. However, any valid expression is allowed on the right-hand side of the equals sign. If you want to explore the effects of a policy that will increase investment by 10% in 1939, you could type . forecast adjust i = 1.1*i if year == 1939 If you believe investment will be −2.0 in 1939, you could type . forecast adjust i = -2.0 if year == 1939 An alternative way to force forecasts of endogenous variables to take on prespecified values is discussed in example 1 of [TS] forecast solve. Stored results forecast adjust stores the following in r(): Macros r(lhs) r(rhs) r(basenames) r(fullnames) left-hand-side (endogenous) variable right-hand side of identity base names of variables found on right-hand side full names of variables found on right-hand side Reference Klein, L. R. 1950. Economic Fluctuations in the United States 1921–1941. New York: Wiley. Also see [TS] forecast — Econometric model forecasting [TS] forecast solve — Obtain static and dynamic forecasts Title forecast clear — Clear current model from memory Description Syntax Remarks and examples Also see Description forecast clear removes the current forecast model from memory. Syntax forecast clear Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. forecast allows you to have only one model in memory at a time. You use forecast clear to remove the current model from memory. Forecast models themselves do not consume a significant amount of memory, so there is no need to clear a model from memory unless you intend to create a new one. An alternative to forecast clear is the replace option with forecast create. Calling forecast clear when no forecast model exists in memory does not result in an error. Also see [TS] forecast — Econometric model forecasting [TS] forecast create — Create a new forecast model 217 Title forecast coefvector — Specify an equation via a coefficient vector Description Methods and formulas Quick start Also see Syntax Options Remarks and examples Description forecast coefvector adds equations that are stored as coefficient vectors to your forecast model. Typically, equations are added using forecast estimates and forecast identity. forecast coefvector is used in less-common situations where you have a vector of parameters that represent a linear equation. Most users of the forecast commands will not need to use forecast coefvector. We recommend skipping this manual entry until you are familiar with the other features of forecast. Quick start Incorporate coefficient vector of the endogenous equation of y to be used by forecast solve forecast coefvector y As above, but include the variance of the estimated parameters stored in matrix mymat forecast coefvector y, variance(mymat) Syntax forecast coefvector cname  , options  cname is a Stata matrix with one row. options Description variance(vname) errorvariance(ename)   names(namelist , replace ) specify parameter variance matrix specify additive error variance matrix use namelist for names of left-hand-side variables Options variance(vname) specifies that Stata matrix vname contains the variance matrix of the estimated parameters. This option only has an effect if you specify the simulate() option when calling forecast solve and request sim technique’s betas or residuals. See [TS] forecast solve. errorvariance(ename) specifies that the equations being added include an additive error term with variance ename, where ename is the name of a Stata matrix. The number of rows and columns in ename must match the number of equations represented by coefficient vector cname. This option only has an effect if you specify the simulate() option when calling forecast solve and request sim technique’s errors or residuals. See [TS] forecast solve. 218 forecast coefvector — Specify an equation via a coefficient vector 219   names(namelist , replace ) instructs forecast coefvector to use namelist as the names of the left-hand-side variables in the coefficient vector being added. By default, forecast coefvector uses the equation names on the column stripe of cname. You must use this option if any of the equation names stored with cname contains time-series operators. Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. This manual entry also assumes that you are familiar with Stata’s matrices and the concepts of row and column names that can be attached to them; see [P] matrix. You use forecast coefvector to add endogenous variables to your model that are defined by linear equations, where the linear equations are stored in a coefficient (parameter) vector. Remarks are presented under the following headings: Introduction Simulations with coefficient vectors Introduction forecast coefvector can be used to add equations that you obtained elsewhere to your model. For example, you might see the estimated coefficients for an equation in an article and want to add that equation to your model. User-written estimators that do not implement a predict command can also be included in forecast models via forecast coefvector. forecast coefvector can also be useful in situations where you want to simulate time-series data, as the next example illustrates. Example 1: A shock to an autoregressive process Consider the following autoregressive process: yt = 0.9yt−1 − 0.6yt−2 + 0.3yt−3 Suppose yt is initially equal to zero. How does yt evolve in response to a one-unit shock at time t = 5? We can use forecast coefvector to find out. First, we create a small dataset with time variable t and set our target variable y equal to zero: . set obs 20 number of observations (_N) was 0, now 20 . generate t = _n . tsset t time variable: t, 1 to 20 delta: 1 unit . generate y = 0 Now let’s think about our coefficient vector. The only tricky part is in labeling the columns. We can represent the lagged values of yt using time-series operators; there is just one equation, corresponding to variable y. We can use matrix coleq to apply both variable and equation names to the columns of our matrix. In Stata, we type 220 forecast coefvector — Specify an equation via a coefficient vector . matrix y = (.9, -.6, 0.3) . matrix coleq y = y:L.y y:L2.y y:L3.y . matrix list y y[1,3] y: y: L. L2. y y r1 .9 -.6 y: L3. y .3 forecast coefvector ignores the row name of the vector being added (r1 here), so we can leave it as is. Next we create a forecast model and add y: . forecast create Forecast model started. . forecast coefvector y Forecast model now contains 1 endogenous variable. To shock our system at t = 5, we can use forecast adjust: . forecast adjust y = 1 in 5 Endogenous variable y now has 1 adjustment. Now we can solve our model. Because our y variable is filled in for the entire dataset, forecast solve will not be able to automatically determine when forecasting should commence. We have three lags in our process, so we will start at t = 4. To reduce the amount of output, we specify log(off): . forecast solve, begin(4) log(off) Computing dynamic forecasts for current model. Starting period: Ending period: Forecast prefix: 4 20 f_ Forecast 1 variable spanning 17 periods. 0 .2 Response .4 .6 .8 1 Impulse−Response Function 0 5 10 t 15 20 Evolution of yt in response to a unit shock at t = 5. The graph shows our shock causing y to jump to 1 at t = 5. At t = 6, we can see that y = 0.9, and at t = 7, we can see that y = 0.9 × 0.9 − 0.6 × 1 = 0.21. forecast coefvector — Specify an equation via a coefficient vector 221 The previous example used a coefficient vector representing a single equation. However, coefficient vectors can contain multiple equations. For example, say we read an article and saw the following results displayed: xt = 0.2 + 0.3xt−1 − 0.8zt zt = 0.1 + 0.7zt−1 + 0.3xt − 0.2xt−1 We can add both equations at once to our forecast model. Again the key is in labeling the columns. forecast coefvector understands cons to mean a constant term, and it looks at the equation names on the vector’s columns to determine how many equations there are and to what endogenous variables they correspond: . matrix eqvector = (0.2, 0.3, -0.8, 0.1, 0.7, 0.3, -0.2) . matrix coleq eqvector = x:_cons x:L.x x:y y:_cons y:L.y y:x y:L.x . matrix list eqvector eqvector[1,7] x: r1 _cons .2 x: L. x .3 x: y -.8 y: _cons .1 y: L. y .7 y: x .3 y: L. x -.2 We could then type . forecast coefvector y to add our coefficient vector to a model. Just like with estimation results whose left-hand-side variables contain time-series operators, if any of the equation names of the coefficient vector being added contains time-series operators, you must use the names() option of forecast coefvector to specify alternative names. Simulations with coefficient vectors The forecast solve command provides the option simulate(sim technique, . . .) to perform stochastic simulations and obtain measures of forecast uncertainty. How forecast solve handles coefficient vectors when performing these simulations depends on the options provided with forecast coefvector. There are four cases to consider: 1. You specify neither variance() nor errorvariance() with forecast coefvector. You have provided no measures of uncertainty with this coefficient vector. Therefore, forecast solve treats it like an identity. No random errors or residuals are added to this coefficient vector’s linear combination, nor are the coefficients perturbed in any way. 2. You specify variance() but not errorvariance(). The variance() option provides the covariance matrix of the estimated parameters in the coefficient vector. Therefore, the coefficient vector is taken to be stochastic. If you request sim technique betas, this coefficient vector is assumed to be distributed multivariate normal with a mean equal to the original value of the vector and covariance matrix as specified in the variance() option, and random draws are taken from this distribution. If you request sim technique residuals, randomly chosen static residuals are added to this coefficient vector’s linear combination. Because you did not specify a covariance matrix for the error terms with the errorvariance() option, sim technique errors cannot draw random errors for this coefficient vector’s linear combination, so sim technique errors has no impact on the equations. 222 forecast coefvector — Specify an equation via a coefficient vector 3. You specify errorvariance() but not variance(). Because you specified a covariance matrix for the assumed additive error term, the equations represented by this coefficient vector are stochastic. If you request sim technique residuals, randomly chosen static residuals are added to this coefficient vector’s linear combination. If you request sim technique errors, multivariate normal errors with mean zero and covariance matrix as specified in the errorvariance() option are added during the simulations. However, specifying sim technique betas does not affect the equations because there is no covariance matrix associated with the coefficients. 4. You specify both variance() and errorvariance(). The equations represented by this coefficient vector are stochastic, and forecast solve treats the coefficient vector just like an estimation result. sim technique’s betas, residuals, and errors all work as expected. Methods and formulas Let β denote the 1 ×k coefficient vector being added. Then the matrix specified in the variance() option must be k × k . Row and column names for that matrix are ignored. Let m denote the number of equations represented by β . That is, if β is stored as Stata matrix beta and local macro m is to hold the number of equations, then in Stata parlance, . local eqnames : coleq beta . local eq : list uniq eqnames . local m : list sizeof eq Then the matrix specified in the errorvariance option must be m × m. Row and column names for that matrix are ignored. Also see [TS] forecast — Econometric model forecasting [TS] forecast solve — Obtain static and dynamic forecasts [P] matrix — Introduction to matrix commands [P] matrix rownames — Name rows and columns Title forecast create — Create a new forecast model Description Also see Quick start Syntax Option Remarks and examples Description forecast create creates a new forecast model in Stata. Quick start Start a forecast model called myforecast forecast create myforecast As above, but clear the existing model myforecast from memory if it exists forecast create myforecast, replace Syntax forecast create  name   , replace  name is an optional name that can be given to the model. name must follow the naming conventions described in [U] 11.3 Naming conventions. Option replace causes Stata to clear the existing model from memory before creating name. You may have only one model in memory at a time. By default, forecast create issues an error message if another model is already in memory. Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. The forecast create command creates a new forecast model in Stata. You must create a model before you can add equations or solve it. You can have only one model in memory at a time. You may optionally specify a name for your model. That name will appear in the output produced by the various forecast subcommands. 223 224 forecast create — Create a new forecast model Example 1 Here we create a model named salesfcast: . forecast create salesfcast Forecast model salesfcast started. Technical note Warning: Do not type clear all, clear mata, or clear results after creating a forecast model with forecast create unless you intend to remove your forecast model. Typing clear all or clear mata eliminates the internal structures used to store your forecast model. Typing clear results clears all estimation results from memory. If your forecast model includes estimation results that rely on the ability to call predict, you will not be able to solve your model. Also see [TS] forecast — Econometric model forecasting [TS] forecast clear — Clear current model from memory Title forecast describe — Describe features of the forecast model Description Stored results Quick start Reference Syntax Also see Options Remarks and examples Description forecast describe displays information about the forecast model currently in memory. For example, you can obtain information regarding all the endogenous or exogenous variables in the model, the adjustments used for alternative scenarios, or the solution method used. Typing forecast describe without specifying a particular aspect of the model is equivalent to typing forecast describe for every available aspect and can result in more output than you want, particularly if you also request a detailed description. Quick start Display information about the estimates in the current forecast forecast describe estimates Display information about coefficient vectors forecast describe coefvector Display endogenous variables defined by identities forecast describe identity Display names of declared exogenous variables forecast describe exogenous Display information about the solution method used forecast describe solve Display information about endogenous variables forecast describe endogenous All the above forecast describe 225 226 forecast describe — Describe features of the forecast model Syntax Describe the current forecast model   forecast describe , options Describe particular aspects of the current forecast model   forecast describe aspect , options ∗ aspect Description estimates coefvector identity exogenous adjust solve endogenous estimation results coefficient vectors identities declared exogenous variables adjustments to endogenous variables forecast solution information all endogenous variables options Description brief detail provide a one-line summary provide more-detailed information ∗ Specifying detail provides no additional information with aspects exogenous, endogenous, and solve. Options brief requests that forecast describe produce a one-sentence summary of the aspect specified. For example, forecast describe exogenous, brief will tell you just the current forecast model’s name and the number of exogenous variables in the model. detail requests a more-detailed description of the aspect specified. For example, typing forecast describe estimates lists all the estimation results added to the model using forecast estimates, the estimation commands used, and the number of left-hand-side variables in each estimation result. When you specify forecast describe estimates, detail, the output includes a list of all the left-hand-side variables entered with forecast estimates. Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. forecast describe displays information about the forecast model currently in memory. You can obtain either all the information at once or information about individual aspects of your model, whereby we use the word “aspect” to refer to, for example, just the estimation results, identities, or solution information. forecast describe — Describe features of the forecast model 227 Example 1 In example 1 of [TS] forecast, we created and forecasted Klein’s (1950) model of the U.S. economy. Here we obtain information about all the endogenous variables in the model: . forecast describe endogenous Forecast model kleinmodel contains 7 endogenous variables: Variable 1. 2. 3. 4. 5. 6. 7. c i wp y p k w Source estimates estimates estimates identity identity identity identity # adjustments 0 0 0 0 0 0 0 As we mentioned in [TS] forecast, there are seven endogenous variables in this model. Three of those variables (c, i, and wp) were left-hand-side variables in equations we fitted and added to our forecast model with forecast estimates. The other four variables were defined by identities added with forecast identity. The right-hand column of the table indicates that none of our endogenous variables contains adjustments specified using forecast adjust. We can obtain more information about the estimated equations in our model using forecast describe estimates: . forecast describe estimates, detail Forecast model kleinmodel contains 1 estimation result: Estimation result 1. klein Command reg3 LHS variables c i wp Our model has one estimation result, klein, containing results produced by the reg3 command. If we had not specified the detail option, forecast describe estimates would have simply stated the number of left-hand-side variables (3) rather than listing them. 228 forecast describe — Describe features of the forecast model At the end of example 1 in [TS] forecast, we obtained dynamic forecasts beginning in 1936. Here we obtain information about the solution: . forecast describe solve Forecast model kleinmodel has been solved: Forecast horizon Begin End Number of periods Forecast variables Prefix Number of variables Storage type Type of forecast Solution Technique Maximum iterations Tolerance for function values Tolerance for function zero 1936 1941 6 d_ 7 float Dynamic Damped Gauss-Seidel (0.200) 500 1.0e-09 (not applicable) We obtain information about the forecast horizon, how the variables holding our forecasts were created and stored, and the solution technique used. If we had used the simulate() option with forecast solve, we would have obtained information about the types of simulations performed and the variables used to hold the results. Stored results When you specify option brief, only a limited number of results are stored. In the tables below, a superscript B indicates results that are available even after brief is specified. forecast coefvector saves certain results only if detail is specified; these are indicated by superscript D. Typing forecast describe without specifying an aspect does not return any results. forecast describe estimates stores the following in r(): Scalars r(n estimates)B r(n lhs) Macros r(model)B r(lhs) r(estimates) number of estimation results number of left-hand-side variables defined by estimation results name of forecast model, if named left-hand-side variables names of estimation results forecast describe identity stores the following in r(): Scalars r(n identities)B Macros r(model)B r(lhs) r(identities) number of identities name of forecast model, if named left-hand-side variables list of identities forecast describe — Describe features of the forecast model forecast describe coefvector stores the following in r(): Scalars r(n coefvectors)B number of coefficient vectors r(n lhs)B number of left-hand-side variables defined by coefficient vectors Macros r(model)B r(lhs) r(rhs)D r(names) r(Vnames)D r(Enames)D name of forecast model, if named left-hand-side variables right-hand-side variables names of coefficient vectors names of variance matrices (“.” if not specified) names of error variance matrices (“.” if not specified) forecast describe exogenous stores the following in r(): Scalars r(n exogenous)B Macros r(model)B r(exogenous) number of declared exogenous variables name of forecast model, if named declared exogenous variables forecast describe endogenous stores the following in r(): Scalars r(n endogenous)B number of endogenous variables Macros r(model)B r(varlist) r(source list) r(adjust cnt) name of forecast model, if named endogenous variables sources of endogenous variables (estimates, identity, coefvector) number of adjustments per endogenous variable forecast describe solve stores the following in r(): Scalars r(periods) r(Npanels) r(Nvar) r(damping) r(maxiter) r(vtolerance) r(ztolerance) r(sim nreps) Macros r(solved)B r(model)B r(actuals) r(double) r(static) r(begin) r(end) r(technique) r(sim technique) r(prefix) r(suffix) r(sim prefix i) r(sim suffix i) r(sim stat i) number of periods forecast per panel number of panels forecast number of forecast variables damping parameter for damped Gauss–Seidel maximum number of iterations tolerance for forecast values tolerance for function zero number of simulations solved, if the model has been solved name of forecast model, if named actuals, if specified with forecast solve double, if specified with forecast solve static, if specified with forecast solve first period in forecast horizon last period in forecast horizon solver technique specified sim technique forecast variable prefix forecast variable suffix ith simulation statistic prefix ith simulation statistic suffix ith simulation statistic 229 230 forecast describe — Describe features of the forecast model forecast describe adjust stores the following in r(): Scalars r(n adjustments)B total number of adjustments r(n adjust vars)B number of variables with adjustments Macros r(model)B r(varlist) r(adjust cnt) r(adjust list) name of forecast model, if named variables with adjustments number of adjustments per endogenous variable list of adjustments Reference Klein, L. R. 1950. Economic Fluctuations in the United States 1921–1941. New York: Wiley. Also see [TS] forecast — Econometric model forecasting [TS] forecast list — List forecast commands composing current model Title forecast drop — Drop forecast variables Description Stored results Quick start Also see Syntax Options Remarks and examples Description forecast drop drops variables previously created by forecast solve. Quick start Remove all variables created by forecast solve from the current dataset forecast drop Remove only forecast variables starting with f forecast drop, prefix(f ) Remove only forecast variables ending with f forecast drop, suffix(_f) Syntax forecast drop ∗ ∗  , options  options Description prefix(string) suffix(string) specify prefix for forecast variables specify suffix for forecast variables ∗ You can specify prefix() or suffix() but not both. Options prefix(string) and suffix(string) specify either a name prefix or a name suffix that will be used to identify forecast variables to be dropped. You may specify prefix() or suffix() but not both. By default, forecast drop removes all forecast variables produced by the previous invocation of forecast solve. Suppose, however, that you previously specified the simulate() option with forecast solve and wish to remove variables containing simulation results but retain the variables containing the point forecasts. Then you can use the prefix() or suffix() option to identify the simulation variables you want dropped. 231 232 forecast drop — Drop forecast variables Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. forecast drop safely removes variables previously created using forecast solve. Say you previously solved your model and created forecast variables that were suffixed with f. Do not type . drop *_f to remove those variables from the dataset. Rather, type . forecast drop The former command is dangerous: Suppose you were given the dataset and asked to produce the forecast. The person who previously worked with the dataset created other variables that ended with f. Using drop would remove those variables as well. forecast drop removes only those variables that were previously created by forecast solve based on the model in memory. If you do not specify any options, forecast drop removes all the forecast variables created by the current model, including the variables that contain the point forecasts as well as any variables that contain simulation results specified by the simulate() option with forecast solve. Suppose you had typed . forecast solve, prefix(s_) simulate(betas, statistic(stddev, prefix(sd_))) Then if you type . forecast drop, prefix(sd_) forecast drop will remove the variables containing the standard deviations of the forecasts and will leave the variables containing the point forecasts (prefixed with s ) untouched. forecast drop does not exit with an error if a variable it intends to drop does not exist in the dataset. Stored results forecast drop stores the following in r(): Scalars r(n dropped) number of variables dropped Also see [TS] forecast — Econometric model forecasting [TS] forecast solve — Obtain static and dynamic forecasts Title forecast estimates — Add estimation results to a forecast model Description Also see Quick start Syntax Options Remarks and examples References Description forecast estimates adds estimation results to the forecast model currently in memory. You must first create a new model using forecast create before you can add estimation results with forecast estimates. After estimating the parameters of an equation or set of equations, you must use estimates store to store the estimation results in memory or use estimates save to save them on disk before adding them to the model. Quick start Add estimation results stored in myestimates to the forecast model in memory forecast estimates myestimates As above, but specify the prediction produced by predict, pr outcome(#1) forecast estimates myestimates, predict("pr outcome(#1)") Add estimates from var estimation stored in memory as varest forecast estimates varest Also add the second estimation result saved on disk as notcurrent.ster to the forecast model forecast estimates using notcurrent, number(2) 233 234 forecast estimates — Add estimation results to a forecast model Syntax Add estimation result currently in memory to model   forecast estimates name , options name is the name of a stored estimation result; see [R] estimates store. Add estimation result currently saved on disk to model   forecast estimates using filename , number(#) options filename is an estimation results file created by estimates save; see [R] estimates save. If no file extension is specified, .ster is assumed. options Description predict(p options)   names(namelist , replace ) advise call predict using p options use namelist for names of left-hand-side variables advise whether estimation results can be dropped from memory Options predict(p options) specifies the predict options to use when predicting the dependent variables. For a single-equation estimation command, you simply specify the appropriate options to pass to predict. If multiple options are required, enclose them in quotation marks: . forecast estimates ..., predict("pr outcome(#1)") For a multiple-equation estimation command, you can either specify one set of options that will be applied to all equations or specify p options, where p is the number of endogenous variables being added. If multiple options are required for each equation, enclose each equation’s options in quotes: . forecast estimates ..., predict("pr eq(#1)" "pr eq(#2)") If you do not specify the eq() option for any of the equations, forecast automatically includes it for you. If you are adding results from a linear estimation command that forecast recognizes as one whose predictions can be calculated as x0t β, do not specify the predict() option, because this will slow forecast’s computation time substantially. Use the advise option to determine whether forecast needs to call predict. If you do not specify any predict options, forecast uses the default type of prediction for the command whose results are being added.   names(namelist , replace ) instructs forecast estimates to use namelist as the names of the left-hand-side variables in the estimation result being added. You must use this option if any of the left-hand-side variables contains time-series operators. By default, forecast estimates uses the names stored in the e(depvar) macro of the results being added. forecast estimates creates a new variable in the dataset for each element of namelist. If a variable of the same name already exists in your dataset, forecast estimates exits with an error unless you specify the replace option, in which case existing variables are overwritten. forecast estimates — Add estimation results to a forecast model 235 advise requests that forecast estimates report a message indicating whether the estimation results being added can be removed from memory. This option is useful if you expect your model to contain more than 300 sets of estimation results, the maximum number that Stata allows you to store in memory; see [R] limits. This option also provides an indication of the speed with which the model can be solved: forecast executes much more slowly with estimation results that must remain in memory. number(#), for use with forecast estimates using, specifies that the #th set of estimation results from filename be loaded. This assumes that multiple sets of estimation results have been saved in filename. The default is number(1). See [R] estimates save for more information on saving multiple sets of estimation results in a single file. Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. forecast estimates adds stochastic equations previously fit by Stata estimation commands to a forecast model. Remarks are presented under the following headings: Introduction The advise option Using saved estimation results The predict option Forecasting with ARIMA models Introduction After you fit an equation that will become a part of your model, you must use either estimates store to store the estimation results in memory or estimates save to save the estimation results to disk. Then you can use forecast estimates to add that equation to your model. We usually refer to “equation” in the singular, but of course, you can also use a multiple-equation estimation command to fit several equations at once and add them to the model. When we discuss adding a stochastic equation to a model, we really mean adding a single estimation result. In this discussion, we also need to make a distinction between making a forecast and obtaining a prediction. We use the word “predict” to refer to the process of obtaining a fitted value for a single equation, just as you can use the predict command to obtain fitted values, residuals, or other statistics after fitting a model with an estimation command. We use the word “forecast” to mean finding a solution to the complete set of equations that compose the forecast model. The iterative techniques we use to solve the model and produce forecasts require that we be able to obtain predictions from each of the equations in the model. Example 1: A simple example Here we illustrate how to add estimation results from a regression model in which none of the left-hand-side variables contains time-series operators or mathematical transformations. We use quietly with the estimation command because the output is not relevant here. We type 236 forecast estimates — Add estimation results to a forecast model . . . . use http://www.stata-press.com/data/r14/klein2 quietly reg3 (c p L.p w) (i p L.p L.k) (wp y L.y yr), endog(w p y) exog(t wg g) estimates store klein forecast create kleinmodel Forecast model kleinmodel started. . forecast estimates klein Added estimation results from reg3. Forecast model kleinmodel now contains 3 endogenous variables. forecast estimates indicated that three endogenous variables were added to the forecast model. That is because we specified three equations in our call to reg3. As we mentioned in example 1 in [TS] forecast, the endog() option of reg3 has no bearing on forecast. All that matters are the three left-hand-side variables. Technical note When you add an estimation result to your forecast model, forecast looks at the macro e(depvar) to determine the endogenous variables being added. If that macro is empty, forecast tries a few other macros to account for nonstandard commands. The number of endogenous variables being added to the model is based on the number of words found in the macro containing the dependent variables. You can fit equations with the D. and S. first- and seasonal-difference time-series operators adorning the left-hand-side variables, but in those cases, when you add the equations to the model, you must use the names() option of forecast estimates. When you specify names(namelist), forecast estimates uses namelist as the names of the newly declared endogenous variables and ignores what is in e(depvar). Moreover, forecast does not automatically “undo” the operators on left-hand-side variables. For example, you might fit a regression with D.x as the regressand and then add it to the model using forecast estimates . . ., name(Dx). In that case, forecast will solve the model in terms of Dx. You must add an identity to convert Dx to the corresponding level variable x, as the next example illustrates. Of course, you are free to use the D., S., and L. time-series operators on endogenous variables when they appear on the right-hand sides of equations. It is only when D. or S. appears on the left-hand side that you must use the names() option to provide alternative names for them. You cannot add equations to models for which the L. operator appears on left-hand-side variables. You cannot use the F. forward operator anywhere in forecast models. Example 2: Differenced and log-transformed dependent variables Consider the following model: D.logC = β10 + β11 D.logW + β12 D.logY + u1t logW = β20 + β21 L.logW + β22 M + β23 logY + β24 logC + u2t (1) (2) Here logY and M are exogenous variables, so we will assume they are filled in over the forecast horizon before solving the model. Ultimately, we are interested in forecasting C and W. However, the first equation is specified in terms of changes in the logarithm of C, and the second equation is specified in terms of the logarithm of W. forecast estimates — Add estimation results to a forecast model 237 We will refer to variables and transformations like logC, D.logC, and C as “related” variables because they are related to one another by simple mathematical functions. Including the related variables, we in fact have a five-equation model with two stochastic equations and three identities: dlogC = β10 + β11 D.logW + β12 D.logY + u1t logC = L.logC + dlogC C = exp(logC) logW = β20 + β21 L.logW + β22 M + β23 logY + β24 logC + u2t W = exp(logW) To fit (1) and (2) in Stata and create a forecast model, we type . use http://www.stata-press.com/data/r14/fcestimates, clear (1978 Automobile Data) . quietly regress D.logC D.logW D.logY . . . . estimates store dlogceq quietly regress logW L.logW M logY logC estimates store logweq forecast create cwmodel, replace (Forecast model kleinmodel ended.) Forecast model cwmodel started. . forecast estimates dlogceq, names(dlogC) Added estimation results from regress. Forecast model cwmodel now contains 1 endogenous variable. . forecast identity logC = L.logC + dlogC Forecast model cwmodel now contains 2 endogenous variables. . forecast identity C = exp(logC) Forecast model cwmodel now contains 3 endogenous variables. . forecast estimates logweq Added estimation results from regress. Forecast model cwmodel now contains 4 endogenous variables. . forecast identity W = exp(logW) Forecast model cwmodel now contains 5 endogenous variables. Because the left-hand-side variable in (1) contains a time-series operator, we had to use the names() option of forecast estimates when adding that equation’s estimation results to our forecast model. Here we named this endogenous variable dlogC. We then added the other four equations to our model. In general, when we have a set of related variables, we prefer to specify the identities right after we add the stochastic equation so that we do not forget about them. Technical note In the previous example, we “undid” the log-transformations by simply exponentiating the logarithmic variable. However, that is only an approximation that does not work well in many applications. Suppose we fit the linear regression model ln yt = x0t β + ut 238 forecast estimates — Add estimation results to a forecast model where ut is a zero-mean regression error term. Then E(yt |xt ) = exp(x0t β) × E{ exp(ut )}. Although E(ut ) = 0, Jensen’s inequality suggests that E{ exp(ut )} = 6 1, implying that we cannot predict yt by simply taking the exponential of the linear prediction x0t β. If we assume that ut ∼ N (0, σ 2 ), then E{ exp(ut )} = exp(σ 2 /2). Moreover, many estimation commands like regress provide an estimate σ b2 of σ 2 , so for regression models that contain a logarithmic dependent variable, we can obtain better forecasts for the dependent variable in levels if we approximate E{ exp(ut )} as exp(b σ 2 /2). Suppose we run the regression . regress lny x1 x2 x3 . estimates store myreg then we could add lny and y as endogenous variables like this: . forecast estimates lny . forecast identity y = exp(lny)*‘=e(rmse)^2 / 2’ In the second command, Stata will first evaluate the expression ‘=e(rmse)^2 / 2’ and replace it with its numerical value. After regress, the macro e(rmse) contains the square root of the estimate of σ b2 , so the value of this expression will be our estimate of E{ exp(ut )}. Then forecast will forecast y as the product of this number and exp(lny). Here we had to use a macro expression including an equals sign to force Stata to evaluate the expression immediately and obtain the expression’s value. Identities are not associated with estimation results, so as soon as we used another estimation command or restored some other estimation results (perhaps unknowingly by invoking forecast solve), our reference to e(rmse) would no longer be meaningful. See [U] 18.3.8 Macro expressions for more information on macro evaluation. Another alternative would be to use Duan’s (1983) smearing technique. Stata code for this is provided in Cameron and Trivedi (2010). A third alternative is to use the generalized linear model (GLM) as implemented by the glm command with a log-link function. In a GLM framework, we would be modeling ln {E(yt )} rather than E { ln(yt )} because we would be using regress, but oftentimes, the two quantities are similar. Moreover, obtaining predicted values for yt in the GLM does not present the transformation problem as happens with linear regression. The forecast commands contain special code to handle estimation results obtained by using glm with the link(log) option, and you do not need to specify an identity to obtain y as a function of lny. All you would need to do is . glm y x1 x2 x3, link(log) . estimates store myglm . forecast estimates myglm The advise option To produce forecasts from your model, forecast must be able to obtain predictions for each estimation result that you have added. For many of the most commonly used estimation commands such as regress, ivregress, and var, forecast includes special code to quickly obtain these predictions. For estimation commands that either require more involved computations to obtain predictions or are not widely used in forecasting, forecast instead relies on the predict command to obtain predictions. The advise option of forecast estimates advises you as to whether forecast includes the special code to obtain fast predictions for the command whose estimation results are being added to the model. For example, here we use advise with forecast estimates when building the Klein (1950) model. forecast estimates — Add estimation results to a forecast model 239 Example 3: Using the advise option . use http://www.stata-press.com/data/r14/klein2, clear . quietly reg3 (c p L.p w) (i p L.p L.k) (wp y L.y yr), endog(w p y) exog(t wg g) . estimates store klein . forecast create kleinmodel, replace (Forecast model cwmodel ended.) Forecast model kleinmodel started. . forecast estimates klein, advise (These estimation results are no longer needed; you can drop them.) Added estimation results from reg3. Forecast model kleinmodel now contains 3 endogenous variables. After we typed forecast estimates, Stata advised us that “[t]hese estimation results are no longer needed; you can drop them”. That means forecast includes code to obtain predictions from reg3 without having to call predict. forecast has recorded all the information it needs about the estimation results stored in klein, and we could type . estimates drop klein to remove those estimates from memory. For relatively small models, there is no need to use estimates drop to remove estimation results from memory. However, Stata allows no more than 300 sets of estimation results to be in memory at once, and forecast solve requires estimation results to be in memory (and not merely saved on disk) before it can produce forecasts. For very large models in which that limit may bind, you can use the advise option to determine which estimation results are needed to solve the model and which can be dropped. Suppose we had estimation results from a command for which forecast must call predict to obtain predictions. Then instead of obtaining the note saying the estimation results were no longer needed, we would obtain a note stating . forecast estimates IUsePredict (These estimation results are needed to solve the model.) In that case, the estimation results would need to be in memory before calling forecast solve. The advise option also provides an indication of how quickly forecasts can be produced from the model. Models for which forecast never needs to call predict can be solved much more quickly than models that include equations for which forecast must restore estimation results and call predict to obtain predictions. Using saved estimation results Stata’s estimates commands allow you to save estimation results to disk so that they are available in subsequent Stata sessions. You can use the using option of forecast estimates to use estimation results saved on disk without having to first call estimates use. In fact, estimates use can even retrieve estimation results stored on a website, as the next example demonstrates. 240 forecast estimates — Add estimation results to a forecast model Example 4: Adding saved estimation results The file klein.ster contains the estimation results produced by reg3 for the three stochastic equations of Klein’s (1950) model. That file is stored on the Stata Press website in the same location as the example datasets. Here we create a forecast model and add those results: . use http://www.stata-press.com/data/r14/klein2 . forecast create example4, replace (Forecast model kleinmodel ended.) Forecast model example4 started. . forecast estimates using http://www.stata-press.com/data/r14/klein Added estimation results from reg3. Forecast model example4 now contains 3 endogenous variables. If you do not specify a file extension, forecast estimates assumes the file ends in .ster. You are more likely to save your estimation results on your computer’s disk drive rather than a web server, but in either case, this example shows that you can fit equations in one session of Stata, save the results to disk, and then build your forecast model later. The estimates save command allows you to save multiple estimation results to the same file and numbers them sequentially starting at 1. You can use the number() option of forecast estimates using to specify which set of estimation results from the specified file you wish to add to the forecast model. If you do not specify number(), forecast estimates using uses the first set of results. When you use forecast estimates using, forecast loads the estimation results from disk and stores them in memory using a temporary name. Later, when you proceed to solve your model, forecast checks to see whether those estimation results are still in memory. If not, it will attempt to reload them from the file you had specified. You should therefore not move or rename estimation result files between the time you add them to your model and the time you solve the model. The predict option As we mentioned while discussing the advise option, the forecast commands include code to quickly obtain predictions from some of the most commonly used commands, while they use predict to obtain predictions from other estimation commands. When you add estimation results that require forecast to use predict, by default, forecast assumes that it can pass the option xb on to predict to obtain the appropriate predicted values. You use the predict() option of forecast estimates to specify the option that predict must use to obtain predicted values from the estimates being added. For example, suppose you used tobit to fit an equation whose dependent variable is left-censored at zero and then stored the estimation results under the name tobitreg. When solving the model, you want to use the predicted values of the left-truncated mean, the expected value of the dependent variable conditional on its being greater than zero. Looking at the Syntax for predict in [R] tobit postestimation, we see that the appropriate option we must pass to predict is e(0,.). To add this estimation result to an existing forecast model, we would therefore type . forecast estimates tobitreg, predict(e(0,.)) Now, whenever forecast calls predict with those estimation results, it will pass the option e(0,.) so that we obtain the appropriate predictions. If you are adding results from a multiple-equation estimation command with k dependent variables, then you must specify k predict options within the predict() option, separated by spaces. forecast estimates — Add estimation results to a forecast model 241 Forecasting with ARIMA models Practitioners often use ARIMA models to forecast some of the variables in their models, and you can certainly use estimation results produced by commands such as arima with forecast. There are just two rules to follow when using commands that use the Kalman filter to obtain predictions. First, do not specify the predict() option with forecast estimates. The forecast commands know how to handle these estimators automatically. Second, as we stated earlier, the forecast commands do not “undo” any time-series operators that may adorn the left-hand-side variables of estimation results, so you must use forecast identity to specify identities to recover the underlying variables in levels. Example 5: An ARIMA model with first- and seasonal-differencing wpi1.dta contains quarterly observations on the variable wpi. First, let’s fit a multiplicative seasonal ARIMA model with both first- and seasonal-difference operators applied to the dependent variable and store the estimation results: . use http://www.stata-press.com/data/r14/wpi1 . arima wpi, arima(1, 1, 1) sarima(1, 1, 1, 4) (output omitted ) . estimates store arima (For details on fitting seasonal ARIMA models, see [TS] arima). With the difference operators used here, when forecast calls predict, it will obtain predictions in terms of DS4.wpi. Using the definitions of time-series operators in [TS] tsset, we have DS4.wpit = (wpit − wpit−4 ) − (wpit−1 − wpit−5 ) so that wpit = DS4.wpit + wpit−4 + (wpit−1 − wpit−5 ) Because our arima results include a dependent variable with time-series operators, we must use the name() option of forecast estimates to specify an alternative variable name. We will name ours ds4wpi. Then we can specify an identity by using the previous equation to recover our forecasts in terms of wpi. We type . forecast create arimaexample, replace (Forecast model example4 ended.) Forecast model arimaexample started. . forecast estimates arima, name(ds4wpi) Added estimation results from arima. Forecast model arimaexample now contains 1 endogenous variable. . forecast identity wpi = ds4wpi + L4.wpi + (L.wpi - L5.wpi) Forecast model arimaexample now contains 2 endogenous variables. 242 forecast estimates — Add estimation results to a forecast model . forecast solve, begin(tq(1988q1)) Computing dynamic forecasts for model arimaexample. Starting period: Ending period: Forecast prefix: 1988q1 1990q4 f_ 1988q1: ............. 1988q2: ............... 1988q3: ............... (output omitted ) 1990q4: ............ Forecast 2 variables spanning 12 periods. Because our entire forecast model consists of a single equation fit by arima, we can also call predict to obtain forecasts: . predict a_wpi, y dynamic(tq(1988q1)) (5 missing values generated) . list t f_wpi a_wpi in -5/l 120. 121. 122. 123. 124. t f_wpi a_wpi 1989q4 1990q1 1990q2 1990q3 1990q4 110.2182 111.6782 112.9945 114.3281 115.5142 110.2182 111.6782 112.9945 114.3281 115.5142 Looking at the last few observations in the dataset, we see that the forecasts produced by forecast (f wpi) match those produced by predict (a wpi). Of course, the advantage of forecast is that we can combine multiple sets of estimation results and obtain forecasts for an entire system of equations. Technical note Do not add estimation results to your forecast model that you have stored after calling an estimation command with the by: prefix. The stored estimation results will contain information from only the last group on which the estimation command was executed. forecast will then use those results for all observations in the forecast horizon regardless of the value of the group variable you specified with by:. References Cameron, A. C., and P. K. Trivedi. 2010. Microeconometrics Using Stata. Rev. ed. College Station, TX: Stata Press. Duan, N. 1983. Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association 78: 605–610. Klein, L. R. 1950. Economic Fluctuations in the United States 1921–1941. New York: Wiley. forecast estimates — Add estimation results to a forecast model Also see [TS] forecast — Econometric model forecasting [R] estimates — Save and manipulate estimation results [R] predict — Obtain predictions, residuals, etc., after estimation 243 Title forecast exogenous — Declare exogenous variables Description Syntax Remarks and examples Also see Description forecast exogenous declares exogenous variables in the current forecast model. Syntax forecast exogenous varlist Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. forecast exogenous declares exogenous variables in your forecast model. Before you can solve your model, all the exogenous variables must be filled in with nonmissing values over the entire forecast horizon. When you use forecast solve, Stata first checks your exogenous variables and exits with an error message if any of them contains missing values for any periods being forecast. When you assemble a large model with many variables, it is easy to forget some variables and then have problems obtaining forecasts. forecast exogenous provides you with a mechanism to explicitly declare the exogenous variables in your model so that you do not forget about them. Declaring exogenous variables with forecast exogenous is not explicitly necessary, but we nevertheless strongly encourage doing so. Stata can check the exogenous variables before solving the model and issue an appropriate error message if missing values are found, whereas troubleshooting models for which forecasting failed is more difficult after the fact. Example 1 Here we fit a simple single-equation dynamic model with two exogenous variables, x1 and x2: . use http://www.stata-press.com/data/r14/forecastex1 . quietly regress y L.y x1 x2 . estimates store exregression . forecast create myexample Forecast model myexample started. . forecast estimates exregression Added estimation results from regress. Forecast model myexample now contains 1 endogenous variable. . forecast exogenous x1 Forecast model myexample now contains 1 declared exogenous variable. . forecast exogenous x2 Forecast model myexample now contains 2 declared exogenous variables. 244 forecast exogenous — Declare exogenous variables Instead of using forecast exogenous twice, we could have instead typed . forecast exogenous x1 x2 Also see [TS] forecast — Econometric model forecasting 245 Title forecast identity — Add an identity to a forecast model Description Stored results Quick start Also see Syntax Options Remarks and examples Description forecast identity adds an identity to the forecast model currently in memory. You must first create a new model using forecast create before you can add an identity with forecast identity. An identity is a nonstochastic equation that expresses an endogenous variable in the model as a function of other variables in the model. Identities often describe the behavior of endogenous variables that are based on accounting identities or adding-up conditions. Quick start Add an identity to the forecast that states that y3 is the sum of y1 and y2 forecast identity y3=y1+y2 As above, and create new variable newy before adding it to the forecast forecast identity newy=y1+y2, generate Syntax forecast identity varname = exp ∗  , options  options Description generate double create new variable varname store new variable as a double instead of as a float varname is the name of an endogenous variable to be added to the forecast model. ∗ You can only specify double if you also specify generate. Options generate specifies that the new variable varname be created equal to exp for all observations in the current dataset. double, for use in conjunction with the generate option, requests that the new variable be created as a double instead of as a float. See [D] data types. Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. forecast identity specifies a nonstochastic equation that determines the value of an endogenous variable in the model. When you type . forecast identity varname = exp 246 forecast identity — Add an identity to a forecast model 247 forecast identity registers varname as an endogenous variable in your forecast model that is equal to exp, where exp is a valid Stata expression that is typically a function of other endogenous variables and exogenous variables in your model and perhaps lagged values of varname as well. forecast identity was used in all the examples in [TS] forecast. Example 1: Variables with constant growth rates Some models contain variables that you are willing to assume will grow at a constant rate throughout the forecast horizon. For example, say we have a model using annual data and want to assume that our population variable pop grows at 0.75% per year. Then we can declare endogenous variable pop by using forecast identity: . forecast identity pop = 1.0075*L.pop Typically, you use forecast identity to define the relationship that determines an endogenous variable that is already in your dataset. For example, in example 1 of [TS] forecast, we used forecast identity to define total wages as the sum of government and private-sector wages, and the total wage variable already existed in our dataset. The generate option of forecast identity is useful when you wish to use a transformation of one or more endogenous variables as a right-hand-side variable in a stochastic equation that describes another endogenous variable. For example, say you want to use regress to model variable y as a function of the ratio of two endogenous variables, u and w, as well as other covariates. Without the generate option of forecast identity, you would have to define the variable y = u/w twice: first, you would have to use the generate command to create the variable before fitting your regression model, and then you would have to use forecast identity to add an identity to your forecast model to define y in terms of u and w. Assuming you have already created your forecast model, the generate option allows you to define the ratio variable just once, before you fit the regression equation. In this example, the ratio variable is easy enough to specify twice, but it is very easy to forget to include identities that define regressors used in estimation results while building large forecast models. In other cases, an endogenous variable may be a more complicated function of other endogenous variables, so having to specify the function only once reduces the chance for error. Stored results forecast identity stores the following in r(): Macros r(lhs) r(rhs) r(basenames) r(fullnames) left-hand-side (endogenous) variable right-hand side of identity base names of variables found on right-hand side full names of variables found on right-hand side Also see [TS] forecast — Econometric model forecasting Title forecast list — List forecast commands composing current model Description Also see Quick start Syntax Options Remarks and examples Reference Description forecast list produces a list of forecast commands that compose the current model. Quick start List all forecast commands that compose the current model forecast list Save a list of commands to replicate the current forecast model to myforecast.do forecast list, saving(myforecast) As above, but save the commands as myforecast.txt forecast list, saving(myforecast.txt) Syntax forecast list  , options  Description   saving(filename , replace ) save list of commands to file notrim do not remove extraneous white space options Options   saving(filename , replace ) requests that forecast list write the list of commands to disk with filename. If no extension is specified, .do is assumed. If filename already exists, an error is issued unless you specify replace, in which case the file is overwritten. notrim requests that forecast list not remove any extraneous spaces and that commands be shown exactly as they were originally entered. By default, superfluous white space is removed. Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. forecast list produces a list of all the forecast commands you would need to enter to re-create the forecast model currently in memory. Unlike using a command log, forecast list only shows the forecast-related commands but not any estimation command or other commands you may have issued. If you specify saving(filename), forecast list saves the list as filename.do, which you can then edit using the Do-file Editor. 248 forecast list — List forecast commands composing current model 249 forecast creates models by accumulating estimation results, identities, and other features that you add to the model by using various forecast subcommands. Once you add a feature to a model, it remains a part of the model until you clear the entire model from memory. forecast list provides a list of all the forecast commands you would need to rebuild the current model. When building all but the smallest forecast models, you will typically write a do-file to load your dataset, perhaps call some estimation commands, and issue a sequence of forecast commands to build and solve your forecast model. There are times, though, when you will type a forecast command interactively and then later want to undo the command or else wish you had not typed the command in the first place. forecast list provides the solution. Suppose you use forecast adjust to perform some policy simulations and then decide you want to remove those adjustments from the model. forecast list makes this easy to do. You simply call forecast list with the saving() option to produce a do-file that contains all the forecast commands issued since the model was created. Then you can edit the do-file to remove the forecast adjust command, type forecast clear, and run the do-file. Example 1: Klein’s model In example 1 of [TS] forecast, we obtained forecasts from Klein’s (1950) macroeconomic model. If we type forecast list after typing all the commands in that example, we obtain . forecast list forecast create kleinmodel forecast estimates klein forecast identity y = c + i + g forecast identity p = y - t - wp forecast identity k = L.k + i forecast identity w = wg + wp forecast exogenous wg forecast exogenous g forecast exogenous t forecast exogenous yr The forecast solve command is not included in output produced by forecast list because solving the model does not add any features to the model. Technical note To prevent you from accidentally destroying the model in memory, forecast list does not add the replace option to forecast create even if you specified replace when you originally called forecast create. Reference Klein, L. R. 1950. Economic Fluctuations in the United States 1921–1941. New York: Wiley. Also see [TS] forecast — Econometric model forecasting Title forecast query — Check whether a forecast model has been started Description Syntax Remarks and examples Stored results Also see Description forecast query issues a message indicating whether a forecast model has been started. Syntax forecast query Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. forecast query allows you to check whether a forecast model has been started. Most users of the forecast commands will not need to use forecast query. This command is most useful to programmers. Suppose there is no forecast model in memory: . forecast query No forecast model exists. Now we create a forecast model named fcmodel: . forecast Forecast . forecast Forecast create fcmodel model fcmodel started. query model fcmodel exists. Stored results forecast query stores the following in r(): Scalars r(found) Macros r(name) 1 if model started; 0 otherwise model name Also see [TS] forecast — Econometric model forecasting [TS] forecast describe — Describe features of the forecast model 250 Title forecast solve — Obtain static and dynamic forecasts Description Remarks and examples Also see Quick start Stored results Syntax Methods and formulas Options References Description forecast solve computes static or dynamic forecasts based on the model currently in memory. Before you can solve a model, you must first create a new model using forecast create and add equations and variables to it using the commands summarized in [TS] forecast. Quick start Compute dynamic forecast after forecast create and forecast estimates forecast solve As above, but with forecasts starting at 1990q1 and ending at 1995q3 forecast solve, begin(q(1990q1)) end(q(1995q3)) As above, and change prefix of predicted endogenous variables to hat forecast solve, begin(q(1990q1)) end(q(1995q3)) prefix(hat) As above, but forecast 11 periods starting at 1990q1 forecast solve, begin(q(1990q1)) prefix(hat) periods(11) Incorporate forecast uncertainty via simulation and store point forecasts and their standard deviations in variables prefixed with d and sd forecast solve, prefix(d_) /// simulate(betas, statistic(stddev, prefix(sd_))) 251 252 forecast solve — Obtain static and dynamic forecasts Syntax forecast solve  , options Model ∗ prefix(string) suffix(string) begin(time constant) † end(time constant) † periods(#) double static actuals ∗  prefix(stub) | suffix(stub) options  Description specify prefix for forecast variables specify suffix for forecast variables specify period to begin forecasting specify period to end forecasting specify number of periods to forecast store forecast variables as doubles instead of as floats produce static forecasts instead of dynamic forecasts use actual values if available instead of forecasts Simulation simulate(sim technique, sim statistic sim options) specify simulation technique and options Reporting log(log level) specify level of logging display; log level may be detail, on, brief, or off Solver vtolerance(#) ztolerance(#) iterate(#) technique(technique) ∗ specify tolerance for forecast values specify tolerance for function zero specify maximum number of iterations specify solution method; may be dampedgaussseidel #, gaussseidel, broydenpowell, or newtonraphson You can specify prefix() or suffix() but not both. † You can specify end() or periods() but not both. sim technique Description betas errors residuals draw multivariate-normal parameter vectors draw additive errors from multivariate normal distribution draw additive residuals based on static forecast errors You can specify one or two sim methods separated by a space, though you cannot specify both errors and residuals. sim statistic is statistic(statistic,  prefix(string) | suffix(string) ) and may be repeated up to three times. forecast solve — Obtain static and dynamic forecasts statistic Description mean variance stddev record the mean of the simulation forecasts record the variance of the simulation forecasts record the standard deviation of the simulation forecasts sim options Description saving(filename, . . .) save results to file; save statistics in double precision; save results to filename every # replications suppress replication dots perform # replications; default is reps(50) nodots reps(#) 253 Options   Model prefix(string) and suffix(string) specify a name prefix or suffix that will be used to name the variables holding the forecast values of the variables in the model. You may specify prefix() or suffix() but not both. Sometimes, it is more convenient to have all forecast variables start with the same set of characters, while other times, it is more convenient to have all forecast variables end with the same set of characters. If you specify prefix(f ), then the forecast values of endogenous variables x, y, and z will be stored in new variables f x, f y, and f z. If you specify suffix( g), then the forecast values of endogenous variables x, y, and z will be stored in new variables x g, y g, and z g. begin(time constant) requests that forecast begin forecasting at period time constant. By default, forecast determines when to begin forecasting automatically. end(time constant) requests that forecast end forecasting at period time constant. By default, forecast produces forecasts for all periods on or after begin() in the dataset. periods(#) specifies the number of periods after begin() to forecast. By default, forecast produces forecasts for all periods on or after begin() in the dataset. double requests that the forecast and simulation variables be stored in double precision. The default is to use single-precision floats. See [D] data types for more information. static requests that static forecasts be produced. Actual values of variables are used wherever lagged values of the endogenous variables appear in the model. By default, dynamic forecasts are produced, which use the forecast values of variables wherever lagged values of the endogenous variables appear in the model. Static forecasts are also called one-step-ahead forecasts. actuals specifies how nonmissing values of endogenous variables in the forecast horizon are treated. By default, nonmissing values are ignored, and forecasts are produced for all endogenous variables. When you specify actuals, forecast sets the forecast values equal to the actual values if they are nonmissing. The forecasts for the other endogenous variables are then conditional on the known values of the endogenous variables with nonmissing data. 254  forecast solve — Obtain static and dynamic forecasts  Simulation simulate(sim technique, sim statistic sim options) allows you to simulate your model to obtain measures of uncertainty surrounding the point forecasts produced by the model. Simulating a model involves repeatedly solving the model, each time accounting for the uncertainty associated with the error terms and the estimated coefficient vectors. sim technique can be betas, errors, or residuals, or you can specify both betas and one of errors or residuals separated by a space. You cannot specify both errors and residuals. The sim technique controls how uncertainty is introduced into the model. sim statistic specifies a summary statistic to summarize the forecasts over all the simulations. sim statistic takes the form statistic(statistic, { prefix(string) | suffix(string) }) where statistic may be mean, variance, or stddev. You may specify either the prefix or the suffix that will be used to name the variables that will contain the requested statistic. You may specify up to three sim statistics, allowing you to track the mean, variance, and standard deviations of your forecasts.   sim options include saving(filename, suboptions ), nodots, and reps(#).   saving(filename, suboptions ) creates a Stata data file (.dta file) consisting of (for each endogenous variable in the model) a variable containing the simulated values. double specifies that the results for each replication be saved as doubles, meaning 8-byte reals. By default, they are saved as floats, meaning 4-byte reals. replace specifies that filename be overwritten if it exists. every(#) specifies that results be written to disk every #th replication. every() should be specified only in conjunction with saving() when the command takes a long time for each replication. This will allow recovery of partial results should some other software crash your computer. See [P] postfile. nodots suppresses display of the replication dots. By default, one dot character is displayed for each successful replication. If during a replication convergence is not achieved, forecast solve exits with an error message. reps(#) requests that forecast solve perform # replications; the default is reps(50).   Reporting log(log level) specifies the level of logging provided while solving the model. log level may be detail, on, brief, or off. log(detail) provides a detailed iteration log including the current values of the convergence criteria for each period in each panel (in the case of panel data) for which the model is being solved. log(on), the default, provides an iteration log showing the current panel and period for which the model is being solved as well as a sequence of dots for each period indicating the number of iterations. log(brief), when used with a time-series dataset, is equivalent to log(on). When used with a panel dataset, log(brief) produces an iteration log showing the current panel being solved but does not show which period within the current panel is being solved. log(off) requests that no iteration log be produced. forecast solve — Obtain static and dynamic forecasts  255  Solver vtolerance(#), ztolerance(#), and iterate(#) control when the solver of the system of equations stops. ztolerance() is ignored if either technique(dampedgaussseidel #) or technique(gaussseidel) is specified. These options are seldom used. See [M-5] solvenl( ). technique(technique) specifies the technique to use to solve the system of equations. technique may be dampedgaussseidel #, gaussseidel, broydenpowell, or newtonraphson, where 0 < # < 1 specifies the amount of damping with smaller numbers indicating less damping. The default is technique(dampedgaussseidel 0.2), which works well in most situations. If you have convergence issues, first try continuing to use dampedgaussseidel # but with a larger damping factor. Techniques broydenpowell and newtonraphson usually work well, but because they require the computation of numerical derivatives, they tend to be much slower. See [M-5] solvenl( ). Remarks and examples For an overview of the forecast commands, see [TS] forecast. This manual entry assumes you have already read that manual entry. The forecast solve command solves a forecast model in Stata. Before you can solve a model, you must first create a model using forecast create, and you must add at least one equation using forecast estimates, forecast coefvector, or forecast identity. We covered the most commonly used options of forecast solve in the examples in [TS] forecast. Here we focus on two sets of options that are available with forecast solve. First, we discuss the actuals option, which allows you to obtain forecasts conditional on prespecified values for one or more of the endogenous variables. Then we focus on performing simulations to obtain estimates of uncertainty around the point forecasts. Remarks are presented under the following headings: Performing conditional forecasts Using simulations to measure forecast accuracy Performing conditional forecasts Sometimes, you already know the values of some of the endogenous variables in the forecast horizon and would like to obtain forecasts for the remaining endogenous variables conditional on those known values. Other times, you may not know the values but would nevertheless like to specify a path for some endogenous variables and see how the others would evolve conditional on that path. To accomplish these types of exercises, you can use the actuals option of forecast solve. Example 1: Specifying alternative scenarios gdpoil.dta contains quarterly data on the annualized growth rate of GDP and the percentage change in the quarterly average price of oil through the end of 2007. We want to explore how GDP would have evolved if the price of oil had risen 10% in each of the first three quarters of 2008 and then held steady for several years. We will use a bivariate vector autoregression (VAR) to forecast the variables gdp and oil. Results obtained from the varsoc command indicate that the Hannan–Quinn information criterion is minimized when the VAR includes two lags. First, we fit our VAR model and store the estimation results: 256 forecast solve — Obtain static and dynamic forecasts . use http://www.stata-press.com/data/r14/gdpoil . var gdp oil, lags(1 2) Vector autoregression Sample: 1986q4 - 2007q4 Number of obs Log likelihood = -500.0749 AIC FPE = 559.0724 HQIC Det(Sigma_ml) = 441.7362 SBIC Equation Parms gdp oil 5 5 Coef. RMSE R-sq chi2 P>chi2 1.88516 11.8776 0.1820 0.1140 18.91318 10.93614 0.0008 0.0273 Std. Err. z P>|z| = = = = 85 12.00176 12.11735 12.28913 [95% Conf. Interval] gdp gdp L1. L2. .1498285 .3465238 .1015076 .1022446 1.48 3.39 0.140 0.001 -.0491227 .146128 .3487797 .5469196 oil L1. L2. -.0374609 .0119564 .0167968 .0164599 -2.23 0.73 0.026 0.468 -.070382 -.0203043 -.0045399 .0442172 _cons 1.519983 .4288145 3.54 0.000 .6795226 2.360444 gdp L1. L2. .8102233 1.090244 .6395579 .6442017 1.27 1.69 0.205 0.091 -.4432871 -.1723684 2.063734 2.352856 oil L1. L2. .0995271 -.1870052 .1058295 .103707 0.94 -1.80 0.347 0.071 -.1078949 -.3902672 .3069491 .0162568 _cons -4.041859 2.701785 -1.50 0.135 -9.33726 1.253543 oil . estimates store var The dataset ends in the fourth quarter of 2007, so before we can produce forecasts for 2008 and beyond, we need to extend our dataset. We can do that using the tsappend command. Here we extend our dataset three years: . tsappend, add(12) forecast solve — Obtain static and dynamic forecasts 257 Now we can create a forecast model and obtain baseline forecasts: . forecast create oilmodel Forecast model oilmodel started. . forecast estimates var Added estimation results from var. Forecast model oilmodel now contains 2 endogenous variables. . forecast solve, prefix(bl_) Computing dynamic forecasts for model oilmodel. Starting period: 2008q1 Ending period: 2010q4 Forecast prefix: bl_ 2008q1: ................. (output omitted ) 2010q4: ............ Forecast 2 variables spanning 12 periods. To see how GDP evolves if oil prices increase 10% in each of the first three quarters of 2008 and then remain flat, we need to obtain a forecast for gdp conditional on a specified path for oil. The actuals option of forecast solve will do that for us. With the actuals option, if an endogenous variable contains a nonmissing value for the period currently being forecast, forecast solve will use that value as the forecast, overriding whatever value might be produced by that variable’s underlying estimation result or identity. Then the endogenous variables with missing values will be forecast conditional on the endogenous variables that do have valid data. Here we fill in oil with our hypothesized price path: . replace oil = 10 if qdate == tq(2008q1) (1 real change made) . replace oil = 10 if qdate == tq(2008q2) (1 real change made) . replace oil = 10 if qdate == tq(2008q3) (1 real change made) . replace oil = 0 if qdate > tq(2008q3) (9 real changes made) Now we obtain forecasts conditional on our oil variable. We will use the prefix alt for these forecast variables: . forecast solve, prefix(alt_) actuals Computing dynamic forecasts for model oilmodel. Starting period: Ending period: Forecast prefix: 2008q1 2010q4 alt_ 2008q1: ............... (output omitted ) 2010q4: ........... Forecast 2 variables spanning 12 periods. Forecasts used actual values if available. 258 forecast solve — Obtain static and dynamic forecasts Finally, we make a variable containing the difference between our alternative and our baseline gdp forecasts and graph it: . generate diff_gdp = alt_gdp - bl_gdp Change in Annualized GDP Growth −.4 −.3 −.2 −.1 0 .1 Oil’s Effect on GDP 0 4 8 12 Quarters since shock Assumes oil increases 10% for 3 quarters, then holds steady Our model indicates GDP growth would be about 0.4% less in the second through fourth quarters of 2008 than it would otherwise be, but would be mostly unaffected thereafter if oil prices followed our hypothetical path. The one-quarter lag in the response of GDP is due to our using a VAR model. In our VAR model, lagged values of oil predict the current value of gdp, but the current value of oil does not. Technical note The previous example allowed us to demonstrate forecast solve’s actuals option, but in fact measuring the economy’s response to oil shocks is much more difficult than our simple VAR analysis would suggest. One obvious complication is that positive and negative oil price shocks do not have symmetric effects on the economy. In our simple model, if a 50% increase in oil prices lowers GDP by x%, then a 50% decrease in oil prices must raise GDP by x%. However, a 50% decrease in oil prices is perhaps more likely to portend weakness in the economy rather than an imminent growth spurt. See, for example, Hamilton (2003) and Kilian and Vigfusson (2013). Another way to specify alternative scenarios for your forecasts is to use the forecast adjust command. That command is more flexible in the types of manipulations you can perform on endogenous variables but, depending on the task at hand, may involve more effort. The actuals option of the forecast solve and the forecast adjust commands are complementary. There is much overlap in what you can achieve; in some situations, specifying the actuals option will be easier, while in other situations, using adjustments via forecast adjust will prove to be easier. forecast solve — Obtain static and dynamic forecasts 259 Using simulations to measure forecast accuracy To motivate the discussion, we will focus on the simple linear regression model. Even though forecast can handle models with many equations with equal ease, all the issues that arise can be illustrated with one equation. Suppose we have the following relationship between variables y and x: yt = α + βxt + t (1) where t is a zero-mean error term. Say we fit (1) by ordinary least squares (OLS) using observations 1, . . . , T and obtain the point estimates α b and βb. Assuming we have data for exogenous variable x at time T + 1, we could forecast yT +1 as b T +1 ybT +1 = α b + βx (2) However, there are several factors that prevent us from guaranteeing ex ante that yT +1 will indeed equal ybT +1 . We must assume that (1) specifies the correct relationship between y and x. Even if that relationship held for times 1 through T , are we sure it will hold at time T + 1? Uncertainty due to issues like that are inherent to the type of forecasting that the forecast commands are designed for. Here we discuss two additional sources of uncertainty that forecast solve can help you measure. First, we estimated α and β by OLS to obtain α b and βb, but we must emphasize the word estimated. Our estimates are subject to sampling error. When you fit a regression using regress or any other estimation command, Stata presents not just the point estimates of the parameters but also the standard errors and confidence intervals representing the level of uncertainty surrounding those point estimates. Uncertainty surrounding the true values of α and β mean that there is some level of uncertainty surrounding our predicted value ybT +1 as well. Second, (1) states that yt depends not just on α, β , and xt but also on an unobserved error term t . When we make our forecast using (2), we assume that the error term will equal its expected value of zero. Saying a random error has an expected value of zero is clearly not the same as saying it will be zero every time. If a positive outside shock occurs at T + 1, yT +1 will be higher than our estimate based on (2) would lead us to believe. Fortunately, quantifying both these sources of uncertainty is straightforward using simulation. First, we solve our model as usual, providing us with our point forecasts. To see how uncertainty surrounding our estimated parameters affects our forecasts, we can take random draws from a multivariate normal b and whose variance is the covariance matrix produced by regress. distribution whose mean is (b α, β) We then solve our model using these randomly drawn parameters rather than the original point estimates. If we repeat the process of drawing random parameters and solving the model many times, we can use the variance or standard deviation across replications for each time period as a measure of uncertainty. To account for uncertainty surrounding the error term, we can also use simulation. Here, at each replication, we add a random noise term to our forecast for yT +1 , where we draw our random errors such that they have the same characteristics as t . There are two ways we can do that. First, all the estimation commands commonly used in forecasting provide us with an estimate of the variance or standard deviation of the error term. For example, regress labels the estimated standard deviation of the error term “Root RMSE” and conveniently saves it in a macro that forecast can access. If we are willing to assume that all the errors in the equations in our model are normally distributed, then we can use random-normal errors drawn with means equal to zero and variances as reported by the estimation command used to fit each equation. Sometimes the assumption of normality is unpalatable. In those cases, an alternative is to solve the model to obtain static forecasts and then compute the sample residuals based on the observations for which we have nonmissing values of the endogenous variables. Then in our simulations, we randomly choose one of the residuals observed for that equation. 260 forecast solve — Obtain static and dynamic forecasts At each replication, whether we draw errors based on the normal errors or from the pool of static-forecast residuals, we add the drawn value to our estimate of ybT +1 to provide a simulated value for our forecast. Then, just like when simulating parameter uncertainty, we can use the variance or standard deviation across replications to measure uncertainty. In fact, we can perform simulations that draw both random parameters and random errors to account for both sources of uncertainty at once. Example 2: Accounting for parameter uncertainty Here we revisit our Klein (1950) model from example 1 of [TS] forecast and perform simulations in which we account for uncertainty associated with the estimated parameters of the model. First, we load the dataset and set up our model: . . > . . use http://www.stata-press.com/data/r14/klein2, clear quietly reg3 (c p L.p w) (i p L.p L.k) (wp y L.y yr), endog(w p y) exog(t wg g) estimates store klein forecast create kleinmodel, replace (Forecast model oilmodel ended.) Forecast model kleinmodel started. . forecast estimates klein Added estimation results from reg3. Forecast model kleinmodel now contains 3 endogenous variables. . forecast identity y = c + i + g Forecast model kleinmodel now contains 4 endogenous variables. . forecast Forecast . forecast Forecast identity p = y model kleinmodel identity k = L.k model kleinmodel t - wp now contains 5 endogenous variables. + i now contains 6 endogenous variables. . forecast Forecast . forecast Forecast . forecast Forecast . forecast Forecast . forecast Forecast identity w = wg + wp model kleinmodel now exogenous wg model kleinmodel now exogenous g model kleinmodel now exogenous t model kleinmodel now exogenous yr model kleinmodel now contains 7 endogenous variables. contains 1 declared exogenous variable. contains 2 declared exogenous variables. contains 3 declared exogenous variables. contains 4 declared exogenous variables. Now we are ready to solve our model. We are going to begin dynamic forecasts in 1936, and we are going to perform 100 replications. We will store the point forecasts in variables prefixed with d , and we will store the standard deviations of our forecasts in variables prefixed with sd . Because the simulations involve the use of random numbers, we must remember to set the random-number seed if we want to be able to replicate our results; see [R] set seed. We type forecast solve — Obtain static and dynamic forecasts 261 . set seed 1 . forecast solve, prefix(d_) begin(1936) > simulate(betas, statistic(stddev, prefix(sd_)) reps(100)) Computing dynamic forecasts for model kleinmodel. Starting period: 1936 Ending period: 1941 Forecast prefix: d_ 1936: ............................................ 1937: .......................................... 1938: ............................................. 1939: ............................................. 1940: ............................................ 1941: .............................................. Performing simulations (100) 1 2 3 4 5 .................................................. 50 .................................................. 100 Forecast 7 variables spanning 6 periods. The key here is the simulate() option. We requested that forecast solve perform 100 simulations by taking random draws for the parameters (betas), and we requested that it record the standard deviation (stddev) of each endogenous variable in new variables that begin with sd . Next we compute the upper and lower bounds of a 95% prediction interval for our forecast of total income y: . generate d_y_up = d_y + invnormal(0.975)*sd_y (16 missing values generated) . generate d_y_dn = d_y + invnormal(0.025)*sd_y (16 missing values generated) We obtained 16 missing values after each generate because the simulation summary variables only contain nonmissing data for the periods in which forecasts were made. The point-forecast variables that begin with d in this example are filled in with the corresponding actual values of the endogenous variables for periods before the beginning of the forecast horizon; in our experience, having both the historical data and forecasts in one set of variables simplifies many tasks. Here we graph our forecast of total income along with the 95% prediction interval: 262 forecast solve — Obtain static and dynamic forecasts 50 60 70 80 90 100 Total Income 1935 1937 1939 1941 Solid lines denote actual values. Dashed lines denote forecast values. 95% confidence bands based on parameter uncertainty. Our next example will use the same forecast model, but we will not need the forecast variables we just created. forecast drop makes removing those variables easy: . forecast drop (dropped 14 variables) forecast drop drops all variables created by the previous invocation of forecast solve, including both the point-forecast variables and any variables that contain simulation results. In this case, forecast drop will remove all the variables that begin with sd as well as d y, d c, d i, and so on. However, we are not done yet. We created the variables d y dn and d y up ourselves, and they were not part of the forecast model. Therefore, they are not removed by forecast drop, and we need to do that ourselves: . drop d_y_dn d_y_up Example 3: Accounting for both parameter uncertainty and random errors In the previous example, we measured uncertainty in our model stemming from the fact that our parameters were estimated. Here we not only simulate random draws for the parameters but also add random-normal errors to the stochastic equations. We type forecast solve — Obtain static and dynamic forecasts 263 . set seed 1 . forecast solve, prefix(d_) begin(1936) > simulate(betas errors, statistic(stddev, prefix(sd_)) reps(100)) Computing dynamic forecasts for model kleinmodel. Starting period: 1936 Ending period: 1941 Forecast prefix: d_ 1936: ............................................ 1937: .......................................... 1938: ............................................. 1939: ............................................. 1940: ............................................ 1941: .............................................. Performing simulations (100) 1 2 3 4 5 .................................................. 50 .................................................. 100 Forecast 7 variables spanning 6 periods. The only difference between this call to forecast solve and the one in the previous example is that here we specified betas errors in the simulate() option rather than just betas. Had we wanted to perform simulations involving the parameters and random draws from the pool of static-forecast residuals rather than random-normal errors, we would have specified betas residuals. After we re-create the variables containing the bounds on our prediction interval, we obtain the following graph: 50 60 70 80 90 100 Total Income 1935 1937 1939 1941 Solid lines denote actual values. Dashed lines denote forecast values. 95% confidence bands based on parameter uncertainty and normally distributed errors. Notice that by accounting for both parameter and additive error uncertainty, our prediction interval became much wider. 264 forecast solve — Obtain static and dynamic forecasts Stored results forecast solve stores the following in r(): Scalars r(first obs) r(last obs) r(Npanels) r(Nvar) r(vtolerance) r(ztolerance) r(iterate) r(sim nreps) r(damping) Macros r(prefix) r(suffix) r(actuals) r(static) r(double) r(sim technique) r(logtype) first observation in forecast horizon last observation in forecast horizon (of first panel if forecasting panel data) number of panels forecast number of forecast variables tolerance for forecast values tolerance for function zero maximum number of iterations number of simulations damping parameter for damped Gauss–Seidel forecast variable prefix forecast variable suffix actuals, if specified static, if specified double, if specified specified sim technique on, off, brief, or detail Methods and formulas Formalizing the definition of a model provided in [TS] forecast, we represent the endogenous variables in the model as the k × 1 vector y, and we represent the exogenous variables in the model as the m × 1 vector x. We refer to the contemporaneous values as yt and xt ; for notational simplicity, we refer to lagged values as yt−1 and xt−1 with the implication that further lags of the variables can also be included with no loss of generality. We use θ to refer to the vector of all the estimated parameters in all the equations of the model. We use ut and ut−1 to refer to contemporaneous and lagged error terms, respectively. The forecast commands solve models of the form yit = fi (y−i,t , yt−1 , xt , xt−1 , ut , ut−1 ; θ) (3) where i = 1, . . . , k and y−i,t refers to the k − 1 × 1 vector of endogenous variables other than yi at time t. If equation j is an identity, we take ujt = 0 for all t; for stochastic equations, the errors correspond to the usual regression error terms. Equation (3) does not include subscripts indexing panels for notational simplicity, but the extension is obvious. A model is solveable if k ≥ 1. m may be zero. Endogenous variables are added to the forecast model via forecast estimates, forecast identity, and forecast coefvector. Equations added via forecast estimates are always stochastic, while equations added via forecast identity are always nonstochastic. Equations added via forecast coefvector are treated as stochastic if options variance() or errorvariance() (or both) are specified and nonstochastic if neither is specified. Exogenous variables are declared using forecast exogenous, but the model may contain additional exogenous variables. For example, the right-hand side of an equation may contain exogenous variables that are not declared using forecast exogenous. Before solving the model, forecast solve determines whether the declared exogenous variables contain missing values over the forecast horizon and issues an informative error message if any do. Undeclared exogenous variables that contain missing values within the forecast horizon will cause forecast solve to exit with a less-informative error message and require the user to do more work to pinpoint the problem. forecast solve — Obtain static and dynamic forecasts 265 Adjustments added via forecast adjust easily fit within the framework of (3). Simply let fi (·) represent the value of yit obtained by first evaluating the appropriate estimation result, coefficient vector, or identity and then performing the adjustments based on that intermediate result. Endogenous variables may have multiple adjustments; adjustments are made in the order in which they were specified via forecast adjust. For single-equation estimation results and coefficient vectors as well as identities, adjustments are performed right after the equation is evaluated. For multiple-equation estimation results and coefficient vectors, adjustments are made after all the equations within that set of results are evaluated. Suppose an estimation result that uses predict includes two left-hand-side variables, y1t and y2t , and you have added two adjustments to y1t and one adjustment to y2t . Here forecast solve first calls predict twice to obtain candidate values for y1t and y2t ; then it performs the two adjustments to y1t , and finally it adjusts y2t . forecast solve offers four solution techniques: Gauss–Seidel, damped Gauss–Seidel, Broyden– Powell, and Newton–Raphson. The Gauss–Seidel techniques are simple iterative techniques that are often fast and typically work well, particularly when a damping factor is used. Gauss–Seidel is simply damped Gauss–Seidel without damping (a damping factor of 0). By default, damped Gauss–Seidel with a damping factor of 0.2 is used, representing a small amount of damping. As Fair (1984, 250) notes, while these techniques often work well, there is no guarantee that they will converge. Technique Newton–Raphson typically works well but is slow because it requires the use of numerical derivatives at every iteration to obtain a Jacobian matrix. The Broyden–Powell (Broyden 1970; Powell 1970) method is analogous to quasi-Newton methods used for function optimization in that an updating method is used at each iteration to update an estimate of the Jacobian matrix rather than actually recalculating it. For additional details as well as a discussion of the convergence criteria, see [M-5] solvenl( ). If you do not specify the begin() option, forecast solve uses the following algorithm to select the starting time period. Suppose the time variable t runs from 1 to T . If, at time T , none of the endogenous variables contains missing values, forecast solve exits with an error message: there are no periods in which the endogenous variables are not known; therefore, there are no periods where a forecast is obviously required. Otherwise, consider period T − 1. If none of the endogenous variables contains missing values in that period, then the only period to forecast is T . Otherwise, work back through time to find the latest period in which all the endogenous variables contain nonmissing values and then begin forecasting in the subsequent period. In the case of panel datasets, the same algorithm is applied to each panel, and forecasts for all panels begin on the earliest period selected. When you specify the simulate() option with sim technique betas, forecast solve draws random vectors from the multivariate normal distribution for each estimation result individually. The mean and variance are based on the estimation result’s e(b) and e(V) macros, respectively. If the estimation result is from a multiple-equation estimator, the corresponding Stata command stores in e(b) and e(V) the full parameter vector and covariance matrix for all equations so that forecast solve’s simulations will account for covariances among parameters in that estimation result’s equations. However, covariances among parameters that appear in different estimation results are taken to be zero. If you specify a coefficient vector using forecast coefvector and specify a variance matrix in the variance() option, then those coefficient vectors are simulated just like the parameter vectors from estimation results. If you do not specify the variance() option, then the coefficient vector is assumed to be nonstochastic and therefore is not simulated. When you specify the simulate() option with sim technique residuals, forecast solve first obtains static forecasts from your model for all possible periods. For each endogenous variable defined by a stochastic equation, it then computes residuals as the forecast value minus the actual value for all observations with nonmissing data. At each replication and for each period in the forecast horizon, forecast solve randomly selects one element from each stochastic equation’s pool of residuals before solving the model for that replication and period. Then whenever forecast solve 266 forecast solve — Obtain static and dynamic forecasts evaluates a stochastic equation, it adds the chosen element to the predicted value for that equation. Suppose an estimation result represents a multiple-equation estimator with m equations, and suppose that there are n time periods for which sample residuals are available. Arrange the residuals into the n × m matrix R. Then when forecast solve is randomly selecting residuals for this estimation result, it will choose a random number j between 1 and n and select the entire j th row from R. That preserves the correlation structure among the error terms of the estimation result’s equations. If you specify a coefficient vector using forecast coefvector and specify either the variance() option or the errorvariance() option (or both), sim technique residuals considers the equation represented by the coefficient vector to be stochastic and resamples residuals for that equation. When you specify the simulate() option with sim technique errors, forecast solve, for each stochastic equation, replication, and period, takes a random draw from a multivariate normal distribution with zero mean before solving the model for that replication and period. Then whenever forecast solve evaluates a stochastic equation, it adds that random draw to the predicted value for that equation. The variance of the distribution from which errors are drawn is based on the estimation results for that equation. The forecast commands look in e(rmse), e(sigma), and e(Sigma) to find the estimated variance. If you add an estimation result that does not set any of those three macros and you request sim technique errors, forecast solve exits with an error message. Multiple-equation commands typically set e(Sigma) so that the randomly drawn errors reflect the estimated error correlation structure. If you specify a coefficient vector using forecast coefvector and specify the errorvariance() option, sim technique errors simulates errors for that equation. Otherwise, the equation is treated like an identity and no errors are added. forecast solve solves panel-data models by solving for all periods in the forecast horizon for the first panel in the dataset, then the second dataset, and so on. When you perform simulations with panel datasets, one replication is completed for all panels in the dataset before moving to the next replication. Simulations that include residual resampling select residuals from the pool containing residuals for all panels; forecast solve does not restrict itself to the static-forecast residuals for a single panel when simulating that panel. References Broyden, C. G. 1970. Recent developments in solving nonlinear algebraic systems. In Numerical Methods for Nonlinear Algebraic Equations, ed. P. Rabinowitz, 61–73. London: Gordon and Breach Science Publishers. Fair, R. C. 1984. Specification, Estimation, and Analysis of Macroeconometric Models. Cambridge, MA: Harvard University Press. Hamilton, J. D. 2003. What is an oil shock? Journal of Econometrics 113: 363–398. Kilian, L., and R. J. Vigfusson. 2013. Do oil prices help forecast U.S. real GDP? The role of nonlinearities and asymmetries. Journal of Business and Economic Statistics 31: 78–93. Klein, L. R. 1950. Economic Fluctuations in the United States 1921–1941. New York: Wiley. Powell, M. J. D. 1970. A hybrid method for nonlinear equations. In Numerical Methods for Nonlinear Algebraic Equations, ed. P. Rabinowitz, 87–114. London: Gordon and Breach Science Publishers. Also see [TS] forecast — Econometric model forecasting [TS] forecast adjust — Adjust a variable by add factoring, replacing, etc. [TS] forecast drop — Drop forecast variables [R] set seed — Specify random-number seed and state Title irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs Description Also see Quick start Syntax Remarks and examples References Description irf creates and manipulates IRF files that contain estimates of the IRFs, dynamic-multiplier functions, and forecast-error variance decompositions (FEVDs) created after estimation by var, svar, or vec; see [TS] var, [TS] var svar, or [TS] vec. irf creates and manipulates IRF files that contain estimates of the IRFs created after estimation by arima or arfima; see [TS] arima or [TS] arfima. IRFs and FEVDs are described below, and the process of analyzing them is outlined. After reading this entry, please see [TS] irf create. Quick start Fit a VAR model var y1 y2 y3 Create impulse–response function myirf and IRF file myirfs.irf irf create myirf, set(myirfs) Graph orthogonalized impulse–response function for dependent variables y1 and y2 given a shock to y1 irf graph oirf, impulse(y1) response(y1 y2) As above, but present results in a table irf table oirf, impulse(y1) response(y1 y2) Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. See [TS] irf add, [TS] irf cgraph, [TS] irf ctable, [TS] irf describe, [TS] irf drop, [TS] irf graph, [TS] irf ograph, [TS] irf rename, [TS] irf set, and [TS] irf table for additional Quick starts. 267 268 irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs Syntax irf subcommand . . .  , ...  subcommand Description create set create IRF file containing IRFs, dynamic-multiplier functions, and FEVDs set the active IRF file graph cgraph ograph table ctable graph results from active file combine graphs of IRFs, dynamic-multiplier functions, and FEVDs graph overlaid IRFs, dynamic-multiplier functions, and FEVDs create tables of IRFs, dynamic-multiplier functions, and FEVDs from active file combine tables of IRFs, dynamic-multiplier functions, and FEVDs describe add drop rename describe contents of active file add results from an IRF file to the active IRF file drop IRF results from active file rename IRF results within a file IRF stands for impulse–response function; FEVD stands for forecast-error variance decomposition. irf can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Remarks and examples An IRF measures the effect of a shock to an endogenous variable on itself or on another endogenous variable; see Lütkepohl (2005, 51–63) and Hamilton (1994, 318–323) for formal definitions. Becketti (2013) provides an approachable, gentle introduction to IRF analysis. Of the many types of IRFs, irf create estimates the five most important: simple IRFs, orthogonalized IRFs, cumulative IRFs, cumulative orthogonalized IRFs, and structural IRFs. A dynamic-multiplier function, or transfer function, measures the impact of a unit increase in an exogenous variable on the endogenous variables over time; see Lütkepohl (2005, chap. 10) for formal definitions. irf create estimates simple and cumulative dynamic-multiplier functions after var. The forecast-error variance decomposition (FEVD) measures the fraction of the forecast-error variance of an endogenous variable that can be attributed to orthogonalized shocks to itself or to another endogenous variable; see Lütkepohl (2005, 63–66) and Hamilton (1994, 323–324) for formal definitions. Of the many types of FEVDs, irf create estimates the two most important: Cholesky and structural. irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs 269 To analyze IRFs and FEVDs in Stata, you first fit a model, then use irf create to estimate the IRFs and FEVDs and save them in a file, and finally use irf graph or any of the other irf analysis commands to examine results: . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk (output omitted ) . irf create order1, step(10) set(myirf1) (file myirf1.irf created) (file myirf1.irf now active) (file myirf1.irf updated) . irf graph oirf, impulse(dln_inc) response(dln_consump) order1, dln_inc, dln_consump .006 .004 .002 0 −.002 0 5 10 step 95% CI orthogonalized irf Graphs by irfname, impulse variable, and response variable Multiple sets of IRFs and FEVDs can be placed in the same file, with each set of results in a file bearing a distinct name. The irf create command above created file myirf1.irf and put one set of results in it, named order1. The order1 results include estimates of the simple IRFs, orthogonalized IRFs, cumulative IRFs, cumulative orthogonalized IRFs, and Cholesky FEVDs. IRF files are just files: they can be erased by erase, listed by dir, and copied by copy; see [D] erase, [D] dir, and [D] copy. Below we use the same estimated var but use a different Cholesky ordering to create a second set of IRF results, which we will save as order2 in the same file, and then we will graph both results: 270 irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs . irf create order2, step(10) order(dln_inc dln_inv dln_consump) (file myirf1.irf updated) . irf graph oirf, irf(order1 order2) impulse(dln_inc) response(dln_consump) order1, dln_inc, dln_consump order2, dln_inc, dln_consump .01 .005 0 −.005 0 5 10 0 5 10 step 95% CI orthogonalized irf Graphs by irfname, impulse variable, and response variable We have compared results for one model under two different identification schemes. We could just as well have compared results of two different models. We now use irf table to display the results tabularly: . irf table oirf, irf(order1 order2) impulse(dln_inc) response(dln_consump) Results from order1 order2 step 0 1 2 3 4 5 6 7 8 9 10 (1) oirf (1) Lower (1) Upper (2) oirf (2) Lower (2) Upper .004934 .001309 .003573 -.000692 .000905 .000328 .000021 .000154 .000026 .000026 .000026 .003016 -.000931 .001285 -.002333 -.000541 -.0005 -.000675 -.000206 -.000248 -.000121 -.000061 .006852 .003549 .005862 .00095 .002351 .001156 .000717 .000515 .0003 .000174 .000113 .005244 .001235 .00391 -.000677 .00094 .000341 .000042 .000161 .000027 .00003 .000027 .003252 -.001011 .001542 -.002347 -.000576 -.000518 -.000693 -.000218 -.000261 -.000125 -.000065 .007237 .003482 .006278 .000993 .002456 .001201 .000777 .00054 .000315 .000184 .00012 95% lower and upper bounds reported (1) irfname = order1, impulse = dln_inc, and response = dln_consump (2) irfname = order2, impulse = dln_inc, and response = dln_consump Both the table and the graph show that the two orthogonalized IRFs are essentially the same. In both functions, an increase in the orthogonalized shock to dln inc causes a short series of increases in dln consump that dies out after four or five periods. irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs 271 References Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Box-Steffensmeier, J. M., J. R. Freeman, M. P. Hitt, and J. C. W. Pevehouse. 2014. Time Series Analysis for the Social Science. New York: Cambridge University Press. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Also see [TS] arfima — Autoregressive fractionally integrated moving-average models [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] var — Vector autoregressive models [TS] var svar — Structural vector autoregressive models [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [TS] vec — Vector error-correction models [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title irf add — Add results from an IRF file to the active IRF file Description Option Quick start Remarks and examples Menu Also see Syntax Description irf add copies results from an existing IRF file on disk to the active IRF file, set by irf set; see [TS] irf set. Quick start Copy the IRF results myirf1 from myirfs.irf to newirf in the active IRF file irf add newirf = myirf1, using(myirfs) As above, but copy all IRF results from myirfs.irf to the active file irf add _all, using(myirfs) Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > Manage IRF results and files 272 > Add IRF results irf add — Add results from an IRF file to the active IRF file 273 Syntax irf add    all | newname= oldname . . . , using(irf filename) Option using(irf filename) specifies the file from which results are to be obtained and is required. If irf filename is specified without an extension, .irf is assumed. Remarks and examples If you have not read [TS] irf, please do so. Example 1 After fitting a VAR model, we create two separate IRF files: . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk (output omitted ) . irf create original, set(irf1, replace) (file irf1.irf created) (file irf1.irf now active) (file irf1.irf updated) . irf create order2, order(dln_inc dln_inv dln_consump) set(irf2, replace) (file irf2.irf created) (file irf2.irf now active) (file irf2.irf updated) We copy IRF results original to the active file giving them the name order1. . irf add order1 = original, using(irf1) (file irf2.irf updated) Here we create new IRF results and save them in the new file irf3. . irf (file (file (file create order3, order(dln_inc dln_consump dln_inv) set(irf3, replace) irf3.irf created) irf3.irf now active) irf3.irf updated) Now we copy all the IRF results in file irf2 into the active file. . irf add _all, using(irf2) (file irf3.irf updated) Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title irf cgraph — Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs Description Options Quick start Remarks and examples Menu Stored results Syntax Also see Description irf cgraph makes a graph or a combined graph of IRF results. A graph is drawn for specified combinations of named IRF results, impulse variables, response variables, and statistics. irf cgraph combines these graphs into one image, unless separate graphs are requested. irf cgraph operates on the active IRF file; see [TS] irf set. Quick start Combine graphs of an orthogonalized IRF myirf and cumulative IRF mycirf for dependent variables y1 and y2 irf cgraph (myirf y1 y2 oirf) (mycirf y1 y2 cirf) As above, but suppress confidence bands and add a title irf cgraph (myirf y1 y2 oirf) (mycirf y1 y2 cirf), noci title("My Title") /// Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > IRF and FEVD analysis 274 > Combined graphs irf cgraph — Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs 275 Syntax irf cgraph (spec1 )      (spec2 ) . . . (specN ) , options where (speck ) is (irfname impulsevar responsevar stat   , spec options ) irfname is the name of a set of IRF results in the active IRF file. impulsevar should be specified as an endogenous variable for all statistics except dm and cdm; for those, specify as an exogenous variable. responsevar is an endogenous variable name. stat is one or more statistics from the list below: stat Description Main irf oirf dm cirf coirf cdm fevd sirf sfevd impulse–response function orthogonalized impulse–response function dynamic-multiplier function cumulative impulse–response function cumulative orthogonalized impulse–response function cumulative dynamic-multiplier function Cholesky forecast-error variance decomposition structural impulse–response function structural forecast-error variance decomposition Notes: 1. No statistic may appear more than once. 2. If confidence intervals are included (the default), only two statistics may be included. 3. If confidence intervals are suppressed (option noci), up to four statistics may be included. options Description Main set(filename) make filename active Options combine options affect appearance of combined graph Y axis, X axis, Titles, Legend, Overall ∗ twoway options any options other than by() documented in [G-3] twoway options spec options level, steps, and rendition of plots and their CIs individual graph each combination individually ∗ spec options appear on multiple tabs in the dialog box. individual does not appear in the dialog box. 276 irf cgraph — Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs spec options Description Main suppress confidence bands noci Options level(#) lstep(#) ustep(#) set confidence level; default is level(95) use # for first step use # for maximum step Plots plot#opts(line options) affect rendition of the line plotting the # stat CI plots ci#opts(area options) affect rendition of the confidence interval for the # stat spec options may be specified within a graph specification, globally, or in both. When specified in a graph specification, the spec options affect only the specification in which they are used. When supplied globally, the spec options affect all graph specifications. When supplied in both places, options in the graph specification take precedence. Options   Main noci suppresses graphing the confidence interval for each statistic. noci is assumed when the model was fit by vec because no confidence intervals were estimated. set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the active file is used.   Options level(#) specifies the default confidence level, as a percentage, for confidence intervals, when they are reported. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals. The value set of an overall level() can be overridden by the level() inside a (speck ). lstep(#) specifies the first step, or period, to be included in the graph. lstep(0) is the default. ustep(#), # ≥ 1, specifies the maximum step, or period, to be included in the graph. combine options affect the appearance of the combined graph; see [G-2] graph combine.   Plots plot1opts(cline options), . . . , plot4opts(cline options) affect the rendition of the plotted statistics. plot1opts() affects the rendition of the first statistic; plot2opts(), the second; and so on. cline options are as described in [G-3] cline options.   CI plots ci1opts1(area options) and ci2opts2(area options) affect the rendition of the confidence intervals for the first (ci1opts()) and second (ci2opts()) statistics. See [TS] irf graph for a description of this option and [G-3] area options for the suboptions that change the look of the CI. irf cgraph — Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs  277  Y axis, X axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, excluding by(). These include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see [G-3] saving option). The following option is available with irf cgraph but is not shown in the dialog box: individual specifies that each graph be displayed individually. By default, irf cgraph combines the subgraphs into one image. Remarks and examples If you have not read [TS] irf, please do so. The relationship between irf cgraph and irf graph is syntactically and conceptually the same as that between irf ctable and irf table; see [TS] irf ctable for a description of the syntax. irf cgraph is much the same as using irf graph to make individual graphs and then using graph combine to put them together. If you cannot use irf cgraph to do what you want, consider the other approach. Example 1 You have previously issued the commands: . . . . . . . use http://www.stata-press.com/data/r14/lutkepohl2 mat a = (., 0, 0\0,.,0\.,.,.) mat b = I(3) svar dln_inv dln_inc dln_consump, aeq(a) beq(b) irf create modela, set(results3) step(8) svar dln_inc dln_inv dln_consump, aeq(a) beq(b) irf create modelb, step(8) 278 irf cgraph — Combined graphs of IRFs, dynamic-multiplier functions, and FEVDs You now type . irf cgraph (modela dln_inc dln_consump oirf sirf) > (modelb dln_inc dln_consump oirf sirf) > (modela dln_inc dln_consump fevd sfevd, lstep(1)) > (modelb dln_inc dln_consump fevd sfevd, lstep(1)), > title("Results from modela and modelb") Results from modela and modelb modela: dln_inc −> dln_consump modelb: dln_inc −> dln_consump .01 .006 .004 .005 .002 0 0 −.002 −.005 0 2 4 step 6 8 0 2 4 step 6 95% CI for oirf 95% CI for sirf 95% CI for oirf 95% CI for sirf oirf sirf oirf sirf modela: dln_inc −> dln_consump 8 modelb: dln_inc −> dln_consump .5 .5 .4 .4 .3 .3 .2 .2 .1 .1 0 2 4 step 6 8 0 2 4 step 6 95% CI for fevd 95% CI for sfevd 95% CI for fevd 95% CI for sfevd fevd sfevd fevd sfevd Stored results irf cgraph stores the following in r(): Scalars r(k) Macros r(individual) r(save) r(name) r(title) r(save#) r(name#) r(title#) r(ci#) r(response#) r(impulse#) r(irfname#) r(stats#) number of specific graph commands individual, if specified filename, replace from saving() option for combined graph name, replace from name() option for combined graph title of the combined graph filename, replace from saving() option for individual graphs name, replace from name() option for individual graphs title for the #th graph level applied to the #th confidence interval or noci response specified in the #th command impulse specified in the #th command IRF name specified in the #th command statistics specified in the #th command Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models 8 Title irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs Description Remarks and examples Quick start Methods and formulas Menu References Syntax Also see Options Description irf create estimates multiple sets of impulse–response functions (IRFs), dynamic-multiplier functions, and forecast-error variance decompositions (FEVDs). All of these estimates and their standard errors are known collectively as IRF results and are saved in an IRF file under a specified filename. Once you have created a set of IRF results, you can use the other irf commands to analyze them. Quick start Create impulse–response function myirf with 8 forecast periods in the active IRF file irf create myirf As above, and use IRF file myirfs.irf irf create myirf, set(myirfs) As above, but compute the IRF for 12 periods irf create myirf, set(myirfs) step(12) Note: irf can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics FEVDs > Multivariate time series > IRF and FEVD analysis 279 > Obtain IRFs, dynamic-multiplier functions, and 280 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs Syntax After var irf create irfname  , var options   , svar options  , vec options  , arima options  , arfima options After svar irf create irfname  After vec irf create irfname  After arima irf create irfname  After arfima irf create irfname  irfname is any valid name that does not exceed 15 characters. var options Description Main   set(filename , replace ) replace step(#) order(varlist) estimates(estname) make filename active replace irfname if it already exists set forecast horizon to #; default is step(8) specify Cholesky ordering of endogenous variables use previously stored results estname; default is to use active results Std. errors nose bs bsp nodots reps(#)   bsaving(filename , replace ) do not calculate standard errors obtain standard errors from bootstrapped residuals obtain standard errors from parametric bootstrap do not display “.” for each bootstrap replication use # bootstrap replications; default is reps(200) save bootstrap results in filename irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs svar options 281 Description Main   set(filename , replace ) replace step(#) estimates(estname) make filename active replace irfname if it already exists set forecast horizon to #; default is step(8) use previously stored results estname; default is to use active results Std. errors nose bs bsp nodots reps(#)   bsaving(filename , replace ) do not calculate standard errors obtain standard errors from bootstrapped residual obtain standard errors from parametric bootstrap do not display “.” for each bootstrap replication use # bootstrap replications; default is reps(200) save bootstrap results in filename vec options Description Main   set(filename , replace ) replace step(#) estimates(estname) make filename active replace irfname if it already exists set forecast horizon to #; default is step(8) use previously stored results estname; default is to use active results arima options Description Main   set(filename , replace ) replace step(#) estimates(estname) make filename active replace irfname if it already exists set forecast horizon to #; default is step(8) use previously stored results estname; default is to use active results Std. errors nose do not calculate standard errors arfima options Description Main   set(filename , replace ) replace step(#) smemory estimates(estname) make filename active replace irfname if it already exists set forecast horizon to #; default is step(8) calculate short-memory IRFs use previously stored results estname; default is to use active results Std. errors nose do not calculate standard errors 282 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs The default is to use asymptotic standard errors if no options are specified. irf create is for use after fitting a model with the var, svar, vec, arima, or arfima command; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. You must tsset your data before using var, svar, vec, arima, or arfima and, hence, before using irf create; see [TS] tsset. Options   Main set(filename[, replace]) specifies the IRF file to be used. If set() is not specified, the active IRF file is used; see [TS] irf set. If set() is specified, the specified file becomes the active file, just as if you had issued an irf set command. replace specifies that the results saved under irfname may be replaced, if they already exist. IRF results are saved in files, and one file may contain multiple IRF results. step(#) specifies the step (forecast) horizon; the default is eight periods. order(varlist) is allowed only after estimation by var; it specifies the Cholesky ordering of the endogenous variables to be used when estimating the orthogonalized IRFs. By default, the order in which the variables were originally specified on the var command is used. smemory is allowed only after estimation by arfima; it specifies that the IRFs are calculated based on a short-memory model with the fractional difference parameter d set to zero. estimates(estname) specifies that estimation results previously estimated by var, svar, or vec, and stored by estimates, be used. This option is rarely specified; see [R] estimates.   Std. errors nose, bs, and bsp are alternatives that specify how (whether) standard errors are to be calculated. If none of these options is specified, asymptotic standard errors are calculated, except in two cases: after estimation by vec and after estimation by svar in which long-run constraints were applied. In those two cases, the default is as if nose were specified, although in the second case, you could specify bs or bsp. After estimation by vec, standard errors are simply not available. nose specifies that no standard errors be calculated. bs specifies that standard errors be calculated by bootstrapping the residuals. bs may not be specified if there are gaps in the data. bsp specifies that standard errors be calculated via a multivariate-normal parametric bootstrap. bsp may not be specified if there are gaps in the data.   nodots, reps(#), and bsaving(filename , replace ) are relevant only if bs or bsp is specified. nodots specifies that dots not be displayed each time irf create performs a bootstrap replication. reps(#), # > 50, specifies the number of bootstrap replications to be performed. reps(200) is the default.   bsaving(filename , replace ) specifies that file filename be created and that the bootstrap replications be saved in it. New file filename is just a .dta dataset that can be loaded later using use; see [D] use. If filename is specified without an extension, .dta is assumed. irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 283 Remarks and examples If you have not read [TS] irf, please do so. An introductory example using IRFs is presented there. irf create estimates several types of IRFs, dynamic-multiplier functions, and FEVDs. Which estimates are saved depends on the estimation method previously used to fit the model, as summarized in the table below: Saves simple IRFs orthogonalized IRFs dynamic multipliers cumulative IRFs cumulative orthogonalized IRFs cumulative dynamic multipliers structural IRFs arima x x x x x Estimation command arfima var svar x x x x x x x x x x x x x x x x Cholesky FEVDs structural FEVDs x x x vec x x x x x Remarks are presented under the following headings: Introductory examples Technical aspects of IRF files IRFs and FEVDs IRF results for VARs An introduction to impulse–response functions for VARs An introduction to dynamic-multiplier functions for VARs An introduction to forecast-error variance decompositions for VARs IRF results for VECMs An introduction to impulse–response functions for VECMs An introduction to forecast-error variance decompositions for VECMs IRF results for ARIMA and ARFIMA Introductory examples Example 1: After var Below we compare bootstrap and asymptotic standard errors for a specific FEVD. We begin by fitting a VAR(2) model to the Lütkepohl data (we use the var command). We next use the irf create command twice, first to create results with asymptotic standard errors (saved under the name asymp) and then to re-create the same results, this time with bootstrap standard errors (saved under the name bs). Because bootstrapping is a random process, we set the random-number seed (set seed 123456) before using irf create the second time; this makes our results reproducible. Finally, we compare results by using the IRF analysis command irf ctable. 284 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr>=tq(1961q2) & qtr<=tq(1978q4), lags(1/2) (output omitted ) . irf create asymp, step(8) set(results1) (file results1.irf created) (file results1.irf now active) (file results1.irf updated) . set seed 123456 . irf create bs, step(8) bs reps(250) nodots (file results1.irf updated) . irf ctable (asymp dln_inc dln_consump fevd) (bs dln_inc dln_consump fevd), > noci stderror step 0 1 2 3 4 5 6 7 8 (1) fevd (1) S.E. (2) fevd (2) S.E. 0 .282135 .278777 .33855 .339942 .342813 .343119 .343079 .34315 0 .087373 .083782 .090006 .089207 .090494 .090517 .090499 .090569 0 .282135 .278777 .33855 .339942 .342813 .343119 .343079 .34315 0 .102756 .098161 .10586 .104191 .105351 .105258 .105266 .105303 (1) irfname = asymp, impulse = dln_inc, and response = dln_consump (2) irfname = bs, impulse = dln_inc, and response = dln_consump Point estimates are, of course, the same. The bootstrap estimates of the standard errors, however, are larger than the asymptotic estimates, which suggests that the sample size of 71 is not large enough for the distribution of the estimator of the FEVD to be well approximated by the asymptotic distribution. Here we would expect the bootstrap confidence interval to be more reliable than the confidence interval that is based on the asymptotic standard error. Technical note The details of the bootstrap algorithms are given in Methods and formulas. These algorithms are conditional on the first p observations, where p is the order of the fitted VAR. (In an SVAR model, p is the order of the VAR that underlies the SVAR.) The bootstrapped estimates are conditional on the first p observations, just as the estimators of the coefficients in VAR models are conditional on the first p observations. With bootstrap standard errors (option bs), the p initial observations are used with resampling the residuals to produce the bootstrap samples used for estimation. With the more parametric bootstrap (option bsp), the p initial observations are used with draws from a multivariate b to generate the bootstrap samples. normal distribution with variance–covariance matrix Σ Technical note b the estimated variance matrix of the disturbances, in For var and svar e() results, irf uses Σ, computing the asymptotic standard errors of all the functions. The point estimates of the orthogonalized impulse–response functions, the structural impulse–response functions, and all the variance irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 285 b As discussed in [TS] var, var and svar use the ML estimator of decompositions also depend on Σ. this matrix by default, but they have option dfk, which will instead use an estimator that includes a small-sample correction. Specifying dfk when the model is fit—when the var or svar command is b and will change the IRF results that depend on it. given—changes the estimate of Σ Example 2: After var with exogenous variables After fitting a VAR, irf create computes estimates of the dynamic multipliers, which describe the impact of a unit change in an exogenous variable on each endogenous variable. For instance, below we estimate and report the cumulative dynamic multipliers from a model in which changes in investment are exogenous. The results indicate that both of the cumulative dynamic multipliers are significant. . var dln_inc dln_consump if qtr>=tq(1961q2) & qtr<=tq(1978q4), lags(1/2) > exog(L(0/2).dln_inv) (output omitted ) . irf create dm, step(8) (file results1.irf updated) . irf table cdm, impulse(dln_inv) irf(dm) Results from dm step 0 1 2 3 4 5 6 7 8 step 0 1 2 3 4 5 6 7 8 (1) cdm (1) Lower (1) Upper .032164 .096568 .140107 .150527 .148979 .151247 .150267 .150336 .150525 -.027215 .003479 .022897 .032116 .031939 .033011 .033202 .032858 .033103 .091544 .189656 .257317 .268938 .26602 .269482 .267331 .267813 .267948 (2) cdm (2) Lower (2) Upper .058681 .062723 .126167 .136583 .146482 .146075 .145542 .146309 .145786 .012529 -.005058 .032497 .038691 .04442 .045201 .044988 .045315 .045206 .104832 .130504 .219837 .234476 .248543 .24695 .246096 .247304 .246365 95% lower and upper bounds reported (1) irfname = dm, impulse = dln_inv, and response = dln_inc (2) irfname = dm, impulse = dln_inv, and response = dln_consump 286 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs Example 3: After vec Although all IRFs and orthogonalized IRFs (OIRFs) from models with stationary variables will taper off to zero, some of the IRFs and OIRFs from models with first-difference stationary variables will not. This is the key difference between IRFs and OIRFs from systems of stationary variables fit by var or svar and those obtained from systems of first-difference stationary variables fit by vec. When the effect of the innovations dies out over time, the shocks are said to be transitory. In contrast, when the effect does not taper off, shocks are said to be permanent. In this example, we look at the OIRF from one of the VECMs fit to the unemployment-rate data analyzed in example 2 of [TS] vec. We see that an orthogonalized shock to Indiana has a permanent effect on the unemployment rate in Missouri: . use http://www.stata-press.com/data/r14/urates . vec missouri indiana kentucky illinois, trend(rconstant) rank(2) lags(4) (output omitted ) . irf create vec1, set(vecirfs) step(50) (file vecirfs.irf created) (file vecirfs.irf now active) (file vecirfs.irf updated) Now we can use irf graph to graph the OIRF of interest: . irf graph oirf, impulse(indiana) response(missouri) vec1, indiana, missouri .3 .2 .1 0 0 50 step Graphs by irfname, impulse variable, and response variable The graph shows that the estimated OIRF converges to a positive asymptote, which indicates that an orthogonalized innovation to the unemployment rate in Indiana has a permanent effect on the unemployment rate in Missouri. Technical aspects of IRF files This section is included for programmers wishing to extend the irf system. irf create estimates a series of impulse–response functions and their standard errors. Although these estimates are saved in an IRF file, most users will never need to look at the contents of this file. The IRF commands fill in, analyze, present, and manage IRF results. irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 287 IRF files are just Stata datasets that have names ending in .irf instead of .dta. The dataset in the file has a nested panel structure. Variable irfname contains the irfname specified by the user. Variable impulse records the name of the endogenous variable whose innovations are the impulse. Variable response records the name of the endogenous variable that is responding to the innovations. In a model with K endogenous variables, there are K 2 combinations of impulse and response. Variable step records the periods for which these estimates were computed. Below is a catalog of the statistics that irf create estimates and the variable names under which they are saved in the IRF file. Statistic impulse–response functions orthogonalized impulse–response functions dynamic-multiplier functions cumulative impulse–response functions cumulative orthogonalized impulse–response functions cumulative dynamic-multiplier functions Cholesky forecast-error decomposition structural impulse–response functions structural forecast-error decomposition standard error of the impulse–response functions standard error of the orthogonalized impulse–response functions standard error of the cumulative impulse–response functions standard error of the cumulative orthogonalized impulse–response functions standard error of the Cholesky forecast-error decomposition standard error of the structural impulse–response functions standard error of the structural forecast-error decomposition Name irf oirf dm cirf coirf cdm fevd sirf sfevd stdirf stdoirf stdcirf stdcoirf stdfevd stdsirf stdsfevd 288 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs In addition to the variables, information is stored in dta characteristics. Much of the following information is also available in r() after irf describe, where it is often more convenient to obtain the information. Characteristic dta[version] contains the version number of the IRF file, which is currently 1.1. Characteristic dta[irfnames] contains a list of all the irfnames in the IRF file. For each irfname, there are a series of additional characteristics: Name Contents dta[irfname dta[irfname dta[irfname dta[irfname dta[irfname model] order] exog] exogvars] constant] dta[irfname dta[irfname dta[irfname dta[irfname dta[irfname dta[irfname dta[irfname lags] exlags] tmin] tmax] timevar] tsfmt] varcns] dta[irfname svarcns] dta[irfname step] dta[irfname stderror] dta[irfname reps] dta[irfname version] dta[irfname dta[irfname dta[irfname dta[irfname dta[irfname rank] trend] veccns] sind] d] var, sr var, lr var, vec, arima, or arfima Cholesky order used in IRF estimates exogenous variables, and their lags, in VAR exogenous variables in VAR constant or noconstant, depending on whether noconstant was specified in var or svar lags in model lags of exogenous variables in model minimum value of timevar in the estimation sample maximum value of timevar in the estimation sample name of tsset timevar format of timevar constrained or colon-separated list of constraints placed on VAR coefficients constrained or colon-separated list of constraints placed on VAR coefficients maximum step in IRF estimates asymptotic, bs, bsp, or none, depending on the type of standard errors requested number of bootstrap replications performed version of the IRF file that originally held irfname IRF results number of cointegrating equations trend() specified in vec constraints placed on VECM parameters normalized seasonal indicators included in vec fractional difference parameter d in arfima IRFs and FEVDs irf create can estimate several types of IRFs and FEVDs for VARs and VECMs. irf create can also estimate IRFs and cumulative IRFs for ARIMA and ARFIMA models. We first discuss IRF results for VAR and SVAR models, and then we discuss them in the context of VECMs. Because the cointegrating VECM is an extension of the stationary VAR framework, the section that discusses the IRF results for VECMs draws on the earlier VAR material. We conclude our discussion with IRF results for ARIMA and ARFIMA models. irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 289 IRF results for VARs An introduction to impulse–response functions for VARs A pth-order vector autoregressive model (VAR) with exogenous variables is given by yt = v + A1 yt−1 + · · · + Ap yt−p + Bxt + ut where yt = (y1t , . . . , yKt )0 is a K × 1 random vector, the Ai are fixed K × K matrices of parameters, xt is an R0 × 1 vector of exogenous variables, B is a K × R0 matrix of coefficients, v is a K × 1 vector of fixed parameters, and ut is assumed to be white noise; that is, E(ut ) = 0 E(ut u0t ) = Σ E(ut u0s ) = 0 for t 6= s As discussed in [TS] varstable, a VAR can be rewritten in moving-average form only if it is stable. Any exogenous variables are assumed to be covariance stationary. Because the functions of interest in this section depend only on the exogenous variables through their effect on the estimated Ai , we can simplify the notation by dropping them from the analysis. All the formulas given below still apply, although the Ai are estimated jointly with B on the exogenous variables. Below we discuss conditions under which the IRFs and forecast-error variance decompositions have a causal interpretation. Although estimation requires only that the exogenous variables be predetermined, that is, that E(xjt uit ) = 0 for all i, j , and t, assigning a causal interpretation to IRFs and FEVDs requires that the exogenous variables be strictly exogenous, that is, that E(xjs uit ) = 0 for all i, j , s, and t. IRFs describe how the innovations to one variable affect another variable after a given number of periods. For an example of how IRFs are interpreted, see Stock and Watson (2001). They use IRFs to investigate the effect of surprise shocks to the Federal Funds rate on inflation and unemployment. In another example, Christiano, Eichenbaum, and Evans (1999) use IRFs to investigate how shocks to monetary policy affect other macroeconomic variables. Consider a VAR without exogenous variables: yt = v + A1 yt−1 + · · · + Ap yt−p + ut (1) The VAR represents the variables in yt as functions of its own lags and serially uncorrelated innovations ut . All the information about contemporaneous correlations among the K variables in yt is contained in Σ. In fact, as discussed in [TS] var svar, a VAR can be viewed as the reduced form of a dynamic simultaneous-equation model. To see how the innovations affect the variables in yt after, say, i periods, rewrite the model in its moving-average form ∞ X yt = µ + Φi ut−i (2) i=0 where µ is the K × 1 time-invariant mean of yt , and  IK Φ i = Pi j=1 Φi−j Aj if i = 0 if i = 1, 2, . . . 290 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs We can rewrite a VAR in the moving-average form only if it is stable. Essentially, a VAR is stable if the variables are covariance stationary and none of the autocorrelations are too high (the issue of stability is discussed in greater detail in [TS] varstable). The Φi are the simple IRFs. The j, k element of Φi gives the effect of a 1–time unit increase in the k th element of ut on the j th element of yt after i periods, holding everything else constant. Unfortunately, these effects have no causal interpretation, which would require us to be able to answer the question, “How does an innovation to variable k , holding everything else constant, affect variable j after i periods?” Because the ut are contemporaneously correlated, we cannot assume that everything else is held constant. Contemporaneous correlation among the ut implies that a shock to one variable is likely to be accompanied by shocks to some of the other variables, so it does not make sense to shock one variable and hold everything else constant. For this reason, (2) cannot provide a causal interpretation. This shortcoming may be overcome by rewriting (2) in terms of mutually uncorrelated innovations. Suppose that we had a matrix P, such that Σ = PP0 . If we had such a P, then P−1 ΣP0−1 = IK , and E{P−1 ut (P−1 ut )0 } = P−1 E{(ut u0t )P0−1 } = P−1 ΣP0−1 = IK We can thus use P−1 to orthogonalize the ut and rewrite (2) as yt = µ + ∞ X Φi PP−1 ut−i i=0 =µ+ ∞ X Θi P−1 ut−i i=0 =µ+ ∞ X Θi wt−i i=0 where Θi = Φi P and wt = P−1 ut . If we had such a P, the wk would be mutually orthogonal, and no information would be lost in the holding-everything-else-constant assumption, implying that the Θi would have the causal interpretation that we seek. Choosing a P is similar to placing identification restrictions on a system of dynamic simultaneous equations. The simple IRFs do not identify the causal relationships that we wish to analyze. Thus we seek at least as many identification restrictions as necessary to identify the causal IRFs. So, where do we get such a P? Sims (1980) popularized the method of choosing P to be the b The IRFs based on this choice of P are known as the orthogonalized Cholesky decomposition of Σ. b is equivalent to imposing a recursive IRFs. Choosing P to be the Cholesky decomposition of Σ structure for the corresponding dynamic structural equation model. The ordering of the recursive structure is the same as the ordering imposed in the Cholesky decomposition. Because this choice is arbitrary, some researchers will look at the OIRFs with different orderings assumed in the Cholesky decomposition. The order() option available with irf create facilitates this type of analysis. The SVAR approach integrates the need to identify the causal IRFs into the model specification and estimation process. Sufficient identification restrictions can be obtained by placing either short-run or long-run restrictions on the model. The VAR in (1) can be rewritten as yt − v − A1 yt−1 − · · · − Ap yt−p = ut Similarly, a short-run SVAR model can be written as A(yt − v − A1 yt−1 − · · · − Ap yt−p ) = Aut = Bet (3) irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 291 where A and B are K × K nonsingular matrices of parameters to be estimated, et is a K × 1 vector of disturbances with et ∼ N (0, IK ), and E(et e0s ) = 0K for all s 6= t. Sufficient constraints must be placed on A and B so that P is identified. One way to see the connection is to draw out the implications of the latter equality in (3). From (3) it can be shown that Σ = A−1 B(A−1 B)0 b and B b are obtained by maximizing the concentrated As discussed in [TS] var svar, the estimates A b obtained from the underlying VAR. The short-run log-likelihood function on the basis of the Σ b −1 B b to identify the causal IRFs. The long-run SVAR approach works SVAR approach chooses P = A b −1 B b −1 is the matrix of estimated long-run or accumulated b =A b , where A similarly, with P = C effects of the reduced-form VAR shocks. There is one important difference between long-run and short-run SVAR models. As discussed by Amisano and Giannini (1997, chap. 6), in the short-run model the constraints are applied directly to the parameters in A and B. Then A and B interact with the estimated parameters of the underlying VAR. In contrast, in a long-run model, the constraints are placed on functions of the estimated VAR parameters. Although estimation and inference of the parameters in C is straightforward, obtaining the asymptotic standard errors of the structural IRFs requires untenable assumptions. For this reason, irf create does not estimate the asymptotic standard errors of the structural IRFs generated by long-run SVAR models. However, bootstrap standard errors are still available. An introduction to dynamic-multiplier functions for VARs A dynamic-multiplier function measures the effect of a unit change in an exogenous variable on the endogenous variables over time. Per Lütkepohl (2005, chap. 10), if the VAR with exogenous variables is stable, it can be rewritten as yt = ∞ X i=0 Di xt−i + ∞ X Φi ut−i i=0 where the Di are the dynamic-multiplier functions. (See Methods and formulas for details.) Some authors refer to the dynamic-multiplier functions as transfer functions because they specify how a unit change in an exogenous variable is “transferred” to the endogenous variables. Technical note irf create computes dynamic-multiplier functions only after var. After short-run SVAR models, the dynamic multipliers from the VAR are the same as those from the SVAR. The dynamic multipliers for long-run SVARs have not yet been worked out. An introduction to forecast-error variance decompositions for VARs Another measure of the effect of the innovations in variable k on variable j is the FEVD. This method, which is also known as innovation accounting, measures the fraction of the error in forecasting variable j after h periods that is attributable to the orthogonalized innovations in variable k . Because deriving the FEVD requires orthogonalizing the ut innovations, the FEVD is always predicated upon a choice of P. 292 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs Lütkepohl (2005, sec. 2.2.2) shows that the h-step forecast error can be written as bt (h) = yt+h − y h−1 X Φi ut+h−i (4) i=0 where yt+h is the value observed at time t + h and ybt (h) is the h-step-ahead predicted value for yt+h that was made at time t. Because the ut are contemporaneously correlated, their distinct contributions to the forecast error cannot be ascertained. However, if we choose a P such that Σ = PP0 , as above, we can orthogonalize the ut into wt = P−1 ut . We can then ascertain the relative contribution of the distinct elements of wt . Thus we can rewrite (4) as bt (h) = yt+h − y h−1 X Φi PP−1 ut+h−i i=0 = h−1 X Θi wt+h−i i=0 Because the forecast errors can be written in terms of the orthogonalized errors, the forecasterror variance can be written in terms of the orthogonalized error variances. Forecast-error variance decompositions measure the fraction of the total forecast-error variance that is attributable to each orthogonalized shock. Technical note The details in this note are not critical to the discussion that follows. A forecast-error variance decomposition is derived for a given P. Per Lütkepohl (2005, sec. 2.3.3), letting θmn,i be the m, nth element of Θi , we can express the h-step forecast error of the j th component of yt as bj (h) = yj,t+h − y h−1 X θj1,1 w1,t+h−i + · · · + θjK,i wK,t+h−i i=0 = K X θjk,0 wk,t+h + · · · + θjk,h−1 wk,t+1 k=1 The wt , which were constructed using P, are mutually orthogonal with unit variance. This allows us to compute easily the mean squared error (MSE) of the forecast of variable j at horizon h in terms of the contributions of the components of wt . Specifically, 2 E[{yj,t+h − yj,t (h)} ] = K X 2 2 (θjk,0 + · · · + θjk,h−1 ) k=1 The k th term in the sum above is interpreted as the contribution of the orthogonalized innovations in variable k to the h-step forecast error of variable j . Note that the k th element in the sum above can be rewritten as h−1 X 2 2 2 (θjk,0 + · · · + θjk,h−1 )= e0j Θk ek i=0 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 293 where ei is the ith column of IK . Normalizing by the forecast error for variable j at horizon h yields Ph−1 ωjk,h = where MSE{yj,t (h)} = e0j Θk ek MSE{yj,t (h)} 2 i=0 Ph−1 PK i=0 2 k=1 θjk,i . Because the FEVD depends on the choice of P, there are different forecast-error variance decompositions associated with each distinct P. irf create can estimate the FEVD for a VAR or an b For an SVAR, P is the estimated structural SVAR. For a VAR, P is the Cholesky decomposition of Σ. −1 b b b for long-run SVAR models. Due to the decomposition, P = A B for short-run models and P = C same complications that arose with the structural impulse–response functions, the asymptotic standard errors of the structural FEVD are not available after long-run SVAR models, but bootstrap standard errors are still available. IRF results for VECMs An introduction to impulse–response functions for VECMs As discussed in [TS] vec intro, the VECM is a reparameterization of the VAR that is especially useful for fitting VARs with cointegrating variables. This implies that the estimated parameters for the corresponding VAR model can be backed out from the estimated parameters of the VECM model. This relationship means we can use the VAR form of the cointegrating VECM to discuss the IRFs for VECMs. Consider a cointegrating VAR with one lag with no constant or trend, yt = Ayt−1 + ut (5) where yt is a K × 1 vector of endogenous, first-difference stationary variables among which there are 1 ≤ r < K cointegration equations; A is K × K matrix of parameters; and ut is a K × 1 vector of i.i.d. disturbances. We developed intuition for the IRFs from a stationary VAR by rewriting the VAR as an infiniteorder vector moving-average (VMA) process. While the Granger representation theorem establishes the existence of a VMA formulation of this model, because the cointegrating VAR is not stable, the inversion is not nearly so intuitive. (See Johansen [1995, chapters 3 and 4] for more details.) For this reason, we use (5) to develop intuition for the IRFs from a cointegrating VAR. Suppose that K is 3, that u1 = (1, 0, 0), and that we want to analyze the time paths of the variables in y conditional on the initial values y0 = 0, A, and the condition that there are no more shocks to the system, that is, 0 = u2 = u3 = · · · . These assumptions and (5) imply that y1 = u1 y2 = Ay1 = Au1 y3 = Ay2 = A2 u1 294 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs and so on. The ith-row element of the first column of As contains the effect of the unit shock to the first variable after s periods. The first column of As contains the IRF of a unit impulse to the first variable after s periods. We could deduce the IRFs of a unit impulse to any of the other variables by administering the unit shock to one of them instead of to the first variable. Thus we can see that the (i, j)th element of As contains the unit IRF from variable j to variable i after s periods. By starting with orthogonalized shocks of the form P−1 ut , we can use the same logic to derive the OIRFs to be As P. For the stationary VAR, stability implies that all the eigenvalues of A have moduli strictly less than one, which in turn implies that all the elements of As → 0 as s → ∞. This implies that all the IRFs from a stationary VAR taper off to zero as s → ∞. In contrast, in a cointegrating VAR, some of the eigenvalues of A are 1, while the remaining eigenvalues have moduli strictly less than 1. This implies that in cointegrating VARs some of the elements of As are not going to zero as s → ∞, which in turn implies that some of the IRFs and OIRFs are not going to zero as s → ∞. The fact that the IRFs and OIRFs taper off to zero for stationary VARs but not for cointegrating VARs is one of the key differences between the two models. When the IRF or OIRF from the innovation in one variable to another tapers off to zero as time goes on, the innovation to the first variable is said to have a transitory effect on the second variable. When the IRF or OIRF does not go to zero, the effect is said to be permanent. Note that, because some of the IRFs and OIRFs do not taper off to zero, some of the cumulative IRFs and OIRFs diverge over time. An introduction to forecast-error variance decompositions for VECMs The results from An introduction to impulse–response functions for VECMs can be used to show that the interpretation of FEVDs for a finite number of steps in cointegrating VARs is essentially the same as in the stationary case. Because the MSE of the forecast is diverging, this interpretation is valid only for a finite number of steps. (See [TS] vec intro and [TS] fcast compute for more information on this point.) IRF results for ARIMA and ARFIMA A covariance-stationary additive ARMA(p, q) model can be written as ρ(Lp )(yt − xt β) = θ(Lq )t where ρ(Lp ) = 1 − ρ1 L − ρ2 L2 − · · · − ρp Lp θ(Lq ) = 1 + θ1 L + θ2 L2 + · · · + θq Lq and Lj yt = yt−j . We can rewrite the above model as an infinite-order moving-average process yt = xt β + ψ(L)t where ψ(L) = θ(L) = 1 + ψ1 L + ψ2 L2 + · · · ρ(L) (6) irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 295 This representation shows the impact of the past innovations on the current yt . The ith coefficient describes the response of yt to a one-time impulse in t−i , holding everything else constant. The ψi coefficients are collectively referred to as the impulse–response function of the ARMA model. For a covariance-stationary series, the ψi coefficients decay exponentially. A covariance-stationary multiplicative seasonal ARMA model, often abbreviated SARMA, of order (p, q) × (P, Q)s can be written as ρ(Lp )ρs (LP )(yt − xt β) = θ(Lq )θs (LQ )t where ρs (LP ) = (1 − ρs,1 Ls − ρs,2 L2s − · · · − ρs,P LP s ) θs (LQ ) = (1 + θs,1 Ls + θs,2 L2s + · · · + θs,Q LQs ) with ρ(Lp ) and θ(Lq ) defined as above. We can express this model as an additive ARMA model by multiplying the terms and imposing nonlinear constraints on multiplied coefficients. For example, consider the SARMA model given by (1 − ρ1 L)(1 − ρ4,1 L4 )yt = t Expanding the above equation and solving for yt yields yt = ρ1 yt−1 + ρ4,1 yt−4 − ρ1 ρ4,1 yt−5 + t or, in ARMA terms, yt = ρ1 yt−1 + ρ4 yt−4 + ρ5 yt−5 + t subject to the constraint ρ5 = −ρ1 ρ4,1 . Once we have obtained an ARMA representation of a SARMA process, we obtain the IRFs from (6). An ARFIMA(p, d, q) model can be written as ρ(Lp )(1 − L)d (yt − xt β) = θ(Lq )t with (1 − L)d denoting a fractional integration operation. Solving for yt , we obtain yt = xt β + (1 − L)−d ψ(L)t This makes it clear that the impulse–response function for an ARFIMA model corresponds to a fractionally differenced impulse–response function for an ARIMA model. Because of the fractional differentiation, the ψi coefficients decay very slowly; see Remarks and examples in [TS] arfima. Methods and formulas Methods and formulas are presented under the following headings: Impulse–response function formulas for VARs Dynamic-multiplier function formulas for VARs Forecast-error variance decomposition formulas for VARs Impulse–response function formulas for VECMs Algorithms for bootstrapping the VAR IRF and FEVD standard errors Impulse–response function formulas for ARIMA and ARFIMA 296 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs Impulse–response function formulas for VARs The previous discussion implies that there are three different choices of P that can be used to obtain distinct Θi . P is the Cholesky decomposition of Σ for the OIRFs. For the structural IRFs, P = A−1 B for short-run models, and P = C for long-run models. We will distinguish between lr the three by defining Θoi to be the OIRFs, Θsr i to be the short-run structural IRFs, and Θi to be the long-run structural IRFs. b c to be the Cholesky decomposition of Σ, b sr = A b −1 B b to be the short-run b P We also define P b lr = C b to be the long-run structural decomposition. structural decomposition, and P b i and Σ b from var or svar, the estimates of the simple IRFs and the Given estimates of the A OIRFs are, respectively, bi = Φ i X bj b i−j A Φ j=1 and bc b oi = Φ b iP Θ b j = 0K for j > p. where A b and B b , or C b , from svar, the estimates of the structural IRFs are either Given the estimates A b sr b b Θ i = Φi Psr or b lr b b Θ i = Φi Plr The estimated structural IRFs stored in an IRF file with the variable name sirf may be from either a short-run model or a long-run model, depending on the estimation results used to create the IRFs. As discussed in [TS] irf describe, you can easily determine whether the structural IRFs were generated from a short-run or a long-run SVAR model using irf describe. Following Lütkepohl (2005, sec. 3.7), estimates of the cumulative IRFs and the cumulative orthogonalized impulse–response functions (COIRFs) at period n are, respectively, bn = Ψ n X bi Φ i=0 and bn = Ξ n X bi Θ i=0 The asymptotic standard errors of the different impulse–response functions are obtained by applications of the delta method. See Lütkepohl (2005, sec. 3.7) and Amisano and Giannini (1997, chap. 4) for the derivations. See Serfling (1980, sec. 3.3) for a discussion of the delta method. In presenting the variance–covariance matrix estimators, we make extensive use of the vec() operator, where vec(X) is the vector obtained by stacking the columns of X. b n ), and Lütkepohl (2005, sec. 3.7) derives the asymptotic VCEs of vec(Φi ), vec(Θoi ), vec(Ψ 2 2 2 b n ). Because vec(Φi ) is K × 1, the asymptotic VCE of vec(Φi ) is K × K , and it is given by vec(Ξ b G0i Gi Σ α b irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs where Gi = Pi−1 m=0 c 0 )(i−1−m) ⊗ Φ bm J(M J = (IK , 0K , . . . , 0K ) b b2 ... A b p−1 A bp  A1 A  IK 0K . . . 0K 0K    c 0 I 0K 0K  M=  .K K . .. ..   .  .. . 0K 0K . . . . . IK 0K 297 Gi is K 2 ×K 2 p J is K×Kp b is Kp×Kp M b i are the estimates of the coefficients on the lagged variables in the VAR, and Σ b is the VCE The A α b 2 2 b 1, . . . , A b p ). Σ b is a K p × K p matrix whose elements come from the VCE matrix of α b = vec(A α b of the VAR coefficient estimator. As such, this VCE is the VCE of the constrained estimator if there are any constraints placed on the VAR coefficients. b n ) after n periods is given by The K 2 × K 2 asymptotic VCE matrix for vec(Ψ b F0n Fn Σ α b where Fn = n X Gi i=1 The K 2 × K 2 asymptotic VCE matrix of the vectorized, orthogonalized, IRFs at horizon i, vec(Θoi ), is b C0i b C0i + Ci Σ Ci Σ b σ α b 298 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs where LK solves C0 = 0 C0 is K 2 ×K 2 p b 0 ⊗ IK )Gi , i = 1, 2, . . . Ci = (P c Ci is K 2 ×K 2 p Ci = (IK ⊗ Φi )H, i = 0, 1, . . . n o−1 b c ⊗ IK )L0 H = L0K LK NK (P K Ci is K 2 ×K 2 vech(F) = LK vec(F) for KK solves LK is K 1 2 (IK 2 + KK ) (K+1) ×K 2 2 DK vech(F) = vec(F)  x11  x21   .   .   .     xK1    x  vech(X) =   22   ..   .     xK2   .   ..  for KK is K 2 ×K 2 NK is K 2 ×K 2 b = 2D+ (Σ b ⊗Σ b )D+ Σ K K b σ −1 0 0 D ) D D+ = (D K K K K DK solves (K+1) 2 F K × K and symmetric KK vec(G) = vec(G0 ) for any K × K matrix G NK = H is K 2 ×K b is Σ bσ K (K+1) (K+1) ×K 2 2 is K D+ K F K × K and symmetric (K+1) ×K 2 2 DK is K 2 ×K (K+1) 2  for X K ×K vech(X) is K (K+1) ×1 2 xKK b is the VCE of vech(Σ). b More details about LK , KK , DK and vech() are available in Note that Σ b σ Lütkepohl (2005, sec. A.12). Finally, as Lütkepohl (2005, 113–114) discusses, D+ K is the Moore– Penrose inverse of DK . As discussed in Amisano and Giannini (1997, chap. 6), the asymptotic standard errors of the structural IRFs are available for short-run SVAR models but not for long-run SVAR models. Following Amisano and Giannini (1997, chap. 5), the asymptotic K 2 × K 2 VCE of the short-run structural IRFs after i periods, when a maximum of h periods are estimated, is the i, i block of n o n o0 e iΣ e 0 + IK ⊗ (JM c i J0 ) Σ(0) IK ⊗ (JM c j J0 ) b (h)ij = G b G Σ j α b irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 299 where e 0 = 0K G n  o e i = Pi−1 P b 0 J(M c 0 )i−1−k ⊗ JM c k J0 G sr k=0 G0 is K 2 ×K 2 p Gi is K 2 ×K 2 p b (0) = Q2 Σ b W Q02 Σ b (0) is Σ b W = Q1 Σ b AB Q01 Σ b W is Σ b sr b0 ⊗ P Q2 = P n sr o b −1 ), (−P b 0−1 ⊗ B−1 ) Q1 = (IK ⊗ B K 2 ×K 2 K 2 ×K 2 Q2 is K 2 ×K 2 Q1 is K 2 ×2K 2 sr b AB is the 2K 2 × 2K 2 VCE of the estimator of vec(A, B). and Σ Dynamic-multiplier function formulas for VARs This section provides the details of how irf create estimates the dynamic-multiplier functions and their asymptotic standard errors. A pth order vector autoregressive model (VAR) with exogenous variables may be written as yt = v + A1 yt−1 + · · · + Ap yt−p + B0 xt + B1 xt−1 + · · · + Bs xt−s + ut where all the notation is the same as above except that the s K × R matrices B1 , B2 , . . . , Bs are explicitly included and s is the number of lags of the R exogenous variables in the model. Lütkepohl (2005) shows that the dynamic-multipliers Di are consistently estimated by b i = Jx A ei B b D x x where Jx = (IK , 0K , . . . , 0K )   c B b M e Ax = e e 0 I b b2 ... B bs  B1 B  0̈ . . . 0̈  b =  0̈. B .. ..  ..  ..  . . . 0̈ 0̈ . . . 0̈  0R 0R . . . 0R 0R  IR 0R . . . 0R 0R    0R 0R  eI =  0R IR  . .. ..  ..  ..  . . . i ∈ {0, 1, . . .} J is K×(Kp+Rs) e x is (Kp+Rs)×(Kp+Rs) A b is Kp×Rs B  0R 0R . . . IR 0R   0 b e Bx = B0 Ï0   e0 = B b 0 0̈0 · · · 0̈0 B 0 Ï0 = [ IR 0R · · · 0R ] eI is Rs×Rs b 0x is R×(Kp+Rs) B e is R×Kp B Ï is R×Rs e is a Rs × Kp matrix of 0s. and 0̈ is a K × R matrix of 0s and 0 300 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs Consistent estimators of the cumulative dynamic-multiplier functions are given by Di = i X bj D j=0 Letting βx = vec(A1 A2 · · · Ap B1 B2 · · · Bs B0 ) and letting Σ b be the asymptotic variance– βx b i is G e iΣ G e0 b covariance estimator (VCE) of βx , Lütkepohl shows that an asymptotic VCE of D bx i β where ! i−1 X 0 i−1−j j 0 j ei = e e J , IR ⊗ Jx A e Jx G Bx A ⊗ Jx A x x x x j=0 Similarly, an asymptotic VCE of Di is P i j=0  P  i ej Σ e0 . G G j j=0 bx β Forecast-error variance decomposition formulas for VARs This section provides details of how irf create estimates the Cholesky FEVD, the structural FEVD, and their standard errors. Beginning with the Cholesky-based forecast-error decompositions, the fraction of the h-step-ahead forecast-error variance of variable j that is attributable to the Cholesky orthogonalized innovations in variable k can be estimated as Ph−1 0 b i ek )2 (ej Θ ω bjk,h = i=0 d j (h) MSE where MSEj (h) is the j th diagonal element of h−1 X b iΣ bΦ b 0i Φ i=0 (See Lütkepohl [2005, 109] for a discussion of this result.) ω bjk,h and MSEj (h) are scalars. The square of the standard error of ω bjk,h is b σ djk,h b α d0jk,h + djk,h Σ djk,h Σ where ( Ph−1 b c ek )(e0 P b0 b iP MSEj (h)(e0j Φ k c ⊗ e0j )Gi ) djk,h = MSE2 (h)2 j djk,h 0 b c ek )2 Ph−1 (e0 Φ b b −(e0j Φi P m=0 j m Σ ⊗ ej )Gm ( Ph−1 b i Pc ek )(e0 ⊗ e0j Φ b i )H = MSEj (h)(e0j Φ k i=0 i=0 djk,h is 1×K 2 p ) b c ek )2 b iP −(e0j Φ Ph−1 0b m=0 (ej Φm b m )DK ⊗ ej Φ 1 MSEj (h)2 G0 = 0 and DK is the K 2 × K{(K + 1)/2} duplication matrix defined previously. djk,h is 1×K (K+1) 2 G0 is K 2 ×K 2 p irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 301 For the structural forecast-error decompositions, we follow Amisano and Giannini (1997, sec. 5.2). They define the matrix of structural forecast-error decompositions at horizon s, when a maximum of h periods are estimated, as c cs = F b −1 M fs W s bs = F s−1 X b sr b sr0 Θ i Θi for s = 1, . . . , h + 1 ! IK i=0 s−1 X sr c fs = bi M Θ b sr Θ i i=0 where is the Hadamard, or element-by-element, product. c s ) is given by The K 2 × K 2 asymptotic VCE of vec(W e s Σ(h)Z e0 Z s b (h) is as derived previously, and where Σ ( es = Z c s) c s) c s ) ∂vec(W ∂vec(W ∂vec(W sr , sr , · · · , b 0 ) ∂vec(Θ b1 ) b sr ∂vec(Θ ∂vec(Θ h) ) n o c s) ∂vec(W b −1 e b sr c 0 b −1 e b sr sr = 2 (IK ⊗ Fs )D(Θj ) − (Ws ⊗ Fs )D(IK )NK (Θj ⊗ IK ) bj ) ∂vec(Θ e If X is an n × n matrix, then D(X) is the n2 × n2 matrix with vec(X) on the diagonal and zeros in all the off-diagonal elements, and NK is as defined previously. Impulse–response function formulas for VECMs We begin by providing the formulas for backing out the estimates of the Ai from the Γi estimated by vec. As discussed in [TS] vec intro, the VAR in (1) can be rewritten as a VECM: ∆yt = v + Πyt−1 + Γ1 ∆yt−1 + Γp−1 ∆yp−2 + t vec estimates Π and the Γi . Johansen (1995, 25) notes that Π= p X A i − IK (6) i=1 where IK is the K -dimensional identity matrix, and Γi = − p X j=i+1 Aj (7) 302 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs Defining Γ = IK − p−1 X Γi i=1 and using (6) and (7) allow us to solve for the Ai as A 1 = Π + Γ 1 + IK Ai = Γi − Γi−1 for i = {2, . . . , p − 1} and Ap = −Γp−1 Using these formulas, we can back out estimates of Ai from the estimates of the Γi and Π produced by vec. Then we simply use the formulas for the IRFs and OIRFs presented in Impulse–response function formulas for VARs. The running sums of the IRFs and OIRFs over the steps within each impulse–response pair are the cumulative IRFs and OIRFs. Algorithms for bootstrapping the VAR IRF and FEVD standard errors irf create offers two bootstrap algorithms for estimating the standard errors of the various IRFs and FEVDs. Both var and svar contain estimators for the coefficients in a VAR that are conditional on the first p observations. The two bootstrap algorithms are also conditional on the first p observations. Specifying the bs option calculates the standard errors by bootstrapping the residuals. For a bootstrap with R repetitions, this method uses the following algorithm: 1. Fit the model and save the estimated parameters. 2. Use the estimated coefficients to calculate the residuals. 3. Repeat steps 3a to 3c R times. 3a. Draw a simple random sample of size T with replacement from the residuals. The random samples are drawn over the K × 1 vectors of residuals. When the tth vector is drawn, all K residuals are selected. This preserves the contemporaneous correlations among the residuals. 3b. Use the p initial observations, the sampled residuals, and the estimated coefficients to construct a new sample dataset. 3c. Fit the model and calculate the different IRFs and FEVDs. 3d. Save these estimates as observation r in the bootstrapped dataset. 4. For each IRF and FEVD, the estimated standard deviation from the R bootstrapped estimates is the estimated standard error of that impulse–response function or forecast-error variance decomposition. Specifying the bsp option estimates the standard errors by a multivariate normal parametric bootstrap. The algorithm for the multivariate normal parametric bootstrap is identical to the one above, with the exception that 3a is replaced by 3a(bsp): 3a(bsp). Draw T pseudovariates from a multivariate normal distribution with covariance matrix b Σ. irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs 303 Impulse–response function formulas for ARIMA and ARFIMA The previous discussion showed that a SARMA process can be rewritten as an ARMA process and that for an ARMA process, we can express ψ(L) in terms of θ(L) and ρ(L), θ(L) ρ(L) ψ(L) = Expanding the above, we obtain ψ0 + ψ1 L + ψ2 L2 + · · · = 1 + θ1 L + θ2 L2 + · · · 1 − ρ1 L − ρ2 L2 − · · · Given the estimate of the autoregressive terms ρ b and the moving-average terms b θ, the IRF is obtained by solving the above equation for the ψ weights. The ψi are calculated using the recursion ψbi = θbi + p X φbj ψbi−j j=1 with ψ0 = 1 and θi = 0 for i > max(p, q + 1). The asymptotic standard errors for the IRF for ARMA are calculated using the delta method; b be the estimate of the see Serfling (1980, sec. 3.3) for a discussion of the delta method. Let Σ b variance–covariance matrix for ρ b and θ, and let Ψ be a matrix of derivatives of ψi with respect to ρ b and b θ. Then the standard errors for ψbi are calculated as b 0 Ψi ΣΨ i The IRF for the ARFIMA(p, d, q) model is obtained by applying the filter (1 − L)−d to ψ(L). The filter is given by Hassler and Kokoszka (2010) as (1 − L)−d = ∞ X bi Li i=0 with b0 = 1 and subsequent bi calculated by the recursion b bbi = d + i − 1 bbi−1 i The resulting IRF is then given by φbi = i X ψbjbbi−j j=0 The asymptotic standard errors for the IRF for ARFIMA are calculated using the delta method. Let b Σ be the estimate of the variance–covariance matrix for ρ b, b θ, and db, and let Φ be a matrix of b b derivatives of φi with respect to ρ b, θ, and d. Then the standard errors for φbi are calculated as b 0 Φi ΣΦ i 304 irf create — Obtain IRFs, dynamic-multiplier functions, and FEVDs References Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer. Christiano, L. J., M. Eichenbaum, and C. L. Evans. 1999. Monetary policy shocks: What have we learned and to what end? In Handbook of Macroeconomics: Volume 1A, ed. J. B. Taylor and M. Woodford. New York: Elsevier. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Hassler, U., and P. Kokoszka. 2010. Impulse responses of fractionally integrated processes with long memory. Econometric Theory 26: 1855–1861. Johansen, S. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Serfling, R. J. 1980. Approximation Theorems of Mathematical Statistics. New York: Wiley. Sims, C. A. 1980. Macroeconomics and reality. Econometrica 48: 1–48. Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101–115. Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title irf ctable — Combined tables of IRFs, dynamic-multiplier functions, and FEVDs Description Options Quick start Remarks and examples Menu Stored results Syntax Also see Description irf ctable makes a table or a combined table of IRF results. A table is made for specified combinations of named IRF results, impulse variables, response variables, and statistics. irf ctable combines these tables into one table, unless separate tables are requested. irf ctable operates on the active IRF file; see [TS] irf set. Quick start Combine tables of an orthogonalized IRF myirf and cumulative IRF mycirf for dependent variables y1 and y2 irf ctable (myirf y1 y2 oirf) (mycirf y1 y2 cirf) As above, but suppress confidence intervals and add a title irf ctable (myirf y1 y2 oirf) (mycirf y1 y2 cirf), noci title("My Title") /// Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > IRF and FEVD analysis 305 > Combined tables 306 irf ctable — Combined tables of IRFs, dynamic-multiplier functions, and FEVDs Syntax irf ctable (spec1 )  (spec2 ) . . .  (specN )   , options  where (speck ) is (irfname impulsevar responsevar stat   , spec options ) irfname is the name of a set of IRF results in the active IRF file. impulsevar should be specified as an endogenous variable for all statistics except dm and cdm; for those, specify as an exogenous variable. responsevar is an endogenous variable name. stat is one or more statistics from the list below: stat Description irf oirf dm cirf coirf cdm fevd sirf sfevd impulse–response function orthogonalized impulse–response function dynamic-multiplier function cumulative impulse–response function cumulative orthogonalized impulse–response function cumulative dynamic-multiplier function Cholesky forecast-error variance decomposition structural impulse–response function structural forecast-error variance decomposition options Description set(filename) noci stderror individual title("text") step(#) level(#) make filename active do not report confidence intervals include standard errors for each statistic make an individual table for each combination use text as overall table title set common maximum step set confidence level; default is level(95) spec options Description noci stderror level(#) do not report confidence intervals include standard errors for each statistic set confidence level; default is level(95) ititle("text") use text as individual subtitle for specific table spec options may be specified within a table specification, globally, or both. When specified in a table specification, the spec options affect only the specification in which they are used. When supplied globally, the spec options affect all table specifications. When specified in both places, options for the table specification take precedence. ititle() does not appear in the dialog box. irf ctable — Combined tables of IRFs, dynamic-multiplier functions, and FEVDs 307 Options set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the active file is used. noci suppresses reporting of the confidence intervals for each statistic. noci is assumed when the model was fit by vec because no confidence intervals were estimated. stderror specifies that standard errors for each statistic also be included in the table. individual places each block, or (speck ), in its own table. By default, irf ctable combines all the blocks into one table. title("text") specifies a title for the table or the set of tables. step(#) specifies the maximum number of steps to use for all tables. By default, each table is constructed using all steps available. level(#) specifies the default confidence level, as a percentage, for confidence intervals, when they are reported. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals. The following option is available with irf ctable but is not shown in the dialog box: ititle("text") specifies an individual subtitle for a specific table. ititle() may be specified only when the individual option is also specified. Remarks and examples If you have not read [TS] irf, please do so. Also see [TS] irf table for a slightly easier to use, but less powerful, table command. irf ctable creates a series of tables from IRF results. The information enclosed within each set of parentheses,   (irfname impulsevar responsevar stat , spec options ) forms a request for a specific table. The first part—irfname impulsevar responsevar—identifies a set of IRF estimates or a set of variance decomposition estimates. The next part—stat—specifies which statistics are to be included in the table. The last part—spec options—includes the noci, level(), and stderror options, and places (or suppresses) additional columns in the table. Each specific table displays the requested statistics corresponding to the specified combination of irfname, impulsevar, and responsevar over the step horizon. By default, all the individual tables are combined into one table. Also by default, all the steps, or periods, available are included in the table. You can use the step() option to impose a common maximum for all tables. Example 1 In example 1 of [TS] irf table, we fit a model using var and we saved the IRFs for two different orderings. The commands we used were . . . . . use var irf irf irf http://www.stata-press.com/data/r14/lutkepohl2 dln_inv dln_inc dln_consump set results4 create ordera, step(8) create orderb, order(dln_inc dln_inv dln_consump) step(8) 308 irf ctable — Combined tables of IRFs, dynamic-multiplier functions, and FEVDs We then formed the desired table by typing . irf table oirf fevd, impulse(dln_inc) response(dln_consump) noci std > title("Ordera versus orderb") Using irf ctable, we can form the equivalent table by typing . irf ctable (ordera dln_inc dln_consump oirf fevd) > (orderb dln_inc dln_consump oirf fevd), > noci std title("Ordera versus orderb") Ordera versus orderb step 0 1 2 3 4 5 6 7 8 step 0 1 2 3 4 5 6 7 8 (1) oirf (1) S.E. (1) fevd (1) S.E. .005123 .001635 .002948 -.000221 .000811 .000462 .000044 .000151 .000091 .000878 .000984 .000993 .000662 .000586 .000333 .000275 .000162 .000114 0 .288494 .294288 .322454 .319227 .322579 .323552 .323383 .323499 0 .077483 .073722 .075562 .074063 .075019 .075371 .075314 .075386 (2) oirf (2) S.E. (2) fevd (2) S.E. .005461 .001578 .003307 -.00019 .000846 .000491 .000069 .000158 .000096 .000925 .000988 .001042 .000676 .000617 .000349 .000292 .000172 .000122 0 .327807 .328795 .370775 .366896 .370399 .371487 .371315 .371438 0 .08159 .077519 .080604 .079019 .079941 .080323 .080287 .080366 (1) irfname = ordera, impulse = dln_inc, and response = dln_consump (2) irfname = orderb, impulse = dln_inc, and response = dln_consump The output is displayed in one table. Because the table did not fit horizontally, it automatically wrapped. At the bottom of the table is a list of keys that appear at the top of each column. The results in the table above indicate that the orthogonalized IRFs do not change by much. Because the estimated forecast-error variances do change, we might want to produce two tables that contain the estimated forecast-error variance decompositions and their 95% confidence intervals: irf ctable — Combined tables of IRFs, dynamic-multiplier functions, and FEVDs 309 . irf ctable (ordera dln_inc dln_consump fevd) > (orderb dln_inc dln_consump fevd), individual Table 1 step 0 1 2 3 4 5 6 7 8 (1) fevd (1) Lower (1) Upper 0 .288494 .294288 .322454 .319227 .322579 .323552 .323383 .323499 0 .13663 .149797 .174356 .174066 .175544 .175826 .17577 .175744 0 .440357 .43878 .470552 .464389 .469613 .471277 .470995 .471253 95% lower and upper bounds reported (1) irfname = ordera, impulse = dln_inc, and response = dln_consump Table 2 step 0 1 2 3 4 5 6 7 8 (2) fevd (2) Lower (2) Upper 0 .327807 .328795 .370775 .366896 .370399 .371487 .371315 .371438 0 .167893 .17686 .212794 .212022 .213718 .214058 .213956 .213923 0 .487721 .48073 .528757 .52177 .52708 .528917 .528674 .528953 95% lower and upper bounds reported (2) irfname = orderb, impulse = dln_inc, and response = dln_consump Because we specified the individual option, the output contains two tables, one for each specific table command. At the bottom of each table is a list of the keys used in that table and a note indicating the level of the confidence intervals that we requested. The results from table 1 and table 2 indicate that each estimated function is well within the confidence interval of the other, so we conclude that the functions are not significantly different. 310 irf ctable — Combined tables of IRFs, dynamic-multiplier functions, and FEVDs Stored results irf ctable stores the following in r(): Scalars r(ncols) r(k umax) r(k) Macros r(key#) r(tnotes) number of columns in all tables number of distinct keys number of specific table commands #th key list of keys applied to each column Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title irf describe — Describe an IRF file Description Options Quick start Remarks and examples Menu Stored results Syntax Also see Description irf describe describes the specification of the estimation command and the specification of the IRF used to create the IRF results that are saved in an IRF file. Quick start Short summary of all IRF results in the active IRF file irf describe Summary of model and IRF specification for irf1 in the active IRF file irf describe irf1 As above, but for irf1 in IRF file myirf.irf irf describe irf1, using(myirf) As above, and also set myirf.irf as the active IRF file irf describe irf1, set(myirf) Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > Manage IRF results and files 311 > Describe IRF file 312 irf describe — Describe an IRF file Syntax irf describe  irf resultslist   , options  options Description set(filename) using(irf filename) detail variables make filename active describe irf filename without making active show additional details of IRF results show underlying structure of the IRF dataset Options set(filename) specifies the IRF file to be described and set; see [TS] irf set. If filename is specified without an extension, .irf is assumed. using(irf filename) specifies the IRF file to be described. The active IRF file, if any, remains unchanged. If irf filename is specified without an extension, .irf is assumed. detail specifies that irf describe display detailed information about each set of IRF results. detail is implied when irf resultslist is specified. variables is a programmer’s option; additionally displays the output produced by the describe command. Remarks and examples If you have not read [TS] irf, please do so. irf describe specified without irf resultslist provides a short summary of the model used to create each set of results in an IRF file. If irf resultslist is specified, then irf describe provides details of the model specification and the IRF specification used to create each set of IRF results. If set() or using() is not specified, the IRF results of the active IRF file are described. Example 1 . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk (output omitted ) irf describe — Describe an IRF file We create three sets of IRF results: . irf create order1, set(myirfs, replace) (file myirfs.irf created) (file myirfs.irf now active) (file myirfs.irf updated) . irf create order2, order(dln_inc dln_inv dln_consump) (file myirfs.irf updated) . irf create order3, order(dln_inc dln_consump dln_inv) (file myirfs.irf updated) . irf describe Contains irf results from myirfs.irf (dated 11 Nov 2014 09:22) irfname order1 order2 order3 model endogenous variables and order (*) var var var dln_inv dln_inc dln_consump dln_inc dln_inv dln_consump dln_inc dln_consump dln_inv (*) order is relevant only when model is var The output reveals the order in which we specified the variables. . irf describe order1 irf results for order1 Estimation specification model: var endog: dln_inv dln_inc dln_consump sample: quarterly data from 1960q4 to 1978q4 lags: 1 2 constant: constant exog: none exogvars: none exlags: none varcns: unconstrained IRF specification step: 8 order: dln_inv dln_inc dln_consump std error: asymptotic reps: none Here we see a summary of the model we fit as well as the specification of the IRFs. 313 314 irf describe — Describe an IRF file Stored results irf describe stores the following in r(): Scalars r(N) r(k) r(width) r(N max) r(k max) r(widthmax) r(changed) Macros r( version) r(irfnames) r(irfname model) r(irfname order) r(irfname exog) r(irfname exogvar) r(irfname constant) r(irfname lags) r(irfname exlags) r(irfname tmin) r(irfname tmax) r(irfname timevar) r(irfname tsfmt) r(irfname varcns) r(irfname svarcns) r(irfname step) r(irfname stderror) r(irfname r(irfname r(irfname r(irfname r(irfname r(irfname reps) version) rank) trend) veccns) sind) number of observations in the IRF file number of variables in the IRF file width of dataset in the IRF file maximum number of observations maximum number of variables maximum width of the dataset flag indicating that data have changed since last saved version of IRF results file names of IRF results in the IRF file var, sr var, lr var, or vec Cholesky order assumed in IRF estimates exogenous variables, and their lags, in VAR or underlying VAR exogenous variables in VAR or underlying VAR constant or noconstant lags in model lags of exogenous variables in model minimum value of timevar in the estimation sample maximum value of timevar in the estimation sample name of tsset timevar format of timevar in the estimation sample unconstrained or colon-separated list of constraints placed on VAR coefficients "." or colon-separated list of constraints placed on SVAR coefficients maximum step in IRF estimates asymptotic, bs, bsp, or none, depending on type of standard errors specified to irf create "." or number of bootstrap replications performed version of IRF file that originally held irfname IRF results "." or number of cointegrating equations "." or trend() specified in vec "." or constraints placed on VECM parameters "." or normalized seasonal indicators included in vec Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title irf drop — Drop IRF results from the active IRF file Description Option Quick start Remarks and examples Menu Also see Syntax Description irf drop removes IRF results from the active IRF file. Quick start Drop impulse–response functions irf1 and irf2 from the active IRF file irf drop irf1 irf2 Drop irf1 and irf2 from the IRF file myirfs.irf irf drop irf1 irf2, set(myirfs) Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > Manage IRF results and files 315 > Drop IRF results 316 irf drop — Drop IRF results from the active IRF file Syntax irf drop irf resultslist  , set(filename)  Option set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the active file is used. Remarks and examples If you have not read [TS] irf, please do so. Example 1 . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk (output omitted ) We create three sets of IRF results: . irf (file (file (file . irf (file . irf (file create order1, set(myirfs, replace) myirfs.irf created) myirfs.irf now active) myirfs.irf updated) create order2, order(dln_inc dln_inv dln_consump) myirfs.irf updated) create order3, order(dln_inc dln_consump dln_inv) myirfs.irf updated) . irf describe Contains irf results from myirfs.irf (dated 11 Nov 2014 09:22) model endogenous variables and order (*) irfname order1 order2 order3 var var var dln_inv dln_inc dln_consump dln_inc dln_inv dln_consump dln_inc dln_consump dln_inv (*) order is relevant only when model is var irf drop — Drop IRF results from the active IRF file Now let’s remove order1 and order2 from myirfs.irf. . irf drop order1 order2 (order1 dropped) (order2 dropped) file myirfs.irf updated . irf describe Contains irf results from myirfs.irf (dated 11 Nov 2014 09:22) model endogenous variables and order (*) irfname order3 var dln_inc dln_consump dln_inv (*) order is relevant only when model is var order1 and order2 have been dropped. Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models 317 Title irf graph — Graphs of IRFs, dynamic-multiplier functions, and FEVDs Description Options Quick start Remarks and examples Menu Stored results Syntax Also see Description irf graph graphs impulse–response functions (IRFs), dynamic-multiplier functions, and forecasterror variance decompositions (FEVDs) over time. Quick start Graph impulse–response function for dependent variables y1 and y2 given an unexpected shock to y1 irf graph irf, impulse(y1) response(y2) As above, but for orthogonalized shocks irf graph oirf, impulse(y1) response(y2) As above, but begin the plot with the third forecast period irf graph oirf, impulse(y1) response(y2) lstep(3) As above, but with a separate graph for each IRF in the current IRF file irf graph oirf, impulse(y1) response(y2) lstep(3) individual Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > IRF and FEVD analysis 318 > Graphs by impulse or response irf graph — Graphs of IRFs, dynamic-multiplier functions, and FEVDs Syntax irf graph stat  , options  stat Description irf oirf dm cirf coirf cdm fevd sirf sfevd impulse–response function orthogonalized impulse–response function dynamic-multiplier function cumulative impulse–response function cumulative orthogonalized impulse–response function cumulative dynamic-multiplier function Cholesky forecast-error variance decomposition structural impulse–response function structural forecast-error variance decomposition Notes: 1. No statistic may appear more than once. 2. If confidence intervals are included (the default), only two statistics may be included. 3. If confidence intervals are suppressed (option noci), up to four statistics may be included. options Description Main set(filename) irf(irfnames) impulse(impulsevar) response(endogvars) noci level(#) lstep(#) ustep(#) make filename active use irfnames IRF result sets use impulsevar as impulse variables use endogenous variables as response variables suppress confidence bands set confidence level; default is level(95) use # for first step use # for maximum step Advanced individual   iname(namestub , replace )   isaving(filenamestub , replace ) graph each combination individually stub for naming the individual graphs stub for saving the individual graphs to files Plots plot#opts(cline options) affect rendition of the line plotting the # stat CI plots ci#opts(area options) affect rendition of the confidence interval for the # stat Y axis, X axis, Titles, Legend, Overall twoway options byopts(by option) any options other than by() documented in [G-3] twoway options how subgraphs are combined, labeled, etc. 319 320 irf graph — Graphs of IRFs, dynamic-multiplier functions, and FEVDs Options   Main set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the active file is used. irf(irfnames) specifies the IRF result sets to be used. If irf() is not specified, each of the results in the active IRF file is used. (Files often contain just one set of IRF results saved under one irfname; in that case, those results are used.) impulse(impulsevar) and response(endogvars) specify the impulse and response variables. Usually one of each is specified, and one graph is drawn. If multiple variables are specified, a separate subgraph is drawn for each impulse–response combination. If impulse() and response() are not specified, subgraphs are drawn for all combinations of impulse and response variables. impulsevar should be specified as an endogenous variable for all statistics except dm or cdm; for those, specify as an exogenous variable. noci suppresses graphing the confidence interval for each statistic. noci is assumed when the model was fit by vec because no confidence intervals were estimated. level(#) specifies the default confidence level, as a percentage, for confidence intervals, when they are reported. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals. Also see [TS] irf cgraph for a graph command that allows the confidence level to vary over the graphs. lstep(#) specifies the first step, or period, to be included in the graphs. lstep(0) is the default. ustep(#), # ≥ 1, specifies the maximum step, or period, to be included in the graphs.   Advanced individual specifies that each graph be displayed individually. By default, irf graph combines the subgraphs into one image. When individual is specified, byopts() may not be specified, but the isaving() and iname() options may be specified.   iname(namestub , replace ) specifies that the ith individual graph be stored in memory under the name namestubi, which must be a valid Stata name of 24 characters or fewer. iname() may be specified only with the individual option.   isaving(filenamestub , replace ) specifies that the ith individual graph should be saved to disk in the current working directory under the name filenamestubi.gph. isaving() may be specified only when the individual option is also specified.   Plots plot1opts(cline options), . . . , plot4opts(cline options) affect the rendition of the plotted statistics (the stat). plot1opts() affects the rendition of the first statistic; plot2opts(), the second; and so on. cline options are as described in [G-3] cline options.   CI plots ci1opts(area options) and ci2opts(area options) affect the rendition of the confidence intervals for the first (ci1opts()) and second (ci2opts()) statistics in stat. area options are as described in [G-3] area options. irf graph — Graphs of IRFs, dynamic-multiplier functions, and FEVDs  321  Y axis, X axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, excluding by(). These include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see [G-3] saving option). The saving() and name() options may not be combined with the individual option. byopts(by option) is as documented in [G-3] by option and may not be specified when individual is specified. byopts() affects how the subgraphs are combined, labeled, etc. Remarks and examples If you have not read [TS] irf, please do so. Also see [TS] irf cgraph, which produces combined graphs; [TS] irf ograph, which produces overlaid graphs; and [TS] irf table, which displays results in tabular form. irf graph produces one or more graphs and displays them arrayed into one image unless the individual option is specified, in which case the individual graphs are displayed separately. Each individual graph consists of all the specified stat and represents one impulse–response combination. Because all the specified stat appear on the same graph, putting together statistics with very different scales is not recommended. For instance, sometimes sirf and oirf are on similar scales while irf is on a different scale. In such cases, combining sirf and oirf on the same graph looks fine, but combining either with irf produces an uninformative graph. Example 1 Suppose that we have results generated from two different SVAR models. We want to know whether the shapes of the structural IRFs and the structural FEVDs are similar in the two models. We are also interested in knowing whether the structural IRFs and the structural FEVDs differ significantly from their Cholesky counterparts. Filling in the background, we have previously issued the commands: . . . . . . . use http://www.stata-press.com/data/r14/lutkepohl2 mat a = (., 0, 0\0,.,0\.,.,.) mat b = I(3) svar dln_inv dln_inc dln_consump, aeq(a) beq(b) irf create modela, set(results3) step(8) svar dln_inc dln_inv dln_consump, aeq(a) beq(b) irf create modelb, step(8) 322 irf graph — Graphs of IRFs, dynamic-multiplier functions, and FEVDs To see whether the shapes of the structural IRFs and the structural FEVDs are similar in the two models, we type . irf graph oirf sirf, impulse(dln_inc) response(dln_consump) modela, dln_inc, dln_consump modelb, dln_inc, dln_consump .01 .005 0 −.005 0 2 4 6 8 0 2 4 6 8 step 95% CI for oirf orthogonalized irf 95% CI for sirf structural irf Graphs by irfname, impulse variable, and response variable The graph reveals that the oirf and the sirf estimates are essentially the same for both models and that the shapes of the functions are very similar for the two models. To see whether the structural IRFs and the structural FEVDs differ significantly from their Cholesky counterparts, we type . irf graph fevd sfevd, impulse(dln_inc) response(dln_consump) lstep(1) > legend(cols(1)) modela, dln_inc, dln_consump modelb, dln_inc, dln_consump .5 .4 .3 .2 .1 0 2 4 6 8 0 2 4 step 95% CI for fevd 95% CI for sfevd fraction of mse due to impulse (structural) fraction of mse due to impulse Graphs by irfname, impulse variable, and response variable 6 8 irf graph — Graphs of IRFs, dynamic-multiplier functions, and FEVDs 323 This combined graph reveals that the shapes of these functions are also similar for the two models. However, the graph illuminates one minor difference between them: In modela, the estimated structural FEVD is slightly larger than the Cholesky-based estimates, whereas in modelb the Cholesky-based estimates are slightly larger than the structural estimates. For both models, however, the structural estimates are close to the center of the wide confidence intervals for the two estimates. Example 2 Let’s focus on the results from modela. Suppose that we were interested in examining how dln consump responded to impulses in its own structural innovations, structural innovations to dln inc, and structural innovations to dln inv. We type . irf graph sirf, irf(modela) response(dln_consump) modela, dln_consump, dln_consump modela, dln_inc, dln_consump .01 .005 0 −.005 0 2 4 6 8 modela, dln_inv, dln_consump .01 .005 0 −.005 0 2 4 6 8 step 95% CI structural irf Graphs by irfname, impulse variable, and response variable The upper-left graph shows the structural IRF of an innovation in dln consump on dln consump. It indicates that the identification restrictions used in modela imply that a positive shock to dln consump causes an increase in dln consump, followed by a decrease, followed by an increase, and so on, until the effect dies out after roughly 5 periods. The upper-right graph shows the structural IRF of an innovation in dln inc on dln consump, indicating that a positive shock to dln inc causes an increase in dln consump, which dies out after 4 or 5 periods. Technical note [TS] irf table contains a technical note warning you to be careful in naming variables when you fit models. What is said there applies equally here. 324 irf graph — Graphs of IRFs, dynamic-multiplier functions, and FEVDs Stored results irf graph stores the following in r(): Scalars r(k) Macros r(stats) r(irfname) r(impulse) r(response) r(plot#) r(ci) r(ciopts#) number of graphs statlist resultslist impulselist responselist contents of plot#opts() level applied to confidence intervals or noci contents of ci#opts() r(byopts) r(saving) r(name) r(individual) r(isaving) r(iname) r(subtitle#) contents of byopts() supplied saving() option supplied name() option individual or blank contents of saving() contents of name() subtitle for individual graph # Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title irf ograph — Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs Description Options Quick start Remarks and examples Menu Stored results Syntax Also see Description irf ograph displays plots of irf results on one graph (one pair of axes). To become familiar with this command, type db irf ograph. Quick start Graph of an orthogonalized IRF myirf overlayed on cumulative IRF mycirf for dependent variables y1 and y2 irf ograph (myirf y1 y2 oirf) (mycirf y1 y2 cirf) As above, and include confidence bands and add a title irf cgraph (myirf y1 y2 oirf) (mycirf y1 y2 cirf), ci title("My Title") /// Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > IRF and FEVD analysis 325 > Overlaid graph 326 irf ograph — Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs Syntax irf ograph (spec1 )      (spec2 ) . . . (spec15 ) , options where (speck ) is (irfname impulsevar responsevar stat   , spec options ) irfname is the name of a set of IRF results in the active IRF file or “.”, which means the first named result in the active IRF file. impulsevar should be specified as an endogenous variable for all statistics except dm and cdm; for those, specify as an exogenous variable. responsevar is an endogenous variable name. stat is one or more statistics from the list below: stat Description irf oirf dm cirf coirf cdm fevd sirf sfevd impulse–response function orthogonalized impulse–response function dynamic-multiplier function cumulative impulse–response function cumulative orthogonalized impulse–response function cumulative dynamic-multiplier function Cholesky forecast-error variance decomposition structural impulse–response function structural forecast-error variance decomposition options Description Plots plot options set(filename) define the IRF plots make filename active Options common options level and steps Y axis, X axis, Titles, Legend, Overall twoway options any options other than by() documented in [G-3] twoway options plot options Description Main set(filename) irf(irfnames) impulse(impulsevar) response(endogvars) ci make filename active use irfnames IRF result sets use impulsevar as impulse variables use endogenous variables as response variables add confidence bands to the graph irf ograph — Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs spec options 327 Description Options common options level and steps Plot cline options affect rendition of the plotted lines CI plot ciopts(area options) affect rendition of the confidence intervals common options Description Options level(#) lstep(#) ustep(#) set confidence level; default is level(95) use # for first step use # for maximum step common options may be specified within a plot specification, globally, or in both. When specified in a plot specification, the common options affect only the specification in which they are used. When supplied globally, the common options affect all plot specifications. When supplied in both places, options in the plot specification take precedence. Options   Plots plot options defines the IRF plots and are found under the Main, Plot, and CI plot tabs. set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the active file is used.   Main set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the active file is used. irf(irfnames) specifies the IRF result sets to be used. If irf() is not specified, each of the results in the active IRF file is used. (Files often contain just one set of IRF results saved under one irfname; in that case, those results are used.) impulse(varlist) and response(endogvars) specify the impulse and response variables. Usually one of each is specified, and one graph is drawn. If multiple variables are specified, a separate subgraph is drawn for each impulse–response combination. If impulse() and response() are not specified, subgraphs are drawn for all combinations of impulse and response variables. ci adds confidence bands to the graph. The noci option may be used within a plot specification to suppress its confidence bands when the ci option is supplied globally.   Plot cline options affect the rendition of the plotted lines; see [G-3] cline options. 328  irf ograph — Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs  CI plot ciopts(area options) affects the rendition of the confidence bands for the plotted statistic; see [G-3] area options. ciopts() implies ci.   Options level(#) specifies the confidence level, as a percentage, for confidence bands; see [U] 20.7 Specifying the width of confidence intervals. lstep(#) specifies the first step, or period, to be included in the graph. lstep(0) is the default. ustep(#), # ≥ 1, specifies the maximum step, or period, to be included.   Y axis, X axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, excluding by(). These include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see [G-3] saving option). Remarks and examples If you have not read [TS] irf, please do so. irf ograph overlays plots of IRFs and FEVDs on one graph. Example 1 We have previously issued the commands: . . . . use var irf irf http://www.stata-press.com/data/r14/lutkepohl2 dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk create order1, step(10) set(myirf1, new) create order2, step(10) order(dln_inc dln_inv dln_consump) irf ograph — Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs 329 We now wish to compare the oirf for impulse dln inc and response dln consump for two different Cholesky orderings: −.002 0 .002 .004 .006 . irf ograph (order1 dln_inc dln_consump oirf) > (order2 dln_inc dln_consump oirf) 0 2 4 6 8 10 step order1: oirf of dln_inc −> dln_consump order2: oirf of dln_inc −> dln_consump Technical note Graph options allow you to change the appearance of each plot. The following graph contains the plots of the FEVDs (FEVDs) for impulse dln inc and each response using the results from the first collection of results in the active IRF file (using the “.” shortcut). In the second plot, we supply the clpat(dash) option (an abbreviation for clpattern(dash)) to give the line a dashed pattern. In the third plot, we supply the m(o) clpat(dash dot) recast(connected) options to get small circles connected by a line with a dash–dot pattern; the cilines option plots the confidence bands by using lines instead of areas. We use the title() option to add a descriptive title to the graph and supply the ci option globally to add confidence bands to all the plots. 330 irf ograph — Overlaid graphs of IRFs, dynamic-multiplier functions, and FEVDs . irf ograph (. dln_inc dln_inc fevd) > (. dln_inc dln_consump fevd, clpat(dash)) > (. dln_inc dln_inv fevd, cilines m(o) clpat(dash_dot) > recast(connected)) > , ci title("Comparison of forecast-error variance decomposition") 0 .2 .4 .6 .8 1 Comparison of forecast−error variance decomposition 0 2 4 6 8 10 step 95% CI of fevd of dln_inc −> dln_inc 95% CI of fevd of dln_inc −> dln_consump 95% CI of fevd of dln_inc −> dln_inv fevd of dln_inc −> dln_inc fevd of dln_inc −> dln_consump fevd of dln_inc −> dln_inv The clpattern() option is described in [G-3] connect options, msymbol() is described in [G-3] marker options, title() is described in [G-3] title options, and recast() is described in [G-3] advanced options. Stored results irf ograph stores the following in r(): Scalars r(plots) r(ciplots) Macros r(irfname#) r(impulse#) r(response#) r(stat#) r(ci#) number of plot specifications number of plotted confidence bands irfname from (spec#) impulse from (spec#) response from (spec#) statistics from (spec#) level from (spec#) or noci Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title irf rename — Rename an IRF result in an IRF file Description Option Quick start Remarks and examples Menu Stored results Syntax Also see Description irf rename changes the name of a set of IRF results saved in the active IRF file. Quick start Rename impulse–response function oldirf in the current file to newirf irf rename oldirf newirf As above, but for IRF file myirfs.irf irf rename oldirf newirf, set(myirfs) Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > Manage IRF results and files > Rename IRF results Syntax irf rename oldname newname  , set(filename)  Option set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the active file is used. Remarks and examples If you have not read [TS] irf, please do so. Example 1 . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk (output omitted ) 331 332 irf rename — Rename an IRF result in an IRF file We create three sets of IRF results: . irf create original, set(myirfs, replace) (file myirfs.irf created) (file myirfs.irf now active) (file myirfs.irf updated) . irf create order2, order(dln_inc dln_inv dln_consump) (file myirfs.irf updated) . irf create order3, order(dln_inc dln_consump dln_inv) (file myirfs.irf updated) . irf describe Contains irf results from myirfs.irf (dated 11 Nov 2014 09:22) irfname original order2 order3 model endogenous variables and order (*) var var var dln_inv dln_inc dln_consump dln_inc dln_inv dln_consump dln_inc dln_consump dln_inv (*) order is relevant only when model is var Now let’s rename IRF result original to order1. . irf rename original order1 (81 real changes made) original renamed to order1 . irf describe Contains irf results from myirfs.irf (dated 11 Nov 2014 09:22) irfname model endogenous variables and order (*) order1 order2 order3 var var var dln_inv dln_inc dln_consump dln_inc dln_inv dln_consump dln_inc dln_consump dln_inv (*) order is relevant only when model is var original has been renamed to order1. Stored results irf rename stores the following in r(): Macros r(irfnames) r(oldnew) irfnames after rename oldname newname Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title irf set — Set the active IRF file Description Options Quick start Remarks and examples Menu Stored results Syntax Also see Description irf set without arguments reports the identity of the active IRF file, if there is one. irf set with a filename specifies that the file be created and set as the active file. irf set, clear specifies that, if any IRF file is set, it be unset and that there be no active IRF file. Quick start Display filename of active IRF file irf set Set file myirfs.irf as the active file and create it if it does not exist irf set myirfs Set file myirfs.irf as the active file, but replace myirfs.irf if it exists irf set myirfs, replace Clear the active IRF file so that no files are active irf set, clear Note: irf commands can be used after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > Manage IRF results and files 333 > Set active IRF file 334 irf set — Set the active IRF file Syntax Report identity of active file irf set Set, and if necessary create, active file   irf set irf filename , replace Clear any active IRF file irf set, clear If irf filename is specified without an extension, .irf is assumed. Options replace specifies that if irf filename already exists, the file is to be erased and a new, empty IRF file is to be created in its place. If it does not already exist, a new, empty file is created. clear unsets the active IRF file. Remarks and examples If you have not read [TS] irf, please do so. irf set reports the identity of the active IRF file: . irf set no irf file active irf set irf filename creates and sets an IRF file: . irf set results1 (file results1.irf now active) We specified the name results1, and results1.irf became the active file. irf set irf filename can also be used to create a new file: . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inc dln_consump, exog(l.dln_inv) (output omitted ) . irf set results2 (file results2.irf created) (file results2.irf now active) . irf create order1 (file results2.irf updated) irf set — Set the active IRF file Stored results irf set stores the following in r(): Macros r(Orville) name of active IRF file, if there is an active IRF Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] irf describe — Describe an IRF file [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models 335 Title irf table — Tables of IRFs, dynamic-multiplier functions, and FEVDs Description Options Quick start Remarks and examples Menu Stored results Syntax Also see Description irf table makes a table of the values of the requested statistics at each time since impulse. Each column represents a combination of an impulse variable and a response variable for each statistic from the named IRF results. Quick start Table of impulse–response function for dependent variables y1 and y2 given an unexpected shock to y1 irf table irf, impulse(y1) response(y2) As above, but for orthogonalized shocks irf table oirf, impulse(y1) response(y2) As above, but with 3 as the common maximum step horizon for all tables irf table oirf, impulse(y1) response(y2) step(3) As above, but with a separate table for each IRF in the active IRF file irf table oirf, impulse(y1) response(y2) step(3) individual Note: irf commands may be used only after var, svar, vec, arima, or arfima; see [TS] var, [TS] var svar, [TS] vec, [TS] arima, or [TS] arfima. Menu Statistics > Multivariate time series > IRF and FEVD analysis 336 > Tables by impulse or response irf table — Tables of IRFs, dynamic-multiplier functions, and FEVDs 337 Syntax irf table  stat   , options  Description stat Main impulse–response function orthogonalized impulse–response function dynamic-multiplier function cumulative impulse–response function cumulative orthogonalized impulse–response function cumulative dynamic-multiplier function Cholesky forecast-error variance decomposition structural impulse–response function structural forecast-error variance decomposition irf oirf dm cirf coirf cdm fevd sirf sfevd If stat is not specified, all statistics are included, unless option nostructural is also specified, in which case sirf and sfevd are excluded. You may specify more than one stat. Description options Main set(filename) irf(irfnames) impulse(impulsevar) response(endogvars) individual title("text") make filename active use irfnames IRF result sets use impulsevar as impulse variables use endogenous variables as response variables make an individual table for each result set use text for overall table title Options level(#) noci stderror nostructural step(#) set confidence level; default is level(95) suppress confidence intervals include standard errors in the tables suppress sirf and sfevd from the default list of statistics use common maximum step horizon # for all tables Options   Main set(filename) specifies the file to be made active; see [TS] irf set. If set() is not specified, the active file is used. All results are obtained from one IRF file. If you have results in different files that you want in one table, use irf add to copy results into one file; see [TS] irf add. irf(irfnames) specifies the IRF result sets to be used. If irf() is not specified, all the results in the active IRF file are used. (Files often contain just one set of IRF results, saved under one irfname; in that case, those results are used. When there are multiple IRF results, you may also wish to specify the individual option.) 338 irf table — Tables of IRFs, dynamic-multiplier functions, and FEVDs impulse(impulsevar) specifies the impulse variables for which the statistics are to be reported. If impulse() is not specified, each model variable, in turn, is used. impulsevar should be specified as an endogenous variable for all statistics except dm or cdm; for those, specify as an exogenous variable. response(endogvars) specifies the response variables for which the statistics are to be reported. If response() is not specified, each endogenous variable, in turn, is used. individual specifies that each set of IRF results be placed in its own table, with its own title and footer. By default, irf table places all the IRF results in one table with one title and one footer. individual may not be combined with title(). title("text") specifies a title for the overall table.   Options level(#) specifies the default confidence level, as a percentage, for confidence intervals, when they are reported. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals. noci suppresses reporting of the confidence intervals for each statistic. noci is assumed when the model was fit by vec because no confidence intervals were estimated. stderror specifies that standard errors for each statistic also be included in the table. nostructural specifies that stat, when not specified, exclude sirf and sfevd. step(#) specifies the maximum step horizon for all tables. If step() is not specified, each table is constructed using all steps available. Remarks and examples If you have not read [TS] irf, please do so. Also see [TS] irf graph, which produces output in graphical form, and see [TS] irf ctable, which also produces tabular output. irf ctable is more difficult to use but provides more control over how tables are formed. Example 1 We have fit a model with var, and we saved the IRFs from two different orderings. The commands we previously used were . . . . . use var irf irf irf http://www.stata-press.com/data/r14/lutkepohl2 dln_inv dln_inc dln_consump set results4 create ordera, step(8) create orderb, order(dln_inc dln_inv dln_consump) step(8) irf table — Tables of IRFs, dynamic-multiplier functions, and FEVDs 339 We now wish to compare the two orderings: . irf table oirf fevd, impulse(dln_inc) response(dln_consump) noci std > title("Ordera versus orderb") Ordera versus orderb step 0 1 2 3 4 5 6 7 8 step 0 1 2 3 4 5 6 7 8 (1) oirf (1) S.E. (1) fevd (1) S.E. .005123 .001635 .002948 -.000221 .000811 .000462 .000044 .000151 .000091 .000878 .000984 .000993 .000662 .000586 .000333 .000275 .000162 .000114 0 .288494 .294288 .322454 .319227 .322579 .323552 .323383 .323499 0 .077483 .073722 .075562 .074063 .075019 .075371 .075314 .075386 (2) oirf (2) S.E. (2) fevd (2) S.E. .005461 .001578 .003307 -.00019 .000846 .000491 .000069 .000158 .000096 .000925 .000988 .001042 .000676 .000617 .000349 .000292 .000172 .000122 0 .327807 .328795 .370775 .366896 .370399 .371487 .371315 .371438 0 .08159 .077519 .080604 .079019 .079941 .080323 .080287 .080366 (1) irfname = ordera, impulse = dln_inc, and response = dln_consump (2) irfname = orderb, impulse = dln_inc, and response = dln_consump The output is displayed as a “single” table; because the table did not fit horizontally, it wrapped automatically. At the bottom of the table is a definition of the keys that appear at the top of each column. The results in the table above indicate that the orthogonalized IRFs do not change by much. 340 irf table — Tables of IRFs, dynamic-multiplier functions, and FEVDs Example 2 Because the estimated FEVDs do change significantly, we might want to produce two tables that contain the estimated FEVDs and their 95% confidence intervals: . irf table fevd, impulse(dln_inc) response(dln_consump) individual Results from ordera step 0 1 2 3 4 5 6 7 8 (1) fevd (1) Lower (1) Upper 0 .288494 .294288 .322454 .319227 .322579 .323552 .323383 .323499 0 .13663 .149797 .174356 .174066 .175544 .175826 .17577 .175744 0 .440357 .43878 .470552 .464389 .469613 .471277 .470995 .471253 95% lower and upper bounds reported (1) irfname = ordera, impulse = dln_inc, and response = dln_consump Results from orderb step 0 1 2 3 4 5 6 7 8 (1) fevd (1) Lower (1) Upper 0 .327807 .328795 .370775 .366896 .370399 .371487 .371315 .371438 0 .167893 .17686 .212794 .212022 .213718 .214058 .213956 .213923 0 .487721 .48073 .528757 .52177 .52708 .528917 .528674 .528953 95% lower and upper bounds reported (1) irfname = orderb, impulse = dln_inc, and response = dln_consump Because we specified the individual option, the output contains two tables, one for each set of IRF results. Examining the results in the tables indicates that each of the estimated functions is well within the confidence interval of the other, so we conclude that the functions are not significantly different. Technical note Be careful in how you name variables when you fit models. Say that you fit one model with var and used time-series operators to form one of the endogenous variables . var d.ln_inv ... and in another model, you created a new variable: . generate dln_inv = d.ln_inv . var dln_inv . . . irf table — Tables of IRFs, dynamic-multiplier functions, and FEVDs 341 Say that you saved IRF results from both (perhaps they differ in the number of lags). Now you wish to use irf table to compare them. You would not be able to specify response(d.ln inv) or response(dln inv) because neither variable is in both models. Similarly, you could not specify impulse(d.ln inv) or impulse(dln inv) for the same reason. All is not lost; if impulse() is not specified, all endogenous variables are used, and similarly if response() is not specified, so you could obtain the result you desired by simply not specifying the options, but you will also obtain a lot more, besides. If you want to specify the impulse() or response() options, be sure to name variables consistently. Also, you may forget how the endogenous variables were named. If so, irf describe, detail can provide the answer. In irf describe’s output, the endogenous variables are listed next to endog. Stored results If the individual option is not specified, irf table stores the following in r(): Scalars r(ncols) r(k umax) r(k) Macros r(key#) r(tnotes) number of columns in table number of distinct keys number of specific table commands #th key list of keys applied to each column If the individual option is specified, then for each irfname, irf table stores the following in r(): Scalars r(irfname r(irfname r(irfname Macros r(irfname r(irfname ncols) k umax) k) number of columns in table for irfname number of distinct keys in table for irfname number of specific table commands used to create table for irfname key#) tnotes) #th key for irfname table list of keys applied to each column in table for irfname Also see [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title mgarch — Multivariate GARCH models Description Syntax Remarks and examples References Also see Description mgarch estimates the parameters of multivariate generalized autoregressive conditionalheteroskedasticity (MGARCH) models. MGARCH models allow both the conditional mean and the conditional covariance to be dynamic. The general MGARCH model is so flexible that not all the parameters can be estimated. For this reason, there are many MGARCH models that parameterize the problem more parsimoniously. mgarch implements four commonly used parameterizations: the diagonal vech model, the constant conditional correlation model, the dynamic conditional correlation model, and the time-varying conditional correlation model. Syntax mgarch model eq  eq . . . eq   if   in   , ... Family model Vech Diagonal vech dvech Conditional correlation constant conditional correlation dynamic conditional correlation varying conditional correlation ccc dcc vcc  Remarks and examples Remarks are presented under the following headings: An introduction to MGARCH models Diagonal vech MGARCH models Conditional correlation MGARCH models Constant conditional correlation MGARCH model Dynamic conditional correlation MGARCH model Varying conditional correlation MGARCH model Error distributions and quasimaximum likelihood Treatment of missing data 342 mgarch — Multivariate GARCH models 343 An introduction to MGARCH models Multivariate GARCH models allow the conditional covariance matrix of the dependent variables to follow a flexible dynamic structure and allow the conditional mean to follow a vector-autoregressive (VAR) structure. The general MGARCH model is too flexible for most problems. There are many restricted MGARCH models in the literature because there is no parameterization that always provides an optimal trade-off between flexibility and parsimony. mgarch implements four commonly used parameterizations: the diagonal vech (DVECH) model, the constant conditional correlation (CCC) model, the dynamic conditional correlation (DCC) model, and the time-varying conditional correlation (VCC) model. Bollerslev, Engle, and Wooldridge (1988); Bollerslev, Engle, and Nelson (1994); Bauwens, Laurent, and Rombouts (2006); Silvennoinen and Teräsvirta (2009); and Engle (2009) provide general introductions to MGARCH models. We provide a quick introduction organized around the models implemented in mgarch. We give a formal definition of the general MGARCH model to establish notation that facilitates comparisons of the models. The general MGARCH model is given by yt = Cxt + t 1/2 t = Ht νt where yt is an m × 1 vector of dependent variables; C is an m × k matrix of parameters; xt is a k × 1 vector of independent variables, which may contain lags of yt ; 1/2 Ht is the Cholesky factor of the time-varying conditional covariance matrix Ht ; and νt is an m× 1 vector of zero-mean, unit-variance, and independent and identically distributed innovations. In the general MGARCH model, Ht is a matrix generalization of univariate GARCH models. For example, in a general MGARCH model with one autoregressive conditional heteroskedastic (ARCH) term and one GARCH term,  vech (Ht ) = s + Avech t−1 0t−1 + Bvech (Ht−1 ) (1) where the vech() function stacks the unique elements that lie on or below the main diagonal in a symmetric matrix into a vector, s is a vector of parameters, and A and B are conformable matrices of parameters. Because this model uses the vech() function to extract and model the unique elements of Ht , it is also known as the VECH model. Because it is a conditional covariance matrix, Ht must be positive definite. Equation (1) can be used to show that the parameters in s, A, and B are not uniquely identified and that further restrictions must be placed on s, A, and B to ensure that Ht is positive definite for all t. 344 mgarch — Multivariate GARCH models The various MGARCH models proposed in the literature differ in how they trade off flexibility and parsimony in their specifications for Ht . Increased flexibility allows a model to capture more complex Ht processes. Increased parsimony makes parameter estimation feasible for more datasets. An important measure of the flexibility–parsimony trade-off is how fast the number of model parameters increases with the number of time series m, because many applied models use multiple time series. Diagonal vech MGARCH models Bollerslev, Engle, and Wooldridge (1988) derived the diagonal vech (DVECH) model by restricting A and B to be diagonal. Although the DVECH model is much more parsimonious than the general model, it can only handle a few series because the number of parameters grows quadratically with the number of series. For example, there are 3m(m + 1)/2 parameters in a DVECH(1,1) model for Ht . Despite the large number of parameters, the diagonal structure implies that each conditional variance and each conditional covariance depends on its own past but not on the past of the other conditional variances and covariances. Formally, in the DVECH(1,1) model each element of Ht is modeled by hij,t = sij + aij i,(t−1) j,(t−1) + bij hij,(t−1) Parameter estimation can be difficult because it requires that Ht be positive definite for each t. The requirement that Ht be positive definite for each t imposes complicated restrictions on the off-diagonal elements. See [TS] mgarch dvech for more details about this model. Conditional correlation MGARCH models Conditional correlation (CC) models use nonlinear combinations of univariate GARCH models to represent the conditional covariances. In each of the conditional correlation models, the conditional covariance matrix is positive definite by construction and has a simple structure, which facilitates parameter estimation. CC models have a slower parameter growth rate than DVECH models as the number of time series increases. In CC models, Ht is decomposed into a matrix of conditional correlations Rt and a diagonal matrix of conditional variances Dt : 1/2 1/2 Ht = Dt Rt Dt (2) where each conditional variance follows a univariate GARCH process and the parameterizations of Rt vary across models. Equation (2) implies that hij,t = ρij,t σi,t σj,t (3) 2 where σi,t is modeled by a univariate GARCH process. Equation (3) highlights that CC models use nonlinear combinations of univariate GARCH models to represent the conditional covariances and that the parameters in the model for ρij,t describe the extent to which the errors from equations i and j move together. mgarch — Multivariate GARCH models 345 Comparing (1) and (2) shows that the number of parameters increases more slowly with the number of time series in a CC model than in a DVECH model. The three CC models implemented in mgarch differ in how they parameterize Rt . Constant conditional correlation MGARCH model Bollerslev (1990) proposed a CC MGARCH model in which the correlation matrix is time invariant. It is for this reason that the model is known as a constant conditional correlation (CCC) MGARCH model. Restricting Rt to a constant matrix reduces the number of parameters and simplifies the estimation but may be too strict in many empirical applications. See [TS] mgarch ccc for more details about this model. Dynamic conditional correlation MGARCH model Engle (2002) introduced a dynamic conditional correlation (DCC) MGARCH model in which the conditional quasicorrelations Rt follow a GARCH(1,1)-like process. (As described by Engle [2009] and Aielli [2009], the parameters in Rt are not standardized to be correlations and are thus known as quasicorrelations.) To preserve parsimony, all the conditional quasicorrelations are restricted to follow the same dynamics. The DCC model is significantly more flexible than the CCC model without introducing an unestimable number of parameters for a reasonable number of series. See [TS] mgarch dcc for more details about this model. Varying conditional correlation MGARCH model Tse and Tsui (2002) derived the varying conditional correlation (VCC) MGARCH model in which the conditional correlations at each period are a weighted sum of a time-invariant component, a measure of recent correlations among the residuals, and last period’s conditional correlations. For parsimony, all the conditional correlations are restricted to follow the same dynamics. See [TS] mgarch vcc for more details about this model. Error distributions and quasimaximum likelihood By default, mgarch dvech, mgarch ccc, mgarch dcc, and mgarch vcc estimate the parameters of MGARCH models by maximum likelihood (ML), assuming that the errors come from a multivariate normal distribution. Both the ML estimator and the quasi–maximum likelihood (QML) estimator, which drops the normality assumption, are assumed to be consistent and normally distributed in large samples; see Jeantheau (1998), Berkes and Horváth (2003), Comte and Lieberman (2003), Ling and McAleer (2003), and Fiorentini and Sentana (2007). Specify vce(robust) to estimate the parameters by QML. The QML parameter estimates are the same as the ML estimates, but the VCEs are different. Based on low-level assumptions, Jeantheau (1998), Comte and Lieberman (2003), and Ling and McAleer (2003) prove that some of the ML and QML estimators implemented in mgarch are consistent and asymptotically normal. Based on higher-level assumptions, Fiorentini and Sentana (2007) prove that all the ML and QML estimators implemented in mgarch are consistent and asymptotically normal. The low-level assumption proofs specify the technical restrictions on the data-generating processes more precisely than the high-level proofs, but they do not cover as many models or cases as the high-level proofs. 346 mgarch — Multivariate GARCH models It is generally accepted that there could be more low-level theoretical work done to substantiate the claims that the ML and QML estimators are consistent and asymptotically normally distributed. These widely applied estimators have been subjected to many Monte Carlo studies that show that the large-sample theory performs well in finite samples. The distribution(t) option causes the mgarch commands to estimate the parameters of the corresponding model by ML assuming that the errors come from a multivariate Student t distribution. The choice between the multivariate normal and the multivariate t distributions is one between robustness and efficiency. If the disturbances come from a multivariate Student t, then the ML estimates based on the multivariate Student t assumption will be consistent and efficient, while the QML estimates based on the multivariate normal assumption will be consistent but not efficient. In contrast, if the disturbances come from a well-behaved distribution that is neither multivariate Student t nor multivariate normal, then the ML estimates based on the multivariate Student t assumption will not be consistent, while the QML estimates based on the multivariate normal assumption will be consistent but not efficient. Fiorentini and Sentana (2007) compare the ML and QML estimators implemented in mgarch and provide many useful technical results pertaining to the estimators. Treatment of missing data mgarch allows for gaps due to missing data. The unconditional expectations are substituted for the dynamic components that cannot be computed because of gaps. This method of handling gaps can only handle the case in which g/T goes to zero as T goes to infinity, where g is the number of observations lost to gaps in the data and T is the number of nonmissing observations. References Aielli, G. P. 2009. Dynamic Conditional Correlations: On Properties and Estimation. Working paper, Dipartimento di Statistica, University of Florence, Florence, Italy. Bauwens, L., S. Laurent, and J. V. K. Rombouts. 2006. Multivariate GARCH models: A survey. Journal of Applied Econometrics 21: 79–109. Berkes, I., and L. Horváth. 2003. The rate of consistency of the quasi-maximum likelihood estimator. Statistics and Probability Letters 61: 133–143. Bollerslev, T. 1990. Modelling the coherence in short-run nominal exchange rates: A multivariate generalized ARCH model. Review of Economics and Statistics 72: 498–505. Bollerslev, T., R. F. Engle, and D. B. Nelson. 1994. ARCH models. In Vol. 4 of Handbook of Econometrics, ed. R. F. Engle and D. L. McFadden. Amsterdam: Elsevier. Bollerslev, T., R. F. Engle, and J. M. Wooldridge. 1988. A capital asset pricing model with time-varying covariances. Journal of Political Economy 96: 116–131. Comte, F., and O. Lieberman. 2003. Asymptotic theory for multivariate GARCH processes. Journal of Multivariate Analysis 84: 61–84. Engle, R. F. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics 20: 339–350. . 2009. Anticipating Correlations: A New Paradigm for Risk Management. Princeton, NJ: Princeton University Press. Fiorentini, G., and E. Sentana. 2007. On the efficiency and consistency of likelihood estimation in multivariate conditionally heteroskedastic dynamic regression models. Working paper 0713, CEMFI, Madrid, Spain. ftp://ftp.cemfi.es/wp/07/0713.pdf. Jeantheau, T. 1998. Strong consistency of estimators for multivariate ARCH models. Economic Theory 14: 70–86. mgarch — Multivariate GARCH models 347 Ling, S., and M. McAleer. 2003. Asymptotic theory for a vector ARM–GARCH model. Economic Theory 19: 280–310. Silvennoinen, A., and T. Teräsvirta. 2009. Multivariate GARCH models. In Handbook of Financial Time Series, ed. T. G. Andersen, R. A. Davis, J.-P. Kreiß, and T. Mikosch, 201–229. New York: Springer. Tse, Y. K., and A. K. C. Tsui. 2002. A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business & Economic Statistics 20: 351–362. Also see [TS] arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators [TS] var — Vector autoregressive models [U] 20 Estimation and postestimation commands Title mgarch ccc — Constant conditional correlation multivariate GARCH models Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description mgarch ccc estimates the parameters of constant conditional correlation (CCC) multivariate generalized autoregressive conditionally heteroskedastic (MGARCH) models in which the conditional variances are modeled as univariate generalized autoregressive conditionally heteroskedastic (GARCH) models and the conditional covariances are modeled as nonlinear functions of the conditional variances. The conditional correlation parameters that weight the nonlinear combinations of the conditional variance are constant in the CCC MGARCH model. The CCC MGARCH model is less flexible than the dynamic conditional correlation MGARCH model (see [TS] mgarch dcc) and varying conditional correlation MGARCH model (see [TS] mgarch vcc), which specify GARCH-like processes for the conditional correlations. The conditional correlation MGARCH models are more parsimonious than the diagonal vech MGARCH model (see [TS] mgarch dvech). Quick start Fit constant conditional correlation multivariate GARCH with first- and second-order ARCH components for dependent variables y1 and y2 using tsset data mgarch ccc (y1 y2), arch(1 2) Add regressors x1 and x2 and first-order GARCH component mgarch ccc (y1 y2 = x1 x2), arch(1 2) garch(1) Add z1 to the model for the conditional heteroskedasticity mgarch ccc (y1 y2 = x1 x2), arch(1 2) garch(1) het(z1) Menu Statistics > Multivariate time series > Multivariate GARCH 348 mgarch ccc — Constant conditional correlation multivariate GARCH models Syntax mgarch ccc eq  eq . . . eq   if   in   , options  where each eq has the form    (depvars = indepvars , eqoptions ) options Description Model arch(numlist) garch(numlist) het(varlist)   distribution(dist # ) unconcentrated constraints(numlist) ARCH terms for all equations GARCH terms for all equations include varlist in the specification of the conditional variance for all equations use dist distribution for errors [may be gaussian (synonym normal) or t; default is gaussian] perform optimization on unconcentrated log likelihood apply linear constraints SE/Robust vce(vcetype) vcetype may be oim or robust Reporting level(#) nocnsreport display options set confidence level; default is level(95) do not display constraints control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling Maximization maximize options from(matname) control the maximization process; seldom used initial values for the coefficients; seldom used coeflegend display legend instead of statistics eqoptions Description noconstant arch(numlist) garch(numlist) het(varlist) ARCH terms GARCH terms suppress constant term in the mean equation include varlist in the specification of the conditional variance You must tsset your data before using mgarch ccc; see [TS] tsset. indepvars and varlist may contain factor variables; see [U] 11.4.3 Factor variables. depvars, indepvars, and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. 349 350 mgarch ccc — Constant conditional correlation multivariate GARCH models Options   Model arch(numlist) specifies the ARCH terms for all equations in the model. By default, no ARCH terms are specified. garch(numlist) specifies the GARCH terms for all equations in the model. By default, no GARCH terms are specified. het(varlist) specifies that varlist be included in the specification of the conditional variance for all equations. This varlist enters the variance specification collectively as multiplicative heteroskedasticity.   distribution(dist # ) specifies the assumed distribution for the errors. dist may be gaussian, normal, or t. gaussian and normal are synonyms; each causes mgarch ccc to assume that the errors come from a multivariate normal distribution. # cannot be specified with either of them. t causes mgarch ccc to assume that the errors follow a multivariate Student t distribution, and the degree-of-freedom parameter is estimated along with the other parameters of the model. If distribution(t #) is specified, then mgarch ccc uses a multivariate Student t distribution with # degrees of freedom. # must be greater than 2. unconcentrated specifies that optimization be performed on the unconcentrated log likelihood. The default is to start with the concentrated log likelihood. constraints(numlist) specifies linear constraints to apply to the parameter estimates.   SE/Robust vce(vcetype) specifies the estimator for the variance–covariance matrix of the estimator. vce(oim), the default, specifies to use the observed information matrix (OIM) estimator. vce(robust) specifies to use the Huber/White/sandwich estimator.   Reporting level(#); see [R] estimation options. nocnsreport; see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and from(matname); see [R] maximize for all options except from(), and see below for information on from(). These options are seldom used. from(matname) specifies initial values for the coefficients. from(b0) causes mgarch ccc to begin the optimization algorithm with the values in b0. b0 must be a row vector, and the number of columns must equal the number of parameters in the model. The following option is available with mgarch ccc but is not shown in the dialog box: coeflegend; see [R] estimation options. mgarch ccc — Constant conditional correlation multivariate GARCH models 351 Eqoptions noconstant suppresses the constant term in the mean equation. arch(numlist) specifies the ARCH terms in the equation. By default, no ARCH terms are specified. This option may not be specified with model-level arch(). garch(numlist) specifies the GARCH terms in the equation. By default, no GARCH terms are specified. This option may not be specified with model-level garch(). het(varlist) specifies that varlist be included in the specification of the conditional variance. This varlist enters the variance specification collectively as multiplicative heteroskedasticity. This option may not be specified with model-level het(). Remarks and examples We assume that you have already read [TS] mgarch, which provides an introduction to MGARCH models and the methods implemented in mgarch ccc. MGARCH models are dynamic multivariate regression models in which the conditional variances and covariances of the errors follow an autoregressive-moving-average structure. The CCC MGARCH model uses a nonlinear combination of univariate GARCH models in which the cross-equation weights are time invariant to model the conditional covariance matrix of the disturbances. As discussed in [TS] mgarch, MGARCH models differ in the parsimony and flexibility of their specifications for a time-varying conditional covariance matrix of the disturbances, denoted by Ht . In the conditional correlation family of MGARCH models, the diagonal elements of Ht are modeled as univariate GARCH models, whereas the off-diagonal elements are modeled as nonlinear functions of the diagonal terms. In the CCC MGARCH model, hij,t = ρij p hii,t hjj,t where the diagonal elements hii,t and hjj,t follow univariate GARCH processes and ρij is a timeinvariate weight interpreted as a conditional correlation. In the dynamic conditional correlation (DCC) and varying conditional correlation (VCC) MGARCH models discussed in [TS] mgarch dcc and [TS] mgarch vcc, the ρij are allowed to vary over time. Although the conditional-correlation structure provides a useful trade-off between parsimony and flexibility in the DCC MGARCH and VCC MGARCH models, the time-invariant parameterization used in the CCC MGARCH model is generally viewed as too restrictive for many applications; see Silvennoinen and Teräsvirta (2009). The baseline CCC MGARCH estimates are frequently compared with DCC MGARCH and VCC MGARCH estimates. Technical note Formally, the CCC MGARCH model derived by Bollerslev (1990) can be written as yt = Cxt + t 1/2 t = Ht νt 1/2 1/2 Ht = Dt RDt 352 mgarch ccc — Constant conditional correlation multivariate GARCH models where yt is an m × 1 vector of dependent variables; C is an m × k matrix of parameters; xt is a k × 1 vector of independent variables, which may contain lags of yt ; 1/2 Ht is the Cholesky factor of the time-varying conditional covariance matrix Ht ; νt is an m × 1 vector of normal, independent, and identically distributed innovations; Dt is a diagonal matrix of conditional variances, 2 σ1,t  0  Dt =  .  .. 0   ··· 0 ··· 0   ..  .. . .  2 · · · σm,t 0 2 σ2,t .. . 0 2 in which each σi,t evolves according to a univariate GARCH model of the form P Pqi pi 2 2 σi,t = si + j=1 αj 2i,t−j + j=1 βj σi,t−j by default, or 2 σi,t = exp(γi zi,t ) + Ppi j=1 αj 2i,t−j + Pqi j=1 2 βj σi,t−j when the het() option is specified, where γt is a 1 × p vector of parameters, zi is a p × 1 vector of independent variables including a constant term, the αj ’s are ARCH parameters, and the βj ’s are GARCH parameters; and R is a matrix of time-invariant unconditional correlations of the standardized residuals −1/2 Dt t ,   1 ρ12 · · · ρ1m 1 · · · ρ2m   ρ12 R= .. ..  ..  ..  . . . . ρ1m ρ2m · · · 1 This model is known as the constant conditional correlation MGARCH model because R is time invariant. Some examples Example 1: Model with common covariates We have daily data on the stock returns of three car manufacturers—Toyota, Nissan, and Honda, from January 2, 2003, to December 31, 2010—in the variables toyota, nissan, and honda. We model the conditional means of the returns as a first-order vector autoregressive process and the conditional covariances as a CCC MGARCH process in which the variance of each disturbance term follows a GARCH(1,1) process. We specify the noconstant option, because the returns have mean zero. The estimated constants in the variance equations are near zero in this example because of how the data are scaled. mgarch ccc — Constant conditional correlation multivariate GARCH models . use http://www.stata-press.com/data/r14/stocks (Data from Yahoo! Finance) . mgarch ccc (toyota nissan honda = L.toyota L.nissan L.honda, noconstant), > arch(1) garch(1) Calculating starting values.... Optimizing concentrated log likelihood (setting technique to bhhh) Iteration 0: log likelihood = 16898.994 Iteration 1: log likelihood = 17008.914 Iteration 2: log likelihood = 17156.946 Iteration 3: log likelihood = 17249.527 Iteration 4: log likelihood = 17287.251 Iteration 5: log likelihood = 17313.5 Iteration 6: log likelihood = 17335.087 Iteration 7: log likelihood = 17356.534 Iteration 8: log likelihood = 17376.051 Iteration 9: log likelihood = 17400.035 (switching technique to nr) Iteration 10: log likelihood = 17423.634 Iteration 11: log likelihood = 17440.261 Iteration 12: log likelihood = 17446.381 Iteration 13: log likelihood = 17447.614 Iteration 14: log likelihood = 17447.645 Iteration 15: log likelihood = 17447.645 Optimizing unconcentrated log likelihood Iteration 0: log likelihood = 17447.645 Iteration 1: log likelihood = 17447.651 Iteration 2: log likelihood = 17447.651 Constant conditional correlation MGARCH model Sample: 1 - 2015 Distribution: Gaussian Log likelihood = 17447.65 Coef. Std. Err. z Number of obs Wald chi2(9) Prob > chi2 P>|z| = = = 2,014 17.46 0.0420 [95% Conf. Interval] toyota toyota L1. -.0537817 .0353211 -1.52 0.128 -.1230098 .0154463 nissan L1. .026686 .024841 1.07 0.283 -.0220015 .0753734 honda L1. -.0043073 .0302761 -0.14 0.887 -.0636473 .0550327 ARCH_toyota arch L1. .0615321 .0087313 7.05 0.000 .0444191 .0786452 garch L1. .9213798 .0110412 83.45 0.000 .8997395 .9430201 _cons 4.42e-06 1.12e-06 3.93 0.000 2.21e-06 6.62e-06 353 354 mgarch ccc — Constant conditional correlation multivariate GARCH models nissan toyota L1. -.0232321 .0400563 -0.58 0.562 -.1017411 .0552769 nissan L1. -.0299552 .0309362 -0.97 0.333 -.0905891 .0306787 honda L1. .0369229 .0360532 1.02 0.306 -.0337401 .1075859 ARCH_nissan arch L1. .0740294 .0119353 6.20 0.000 .0506366 .0974222 garch L1. .9102548 .0142328 63.95 0.000 .882359 .9381506 _cons 6.36e-06 1.76e-06 3.61 0.000 2.91e-06 9.81e-06 toyota L1. -.0378616 .036792 -1.03 0.303 -.1099727 .0342495 nissan L1. .0551649 .0272559 2.02 0.043 .0017443 .1085855 honda L1. -.0431919 .0331268 -1.30 0.192 -.1081193 .0217354 ARCH_honda arch L1. .0433036 .0070224 6.17 0.000 .0295399 .0570674 garch L1. .939117 .010131 92.70 0.000 .9192605 .9589735 _cons 5.02e-06 1.31e-06 3.83 0.000 2.45e-06 7.59e-06 .6532264 .0128035 51.02 0.000 .628132 .6783208 .7185412 .0108132 66.45 0.000 .6973477 .7397346 .6298972 .0135336 46.54 0.000 .6033717 .6564226 honda corr(toyota, nissan) corr(toyota, honda) corr(nissan, honda) The iteration log has three parts: the dots from the search for initial values, the iteration log from optimizing the concentrated log likelihood, and the iteration log from maximizing the unconcentrated log likelihood. A detailed discussion of the optimization methods can be found in Methods and formulas. The header describes the estimation sample and reports a Wald test against the null hypothesis that all the coefficients on the independent variables in the mean equations are zero. Here the null hypothesis is rejected at the 5% level. The output table first presents results for the mean or variance parameters used to model each dependent variable. Subsequently, the output table presents results for the conditional correlation parameters. For example, the conditional correlation between the standardized residuals for Toyota and Nissan is estimated to be 0.65. mgarch ccc — Constant conditional correlation multivariate GARCH models 355 The output above indicates that we may not need all the vector autoregressive parameters, but that each of the univariate ARCH, univariate GARCH, and conditional correlation parameters are statistically significant. That the estimated conditional correlation parameters are positive and significant indicates that the returns on these stocks rise or fall together. That the conditional correlations are time invariant is a restrictive assumption. The DCC MGARCH model and the VCC MGARCH model nest the CCC MGARCH model. When we test the time-invariance assumption with Wald tests on the parameters of these more general models in [TS] mgarch dcc and [TS] mgarch vcc, we reject the null hypothesis that these conditional correlations are time invariant. Example 2: Model with covariates that differ by equation We improve the previous example by removing the insignificant parameters from the model. To remove these parameters, we specify the honda equation separately from the toyota and nissan equations: . mgarch ccc (toyota nissan = , noconstant) (honda = L.nissan, noconstant), > arch(1) garch(1) Calculating starting values.... Optimizing concentrated log likelihood (setting technique to bhhh) Iteration 0: log likelihood = 16886.88 Iteration 1: log likelihood = 16974.779 Iteration 2: log likelihood = 17147.893 Iteration 3: log likelihood = 17247.473 Iteration 4: log likelihood = 17285.549 Iteration 5: log likelihood = 17311.153 Iteration 6: log likelihood = 17333.588 Iteration 7: log likelihood = 17353.717 Iteration 8: log likelihood = 17374.895 Iteration 9: log likelihood = 17400.669 (switching technique to nr) Iteration 10: log likelihood = 17425.661 Iteration 11: log likelihood = 17436.789 Iteration 12: log likelihood = 17439.74 Iteration 13: log likelihood = 17439.865 Iteration 14: log likelihood = 17439.866 Optimizing unconcentrated log likelihood Iteration 0: log likelihood = 17439.865 Iteration 1: log likelihood = 17439.872 Iteration 2: log likelihood = 17439.872 356 mgarch ccc — Constant conditional correlation multivariate GARCH models Constant conditional correlation MGARCH model Sample: 1 - 2015 Distribution: Gaussian Log likelihood = 17439.87 Coef. Number of obs Wald chi2(1) Prob > chi2 Std. Err. z P>|z| = = = 2,014 1.81 0.1781 [95% Conf. Interval] ARCH_toyota arch L1. .0619604 .0087942 7.05 0.000 .044724 .0791968 garch L1. .9208961 .0110995 82.97 0.000 .8991414 .9426508 _cons 4.43e-06 1.13e-06 3.94 0.000 2.23e-06 6.64e-06 ARCH_nissan arch L1. .0773095 .012328 6.27 0.000 .0531471 .1014719 garch L1. .906088 .0147303 61.51 0.000 .8772171 .9349589 _cons 6.77e-06 1.85e-06 3.66 0.000 3.14e-06 .0000104 nissan L1. .0186628 .0138575 1.35 0.178 -.0084975 .0458231 ARCH_honda arch L1. .0433741 .006996 6.20 0.000 .0296622 .0570861 garch L1. .9391094 .0100707 93.25 0.000 .9193712 .9588476 _cons 5.02e-06 1.31e-06 3.83 0.000 2.45e-06 7.60e-06 .652299 .0128271 50.85 0.000 .6271583 .6774396 .7189531 .0108005 66.57 0.000 .6977845 .7401218 .628435 .0135653 46.33 0.000 .6018475 .6550225 honda corr(toyota, nissan) corr(toyota, honda) corr(nissan, honda) It turns out that the coefficient on L1.nissan in the honda equation is now statistically insignificant. We could further improve the model by removing L1.nissan from the model. As expected, removing the insignificant parameters from conditional mean equations had almost no effect on the estimated conditional variance parameters. There is no mean equation for Toyota or Nissan. In [TS] mgarch ccc postestimation, we discuss prediction from models without covariates. mgarch ccc — Constant conditional correlation multivariate GARCH models 357 Example 3: Model with constraints Here we fit a bivariate CCC MGARCH model for the Toyota and Nissan shares. We believe that the shares of these car manufacturers follow the same process, so we impose the constraints that the ARCH and the GARCH coefficients are the same for the two companies. . constraint 1 _b[ARCH_toyota:L.arch] = _b[ARCH_nissan:L.arch] . constraint 2 _b[ARCH_toyota:L.garch] = _b[ARCH_nissan:L.garch] . mgarch ccc (toyota nissan = , noconstant), arch(1) garch(1) constraints(1 2) Calculating starting values.... Optimizing concentrated log likelihood (setting technique to bhhh) Iteration 0: log likelihood Iteration 1: log likelihood Iteration 2: log likelihood Iteration 3: log likelihood (output omitted ) Iteration 8: log likelihood Iteration 9: log likelihood (switching technique to nr) Iteration 10: log likelihood Iteration 11: log likelihood Iteration 12: log likelihood = = = = 10317.225 10630.464 10865.964 11063.329 = = 11273.962 11274.409 = = = 11274.494 11274.499 11274.499 Optimizing unconcentrated log likelihood Iteration 0: Iteration 1: Iteration 2: log likelihood = log likelihood = log likelihood = 11274.499 11274.501 11274.501 Constant conditional correlation MGARCH model Sample: 1 - 2015 Distribution: Gaussian Log likelihood = 11274.5 ( 1) ( 2) Number of obs Wald chi2(.) Prob > chi2 = = = 2,015 . . [ARCH_toyota]L.arch - [ARCH_nissan]L.arch = 0 [ARCH_toyota]L.garch - [ARCH_nissan]L.garch = 0 Coef. Std. Err. z P>|z| [95% Conf. Interval] ARCH_toyota arch L1. .0742678 .0095464 7.78 0.000 .0555572 .0929785 garch L1. .9131674 .0111558 81.86 0.000 .8913024 .9350323 _cons 3.77e-06 1.02e-06 3.71 0.000 1.78e-06 5.77e-06 ARCH_nissan arch L1. .0742678 .0095464 7.78 0.000 .0555572 .0929785 garch L1. .9131674 .0111558 81.86 0.000 .8913024 .9350323 _cons 5.30e-06 1.36e-06 3.89 0.000 2.63e-06 7.97e-06 .651389 .0128482 50.70 0.000 .6262071 .6765709 corr(toyota, nissan) 358 mgarch ccc — Constant conditional correlation multivariate GARCH models We could test our constraints by fitting the unconstrained model and performing a likelihood-ratio test. The results indicate that the restricted model is preferable. Example 4: Model with a GARCH term In this example, we have data on fictional stock returns for the Acme and Anvil corporations and we believe that the movement of the two stocks is governed by different processes. We specify one ARCH and one GARCH term for the conditional variance equation for Acme and two ARCH terms for the conditional variance equation for Anvil. In addition, we include the lagged value of the stock return for Apex, the main subsidiary of Anvil corporation, in the variance equation of Anvil. For Acme, we have data on the changes in an index of futures prices of products related to those produced by Acme in afrelated. For Anvil, we have data on the changes in an index of futures prices of inputs used by Anvil in afinputs. mgarch ccc — Constant conditional correlation multivariate GARCH models . use http://www.stata-press.com/data/r14/acmeh . mgarch ccc (acme = afrelated, noconstant arch(1) garch(1)) > (anvil = afinputs, arch(1/2) het(L.apex)) Calculating starting values.... Optimizing concentrated log likelihood (setting technique to bhhh) Iteration 0: log likelihood = -12996.245 Iteration 1: log likelihood = -12609.982 Iteration 2: log likelihood = -12563.103 Iteration 3: log likelihood = -12554.73 Iteration 4: log likelihood = -12554.542 Iteration 5: log likelihood = -12554.534 Iteration 6: log likelihood = -12554.534 Iteration 7: log likelihood = -12554.534 Optimizing unconcentrated log likelihood Iteration 0: log likelihood = -12554.534 Iteration 1: log likelihood = -12554.533 Constant conditional correlation MGARCH model Sample: 1 - 2500 Number of obs Distribution: Gaussian Wald chi2(2) Log likelihood = -12554.53 Prob > chi2 Coef. Std. Err. z = = = 2,499 2212.30 0.0000 P>|z| [95% Conf. Interval] acme afrelated .9175148 .0651088 14.09 0.000 .7899039 1.045126 ARCH_acme arch L1. .0798719 .0169526 4.71 0.000 .0466455 .1130983 garch L1. .7336823 .0601569 12.20 0.000 .6157768 .8515877 _cons 2.880836 .760206 3.79 0.000 1.39086 4.370812 anvil afinputs _cons -1.015561 .0703606 .0226437 .0211689 -44.85 3.32 0.000 0.001 -1.059942 .0288703 -.97118 .1118508 ARCH_anvil arch L1. L2. .4893288 .2782296 .0286012 .0208172 17.11 13.37 0.000 0.000 .4332714 .2374287 .5453862 .3190305 apex L1. 1.894972 .0616293 30.75 0.000 1.774181 2.015763 _cons .1034111 .0735512 1.41 0.160 -.0407466 .2475688 -.5354047 .0143275 -37.37 0.000 -.563486 -.5073234 corr(acme, anvil) 359 The results indicate that increases in the futures prices for related products lead to higher returns on the Acme stock, and increased input prices lead to lower returns on the Anvil stock. In the conditional variance equation for Anvil, the coefficient on L1.apex is positive and significant, which indicates that an increase in the return on the Apex stock leads to more variability in the return on the Anvil stock. That the estimated conditional correlation between the two returns is −0.54 indicates that these 360 mgarch ccc — Constant conditional correlation multivariate GARCH models returns tend to move in opposite directions; in other words, an increase in the return for the Acme stock tends to be associated with a decrease in the return for the Anvil stock, and vice versa. Stored results mgarch ccc stores the following in e(): Scalars e(N) e(k) e(k aux) e(k extra) e(k eq) e(k dv) e(df m) e(ll) e(chi2) e(p) e(estdf) e(usr) e(tmin) e(tmax) e(N gaps) e(rank) e(ic) e(rc) e(converged) Macros e(cmd) e(model) e(cmdline) e(depvar) e(covariates) e(dv eqs) e(indeps) e(tvar) e(title) e(chi2type) e(vce) e(vcetype) e(tmins) e(tmaxs) e(dist) e(arch) e(garch) e(technique) e(properties) e(estat cmd) e(predict) e(marginsok) e(marginsnotok) e(marginsdefault) e(asbalanced) e(asobserved) number of observations number of parameters number of auxiliary parameters number of extra estimates added to number of equations in e(b) number of dependent variables model degrees of freedom log likelihood b χ2 significance 1 if distribution parameter was estimated, 0 otherwise user-provided distribution parameter minimum time in sample maximum time in sample number of gaps rank of e(V) number of iterations return code 1 if converged, 0 otherwise mgarch ccc command as typed names of dependent variables list of covariates dependent variables with mean equations independent variables in each equation time variable title in estimation output Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. formatted minimum time formatted maximum time distribution for error term: gaussian or t specified ARCH terms specified GARCH terms maximization technique b V program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins default predict() specification for margins factor variables fvset as asbalanced factor variables fvset as asobserved mgarch ccc — Constant conditional correlation multivariate GARCH models Matrices e(b) e(Cns) e(ilog) e(gradient) e(hessian) e(V) e(pinfo) Functions e(sample) 361 coefficient vector constraints matrix iteration log (up to 20 iterations) gradient vector Hessian matrix variance–covariance matrix of the estimators parameter information, used by predict marks estimation sample Methods and formulas mgarch ccc estimates the parameters of the CCC MGARCH model by maximum likelihood. The unconcentrated log-likelihood function based on the multivariate normal distribution for observation t is n  o 1/2 lt = −0.5m log(2π) − 0.5log {det (R)} − log det Dt − 0.5e t R−1e 0t (1) −1/2 where e t = Dt t is an m × 1 vector of standardized residuals, t = yt − Cxt . The log-likelihood PT function is t=1 lt . If we assume that νt follow a multivariate t distribution with degrees of freedom (df) greater than 2, then the unconcentrated log-likelihood function for observation t is   df m log {(df − 2)π} − 2 2   n  o df + m e t R−1e 0t 1/2 − 0.5log {det (R)} − log det Dt − log 1 + 2 df − 2 lt = log Γ df + m 2   − log Γ (2) The correlation matrix R can be concentrated out of (1) and (2) by defining the (i, j)th element of R as ρbij = T X t=1 ! e ite jt T X t=1 e 2it T − 12  X e 2jt − 21 t=1 mgarch ccc starts the optimization process with the concentrated log-likelihood function. The starting values for the parameters in the mean equations and the initial residuals b t are obtained by least-squares regression. The starting values for the parameters in the variance equations are obtained by a procedure proposed by Gourieroux and Monfort (1997, sec. 6.2.2). If the optimization is started with the unconcentrated log likelihood, then the initial values for the parameters in R are calculated from the standardized residuals e t . 362 mgarch ccc — Constant conditional correlation multivariate GARCH models GARCH estimators require initial values that can be plugged in for t−i 0t−i and Ht−j when t − i < 1 and t − j < 1. mgarch ccc substitutes an estimator of the unconditional covariance of the disturbances T X 0 −1 b b Σ=T b t b b t (3) t=1 b t is the vector of residuals for t−i 0t−i when t − i < 1 and for Ht−j when t − j < 1, where b calculated using the estimated parameters. mgarch ccc requires a sample size that at the minimum is equal to the number of parameters in the model plus twice the number of equations. mgarch ccc uses numerical derivatives in maximizing the log-likelihood function. References Bollerslev, T. 1990. Modelling the coherence in short-run nominal exchange rates: A multivariate generalized ARCH model. Review of Economics and Statistics 72: 498–505. Gourieroux, C. S., and A. Monfort. 1997. Time Series and Dynamic Models. Trans. ed. G. M. Gallo. Cambridge: Cambridge University Press. Silvennoinen, A., and T. Teräsvirta. 2009. Multivariate GARCH models. In Handbook of Financial Time Series, ed. T. G. Andersen, R. A. Davis, J.-P. Kreis̈, and T. Mikosch, 201–229. Berlin: Springer. Also see [TS] mgarch ccc postestimation — Postestimation tools for mgarch ccc [TS] mgarch — Multivariate GARCH models [TS] tsset — Declare data to be time-series data [TS] arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators [TS] var — Vector autoregressive models [U] 20 Estimation and postestimation commands Title mgarch ccc postestimation — Postestimation tools for mgarch ccc Postestimation commands Methods and formulas predict Also see margins Remarks and examples Postestimation commands The following standard postestimation commands are available after mgarch ccc: Command Description contrast estat ic estat summarize estat vce estimates forecast lincom contrasts and ANOVA-style joint tests of estimates Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions pairwise comparisons of estimates Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest margins marginsplot nlcom predict predictnl pwcompare test testnl 363 364 mgarch ccc postestimation — Postestimation tools for mgarch ccc predict Description for predict predict creates a new variable containing predictions such as linear predictions and conditional variances, covariances, and correlations. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic { stub* | newvarlist }  if   in   , statistic options  Description Main xb residuals variance correlation linear prediction; the default residuals conditional variances and covariances conditional correlations These statistics are available both in and out of sample; type predict the estimation sample. . . . if e(sample) . . . if wanted only for Description options Options equation(eqnames) names of equations for which predictions are made dynamic(time constant) begin dynamic forecast at specified time Options for predict  Main  xb, the default, calculates the linear predictions of the dependent variables. residuals calculates the residuals. variance predicts the conditional variances and conditional covariances. correlation predicts the conditional correlations.   Options equation(eqnames) specifies the equation for which the predictions are calculated. Use this option to predict a statistic for a particular equation. Equation names, such as equation(income), are used to identify equations. mgarch ccc postestimation — Postestimation tools for mgarch ccc 365 One equation name may be specified when predicting the dependent variable, the residuals, or the conditional variance. For example, specifying equation(income) causes predict to predict income, and specifying variance equation(income) causes predict to predict the conditional variance of income. Two equations may be specified when predicting a conditional variance or covariance. For example, specifying equation(income, consumption) variance causes predict to predict the conditional covariance of income and consumption. dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified time constant must be in the scale of the time variable specified in tsset, and the time constant must be inside a sample for which observations on the dependent variables are available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of 2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains exogenous variables, they must be present for the whole predicted sample. dynamic() may not be specified with residuals. margins Description for margins margins estimates margins of response for linear predictions and conditional variances, covariances, and correlations. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . )  predict(statistic . . . ) . . . statistic Description default xb variance correlation residuals linear predictions for each equation linear prediction for a specified equation conditional variances and covariances conditional correlations not allowed with margins   options xb defaults to the first equation. Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins.  366 mgarch ccc postestimation — Postestimation tools for mgarch ccc Remarks and examples We assume that you have already read [TS] mgarch ccc. In this entry, we use predict after mgarch ccc to make in-sample and out-of-sample forecasts. Example 1: Dynamic forecasts In this example, we obtain dynamic forecasts for the Toyota, Nissan, and Honda stock returns modeled in example 2 of [TS] mgarch ccc. In the output below, we reestimate the parameters of the model, use tsappend (see [TS] tsappend) to extend the data, and use predict to obtain in-sample one-step-ahead forecasts and dynamic forecasts of the conditional variances of the returns. We graph the forecasts below. 0 .001 .002 .003 . use http://www.stata-press.com/data/r14/stocks (Data from Yahoo! Finance) . quietly mgarch ccc (toyota nissan = , noconstant) > (honda = L.nissan, noconstant), arch(1) garch(1) . tsappend, add(50) . predict H*, variance dynamic(2016) 01jan2009 01jul2009 01jan2010 Date 01jul2010 01jan2011 Variance prediction (toyota,toyota), dynamic(2016) Variance prediction (nissan,nissan), dynamic(2016) Variance prediction (honda,honda), dynamic(2016) Recent in-sample one-step-ahead forecasts are plotted to the left of the vertical line in the above graph, and the dynamic out-of-sample forecasts appear to the right of the vertical line. The graph shows the tail end of the huge increase in return volatility that took place in 2008 and 2009. It also shows that the dynamic forecasts quickly converge. Methods and formulas All one-step predictions are obtained by substituting the parameter estimates into the model. The b is the initial value for the ARCH and estimated unconditional variance matrix of the disturbances, Σ, b using the prediction sample, the parameter GARCH terms. The postestimation routines recompute Σ estimates stored in e(b), and (3) in Methods and formulas of [TS] mgarch ccc. For observations in which the residuals are missing, the estimated unconditional variance matrix of the disturbances is used in place of the outer product of the residuals. mgarch ccc postestimation — Postestimation tools for mgarch ccc 367 Dynamic predictions of the dependent variables use previously predicted values beginning in the period specified by dynamic(). b for the outer product of the Dynamic variance predictions are implemented by substituting Σ residuals beginning in the period specified in dynamic(). Also see [TS] mgarch ccc — Constant conditional correlation multivariate GARCH models [U] 20 Estimation and postestimation commands Title mgarch dcc — Dynamic conditional correlation multivariate GARCH models Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description mgarch dcc estimates the parameters of dynamic conditional correlation (DCC) multivariate generalized autoregressive conditionally heteroskedastic (MGARCH) models in which the conditional variances are modeled as univariate generalized autoregressive conditionally heteroskedastic (GARCH) models and the conditional covariances are modeled as nonlinear functions of the conditional variances. The conditional quasicorrelation parameters that weight the nonlinear combinations of the conditional variances follow the GARCH-like process specified in Engle (2002). The DCC MGARCH model is about as flexible as the closely related varying conditional correlation MGARCH model (see [TS] mgarch vcc), more flexible than the conditional correlation MGARCH model (see [TS] mgarch ccc), and more parsimonious than the diagonal vech MGARCH model (see [TS] mgarch dvech). Quick start Fit dynamic conditional correlation multivariate GARCH with first- and second-order ARCH components for dependent variables y1 and y2 using tsset data mgarch dcc (y1 y2), arch(1 2) Add regressors x1 and x2 and first-order GARCH component mgarch dcc (y1 y2 = x1 x2), arch(1 2) garch(1) Add z1 to the model for the conditional heteroskedasticity mgarch dcc (y1 y2 = x1 x2), arch(1 2) garch(1) het(z1) Menu Statistics > Multivariate time series > Multivariate GARCH 368 mgarch dcc — Dynamic conditional correlation multivariate GARCH models Syntax mgarch dcc eq  eq . . . eq   if   in   , options  where each eq has the form    (depvars = indepvars , eqoptions ) options Description Model arch(numlist) garch(numlist) het(varlist)   distribution(dist # ) constraints(numlist) ARCH terms for all equations GARCH terms for all equations include varlist in the specification of the conditional variance for all equations use dist distribution for errors [may be gaussian (synonym normal) or t; default is gaussian] apply linear constraints SE/Robust vce(vcetype) vcetype may be oim or robust Reporting level(#) nocnsreport display options set confidence level; default is level(95) do not display constraints control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling Maximization maximize options from(matname) control the maximization process; seldom used initial values for the coefficients; seldom used coeflegend display legend instead of statistics eqoptions Description noconstant arch(numlist) garch(numlist) het(varlist) suppress constant term in the mean equation ARCH terms GARCH terms include varlist in the specification of the conditional variance You must tsset your data before using mgarch dcc; see [TS] tsset. indepvars and varlist may contain factor variables; see [U] 11.4.3 Factor variables. depvars, indepvars, and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. 369 370 mgarch dcc — Dynamic conditional correlation multivariate GARCH models Options   Model arch(numlist) specifies the ARCH terms for all equations in the model. By default, no ARCH terms are specified. garch(numlist) specifies the GARCH terms for all equations in the model. By default, no GARCH terms are specified. het(varlist) specifies that varlist be included in the specification of the conditional variance for all equations. This varlist enters the variance specification collectively as multiplicative heteroskedasticity.   distribution(dist # ) specifies the assumed distribution for the errors. dist may be gaussian, normal, or t. gaussian and normal are synonyms; each causes mgarch dcc to assume that the errors come from a multivariate normal distribution. # may not be specified with either of them. t causes mgarch dcc to assume that the errors follow a multivariate Student t distribution, and the degree-of-freedom parameter is estimated along with the other parameters of the model. If distribution(t #) is specified, then mgarch dcc uses a multivariate Student t distribution with # degrees of freedom. # must be greater than 2. constraints(numlist) specifies linear constraints to apply to the parameter estimates.   SE/Robust vce(vcetype) specifies the estimator for the variance–covariance matrix of the estimator. vce(oim), the default, specifies to use the observed information matrix (OIM) estimator. vce(robust) specifies to use the Huber/White/sandwich estimator.   Reporting level(#); see [R] estimation options. nocnsreport; see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and from(matname); see [R] maximize for all options except from(), and see below for information on from(). These options are seldom used. from(matname) specifies initial values for the coefficients. from(b0) causes mgarch dcc to begin the optimization algorithm with the values in b0. b0 must be a row vector, and the number of columns must equal the number of parameters in the model. The following option is available with mgarch dcc but is not shown in the dialog box: coeflegend; see [R] estimation options. mgarch dcc — Dynamic conditional correlation multivariate GARCH models 371 Eqoptions noconstant suppresses the constant term in the mean equation. arch(numlist) specifies the ARCH terms in the equation. By default, no ARCH terms are specified. This option may not be specified with model-level arch(). garch(numlist) specifies the GARCH terms in the equation. By default, no GARCH terms are specified. This option may not be specified with model-level garch(). het(varlist) specifies that varlist be included in the specification of the conditional variance. This varlist enters the variance specification collectively as multiplicative heteroskedasticity. This option may not be specified with model-level het(). Remarks and examples We assume that you have already read [TS] mgarch, which provides an introduction to MGARCH models and the methods implemented in mgarch dcc. MGARCH models are dynamic multivariate regression models in which the conditional variances and covariances of the errors follow an autoregressive-moving-average structure. The DCC MGARCH model uses a nonlinear combination of univariate GARCH models with time-varying cross-equation weights to model the conditional covariance matrix of the errors. As discussed in [TS] mgarch, MGARCH models differ in the parsimony and flexibility of their specifications for a time-varying conditional covariance matrix of the disturbances, denoted by Ht . In the conditional correlation family of MGARCH models, the diagonal elements of Ht are modeled as univariate GARCH models, whereas the off-diagonal elements are modeled as nonlinear functions of the diagonal terms. In the DCC MGARCH model, hij,t = ρij,t p hii,t hjj,t where the diagonal elements hii,t and hjj,t follow univariate GARCH processes and ρij,t follows the dynamic process specified in Engle (2002) and discussed below. Because the ρij,t varies with time, this model is known as the DCC GARCH model. Technical note The DCC GARCH model proposed by Engle (2002) can be written as yt = Cxt + t 1/2 t = Ht νt 1/2 1/2 Ht = Dt Rt Dt −1/2 Rt = diag(Qt ) Qt diag(Qt ) −1/2 Qt = (1 − λ1 − λ2 )R + λ1 e t−1e 0t−1 + λ2 Qt−1 where yt is an m × 1 vector of dependent variables; C is an m × k matrix of parameters; xt is a k × 1 vector of independent variables, which may contain lags of yt ; (1) 372 mgarch dcc — Dynamic conditional correlation multivariate GARCH models 1/2 Ht is the Cholesky factor of the time-varying conditional covariance matrix Ht ; νt is an m × 1 vector of normal, independent, and identically distributed innovations; Dt is a diagonal matrix of conditional variances, 2 σ1,t  0  Dt =  .  .. 0   ··· 0 ··· 0   ..  .. . .  2 · · · σm,t 0 2 σ2,t .. . 0 2 in which each σi,t evolves according to a univariate GARCH model of the form Ppi Pqi 2 2 σi,t = si + j=1 αj 2i,t−j + j=1 βj σi,t−j by default, or 2 σi,t = exp(γi zi,t ) + Ppi j=1 αj 2i,t−j + Pqi j=1 2 βj σi,t−j when the het() option is specified, where γt is a 1 × p vector of parameters, zi is a p × 1 vector of independent variables including a constant term, the αj ’s are ARCH parameters, and the βj ’s are GARCH parameters; Rt is a matrix of conditional quasicorrelations,  1  ρ12,t Rt =   .. . ρ1m,t ρ12,t 1 .. . ρ2m,t · · · ρ1m,t  · · · ρ2m,t  ..  ..  . . ··· 1 −1/2 e t is an m × 1 vector of standardized residuals, Dt t ; and λ1 and λ2 are parameters that govern the dynamics of conditional quasicorrelations. λ1 and λ2 are nonnegative and satisfy 0 ≤ λ1 + λ2 < 1. When Qt is stationary, the R matrix in (1) is a weighted average of the unconditional covariance matrix of the standardized residuals e t , denoted by R, and the unconditional mean of Qt , denoted by Q. Because R 6= Q, as shown by Aielli (2009), R is neither the unconditional correlation matrix nor the unconditional mean of Qt . For this reason, the parameters in R are known as quasicorrelations; see Aielli (2009) and Engle (2009) for discussions. Some examples Example 1: Model with common covariates We have daily data on the stock returns of three car manufacturers—Toyota, Nissan, and Honda, from January 2, 2003, to December 31, 2010—in the variables toyota, nissan and honda. We model the conditional means of the returns as a first-order vector autoregressive process and the conditional covariances as a DCC MGARCH process in which the variance of each disturbance term follows a GARCH(1,1) process. mgarch dcc — Dynamic conditional correlation multivariate GARCH models . use http://www.stata-press.com/data/r14/stocks (Data from Yahoo! Finance) . mgarch dcc (toyota nissan honda = L.toyota L.nissan L.honda, noconstant), > arch(1) garch(1) Calculating starting values.... Optimizing log likelihood (setting technique to bhhh) Iteration 0: log likelihood = 16902.435 Iteration 1: log likelihood = 17005.448 Iteration 2: log likelihood = 17157.958 Iteration 3: log likelihood = 17267.363 Iteration 4: log likelihood = 17318.29 Iteration 5: log likelihood = 17353.029 Iteration 6: log likelihood = 17369.115 Iteration 7: log likelihood = 17388.035 Iteration 8: log likelihood = 17401.254 Iteration 9: log likelihood = 17435.556 (switching technique to nr) Iteration 10: log likelihood = 17451.739 Iteration 11: log likelihood = 17476.882 Iteration 12: log likelihood = 17478.382 Iteration 13: log likelihood = 17483.858 Iteration 14: log likelihood = 17484.886 Iteration 15: log likelihood = 17484.95 Iteration 16: log likelihood = 17484.95 Refining estimates Iteration 0: log likelihood = 17484.95 Iteration 1: log likelihood = 17484.95 Dynamic conditional correlation MGARCH model Sample: 1 - 2015 Number of obs = 2,014 Distribution: Gaussian Wald chi2(9) = 19.54 Log likelihood = 17484.95 Prob > chi2 = 0.0210 Coef. Std. Err. z P>|z| [95% Conf. Interval] toyota toyota L1. -.0510867 .0339825 -1.50 0.133 -.1176911 .0155177 nissan L1. .0297829 .0247455 1.20 0.229 -.0187173 .0782832 honda L1. -.0162824 .0300323 -0.54 0.588 -.0751447 .0425799 ARCH_toyota arch L1. .0608223 .0086687 7.02 0.000 .043832 .0778127 garch L1. .9222203 .0111055 83.04 0.000 .9004539 .9439866 _cons 4.47e-06 1.15e-06 3.90 0.000 2.22e-06 6.72e-06 373 374 mgarch dcc — Dynamic conditional correlation multivariate GARCH models nissan toyota L1. -.0056722 .0389348 -0.15 0.884 -.0819829 .0706386 nissan L1. -.0287097 .0309379 -0.93 0.353 -.0893468 .0319275 honda L1. .015498 .0358802 0.43 0.666 -.0548259 .0858218 ARCH_nissan arch L1. .0844244 .0128192 6.59 0.000 .0592992 .1095496 garch L1. .89942 .0151125 59.51 0.000 .8698 .92904 _cons 7.21e-06 1.93e-06 3.74 0.000 3.43e-06 .000011 toyota L1. -.0272415 .0361819 -0.75 0.452 -.0981566 .0436737 nissan L1. .0617491 .0271378 2.28 0.023 .0085599 .1149382 honda L1. -.063507 .0332918 -1.91 0.056 -.1287578 .0017437 ARCH_honda arch L1. .0490134 .0073695 6.65 0.000 .0345693 .0634574 garch L1. .9331125 .0103686 89.99 0.000 .9127905 .9534346 _cons 5.35e-06 1.35e-06 3.95 0.000 2.69e-06 8.00e-06 .6689537 .0168019 39.81 0.000 .6360226 .7018849 .7259623 .0140155 51.80 0.000 .6984925 .7534321 .6335651 .0180409 35.12 0.000 .5982056 .6689247 .0315281 .8704093 .0088382 .0613336 3.57 14.19 0.000 0.000 .0142054 .7501977 .0488507 .9906209 honda corr(toyota, nissan) corr(toyota, honda) corr(nissan, honda) Adjustment lambda1 lambda2 The iteration log has three parts: the dots from the search for initial values, the iteration log from optimizing the log likelihood, and the iteration log from the refining step. A detailed discussion of the optimization methods is in Methods and formulas. The header describes the estimation sample and reports a Wald test against the null hypothesis that all the coefficients on the independent variables in the mean equations are zero. Here the null hypothesis is rejected at the 5% level. The output table first presents results for the mean or variance parameters used to model each dependent variable. Subsequently, the output table presents results for the conditional quasicorrelations. mgarch dcc — Dynamic conditional correlation multivariate GARCH models 375 For example, the conditional quasicorrelation between the standardized residuals for Toyota and Nissan is estimated to be 0.67. Finally, the output table presents results for the adjustment parameters λ1 and λ2 . In the example at hand, the estimates for both λ1 and λ2 are statistically significant. The DCC MGARCH model reduces to the CCC MGARCH model when λ1 = λ2 = 0. The output below shows that a Wald test rejects the null hypothesis that λ1 = λ2 = 0 at all conventional levels. . test _b[Adjustment:lambda1] = _b[Adjustment:lambda2] = 0 ( 1) [Adjustment]lambda1 - [Adjustment]lambda2 = 0 ( 2) [Adjustment]lambda1 = 0 chi2( 2) = 1102.27 Prob > chi2 = 0.0000 These results indicate that the assumption of time-invariant conditional correlations maintained in the CCC MGARCH model is too restrictive for these data. Example 2: Model with covariates that differ by equation We improve the previous example by removing the insignificant parameters from the model. To remove these parameters, we specify the honda equation separately from the toyota and nissan equations: . mgarch dcc (toyota nissan = , noconstant) (honda = L.nissan, noconstant), > arch(1) garch(1) Calculating starting values.... Optimizing log likelihood (setting technique to bhhh) Iteration 0: log likelihood = 16884.502 Iteration 1: log likelihood = 16970.755 Iteration 2: log likelihood = 17140.318 Iteration 3: log likelihood = 17237.807 Iteration 4: log likelihood = 17306.12 Iteration 5: log likelihood = 17342.533 Iteration 6: log likelihood = 17363.511 Iteration 7: log likelihood = 17392.501 Iteration 8: log likelihood = 17407.242 Iteration 9: log likelihood = 17448.702 (switching technique to nr) Iteration 10: log likelihood = 17472.199 Iteration 11: log likelihood = 17475.842 Iteration 12: log likelihood = 17476.345 Iteration 13: log likelihood = 17476.35 Iteration 14: log likelihood = 17476.35 Refining estimates Iteration 0: Iteration 1: log likelihood = log likelihood = 17476.35 17476.35 376 mgarch dcc — Dynamic conditional correlation multivariate GARCH models Dynamic conditional correlation MGARCH model Sample: 1 - 2015 Distribution: Gaussian Log likelihood = 17476.35 Coef. Number of obs Wald chi2(1) Prob > chi2 = = = 2,014 2.21 0.1374 Std. Err. z P>|z| [95% Conf. Interval] ARCH_toyota arch L1. .0608188 .0086675 7.02 0.000 .0438308 .0778067 garch L1. .9219957 .0111066 83.01 0.000 .9002271 .9437643 _cons 4.49e-06 1.14e-06 3.95 0.000 2.27e-06 6.72e-06 ARCH_nissan arch L1. .0876161 .01302 6.73 0.000 .0620974 .1131349 garch L1. .8950964 .0152908 58.54 0.000 .865127 .9250658 _cons 7.69e-06 1.99e-06 3.86 0.000 3.79e-06 .0000116 nissan L1. .019978 .0134488 1.49 0.137 -.0063811 .0463371 ARCH_honda arch L1. .0488799 .0073767 6.63 0.000 .0344218 .063338 garch L1. .9330047 .0103944 89.76 0.000 .912632 .9533774 _cons 5.42e-06 1.36e-06 3.98 0.000 2.75e-06 8.08e-06 .6668433 .0163209 40.86 0.000 .6348548 .6988317 .7258101 .0137072 52.95 0.000 .6989446 .7526757 .6313515 .0175454 35.98 0.000 .5969631 .6657399 .0324493 .8574681 .0074013 .0476274 4.38 18.00 0.000 0.000 .0179429 .7641202 .0469556 .950816 honda corr(toyota, nissan) corr(toyota, honda) corr(nissan, honda) Adjustment lambda1 lambda2 It turns out that the coefficient on L1.nissan in the honda equation is now statistically insignificant. We could further improve the model by removing L1.nissan from the model. There is no mean equation for Toyota or Nissan. In [TS] mgarch dcc postestimation, we discuss prediction from models without covariates. mgarch dcc — Dynamic conditional correlation multivariate GARCH models 377 Example 3: Model with constraints Here we fit a bivariate DCC MGARCH model for the Toyota and Nissan shares. We believe that the shares of these car manufacturers follow the same process, so we impose the constraints that the ARCH coefficients are the same for the two companies and that the GARCH coefficients are also the same. . constraint 1 _b[ARCH_toyota:L.arch] = _b[ARCH_nissan:L.arch] . constraint 2 _b[ARCH_toyota:L.garch] = _b[ARCH_nissan:L.garch] . mgarch dcc (toyota nissan = , noconstant), arch(1) garch(1) constraints(1 2) Calculating starting values.... Optimizing log likelihood (setting technique to bhhh) Iteration 0: log likelihood = 10307.609 Iteration 1: log likelihood = 10656.153 Iteration 2: log likelihood = 10862.137 Iteration 3: log likelihood = 10987.457 Iteration 4: log likelihood = 11062.347 Iteration 5: log likelihood = 11135.207 Iteration 6: log likelihood = 11245.619 Iteration 7: log likelihood = 11253.56 Iteration 8: log likelihood = 11294 Iteration 9: log likelihood = 11296.364 (switching technique to nr) Iteration 10: log likelihood = 11296.76 Iteration 11: log likelihood = 11297.087 Iteration 12: log likelihood = 11297.091 Iteration 13: log likelihood = 11297.091 Refining estimates Iteration 0: log likelihood = 11297.091 Iteration 1: log likelihood = 11297.091 378 mgarch dcc — Dynamic conditional correlation multivariate GARCH models Dynamic conditional correlation MGARCH model Sample: 1 - 2015 Number of obs Distribution: Gaussian Wald chi2(.) Log likelihood = 11297.09 Prob > chi2 ( 1) [ARCH_toyota]L.arch - [ARCH_nissan]L.arch = 0 ( 2) [ARCH_toyota]L.garch - [ARCH_nissan]L.garch = 0 Coef. Std. Err. z P>|z| = = = 2,015 . . [95% Conf. Interval] ARCH_toyota arch L1. .080889 .0103227 7.84 0.000 .060657 .1011211 garch L1. .9060711 .0119107 76.07 0.000 .8827267 .9294156 _cons 4.21e-06 1.10e-06 3.83 0.000 2.05e-06 6.36e-06 ARCH_nissan arch L1. .080889 .0103227 7.84 0.000 .060657 .1011211 garch L1. .9060711 .0119107 76.07 0.000 .8827267 .9294156 _cons 5.92e-06 1.47e-06 4.03 0.000 3.04e-06 8.80e-06 .6646283 .0187793 35.39 0.000 .6278215 .7014351 .0446559 .8686054 .0123017 .0510884 3.63 17.00 0.000 0.000 .020545 .7684739 .0687668 .968737 corr(toyota, nissan) Adjustment lambda1 lambda2 We could test our constraints by fitting the unconstrained model and performing a likelihood-ratio test. The results indicate that the restricted model is preferable. Example 4: Model with a GARCH term In this example, we have data on fictional stock returns for the Acme and Anvil corporations, and we believe that the movement of the two stocks is governed by different processes. We specify one ARCH and one GARCH term for the conditional variance equation for Acme and two ARCH terms for the conditional variance equation for Anvil. In addition, we include the lagged value of the stock return for Apex, the main subsidiary of Anvil corporation, in the variance equation of Anvil. For Acme, we have data on the changes in an index of futures prices of products related to those produced by Acme in afrelated. For Anvil, we have data on the changes in an index of futures prices of inputs used by Anvil in afinputs. mgarch dcc — Dynamic conditional correlation multivariate GARCH models . use http://www.stata-press.com/data/r14/acmeh . mgarch dcc (acme = afrelated, noconstant arch(1) garch(1)) > (anvil = afinputs, arch(1/2) het(L.apex)) Calculating starting values.... Optimizing log likelihood (setting technique to bhhh) Iteration 0: log likelihood (output omitted ) Iteration 9: log likelihood (switching technique to nr) Iteration 10: log likelihood Refining estimates Iteration 0: log likelihood Iteration 1: log likelihood = -13260.522 = -12362.876 = -12362.876 = -12362.876 = -12362.876 Dynamic conditional correlation MGARCH model Sample: 1 - 2500 Distribution: Gaussian Log likelihood = -12362.88 Coef. Std. Err. z Number of obs Wald chi2(2) Prob > chi2 P>|z| = = = 2,499 2596.18 0.0000 [95% Conf. Interval] acme afrelated .950805 .0557082 17.07 0.000 .841619 1.059991 ARCH_acme arch L1. .1063295 .015716 6.77 0.000 .0755266 .1371324 garch L1. .7556294 .0391568 19.30 0.000 .6788836 .8323752 _cons 2.197566 .458343 4.79 0.000 1.29923 3.095901 anvil afinputs _cons -1.015657 .0808653 .0209959 .019445 -48.37 4.16 0.000 0.000 -1.056808 .0427538 -.9745054 .1189767 ARCH_anvil arch L1. L2. .5261675 .2866454 .0281586 .0196504 18.69 14.59 0.000 0.000 .4709777 .2481314 .5813572 .3251595 apex L1. 1.953173 .0594862 32.83 0.000 1.836582 2.069763 _cons -.0062964 .0710842 -0.09 0.929 -.1456188 .1330261 -.5600358 .0326358 -17.16 0.000 -.6240008 -.4960708 .1904321 .7147267 .0154449 .0226204 12.33 31.60 0.000 0.000 .1601607 .6703916 .2207035 .7590618 corr(acme, anvil) Adjustment lambda1 lambda2 379 380 mgarch dcc — Dynamic conditional correlation multivariate GARCH models The results indicate that increases in the futures prices for related products lead to higher returns on the Acme stock, and increased input prices lead to lower returns on the Anvil stock. In the conditional variance equation for Anvil, the coefficient on L1.apex is positive and significant, which indicates that an increase in the return on the Apex stock leads to more variability in the return on the Anvil stock. Stored results mgarch dcc stores the following in e(): Scalars e(N) e(k) e(k aux) e(k extra) e(k eq) e(k dv) e(df m) e(ll) e(chi2) e(p) e(estdf) e(usr) e(tmin) e(tmax) e(N gaps) e(rank) e(ic) e(rc) e(converged) significance 1 if distribution parameter was estimated, 0 otherwise user-provided distribution parameter minimum time in sample maximum time in sample number of gaps rank of e(V) number of iterations return code 1 if converged, 0 otherwise Macros e(cmd) e(model) e(cmdline) e(depvar) e(covariates) e(dv eqs) e(indeps) e(tvar) e(title) e(chi2type) e(vce) e(vcetype) e(tmins) e(tmaxs) e(dist) e(arch) e(garch) e(technique) e(properties) e(estat cmd) e(predict) e(marginsok) e(marginsnotok) e(marginsdefault) e(asbalanced) e(asobserved) mgarch dcc command as typed names of dependent variables list of covariates dependent variables with mean equations independent variables in each equation time variable title in estimation output Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. formatted minimum time formatted maximum time distribution for error term: gaussian or t specified ARCH terms specified GARCH terms maximization technique b V program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins default predict() specification for margins factor variables fvset as asbalanced factor variables fvset as asobserved Matrices e(b) e(Cns) coefficient vector constraints matrix number of observations number of parameters number of auxiliary parameters number of extra estimates added to number of equations in e(b) number of dependent variables model degrees of freedom log likelihood b χ2 mgarch dcc — Dynamic conditional correlation multivariate GARCH models 381 iteration log (up to 20 iterations) gradient vector Hessian matrix variance–covariance matrix of the estimators parameter information, used by predict e(ilog) e(gradient) e(hessian) e(V) e(pinfo) Functions e(sample) marks estimation sample Methods and formulas mgarch dcc estimates the parameters of the DCC MGARCH model by maximum likelihood. The log-likelihood function based on the multivariate normal distribution for observation t is n  o 1/2 lt = −0.5m log(2π) − 0.5log {det (Rt )} − log det Dt − 0.5e t R−1 0t t e −1/2 where e t = Dt t is an m × 1 vector of standardized residuals, t = yt − Cxt . The log-likelihood PT function is t=1 lt . If we assume that νt follow a multivariate t distribution with degrees of freedom (df) greater than 2, then the log-likelihood function for observation t is   df m log {(df − 2)π} − 2 2   n  o df + m e t R−1 0t 1/2 t e log 1 + − 0.5log {det (Rt )} − log det Dt − 2 df − 2 lt = log Γ df + m 2   − log Γ The starting values for the parameters in the mean equations and the initial residuals b t are obtained by least-squares regression. The starting values for the parameters in the variance equations are obtained by a procedure proposed by Gourieroux and Monfort (1997, sec. 6.2.2). The starting values for the quasicorrelation parameters are calculated from the standardized residuals e t . Given the starting values for the mean and variance equations, the starting values for the parameters λ1 and λ2 are obtained from a grid search performed on the log likelihood. The initial optimization step is performed in the unconstrained space. Once the maximum is found, we impose the constraints λ1 ≥ 0, λ2 ≥ 0, and 0 ≤ λ1 + λ2 < 1, and maximize the log likelihood in the constrained space. This step is reported in the iteration log as the refining step. GARCH estimators require initial values that can be plugged in for t−i 0t−i and Ht−j when t − i < 1 and t − j < 1. mgarch dcc substitutes an estimator of the unconditional covariance of the disturbances b = T −1 Σ T X 0 b b t b b t (2) t=1 for t−i 0t−i when t − i < 1 and for Ht−j when t − j < 1, where b b t is the vector of residuals calculated using the estimated parameters. mgarch dcc uses numerical derivatives in maximizing the log-likelihood function. 382 mgarch dcc — Dynamic conditional correlation multivariate GARCH models References Aielli, G. P. 2009. Dynamic Conditional Correlations: On Properties and Estimation. Working paper, Dipartimento di Statistica, University of Florence, Florence, Italy. Engle, R. F. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics 20: 339–350. . 2009. Anticipating Correlations: A New Paradigm for Risk Management. Princeton, NJ: Princeton University Press. Gourieroux, C. S., and A. Monfort. 1997. Time Series and Dynamic Models. Trans. ed. G. M. Gallo. Cambridge: Cambridge University Press. Also see [TS] mgarch dcc postestimation — Postestimation tools for mgarch dcc [TS] mgarch — Multivariate GARCH models [TS] tsset — Declare data to be time-series data [TS] arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators [TS] var — Vector autoregressive models [U] 20 Estimation and postestimation commands Title mgarch dcc postestimation — Postestimation tools for mgarch dcc Postestimation commands Methods and formulas predict Also see margins Remarks and examples Postestimation commands The following standard postestimation commands are available after mgarch dcc: Command Description contrast estat ic estat summarize estat vce estimates forecast lincom contrasts and ANOVA-style joint tests of estimates Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions pairwise comparisons of estimates Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest margins marginsplot nlcom predict predictnl pwcompare test testnl 383 384 mgarch dcc postestimation — Postestimation tools for mgarch dcc predict Description for predict predict creates a new variable containing predictions such as linear predictions and conditional variances, covariances, and correlations. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic { stub* | newvarlist }  if   in   , statistic options  Description Main xb residuals variance correlation linear prediction; the default residuals conditional variances and covariances conditional correlations These statistics are available both in and out of sample; type predict the estimation sample. . . . if e(sample) . . . if wanted only for Description options Options names of equations for which predictions are made equation(eqnames) dynamic(time constant) begin dynamic forecast at specified time Options for predict  Main  xb, the default, calculates the linear predictions of the dependent variables. residuals calculates the residuals. variance predicts the conditional variances and conditional covariances. correlation predicts the conditional correlations.   Options equation(eqnames) specifies the equation for which the predictions are calculated. Use this option to predict a statistic for a particular equation. Equation names, such as equation(income), are used to identify equations. mgarch dcc postestimation — Postestimation tools for mgarch dcc 385 One equation name may be specified when predicting the dependent variable, the residuals, or the conditional variance. For example, specifying equation(income) causes predict to predict income, and specifying variance equation(income) causes predict to predict the conditional variance of income. Two equations may be specified when predicting a conditional variance or covariance. For example, specifying equation(income, consumption) variance causes predict to predict the conditional covariance of income and consumption. dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified time constant must be in the scale of the time variable specified in tsset, and the time constant must be inside a sample for which observations on the dependent variables are available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of 2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains exogenous variables, they must be present for the whole predicted sample. dynamic() may not be specified with residuals. margins Description for margins margins estimates margins of response for linear predictions and conditional variances, covariances, and correlations. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . )  predict(statistic . . . ) . . . statistic Description default xb variance correlation residuals linear predictions for each equation linear prediction for a specified equation conditional variances and covariances conditional correlations not allowed with margins   options xb defaults to the first equation. Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins.  386 mgarch dcc postestimation — Postestimation tools for mgarch dcc Remarks and examples We assume that you have already read [TS] mgarch dcc. In this entry, we use predict after mgarch dcc to make in-sample and out-of-sample forecasts. Example 1: Dynamic forecasts In this example, we obtain dynamic forecasts for the Toyota, Nissan, and Honda stock returns modeled in example 2 of [TS] mgarch dcc. In the output below, we reestimate the parameters of the model, use tsappend (see [TS] tsappend) to extend the data, and use predict to obtain in-sample one-step-ahead forecasts and dynamic forecasts of the conditional variances of the returns. We graph the forecasts below. 0 .001 .002 .003 . use http://www.stata-press.com/data/r14/stocks (Data from Yahoo! Finance) . quietly mgarch dcc (toyota nissan = , noconstant) > (honda = L.nissan, noconstant), arch(1) garch(1) . tsappend, add(50) . predict H*, variance dynamic(2016) 01jan2009 01jul2009 01jan2010 Date 01jul2010 01jan2011 Variance prediction (toyota,toyota), dynamic(2016) Variance prediction (nissan,nissan), dynamic(2016) Variance prediction (honda,honda), dynamic(2016) Recent in-sample one-step-ahead forecasts are plotted to the left of the vertical line in the above graph, and the dynamic out-of-sample forecasts appear to the right of the vertical line. The graph shows the tail end of the huge increase in return volatility that took place in 2008 and 2009. It also shows that the dynamic forecasts quickly converge. Methods and formulas All one-step predictions are obtained by substituting the parameter estimates into the model. The b is the initial value for the ARCH and estimated unconditional variance matrix of the disturbances, Σ, b using the prediction sample, the parameter GARCH terms. The postestimation routines recompute Σ estimates stored in e(b), and (2) in Methods and formulas of [TS] mgarch dcc. For observations in which the residuals are missing, the estimated unconditional variance matrix of the disturbances is used in place of the outer product of the residuals. mgarch dcc postestimation — Postestimation tools for mgarch dcc 387 Dynamic predictions of the dependent variables use previously predicted values beginning in the period specified by dynamic(). b for the outer product of the Dynamic variance predictions are implemented by substituting Σ residuals beginning in the period specified in dynamic(). Also see [TS] mgarch dcc — Dynamic conditional correlation multivariate GARCH models [U] 20 Estimation and postestimation commands Title mgarch dvech — Diagonal vech multivariate GARCH models Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description mgarch dvech estimates the parameters of diagonal vech (DVECH) multivariate generalized autoregressive conditionally heteroskedastic (MGARCH) models in which each element of the conditional correlation matrix is parameterized as a linear function of its own past and past shocks. DVECH MGARCH models are less parsimonious than the conditional correlation models discussed in [TS] mgarch ccc, [TS] mgarch dcc, and [TS] mgarch vcc because the number of parameters in DVECH MGARCH models increases more rapidly with the number of series modeled. Quick start Fit diagonal vech multivariate GARCH with first- and second-order ARCH components for dependent variables y1 and y2 using tsset data mgarch dvech (y1 y2), arch(1 2) Add regressors x1 and x2 and first-order GARCH component mgarch dvech (y1 y2 = x1 x2), arch(1 2) garch(1) Menu Statistics > Multivariate time series > Multivariate GARCH 388 mgarch dvech — Diagonal vech multivariate GARCH models 389 Syntax mgarch dvech eq  eq . . . eq   if   in   , options  where each eq has the form    (depvars = indepvars , noconstant ) Description options Model arch(numlist) garch(numlist)   distribution(dist # ) constraints(numlist) ARCH terms GARCH terms use dist distribution for errors (may be gaussian, normal, or t; default is gaussian) apply linear constraints SE/Robust vce(vcetype) vcetype may be oim or robust Reporting level(#) nocnsreport display options set confidence level; default is level(95) do not display constraints control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling Maximization control the maximization process; seldom used maximize options from(matname) initial values for the coefficients; seldom used svtechnique(algorithm spec)starting-value maximization algorithm sviterate(#) number of starting-value iterations; default is sviterate(25) coeflegend display legend instead of statistics You must tsset your data before using mgarch dvech; see [TS] tsset. indepvars may contain factor variables; see [U] 11.4.3 Factor variables. depvars and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. Options   Model noconstant suppresses the constant term(s). arch(numlist) specifies the ARCH terms in the model. By default, no ARCH terms are specified. garch(numlist) specifies the GARCH terms in the model. By default, no GARCH terms are specified.   distribution(dist # ) specifies the assumed distribution for the errors. dist may be gaussian, normal, or t. 390 mgarch dvech — Diagonal vech multivariate GARCH models gaussian and normal are synonyms; each causes mgarch dvech to assume that the errors come from a multivariate normal distribution. # cannot be specified with either of them. t causes mgarch dvech to assume that the errors follow a multivariate Student t distribution, and the degree-of-freedom parameter is estimated along with the other parameters of the model. If distribution(t #) is specified, then mgarch dvech uses a multivariate Student t distribution with # degrees of freedom. # must be greater than 2. constraints(numlist) specifies linear constraints to apply to the parameter estimates.   SE/Robust vce(vcetype) specifies the estimator for the variance–covariance matrix of the estimator. vce(oim), the default, specifies to use the observed information matrix (OIM) estimator. vce(robust) specifies to use the Huber/White/sandwich estimator.   Reporting level(#); see [R] estimation options. nocnsreport; see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and from(matname); see [R] maximize for all options except from(), and see below for information on from(). These options are seldom used. from(matname) specifies initial values for the coefficients. from(b0) causes mgarch dvech to begin the optimization algorithm with the values in b0. b0 must be a row vector, and the number of columns must equal the number of parameters in the model. svtechnique(algorithm spec) and sviterate(#) specify options for the starting-value search process. svtechnique(algorithm spec) specifies the algorithm used to search for initial values. The syntax for algorithm spec is the same as for the technique() option; see [R] maximize. svtechnique(bhhh 5 nr 16000) is the default. This option may not be specified with from(). sviterate(#) specifies the maximum number of iterations that the search algorithm may perform. The default is sviterate(25). This option may not be specified with from(). The following option is available with mgarch dvech but is not shown in the dialog box: coeflegend; see [R] estimation options. Remarks and examples We assume that you have already read [TS] mgarch, which provides an introduction to MGARCH models and the methods implemented in mgarch dvech. mgarch dvech — Diagonal vech multivariate GARCH models 391 MGARCH models are dynamic multivariate regression models in which the conditional variances and covariances of the errors follow an autoregressive-moving-average structure. The DVECH MGARCH model parameterizes each element of the current conditional covariance matrix as a linear function of its own past and past shocks. As discussed in [TS] mgarch, MGARCH models differ in the parsimony and flexibility of their specifications for a time-varying conditional covariance matrix of the disturbances, denoted by Ht . In a DVECH MGARCH model with one ARCH term and one GARCH term, the (i, j)th element of conditional covariance matrix is modeled by hij,t = sij + aij i,t−1 j,t−1 + bij hij,t−1 where sij , aij , and bij are parameters and t−1 is the vector of errors from the previous period. This expression shows the linear form in which each element of the current conditional covariance matrix is a function of its own past and past shocks. Technical note The general vech MGARCH model developed by Bollerslev, Engle, and Wooldridge (1988) can be written as yt = Cxt + t t = 1/2 Ht νt p X ht = s + (1) (2) Ai vech(t−i 0t−i ) + i=1 q X Bj ht−j (3) j=1 where yt is an m × 1 vector of dependent variables; C is an m × k matrix of parameters; xt is a k × 1 vector of independent variables, which may contain lags of yt ; 1/2 Ht is the Cholesky factor of the time-varying conditional covariance matrix Ht ; νt is an m × 1 vector of independent and identically distributed innovations; ht = vech(Ht ); the vech() function stacks the lower diagonal elements of a symmetric matrix into a column vector, for example,  vech 1 2 2 3  = (1, 2, 3)0 s is an m(m + 1)/2 × 1 vector of parameters; each Ai is an {m(m + 1)/2} × {m(m + 1)/2} matrix of parameters; and each Bj is an {m(m + 1)/2} × {m(m + 1)/2} matrix of parameters. Bollerslev, Engle, and Wooldridge (1988) argued that the general-vech MGARCH model in (1)–(3) was too flexible to fit to data, so they proposed restricting the matrices Ai and Bj to be diagonal matrices. It is for this restriction that the model is known as a diagonal vech MGARCH model. The diagonal vech MGARCH model can also be expressed by replacing (3) with 392 mgarch dvech — Diagonal vech multivariate GARCH models Ht = S + p X Ai t−i 0t−i + i=1 q X Bj Ht−j (30 ) j=1 where S is an m × m symmetric parameter matrix; each Ai is an m × m symmetric parameter matrix; is the elementwise or Hadamard product; and each Bj is an m × m symmetric parameter matrix. In (30 ), A and B are symmetric but not diagonal matrices because we used the Hadamard product. The matrices are diagonal in the vech representation of (3) but not in the Hadamard-product representation of (30 ). The Hadamard-product representation in (30 ) clarifies that each element in Ht depends on its past values and the past values of the corresponding ARCH terms. Although this representation does not allow cross-covariance effects, it is still quite flexible. The rapid rate at which the number of parameters grows with m, p, or q is one aspect of the model’s flexibility. Some examples Example 1: Model with common covariates We have data on a secondary market rate of a six-month U.S. Treasury bill, tbill, and on Moody’s seasoned AAA corporate bond yield, bond. We model the first-differences of tbill and the first-differences of bond as a VAR(1) with an ARCH(1) term. . use http://www.stata-press.com/data/r14/irates4 (St. Louis Fed (FRED) financial data) . mgarch dvech (D.bond D.tbill = Getting starting values (setting technique to bhhh) Iteration 0: log likelihood = Iteration 1: log likelihood = (output omitted ) Iteration 6: log likelihood = Iteration 7: log likelihood = (switching technique to nr) Iteration 8: log likelihood = Iteration 9: log likelihood = Iteration 10: log likelihood = Estimating parameters (setting technique to bhhh) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = (switching technique to nr) Iteration 5: log likelihood = Iteration 6: log likelihood = Iteration 7: log likelihood = Iteration 8: log likelihood = LD.bond LD.tbill), arch(1) 3569.2723 3708.4561 4183.8853 4184.2424 4184.4141 4184.5973 4184.5975 4184.5975 4200.6303 4208.5342 4212.426 4215.2373 4217.0676 4221.5706 4221.6576 4221.6577 mgarch dvech — Diagonal vech multivariate GARCH models Diagonal vech MGARCH model Sample: 3 - 2456 Distribution: Gaussian Log likelihood = 4221.658 Coef. Number of obs Wald chi2(4) Prob > chi2 Std. Err. z = = = 393 2,454 1183.52 0.0000 P>|z| [95% Conf. Interval] D.bond bond LD. .2967674 .0247149 12.01 0.000 .2483271 .3452077 tbill LD. .0947949 .0098683 9.61 0.000 .0754533 .1141364 _cons .0003991 .00143 0.28 0.780 -.0024036 .0032019 bond LD. .0108373 .0301501 0.36 0.719 -.0482558 .0699304 tbill LD. .4344747 .0176497 24.62 0.000 .3998819 .4690675 _cons .0011611 .0021033 0.55 0.581 -.0029612 .0052835 1_1 2_1 2_2 .004894 .0040986 .0115149 .0002006 .0002396 .0005227 24.40 17.10 22.03 0.000 0.000 0.000 .0045008 .0036289 .0104904 .0052871 .0045683 .0125395 1_1 2_1 2_2 .4514942 .2518879 .843368 .0456835 .036736 .0608055 9.88 6.86 13.87 0.000 0.000 0.000 .3619562 .1798866 .7241914 .5410323 .3238893 .9625446 D.tbill Sigma0 L.ARCH The output has three parts: an iteration log, a header, and an output table. The iteration log has two parts: the first part reports the iterations from the process of searching for starting values, and the second part reports the iterations from maximizing the log-likelihood function. The header describes the estimation sample and reports a Wald test against the null hypothesis that all the coefficients on the independent variables in each equation are zero. Here the null hypothesis is rejected at all conventional levels. The output table reports point estimates, standard errors, tests against zero, and confidence intervals for the estimated coefficients, the estimated elements of S, and any estimated elements of A or B. Here the output indicates that in the equation for D.tbill, neither the coefficient on LD.bond nor the constant are statistically significant. The elements of S are reported in the Sigma0 equation. The estimate of S[1, 1] is 0.005, and the estimate of S[2, 1] is 0.004. The ARCH term results are reported in the L.ARCH equation. In the L.ARCH equation, 1 1 is the coefficient on the ARCH term for the conditional variance of the first dependent variable, 2 1 is the coefficient on the ARCH term for the conditional covariance between the first and second dependent variables, and 2 2 is the coefficient on the ARCH term for the conditional variance of the second dependent variable. 394 mgarch dvech — Diagonal vech multivariate GARCH models Example 2: Model with covariates that differ by equation We improve the previous example by removing the insignificant parameters from the model: . mgarch dvech (D.bond = LD.bond LD.tbill, noconstant) > (D.tbill = LD.tbill, noconstant), arch(1) Getting starting values (setting technique to bhhh) Iteration 0: log likelihood Iteration 1: log likelihood Iteration 2: log likelihood Iteration 3: log likelihood Iteration 4: log likelihood Iteration 5: log likelihood Iteration 6: log likelihood Iteration 7: log likelihood (switching technique to nr) Iteration 8: log likelihood Iteration 9: log likelihood Iteration 10: log likelihood Estimating parameters (setting technique to bhhh) Iteration 0: log likelihood Iteration 1: log likelihood Iteration 2: log likelihood Iteration 3: log likelihood Iteration 4: log likelihood (switching technique to nr) Iteration 5: log likelihood Iteration 6: log likelihood Iteration 7: log likelihood Iteration 8: log likelihood = = = = = = = = 3566.8824 3701.6181 3952.8048 4076.5164 4166.6842 4180.2998 4182.4545 4182.9563 = = = 4183.0293 4183.1112 4183.1113 = = = = = 4183.1113 4202.0304 4210.2929 4215.7798 4217.7755 = = = = 4219.0078 4221.4197 4221.433 4221.433 Diagonal vech MGARCH model Sample: 3 - 2456 Distribution: Gaussian Log likelihood = 4221.433 Coef. Number of obs Wald chi2(3) Prob > chi2 Std. Err. z = = = 2,454 1197.76 0.0000 P>|z| [95% Conf. Interval] D.bond bond LD. .2941649 .0234734 12.53 0.000 .2481579 .3401718 tbill LD. .0953158 .0098077 9.72 0.000 .076093 .1145386 D.tbill tbill LD. .4385945 .0136672 32.09 0.000 .4118072 .4653817 1_1 2_1 2_2 .0048922 .0040949 .0115043 .0002005 .0002394 .0005184 24.40 17.10 22.19 0.000 0.000 0.000 .0044993 .0036256 .0104883 .0052851 .0045641 .0125203 1_1 2_1 2_2 .4519233 .2515474 .8437212 .045671 .0366701 .0600839 9.90 6.86 14.04 0.000 0.000 0.000 .3624099 .1796752 .7259589 .5414368 .3234195 .9614836 Sigma0 L.ARCH mgarch dvech — Diagonal vech multivariate GARCH models 395 We specified each equation separately to remove the insignificant parameters. All the parameter estimates are statistically significant. Example 3: Model with constraints Here we analyze some fictional weekly data on the percentages of bad widgets found in the factories of Acme Inc. and Anvil Inc. We model the levels as a first-order autoregressive process. We believe that the adaptive management style in these companies causes the variances to follow a diagonal vech MGARCH process with one ARCH term and one GARCH term. Furthermore, these close competitors follow essentially the same process, so we impose the constraints that the ARCH coefficients are the same for the two companies and that the GARCH coefficients are also the same. Imposing these constraints yields . use http://www.stata-press.com/data/r14/acme . constraint 1 [L.ARCH]1_1 = [L.ARCH]2_2 . constraint 2 [L.GARCH]1_1 = [L.GARCH]2_2 . mgarch dvech (acme = L.acme) (anvil = L.anvil), arch(1) garch(1) > constraints(1 2) Getting starting values (setting technique to bhhh) Iteration 0: log likelihood = -6087.0665 (not concave) Iteration 1: log likelihood = -6022.2046 Iteration 2: log likelihood = -5986.6152 Iteration 3: log likelihood = -5976.5739 Iteration 4: log likelihood = -5974.4342 Iteration 5: log likelihood = -5974.4046 Iteration 6: log likelihood = -5974.4036 Iteration 7: log likelihood = -5974.4035 Estimating parameters (setting technique to bhhh) Iteration 0: log likelihood = -5974.4035 Iteration 1: log likelihood = -5973.812 Iteration 2: log likelihood = -5973.8004 Iteration 3: log likelihood = -5973.7999 Iteration 4: log likelihood = -5973.7999 396 mgarch dvech — Diagonal vech multivariate GARCH models Diagonal vech MGARCH model Sample: 1969w35 - 1998w25 Distribution: Gaussian Log likelihood = -5973.8 ( 1) [L.ARCH]1_1 - [L.ARCH]2_2 = 0 ( 2) [L.GARCH]1_1 - [L.GARCH]2_2 = 0 Coef. Std. Err. Number of obs Wald chi2(2) Prob > chi2 z = = = 1,499 272.47 0.0000 P>|z| [95% Conf. Interval] acme acme L1. .3365278 .0255134 13.19 0.000 .2865225 .3865331 _cons 1.124611 .060085 18.72 0.000 1.006847 1.242376 anvil L1. .3151955 .0263287 11.97 0.000 .2635922 .3667988 _cons 1.215786 .0642052 18.94 0.000 1.089947 1.341626 1_1 2_1 2_2 1.889237 .4599576 2.063113 .2168733 .1139843 .2454633 8.71 4.04 8.40 0.000 0.000 0.000 1.464173 .2365525 1.582014 2.314301 .6833626 2.544213 1_1 2_1 2_2 .2813443 .181877 .2813443 .0299124 .0335393 .0299124 9.41 5.42 9.41 0.000 0.000 0.000 .222717 .1161412 .222717 .3399716 .2476128 .3399716 1_1 2_1 2_2 .1487581 .085404 .1487581 .0697531 .1446524 .0697531 2.13 0.59 2.13 0.033 0.555 0.033 .0120445 -.1981094 .0120445 .2854716 .3689175 .2854716 anvil Sigma0 L.ARCH L.GARCH We could test our constraints by fitting the unconstrained model and performing either a Wald or a likelihood-ratio test. The results indicate that we might further restrict the time-invariant components of the conditional variances to be the same across companies. Example 4: Model with a GARCH term Some models of financial data include no covariates or constant terms. For example, in modeling fictional data on the stock returns of Acme Inc. and Anvil Inc., we found it best not to include any covariates or constant terms. We include two ARCH terms and one GARCH term to model the conditional variances. mgarch dvech — Diagonal vech multivariate GARCH models . use http://www.stata-press.com/data/r14/aacmer . mgarch dvech (acme anvil = , noconstant), arch(1/2) garch(1) Getting starting values (setting technique to bhhh) Iteration 0: log likelihood = -18417.243 (not concave) Iteration 1: log likelihood = -18215.005 Iteration 2: log likelihood = -18199.691 Iteration 3: log likelihood = -18136.699 Iteration 4: log likelihood = -18084.256 Iteration 5: log likelihood = -17993.662 Iteration 6: log likelihood = -17731.1 Iteration 7: log likelihood = -17629.505 (switching technique to nr) Iteration 8: log likelihood = -17548.172 Iteration 9: log likelihood = -17544.987 Iteration 10: log likelihood = -17544.937 Iteration 11: log likelihood = -17544.937 Estimating parameters (setting technique to bhhh) Iteration 0: log likelihood = -17544.937 Iteration 1: log likelihood = -17544.937 Diagonal vech MGARCH model Sample: 1 - 5000 Number of obs Distribution: Gaussian Wald chi2(.) Log likelihood = -17544.94 Prob > chi2 Coef. Std. Err. z = = = 397 5,000 . . P>|z| [95% Conf. Interval] Sigma0 1_1 2_1 2_2 1.026283 .4300997 1.019753 .0823348 .0590294 .0837146 12.46 7.29 12.18 0.000 0.000 0.000 .8649096 .3144042 .8556751 1.187656 .5457952 1.18383 1_1 2_1 2_2 .2878739 .1036685 .2034196 .02157 .0161446 .019855 13.35 6.42 10.25 0.000 0.000 0.000 .2455975 .0720256 .1645044 .3301504 .1353114 .2423347 1_1 2_1 2_2 .1837825 .0884425 .2025718 .0274555 .02208 .0272639 6.69 4.01 7.43 0.000 0.000 0.000 .1299706 .0451665 .1491355 .2375943 .1317185 .256008 1_1 2_1 2_2 .0782467 .2888104 .201618 .053944 .0818303 .0470584 1.45 3.53 4.28 0.147 0.000 0.000 -.0274816 .1284261 .1093853 .183975 .4491948 .2938508 L.ARCH L2.ARCH L.GARCH The model test is omitted from the output, because there are no covariates in the model. The univariate tests indicate that the included parameters fit the data well. In [TS] mgarch dvech postestimation, we discuss prediction from models without covariates. 398 mgarch dvech — Diagonal vech multivariate GARCH models Stored results mgarch dvech stores the following in e(): Scalars e(N) e(k) e(k extra) e(k eq) e(k dv) e(df m) e(ll) e(chi2) e(p) e(estdf) e(usr) e(tmin) e(tmax) e(N gaps) e(rank) e(ic) e(rc) e(converged) Macros e(cmd) e(model) e(cmdline) e(depvar) e(covariates) e(dv eqs) e(indeps) e(tvar) e(title) e(chi2type) e(vce) e(vcetype) e(tmins) e(tmaxs) e(dist) e(arch) e(garch) e(svtechnique) e(technique) e(properties) e(estat cmd) e(predict) e(marginsok) e(marginsnotok) e(marginsdefault) e(asbalanced) e(asobserved) number of observations number of parameters number of extra estimates added to number of equations in e(b) number of dependent variables model degrees of freedom log likelihood b χ2 significance 1 if distribution parameter was estimated, 0 otherwise user-provided distribution parameter minimum time in sample maximum time in sample number of gaps rank of e(V) number of iterations return code 1 if converged, 0 otherwise mgarch dvech command as typed names of dependent variables list of covariates dependent variables with mean equations independent variables in each equation time variable title in estimation output Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. formatted minimum time formatted maximum time distribution for error term: gaussian or t specified ARCH terms specified GARCH terms maximization technique(s) for starting values maximization technique b V program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins default predict() specification for margins factor variables fvset as asbalanced factor variables fvset as asobserved mgarch dvech — Diagonal vech multivariate GARCH models Matrices e(b) e(Cns) e(ilog) e(gradient) e(hessian) e(A) e(B) e(S) e(Sigma) e(V) e(pinfo) Functions e(sample) 399 coefficient vector constraints matrix iteration log (up to 20 iterations) gradient vector Hessian matrix estimates of A matrices estimates of B matrices estimates of Sigma0 matrix Sigma hat variance–covariance matrix of the estimators parameter information, used by predict marks estimation sample Methods and formulas Recall that the diagonal vech MGARCH model can be written as yt = Cxt + t 1/2 t = Ht νt Ht = S + p X Ai t−i 0t−i + i=1 q X Bj Ht−j j=1 where yt is an m × 1 vector of dependent variables; C is an m × k matrix of parameters; xt is a k × 1 vector of independent variables, which may contain lags of yt ; 1/2 Ht is the Cholesky factor of the time-varying conditional covariance matrix Ht ; νt is an m × 1 vector of normal, independent, and identically distributed innovations; S is an m × m symmetric matrix of parameters; each Ai is an m × m symmetric matrix of parameters; is the elementwise or Hadamard product; and each Bj is an m × m symmetric matrix of parameters. mgarch dvech estimates the parameters by maximum likelihood. The log-likelihood function based on the multivariate normal distribution for observation t is 0 lt = −0.5m log(2π) − 0.5log {det (Ht )} − 0.5t H−1 t t where t = yt − Cxt . The log-likelihood function is PT t=1 lt . 400 mgarch dvech — Diagonal vech multivariate GARCH models If we assume that νt follow a multivariate t distribution with degrees of freedom (df) greater than 2, then the log-likelihood function for observation t is  df + m 2    m − log {(df − 2)π} 2   0 df + m t H−1 t t − 0.5log {det (Ht )} − log 1 + 2 df − 2 lt = log Γ − log Γ df 2 mgarch dvech ensures that Ht is positive definite for each t. By default, mgarch dvech performs an iterative search for starting values. mgarch dvech estimates starting values for C by seemingly unrelated regression, uses these estimates to compute residuals b t , plugs b t into the above log-likelihood function, and optimizes this log-likelihood function over the parameters in Ht . This starting-value method plugs in consistent estimates of the parameters for the conditional means of the dependent variables and then iteratively searches for the variance parameters that maximize the log-likelihood function. Lütkepohl (2005, chap. 16) discusses this method as an estimator for the variance parameters. GARCH estimators require initial values that can be plugged in for t−i 0t−i and Ht−j when t − i < 1 and t − j < 1. mgarch dvech substitutes an estimator of the unconditional covariance of the disturbances, b = T −1 Σ T X 0 b b t b b t (4) t=1 for t−i 0t−i when t − i < 1 and for Ht−j when t − j < 1, where b b t is the vector of residuals calculated using the estimated parameters. mgarch dvech uses analytic first and second derivatives in maximizing the log-likelihood function based on the multivariate normal distribution. mgarch dvech uses numerical derivatives in maximizing the log-likelihood function based on the multivariate t distribution. References Bollerslev, T., R. F. Engle, and J. M. Wooldridge. 1988. A capital asset pricing model with time-varying covariances. Journal of Political Economy 96: 116–131. Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Also see [TS] mgarch dvech postestimation — Postestimation tools for mgarch dvech [TS] mgarch — Multivariate GARCH models [TS] tsset — Declare data to be time-series data [TS] arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators [TS] var — Vector autoregressive models [U] 20 Estimation and postestimation commands Title mgarch dvech postestimation — Postestimation tools for mgarch dvech Postestimation commands Methods and formulas predict Also see margins Remarks and examples Postestimation commands The following standard postestimation commands are available after mgarch dvech: Command Description contrast estat ic estat summarize estat vce estimates forecast lincom contrasts and ANOVA-style joint tests of estimates Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions pairwise comparisons of estimates Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest margins marginsplot nlcom predict predictnl pwcompare test testnl 401 402 mgarch dvech postestimation — Postestimation tools for mgarch dvech predict Description for predict predict creates a new variable containing predictions such as linear predictions and conditional variances and covariances. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic { stub* | newvarlist }  if   in   , statistic options  Description Main xb residuals variance linear prediction; the default residuals conditional variances and covariances These statistics are available both in and out of sample; type predict the estimation sample. . . . if e(sample) . . . if wanted only for Description options Options equation(eqnames) names of equations for which predictions are made dynamic(time constant) begin dynamic forecast at specified time Options for predict   Main xb, the default, calculates the linear predictions of the dependent variables. residuals calculates the residuals. variance predicts the conditional variances and conditional covariances.   Options equation(eqnames) specifies the equation for which the predictions are calculated. Use this option to predict a statistic for a particular equation. Equation names, such as equation(income), are used to identify equations. One equation name may be specified when predicting the dependent variable, the residuals, or the conditional variance. For example, specifying equation(income) causes predict to predict income, and specifying variance equation(income) causes predict to predict the conditional variance of income. mgarch dvech postestimation — Postestimation tools for mgarch dvech 403 Two equations may be specified when predicting a conditional variance or covariance. For example, specifying equation(income, consumption) variance causes predict to predict the conditional covariance of income and consumption. dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified time constant must be in the scale of the time variable specified in tsset, and the time constant must be inside a sample for which observations on the dependent variables are available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of 2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains exogenous variables, they must be present for the whole predicted sample. dynamic() may not be specified with residuals. margins Description for margins margins estimates margins of response for linear predictions and conditional variances and covariances. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . )  predict(statistic . . . ) . . . statistic Description default xb variance correlation residuals linear predictions for each equation linear prediction for a specified equation conditional variances and covariances conditional correlations not allowed with margins   options xb defaults to the first equation. Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins.  404 mgarch dvech postestimation — Postestimation tools for mgarch dvech Remarks and examples We assume that you have already read [TS] mgarch dvech. In this entry, we illustrate some of the features of predict after using mgarch dvech to estimate the parameters of diagonal vech MGARCH models. Example 1: Dynamic forecasts In example 3 of [TS] mgarch dvech, we obtained dynamic predictions for the Acme Inc. and Anvil Inc. fictional widget data. . . . . > use http://www.stata-press.com/data/r14/acme constraint 1 [L.ARCH]1_1 = [L.ARCH]2_2 constraint 2 [L.GARCH]1_1 = [L.GARCH]2_2 mgarch dvech (acme = L.acme) (anvil = L.anvil), arch(1) garch(1) constraints(1 2) (output omitted ) Now we use tsappend (see [TS] tsappend) to extend the data, use predict to obtain the dynamic predictions, and graph the predictions. 2 4 6 8 10 . tsappend, add(12) . predict H*, variance dynamic(tw(1998w26)) . tsline H_acme_acme H_anvil_anvil if t>=tw(1995w25), legend(rows(2)) 1995w26 1996w27 1997w26 weekly date 1998w26 variance prediction (acme, acme) dynamic(tw(1998w26)) variance prediction (anvil, anvil) dynamic(tw(1998w26)) The graph shows that the in-sample predictions are similar for the conditional variances of Acme Inc. and Anvil Inc. and that the dynamic forecasts converge to similar levels. It also shows that the ARCH and GARCH parameters cause substantial time-varying volatility. The predicted conditional variance of acme ranges from lows of just over 2 to highs above 10. mgarch dvech postestimation — Postestimation tools for mgarch dvech 405 Example 2: Predicting in-sample conditional variances In this example, we obtain the in-sample predicted conditional variances of the returns for the fictional Acme Inc., which we modeled in example 4 of [TS] mgarch dvech. First, we reestimate the parameters of the model. use http://www.stata-press.com/data/r14/aacmer, clear . mgarch dvech (acme anvil = , noconstant), arch(1/2) garch(1) Getting starting values (setting technique to bhhh) Iteration 0: log likelihood = -18417.243 (not concave) Iteration 1: log likelihood = -18215.005 Iteration 2: log likelihood = -18199.691 Iteration 3: log likelihood = -18136.699 Iteration 4: log likelihood = -18084.256 Iteration 5: log likelihood = -17993.662 Iteration 6: log likelihood = -17731.1 Iteration 7: log likelihood = -17629.505 (switching technique to nr) Iteration 8: log likelihood = -17548.172 Iteration 9: log likelihood = -17544.987 Iteration 10: log likelihood = -17544.937 Iteration 11: log likelihood = -17544.937 Estimating parameters (setting technique to bhhh) Iteration 0: log likelihood = -17544.937 Iteration 1: log likelihood = -17544.937 Diagonal vech MGARCH model Sample: 1 - 5000 Number of obs Distribution: Gaussian Wald chi2(.) Log likelihood = -17544.94 Prob > chi2 Coef. Std. Err. z = = = 5,000 . . P>|z| [95% Conf. Interval] Sigma0 1_1 2_1 2_2 1.026283 .4300997 1.019753 .0823348 .0590294 .0837146 12.46 7.29 12.18 0.000 0.000 0.000 .8649096 .3144042 .8556751 1.187656 .5457952 1.18383 1_1 2_1 2_2 .2878739 .1036685 .2034196 .02157 .0161446 .019855 13.35 6.42 10.25 0.000 0.000 0.000 .2455975 .0720256 .1645044 .3301504 .1353114 .2423347 1_1 2_1 2_2 .1837825 .0884425 .2025718 .0274555 .02208 .0272639 6.69 4.01 7.43 0.000 0.000 0.000 .1299706 .0451665 .1491355 .2375943 .1317185 .256008 1_1 2_1 2_2 .0782467 .2888104 .201618 .053944 .0818303 .0470584 1.45 3.53 4.28 0.147 0.000 0.000 -.0274816 .1284261 .1093853 .183975 .4491948 .2938508 L.ARCH L2.ARCH L.GARCH 406 mgarch dvech postestimation — Postestimation tools for mgarch dvech Now we use predict to obtain the in-sample conditional variances of acme and use tsline (see [TS] tsline) to graph the results. 0 variance prediction (acme, acme) 10 20 30 40 50 . predict h_acme, variance eq(acme, acme) . tsline h_acme 0 1000 2000 3000 4000 5000 t The graph shows that the predicted conditional variances vary substantially over time, as the parameter estimates indicated. Because there are no covariates in the model for acme, specifying xb puts a prediction of 0 in each observation, and specifying residuals puts the value of the dependent variable into the prediction. Methods and formulas All one-step predictions are obtained by substituting the parameter estimates into the model. The b is the initial value for the ARCH and estimated unconditional variance matrix of the disturbances, Σ, b using the prediction sample, the parameter GARCH terms. The postestimation routines recompute Σ estimates stored in e(b), and (4) in Methods and formulas of [TS] mgarch dvech. For observations in which the residuals are missing, the estimated unconditional variance matrix of the disturbances is used in place of the outer product of the residuals. Dynamic predictions of the dependent variables use previously predicted values beginning in the period specified by dynamic(). b for the outer product of the Dynamic variance predictions are implemented by substituting Σ residuals beginning in the period specified by dynamic(). Also see [TS] mgarch dvech — Diagonal vech multivariate GARCH models [U] 20 Estimation and postestimation commands Title mgarch vcc — Varying conditional correlation multivariate GARCH models Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description mgarch vcc estimates the parameters of varying conditional correlation (VCC) multivariate generalized autoregressive conditionally heteroskedastic (MGARCH) models in which the conditional variances are modeled as univariate generalized autoregressive conditionally heteroskedastic (GARCH) models and the conditional covariances are modeled as nonlinear functions of the conditional variances. The conditional correlation parameters that weight the nonlinear combinations of the conditional variance follow the GARCH-like process specified in Tse and Tsui (2002). The VCC MGARCH model is about as flexible as the closely related dynamic conditional correlation MGARCH model (see [TS] mgarch dcc), more flexible than the conditional correlation MGARCH model (see [TS] mgarch ccc), and more parsimonious than the diagonal vech model (see [TS] mgarch dvech). Quick start Fit varying conditional correlation multivariate GARCH with first- and second-order ARCH components for dependent variables y1 and y2 using tsset data mgarch vcc (y1 y2), arch(1 2) Add regressors x1 and x2 and first-order GARCH component mgarch vcc (y1 y2 = x1 x2), arch(1 2) garch(1) Add z1 to the model for the conditional heteroskedasticity mgarch vcc (y1 y2 = x1 x2), arch(1 2) garch(1) het(z1) Menu Statistics > Multivariate time series > Multivariate GARCH 407 408 mgarch vcc — Varying conditional correlation multivariate GARCH models Syntax mgarch vcc eq  eq . . . eq   if   in   , options  where each eq has the form    (depvars = indepvars , eqoptions ) options Description Model arch(numlist) garch(numlist) het(varlist)   distribution(dist # ) constraints(numlist) ARCH terms for all equations GARCH terms for all equations include varlist in the specification of the conditional variance for all equations use dist distribution for errors [may be gaussian (synonym normal) or t; default is gaussian] apply linear constraints SE/Robust vce(vcetype) vcetype may be oim or robust Reporting level(#) nocnsreport display options set confidence level; default is level(95) do not display constraints control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling Maximization maximize options from(matname) control the maximization process; seldom used initial values for the coefficients; seldom used coeflegend display legend instead of statistics eqoptions Description Model noconstant arch(numlist) garch(numlist) het(varlist) suppress constant term in the mean equation ARCH terms GARCH terms include varlist in the specification of the conditional variance You must tsset your data before using mgarch vcc; see [TS] tsset. indepvars and varlist may contain factor variables; see [U] 11.4.3 Factor variables. depvars, indepvars, and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. mgarch vcc — Varying conditional correlation multivariate GARCH models 409 Options   Model arch(numlist) specifies the ARCH terms for all equations in the model. By default, no ARCH terms are specified. garch(numlist) specifies the GARCH terms for all equations in the model. By default, no GARCH terms are specified. het(varlist) specifies that varlist be included in the model in the specification of the conditional variance for all equations. This varlist enters the variance specification collectively as multiplicative heteroskedasticity.   distribution(dist # ) specifies the assumed distribution for the errors. dist may be gaussian, normal, or t. gaussian and normal are synonyms; each causes mgarch vcc to assume that the errors come from a multivariate normal distribution. # may not be specified with either of them. t causes mgarch vcc to assume that the errors follow a multivariate Student t distribution, and the degree-of-freedom parameter is estimated along with the other parameters of the model. If distribution(t #) is specified, then mgarch vcc uses a multivariate Student t distribution with # degrees of freedom. # must be greater than 2. constraints(numlist) specifies linear constraints to apply to the parameter estimates.   SE/Robust vce(vcetype) specifies the estimator for the variance–covariance matrix of the estimator. vce(oim), the default, specifies to use the observed information matrix (OIM) estimator. vce(robust) specifies to use the Huber/White/sandwich estimator.   Reporting level(#); see [R] estimation options. nocnsreport; see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and from(matname); see [R] maximize for all options except from(), and see below for information on from(). These options are seldom used. from(matname) specifies initial values for the coefficients. from(b0) causes mgarch vcc to begin the optimization algorithm with the values in b0. b0 must be a row vector, and the number of columns must equal the number of parameters in the model. The following option is available with mgarch vcc but is not shown in the dialog box: coeflegend; see [R] estimation options. 410 mgarch vcc — Varying conditional correlation multivariate GARCH models Eqoptions noconstant suppresses the constant term in the mean equation. arch(numlist) specifies the ARCH terms in the equation. By default, no ARCH terms are specified. This option may not be specified with model-level arch(). garch(numlist) specifies the GARCH terms in the equation. By default, no GARCH terms are specified. This option may not be specified with model-level garch(). het(varlist) specifies that varlist be included in the specification of the conditional variance. This varlist enters the variance specification collectively as multiplicative heteroskedasticity. This option may not be specified with model-level het(). Remarks and examples We assume that you have already read [TS] mgarch, which provides an introduction to MGARCH models and the methods implemented in mgarch vcc. MGARCH models are dynamic multivariate regression models in which the conditional variances and covariances of the errors follow an autoregressive-moving-average structure. The VCC MGARCH model uses a nonlinear combination of univariate GARCH models with time-varying cross-equation weights to model the conditional covariance matrix of the errors. As discussed in [TS] mgarch, MGARCH models differ in the parsimony and flexibility of their specifications for a time-varying conditional covariance matrix of the disturbances, denoted by Ht . In the conditional correlation family of MGARCH models, the diagonal elements of Ht are modeled as univariate GARCH models, whereas the off-diagonal elements are modeled as nonlinear functions of the diagonal terms. In the VCC MGARCH model, hij,t = ρij,t p hii,t hjj,t where the diagonal elements hii,t and hjj,t follow univariate GARCH processes and ρij,t follows the dynamic process specified in Tse and Tsui (2002) and discussed below. Because the ρij,t varies with time, this model is known as the VCC GARCH model. Technical note The VCC GARCH model proposed by Tse and Tsui (2002) can be written as yt = Cxt + t 1/2 t = Ht νt 1/2 1/2 Ht = Dt Rt Dt Rt = (1 − λ1 − λ2 )R + λ1 Ψt−1 + λ2 Rt−1 where yt is an m × 1 vector of dependent variables; C is an m × k matrix of parameters; xt is a k × 1 vector of independent variables, which may contain lags of yt ; (1) mgarch vcc — Varying conditional correlation multivariate GARCH models 1/2 Ht 411 is the Cholesky factor of the time-varying conditional covariance matrix Ht ; νt is an m × 1 vector of independent and identically distributed innovations; Dt is a diagonal matrix of conditional variances, 2 σ1,t  0  Dt =  .  .. 0   ··· 0 ··· 0   ..  .. . .  2 · · · σm,t 0 2 σ2,t .. . 0 2 in which each σi,t evolves according to a univariate GARCH model of the form Ppi Pqi 2 2 σi,t = si + j=1 αj 2i,t−j + j=1 βj σi,t−j by default, or 2 σi,t = exp(γi zi,t ) + Ppi j=1 αj 2i,t−j + Pqi j=1 2 βj σi,t−j when the het() option is specified, where γt is a 1 × p vector of parameters, zi is a p × 1 vector of independent variables including a constant term, the αj ’s are ARCH parameters, and the βj ’s are GARCH parameters; Rt is a matrix of conditional correlations,  1  ρ12,t Rt =   .. . ρ1m,t ρ12,t 1 .. . ρ2m,t · · · ρ1m,t  · · · ρ2m,t  ..  ..  . . ··· 1 R is the matrix of means to which the dynamic process in (1) reverts; Ψt is the rolling estimator of the correlation matrix of e t , which uses the previous m + 1 observations; and λ1 and λ2 are parameters that govern the dynamics of conditional correlations. λ1 and λ2 are nonnegative and satisfy 0 ≤ λ1 + λ2 < 1. To differentiate this model from Engle (2002), Tse and Tsui (2002) call their model a VCC MGARCH model. Some examples Example 1: Model with common covariates We have daily data on the stock returns of three car manufacturers—Toyota, Nissan, and Honda, from January 2, 2003, to December 31, 2010—in the variables toyota, nissan, and honda. We model the conditional means of the returns as a first-order vector autoregressive process and the conditional covariances as a VCC MGARCH process in which the variance of each disturbance term follows a GARCH(1,1) process. 412 mgarch vcc — Varying conditional correlation multivariate GARCH models . use http://www.stata-press.com/data/r14/stocks (Data from Yahoo! Finance) . mgarch vcc (toyota nissan honda = L.toyota L.nissan L.honda, noconstant), > arch(1) garch(1) Calculating starting values.... Optimizing log likelihood (setting technique to bhhh) Iteration 0: log likelihood = 16901.2 Iteration 1: log likelihood = 17028.644 Iteration 2: log likelihood = 17145.905 Iteration 3: log likelihood = 17251.485 Iteration 4: log likelihood = 17306.115 Iteration 5: log likelihood = 17332.59 Iteration 6: log likelihood = 17353.617 Iteration 7: log likelihood = 17374.86 Iteration 8: log likelihood = 17398.526 Iteration 9: log likelihood = 17418.748 (switching technique to nr) Iteration 10: log likelihood = 17442.552 Iteration 11: log likelihood = 17455.64 Iteration 12: log likelihood = 17463.593 Iteration 13: log likelihood = 17463.922 Iteration 14: log likelihood = 17463.925 Iteration 15: log likelihood = 17463.925 Refining estimates Iteration 0: log likelihood = 17463.925 Iteration 1: log likelihood = 17463.925 Varying conditional correlation MGARCH model Sample: 1 - 2015 Distribution: Gaussian Log likelihood = 17463.92 Coef. Number of obs Wald chi2(9) Prob > chi2 Std. Err. z P>|z| = = = 2,014 17.67 0.0392 [95% Conf. Interval] toyota toyota L1. -.0565645 .0335696 -1.68 0.092 -.1223597 .0092307 nissan L1. .0248101 .0252701 0.98 0.326 -.0247184 .0743385 honda L1. .0035836 .0298895 0.12 0.905 -.0549986 .0621659 ARCH_toyota arch L1. .0602807 .0086799 6.94 0.000 .0432684 .0772929 garch L1. .922469 .0110316 83.62 0.000 .9008474 .9440906 _cons 4.38e-06 1.12e-06 3.91 0.000 2.18e-06 6.58e-06 mgarch vcc — Varying conditional correlation multivariate GARCH models nissan toyota L1. -.0196399 .0387112 -0.51 0.612 -.0955124 .0562326 nissan L1. -.0306663 .031051 -0.99 0.323 -.091525 .0301925 honda L1. .038315 .0354691 1.08 0.280 -.0312031 .1078331 ARCH_nissan arch L1. .0774228 .0119642 6.47 0.000 .0539733 .1008723 garch L1. .9076856 .0139339 65.14 0.000 .8803756 .9349955 _cons 6.20e-06 1.70e-06 3.65 0.000 2.87e-06 9.53e-06 toyota L1. -.0358293 .0340492 -1.05 0.293 -.1025645 .030906 nissan L1. .0544071 .0276156 1.97 0.049 .0002815 .1085327 honda L1. -.0424383 .0326249 -1.30 0.193 -.106382 .0215054 ARCH_honda arch L1. .0458673 .0072714 6.31 0.000 .0316157 .0601189 garch L1. .9369252 .0101755 92.08 0.000 .9169815 .9568689 _cons 4.99e-06 1.29e-06 3.85 0.000 2.45e-06 7.52e-06 .6643028 .0151086 43.97 0.000 .6346905 .6939151 .7302093 .0126361 57.79 0.000 .705443 .7549755 .6347321 .0159738 39.74 0.000 .603424 .6660401 .0277374 .8255525 .0086942 .0755881 3.19 10.92 0.001 0.000 .010697 .6774025 .0447778 .9737026 413 honda corr(toyota, nissan) corr(toyota, honda) corr(nissan, honda) Adjustment lambda1 lambda2 The output has three parts: an iteration log, a header, and an output table. The iteration log has three parts: the dots from the search for initial values, the iteration log from optimizing the log likelihood, and the iteration log from the refining step. A detailed discussion of the optimization methods is in Methods and formulas. The header describes the estimation sample and reports a Wald test against the null hypothesis that all the coefficients on the independent variables in the mean equations are zero. Here the null hypothesis is rejected at the 5% level. 414 mgarch vcc — Varying conditional correlation multivariate GARCH models The output table first presents results for the mean or variance parameters used to model each dependent variable. Subsequently, the output table presents results for the parameters in R. For example, the estimate of the mean of the process that associates Toyota and Nissan is 0.66. Finally, the output table presents results for the adjustment parameters λ1 and λ2 . In the example at hand, the estimates for both λ1 and λ2 are statistically significant. The VCC MGARCH model reduces to the CCC MGARCH model when λ1 = λ2 = 0. The output below shows that a Wald test rejects the null hypothesis that λ1 = λ2 = 0 at all conventional levels. . test _b[Adjustment:lambda1] = _b[Adjustment:lambda2] = 0 ( 1) [Adjustment]lambda1 - [Adjustment]lambda2 = 0 ( 2) [Adjustment]lambda1 = 0 chi2( 2) = 482.80 Prob > chi2 = 0.0000 These results indicate that the assumption of time-invariant conditional correlations maintained in the CCC MGARCH model is too restrictive for these data. Example 2: Model with covariates that differ by equation We improve the previous example by removing the insignificant parameters from the model. To accomplish that, we specify the honda equation separately from the toyota and nissan equations: . mgarch vcc (toyota nissan = , noconstant) (honda = L.nissan, noconstant), > arch(1) garch(1) Calculating starting values.... Optimizing log likelihood (setting technique to bhhh) Iteration 0: log likelihood = 16889.43 Iteration 1: log likelihood = 17002.567 Iteration 2: log likelihood = 17134.525 Iteration 3: log likelihood = 17233.192 Iteration 4: log likelihood = 17295.342 Iteration 5: log likelihood = 17326.347 Iteration 6: log likelihood = 17348.063 Iteration 7: log likelihood = 17363.988 Iteration 8: log likelihood = 17387.216 Iteration 9: log likelihood = 17404.734 (switching technique to nr) Iteration 10: log likelihood = 17438.432 (not concave) Iteration 11: log likelihood = 17450.002 Iteration 12: log likelihood = 17455.443 Iteration 13: log likelihood = 17455.971 Iteration 14: log likelihood = 17455.98 Iteration 15: log likelihood = 17455.98 Refining estimates Iteration 0: log likelihood = 17455.98 Iteration 1: log likelihood = 17455.98 (backed up) mgarch vcc — Varying conditional correlation multivariate GARCH models Varying conditional correlation MGARCH model Sample: 1 - 2015 Distribution: Gaussian Log likelihood = 17455.98 Coef. Number of obs Wald chi2(1) Prob > chi2 = = = 415 2,014 1.62 0.2032 Std. Err. z P>|z| [95% Conf. Interval] ARCH_toyota arch L1. .0609064 .0087784 6.94 0.000 .0437011 .0781117 garch L1. .921703 .0111493 82.67 0.000 .8998508 .9435552 _cons 4.42e-06 1.13e-06 3.91 0.000 2.20e-06 6.64e-06 ARCH_nissan arch L1. .0806598 .0123529 6.53 0.000 .0564486 .104871 garch L1. .9035239 .014421 62.65 0.000 .8752592 .9317886 _cons 6.61e-06 1.79e-06 3.70 0.000 3.11e-06 .0000101 nissan L1. .0175565 .0137982 1.27 0.203 -.0094874 .0446005 ARCH_honda arch L1. .0461398 .0073048 6.32 0.000 .0318225 .060457 garch L1. .9366096 .0102021 91.81 0.000 .9166139 .9566053 _cons 5.03e-06 1.31e-06 3.85 0.000 2.47e-06 7.59e-06 .6635251 .0150293 44.15 0.000 .6340682 .692982 .7299703 .0124828 58.48 0.000 .7055045 .754436 .6338207 .0158681 39.94 0.000 .6027198 .6649217 .0285319 .8113923 .0092448 .0854955 3.09 9.49 0.002 0.000 .0104124 .6438242 .0466514 .9789604 honda corr(toyota, nissan) corr(toyota, honda) corr(nissan, honda) Adjustment lambda1 lambda2 It turns out that the coefficient on L1.nissan in the honda equation is now statistically insignificant. We could further improve the model by removing L1.nissan from the model. There is no mean equation for Toyota or Nissan. In [TS] mgarch vcc postestimation, we discuss prediction from models without covariates. 416 mgarch vcc — Varying conditional correlation multivariate GARCH models Example 3: Model with constraints Here we fit a bivariate VCC MGARCH model for the Toyota and Nissan shares. We believe that the shares of these car manufacturers follow the same process, so we impose the constraints that the ARCH coefficients are the same for the two companies and that the GARCH coefficients are also the same. . constraint 1 _b[ARCH_toyota:L.arch] = _b[ARCH_nissan:L.arch] . constraint 2 _b[ARCH_toyota:L.garch] = _b[ARCH_nissan:L.garch] . mgarch vcc (toyota nissan = , noconstant), arch(1) garch(1) constraints(1 2) Calculating starting values.... Optimizing log likelihood (setting technique to bhhh) Iteration 0: log likelihood = 10326.298 Iteration 1: log likelihood = 10680.73 Iteration 2: log likelihood = 10881.388 Iteration 3: log likelihood = 11043.345 Iteration 4: log likelihood = 11122.459 Iteration 5: log likelihood = 11202.411 Iteration 6: log likelihood = 11253.657 Iteration 7: log likelihood = 11276.325 Iteration 8: log likelihood = 11279.823 Iteration 9: log likelihood = 11281.704 (switching technique to nr) Iteration 10: log likelihood = 11282.313 Iteration 11: log likelihood = 11282.46 Iteration 12: log likelihood = 11282.461 mgarch vcc — Varying conditional correlation multivariate GARCH models 417 Refining estimates Iteration 0: log likelihood = 11282.461 Iteration 1: log likelihood = 11282.461 (backed up) Varying conditional correlation MGARCH model Sample: 1 - 2015 Distribution: Gaussian Log likelihood = 11282.46 ( 1) ( 2) Number of obs Wald chi2(.) Prob > chi2 = = = 2,015 . . [ARCH_toyota]L.arch - [ARCH_nissan]L.arch = 0 [ARCH_toyota]L.garch - [ARCH_nissan]L.garch = 0 Coef. Std. Err. z P>|z| [95% Conf. Interval] ARCH_toyota arch L1. .0797459 .0101634 7.85 0.000 .059826 .0996659 garch L1. .9063808 .0118211 76.67 0.000 .883212 .9295497 _cons 4.24e-06 1.10e-06 3.85 0.000 2.08e-06 6.40e-06 ARCH_nissan arch L1. .0797459 .0101634 7.85 0.000 .059826 .0996659 garch L1. .9063808 .0118211 76.67 0.000 .883212 .9295497 _cons 5.91e-06 1.47e-06 4.03 0.000 3.03e-06 8.79e-06 .6720056 .0162585 41.33 0.000 .6401394 .7038718 .0343012 .7945548 .0128097 .101067 2.68 7.86 0.007 0.000 .0091945 .596467 .0594078 .9926425 corr(toyota, nissan) Adjustment lambda1 lambda2 We could test our constraints by fitting the unconstrained model and performing a likelihood-ratio test. The results indicate that the restricted model is preferable. Example 4: Model with a GARCH term In this example, we have data on fictional stock returns for the Acme and Anvil corporations, and we believe that the movement of the two stocks is governed by different processes. We specify one ARCH and one GARCH term for the conditional variance equation for Acme and two ARCH terms for the conditional variance equation for Anvil. In addition, we include the lagged value of the stock return for Apex, the main subsidiary of Anvil corporation, in the variance equation of Anvil. For Acme, we have data on the changes in an index of futures prices of products related to those produced by Acme in afrelated. For Anvil, we have data on the changes in an index of futures prices of inputs used by Anvil in afinputs. . use http://www.stata-press.com/data/r14/acmeh . mgarch vcc (acme = afrelated, noconstant arch(1) garch(1)) > (anvil = afinputs, arch(1/2) het(L.apex)) Calculating starting values.... 418 mgarch vcc — Varying conditional correlation multivariate GARCH models Optimizing log likelihood (setting technique to bhhh) Iteration 0: log likelihood Iteration 1: log likelihood Iteration 2: log likelihood Iteration 3: log likelihood Iteration 4: log likelihood Iteration 5: log likelihood Iteration 6: log likelihood Iteration 7: log likelihood Iteration 8: log likelihood Iteration 9: log likelihood (switching technique to nr) Iteration 10: log likelihood Iteration 11: log likelihood Refining estimates Iteration 0: log likelihood Iteration 1: log likelihood = = = = = = = = = = -13252.793 -12859.124 -12522.14 -12406.487 -12304.275 -12273.103 -12256.104 -12254.55 -12254.482 -12254.478 = -12254.478 = -12254.478 = -12254.478 = -12254.478 Varying conditional correlation MGARCH model Sample: 1 - 2500 Distribution: Gaussian Log likelihood = -12254.48 Coef. Std. Err. z Number of obs Wald chi2(2) Prob > chi2 = = = 2,499 5226.19 0.0000 P>|z| [95% Conf. Interval] acme afrelated .9672465 .0510066 18.96 0.000 .8672753 1.067218 ARCH_acme arch L1. .0949142 .0147302 6.44 0.000 .0660435 .1237849 garch L1. .7689442 .038885 19.77 0.000 .6927309 .8451574 _cons 2.129468 .464916 4.58 0.000 1.218249 3.040687 anvil afinputs _cons -1.018629 .1015986 .0145027 .0177952 -70.24 5.71 0.000 0.000 -1.047053 .0667205 -.9902037 .1364766 ARCH_anvil arch L1. L2. .4990272 .2839812 .0243531 .0181966 20.49 15.61 0.000 0.000 .4512959 .2483165 .5467584 .3196459 apex L1. 1.897144 .0558791 33.95 0.000 1.787623 2.006665 _cons .0682724 .0662257 1.03 0.303 -.0615276 .1980724 -.6574256 .0294259 -22.34 0.000 -.7150994 -.5997518 .2375029 .6492072 .0179114 .0254493 13.26 25.51 0.000 0.000 .2023971 .5993274 .2726086 .6990869 corr(acme, anvil) Adjustment lambda1 lambda2 mgarch vcc — Varying conditional correlation multivariate GARCH models 419 The results indicate that increases in the futures prices for related products lead to higher returns on the Acme stock, and increased input prices lead to lower returns on the Anvil stock. In the conditional variance equation for Anvil, the coefficient on L1.apex is positive and significant, which indicates that an increase in the return on the Apex stock leads to more variability in the return on the Anvil stock. Stored results mgarch vcc stores the following in e(): Scalars e(N) e(k) e(k aux) e(k extra) e(k eq) e(k dv) e(df m) e(ll) e(chi2) e(p) e(estdf) e(usr) e(tmin) e(tmax) e(N gaps) e(rank) e(ic) e(rc) e(converged) Macros e(cmd) e(model) e(cmdline) e(depvar) e(covariates) e(dv eqs) e(indeps) e(tvar) e(title) e(chi2type) e(vce) e(vcetype) e(tmins) e(tmaxs) e(dist) e(arch) e(garch) e(technique) e(properties) e(estat cmd) e(predict) e(marginsok) e(marginsnotok) e(marginsdefault) e(asbalanced) e(asobserved) number of observations number of parameters number of auxiliary parameters number of extra estimates added to number of equations in e(b) number of dependent variables model degrees of freedom log likelihood b χ2 significance 1 if distribution parameter was estimated, 0 otherwise user-provided distribution parameter minimum time in sample maximum time in sample number of gaps rank of e(V) number of iterations return code 1 if converged, 0 otherwise mgarch vcc command as typed names of dependent variables list of covariates dependent variables with mean equations independent variables in each equation time variable title in estimation output Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. formatted minimum time formatted maximum time distribution for error term: gaussian or t specified ARCH terms specified GARCH terms maximization technique b V program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins default predict() specification for margins factor variables fvset as asbalanced factor variables fvset as asobserved 420 mgarch vcc — Varying conditional correlation multivariate GARCH models Matrices e(b) e(Cns) e(ilog) e(gradient) e(hessian) e(V) e(pinfo) Functions e(sample) coefficient vector constraints matrix iteration log (up to 20 iterations) gradient vector Hessian matrix variance–covariance matrix of the estimators parameter information, used by predict marks estimation sample Methods and formulas mgarch vcc estimates the parameters of the varying conditional correlation MGARCH model by maximum likelihood. The log-likelihood function based on the multivariate normal distribution for observation t is n  o 1/2 lt = −0.5m log(2π) − 0.5log {det (Rt )} − log det Dt − 0.5e t R−1 0t t e −1/2 where e t = Dt t is an m × 1 vector of standardized residuals, t = yt − Cxt . The log-likelihood PT function is t=1 lt . If we assume that νt follow a multivariate t distribution with degrees of freedom (df) greater than 2, then the log-likelihood function for observation t is   df m log {(df − 2)π} − 2 2   n  o df + m e t R−1 0t 1/2 t e − 0.5log {det (Rt )} − log det Dt − log 1 + 2 df − 2 lt = log Γ df + m 2   − log Γ The starting values for the parameters in the mean equations and the initial residuals b t are obtained by least-squares regression. The starting values for the parameters in the variance equations are obtained by a procedure proposed by Gourieroux and Monfort (1997, sec. 6.2.2). The starting values for the parameters in R are calculated from the standardized residuals e t . Given the starting values for the mean and variance equations, the starting values for the parameters λ1 and λ2 are obtained from a grid search performed on the log likelihood. The initial optimization step is performed in the unconstrained space. Once the maximum is found, we impose the constraints λ1 ≥ 0, λ2 ≥ 0, and 0 ≤ λ1 + λ2 < 1, and maximize the log likelihood in the constrained space. This step is reported in the iteration log as the refining step. GARCH estimators require initial values that can be plugged in for t−i 0t−i and Ht−j when t − i < 1 and t − j < 1. mgarch vcc substitutes an estimator of the unconditional covariance of the disturbances b = T −1 Σ T X t=1 0 b b t b b t (2) mgarch vcc — Varying conditional correlation multivariate GARCH models 421 for t−i 0t−i when t − i < 1 and for Ht−j when t − j < 1, where b b t is the vector of residuals calculated using the estimated parameters. mgarch vcc uses numerical derivatives in maximizing the log-likelihood function. References Engle, R. F. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics 20: 339–350. Gourieroux, C. S., and A. Monfort. 1997. Time Series and Dynamic Models. Trans. ed. G. M. Gallo. Cambridge: Cambridge University Press. Tse, Y. K., and A. K. C. Tsui. 2002. A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business & Economic Statistics 20: 351–362. Also see [TS] mgarch vcc postestimation — Postestimation tools for mgarch vcc [TS] mgarch — Multivariate GARCH models [TS] tsset — Declare data to be time-series data [TS] arch — Autoregressive conditional heteroskedasticity (ARCH) family of estimators [TS] var — Vector autoregressive models [U] 20 Estimation and postestimation commands Title mgarch vcc postestimation — Postestimation tools for mgarch vcc Postestimation commands Methods and formulas predict Also see margins Remarks and examples Postestimation commands The following standard postestimation commands are available after mgarch vcc: Command Description contrast estat ic estat summarize estat vce estimates forecast lincom contrasts and ANOVA-style joint tests of estimates Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions pairwise comparisons of estimates Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest margins marginsplot nlcom predict predictnl pwcompare test testnl 422 mgarch vcc postestimation — Postestimation tools for mgarch vcc 423 predict Description for predict predict creates a new variable containing predictions such as linear predictions and conditional variances, covariances, and correlations. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic { stub* | newvarlist }  if   in   , statistic options  Description Main xb residuals variance correlation linear prediction; the default residuals conditional variances and covariances conditional correlations These statistics are available both in and out of sample; type predict the estimation sample. . . . if e(sample) . . . if wanted only for Description options Options equation(eqnames) names of equations for which predictions are made dynamic(time constant) begin dynamic forecast at specified time Options for predict  Main  xb, the default, calculates the linear predictions of the dependent variables. residuals calculates the residuals. variance predicts the conditional variances and conditional covariances. correlation predicts the conditional correlations.   Options equation(eqnames) specifies the equation for which the predictions are calculated. Use this option to predict a statistic for a particular equation. Equation names, such as equation(income), are used to identify equations. 424 mgarch vcc postestimation — Postestimation tools for mgarch vcc One equation name may be specified when predicting the dependent variable, the residuals, or the conditional variance. For example, specifying equation(income) causes predict to predict income, and specifying variance equation(income) causes predict to predict the conditional variance of income. Two equations may be specified when predicting a conditional variance or covariance. For example, specifying equation(income, consumption) variance causes predict to predict the conditional covariance of income and consumption. dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified time constant must be in the scale of the time variable specified in tsset, and the time constant must be inside a sample for which observations on the dependent variables are available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of 2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains exogenous variables, they must be present for the whole predicted sample. dynamic() may not be specified with residuals. margins Description for margins margins estimates margins of response for linear predictions and conditional variances, covariances, and correlations. All predictions are available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . )  predict(statistic . . . ) . . . statistic Description default xb variance correlation residuals linear predictions for each equation linear prediction for a specified equation conditional variances and covariances conditional correlations not allowed with margins   options xb defaults to the first equation. Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins.  mgarch vcc postestimation — Postestimation tools for mgarch vcc 425 Remarks and examples We assume that you have already read [TS] mgarch vcc. In this entry, we use predict after mgarch vcc to make in-sample and out-of-sample forecasts. Example 1: Dynamic forecasts In this example, we obtain dynamic forecasts for the Toyota, Nissan, and Honda stock returns modeled in example 2 of [TS] mgarch vcc. In the output below, we reestimate the parameters of the model, use tsappend (see [TS] tsappend) to extend the data, and use predict to obtain in-sample one-step-ahead forecasts and dynamic forecasts of the conditional variances of the returns. We graph the forecasts below. 0 .001 .002 .003 . use http://www.stata-press.com/data/r14/stocks (Data from Yahoo! Finance) . quietly mgarch vcc (toyota nissan = , noconstant) > (honda = L.nissan, noconstant), arch(1) garch(1) . tsappend, add(50) . predict H*, variance dynamic(2016) 01jan2009 01jul2009 01jan2010 Date 01jul2010 01jan2011 Variance prediction (toyota,toyota), dynamic(2016) Variance prediction (nissan,nissan), dynamic(2016) Variance prediction (honda,honda), dynamic(2016) Recent in-sample one-step-ahead forecasts are plotted to the left of the vertical line in the above graph, and the dynamic out-of-sample forecasts appear to the right of the vertical line. The graph shows the tail end of the huge increase in return volatility that took place in 2008 and 2009. It also shows that the dynamic forecasts quickly converge. Methods and formulas All one-step predictions are obtained by substituting the parameter estimates into the model. The b is the initial value for the ARCH and estimated unconditional variance matrix of the disturbances, Σ, b using the prediction sample, the parameter GARCH terms. The postestimation routines recompute Σ estimates stored in e(b), and (2) in Methods and formulas of [TS] mgarch vcc. For observations in which the residuals are missing, the estimated unconditional variance matrix of the disturbances is used in place of the outer product of the residuals. 426 mgarch vcc postestimation — Postestimation tools for mgarch vcc Dynamic predictions of the dependent variables use previously predicted values beginning in the period specified by dynamic(). b for the outer product of the Dynamic variance predictions are implemented by substituting Σ residuals beginning in the period specified in dynamic(). Also see [TS] mgarch vcc — Varying conditional correlation multivariate GARCH models [U] 20 Estimation and postestimation commands Title mswitch — Markov-switching regression models Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description mswitch fits dynamic regression models that exhibit different dynamics across unobserved states using state-dependent parameters to accommodate structural breaks or other multiple-state phenomena. These models are known as Markov-switching models because the transitions between the unobserved states follow a Markov chain. Two models are available: Markov-switching dynamic regression (MSDR) models that allow a quick adjustment after the process changes state and Markov-switching autoregression (MSAR) models that allow a more gradual adjustment. Quick start MSDR model for the dependent variable y with two state-dependent intercepts using tsset data mswitch dr y Same as above mswitch dr y, states(2) As above, but with three states and switching coefficients on x mswitch dr y, switch(x) states(3) MSDR model with two state-dependent intercepts and variance parameters mswitch dr y, varswitch MSAR model with two state-dependent intercepts and an autoregression (AR) term mswitch ar y, ar(1) As above, but with switching AR coefficients mswitch ar y, ar(1) arswitch Menu Statistics > Time series > Markov-switching model 427 428 mswitch — Markov-switching regression models Syntax Markov-switching dynamic regression         mswitch dr depvar nonswitch varlist if in , options Markov-switching AR mswitch ar depvar     nonswitch varlist , ar(numlist) msar options options nonswitch varlist is a list of variables with state-invariant coefficients. options Description Main states(#)    specify number of states; default is states(2) switch( varlist , noconstant ) specify variables with switching coefficients; by default, the constant term is state dependent unless switch(, noconstant) is specified allow a state-invariant constant term; may be specified only constant with switch(, noconstant) varswitch specify state-dependent variance parameters; by default, the variance parameter is constant across all states p0(type) specify initial unconditional probabilities where type is one of transition, fixed, or smoothed; the default is p0(transition) constraints(numlist) apply specified linear constraints SE/Robust vce(vcetype) vcetype may be oim or robust Reporting level(#) nocnsreport display options set confidence level; default is level(95) do not display constraints control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling EM options emiterate(#) emlog emdots specify the number of expectation-maximization (EM) iterations; default is emiterate(10) show EM iteration log show EM iterations as dots Maximization maximize options control the maximization process coeflegend display legend instead of statistics mswitch — Markov-switching regression models msar options Model ∗ ar(numlist) arswitch ∗ 429 Description specify the number of AR terms specify state-dependent AR coefficients ar(numlist) is required. You must tsset your data before using mswitch; see [TS] tsset. varlist and nonswitch varlist may contain factor variables; see [U] 11.4.3 Factor variables. depvar, nonswitch varlist, and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. Options   Model ar(numlist) specifies the number of AR terms. This option may be specified only with command mswitch ar. ar() is required to fit AR models. arswitch specifies that the AR coefficients vary over the states. arswitch may be specified only with option ar().   Main states(#) specifies the number of states. The default is states(2).    switch( varlist , noconstant ) specifies variables whose coefficients vary over the states. By default, the constant term is state dependent and is included in the regression model. You may suppress the constant term by specifying switch(, noconstant). constant specifies that a state-invariant constant term be included in the model. This option may be specified only with switch(, noconstant). varswitch specifies that the variance parameters are state dependent. The default is constant variance across all states. p0(type) is rarely used. This option specifies the method for obtaining values for the unconditional transition probabilities. type is one of transition, fixed, or smoothed. The default is p0(transition), which specifies that the values be computed using the matrix of conditional transition probabilities. Type fixed specifies that each unconditional probability is 1/k , where k is the number of states. Type smoothed specifies that the unconditional probabilities be estimated as extra parameters of the model. constraints(numlist) specifies the linear constraints to be applied to the parameter estimates.   SE/Robust vce(vcetype) specifies the type of standard error reported, which includes types that are derived from asymptotic theory (oim) and that are robust to some kinds of misspecification (robust); see [R] vce option. 430  mswitch — Markov-switching regression models  Reporting level(#), nocnsreport; see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options.   EM options emiterate(#), emlog, and emdots control the EM iterations that take place before estimation switches to a quasi-Newton method. EM is used to obtain starting values. emiterate(#) specifies the number of EM iterations; the default is emiterate(10). emlog specifies that the EM iteration log be shown. The default is to not display the EM iteration log. emdots specifies that the EM iterations be shown as dots. The default is to not display the dots.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and from(matname); see [R] maximize for all options except from(), and see below for information on from(). from(matname) specifies initial values for the maximization process. If from() is specified, the initial values are used in the EM step to improve the likelihood unless emiterate(0) is also specified. The coefficients obtained at the end of the EM iterations serve as initial values for the quasi-Newton method. matname must be a row vector. The number of columns must equal the number of parameters in the model, and the values must be in the same order as the parameters in e(b). The following option is available with mswitch but is not shown in the dialog box: coeflegend; see [R] estimation options. Remarks and examples mswitch fits Markov-switching models in which the parameters vary over states. The states are unobserved and follow a Markov process. mswitch dr fits MSDR models that allow a quick adjustment after a state change and are often used to model monthly and higher-frequency data. mswitch ar fits MSAR models that allow a more gradual adjustment after a state change and are often used to model quarterly and lower-frequency data. Estimation is by maximum likelihood. You must tsset your data before using mswitch; see [TS] tsset. Remarks are presented under the following headings: Introduction Markov-switching dynamic regression Markov-switching AR If you are new to Markov-switching models, we recommend that you begin with Introduction. A more technical discussion and examples are presented in the model-specific sections. mswitch — Markov-switching regression models 431 Introduction Markov-switching models are widely applied in the social sciences. For example, in economics, the growth rate of Gross Domestic Product is modeled as a switching process to capture the asymmetrical behavior observed over expansions and recessions (Hamilton 1989). Other economic examples include modeling interest rates (Garcia and Perron 1996) and exchange rates (Engel and Hamilton 1990). In finance, Kim, Nelson, and Startz (1998) model monthly stock returns, while Guidolin (2011b, 2011a) provide many applications of these models to returns, portfolio choice, and asset pricing. In political science, Jones, Kim, and Startz (2010) model Democratic and Republican partisan states in the United States Congress. These models are also used in health sciences. For example, in psychology, Markov-switching models have been applied to daily data on manic and depressive states for individuals with rapidcycling bipolar disorder (Hamaker, Grasman, and Kamphuis 2010). In epidemiology, Lu, Zeng, and Chen (2010) and Martinez-Beneito et al. (2008) model the incidence rate of infectious disease in epidemic and nonepidemic states. The Markov-switching regression model was initially developed in Quandt (1972) and Goldfeld and Quandt (1973). In a seminal paper, Hamilton (1989) extended Markov-switching regressions for AR processes and provided a nonlinear filter for estimation. Hamilton (1993) and Hamilton (1994, chap. 22) provide excellent introductions to Markov-switching regression models. Markov-switching models are used for series that are believed to transition over a finite set of unobserved states, allowing the process to evolve differently in each state. The transitions occur according to a Markov process. The time of transition from one state to another and the duration between changes in state is random. For example, these models can be used to understand the process that governs the time at which economic growth transitions between expansion and recession and the duration of each period. Consider the evolution of a series yt , where t = 1, 2, . . . , T , is characterized by two states, as in the models below State 1 : yt = µ1 + εt State 2 : yt = µ2 + εt where µ1 and µ2 are the intercept terms in state 1 and state 2, respectively. εt is a white noise error with variance σ 2 . The two states model shifts in the intercept term. If the timing of switches is known, the above model can be expressed as yt = st µ1 + (1 − st )µ2 + εt where st is 1 if the process is in state 1 and 0 otherwise. The above model is a regression with dummy variables and could be estimated with ordinary least squares using, for example, regress. However, in the case of interest, we never know in which state the process is; that is to say, st is not observed. Markov-switching regression models allow the parameters to vary over the unobserved states. In the simplest case, we can express this model as a MSDR model with a state-dependent intercept term yt = µst + εt where µst is the parameter of interest; µst = µ1 when st = 1, and µst = µ2 when st = 2. Although one never knows with certainty in which state the process lies, the probabilities of being in each state can be estimated. For a Markov process, the transition probabilities are of greater interest. One-step transition probabilities are given by pst ,st+1 , so for a two-state process, p11 denotes the probability of staying in state 1 in the next period given that the process is in state 1 in the current period. Likewise, p22 denotes the probability of staying in state 2. Values closer to 1 indicate a more persistent process, or in other words, that it is expected to stay in a given state for a long time. 432 mswitch — Markov-switching regression models Estimation of Markov-switching models proceeds by predicting the probabilities of the unobserved state and updating the likelihood at each period, akin to the Kalman filter. However, while the Kalman filter is concerned with making linear updates on continuous latent variables, the filter developed in Hamilton (1989) is a nonlinear algorithm that estimates the probabilities that a discrete, latent variable is in one of several states. Also see Hamilton (1990) for estimation of the model parameters by an EM algorithm, which is also a robust method to find reasonable starting values. Markov-switching dynamic regression In this section, we use a series of examples to describe MSDR models and the mswitch dr command. MSDR models allow a quick adjustment after the process changes state. These models are often used to model monthly and higher-frequency data. When the process is in state s at time t, a general specification of the MSDR model is written as yt = µst + xt α + zt βst + εs where yt is the dependent variable, µs is the state-dependent intercept, xt is a vector of exogenous variables with state-invariant coefficients α, zt is a vector of exogenous variables with state-dependent coefficients βs , and εs is an independent and identically distributed (i.i.d.) normal error with mean 0 and state-dependent variance σs2 . xt and zt may contain lags of yt . MSDR models allow states to switch according to a Markov process as described in Markov-switching regression models under Methods and formulas. In the default model fit by mswitch dr, s = 2 and a constant σ 2 is assumed (σ12 = σ22 = σ 2 ), so three parameters, µ1 , µ2 , and σ 2 , are estimated. There is no xt or zt . The number of states may be increased with option state(). To include xt , you specify a varlist after the command name, and to include zt , you specify option switch(). The assumption of constant variances over states may be relaxed with option varswitch. A more complete discussion of the MSDR model is provided in Specification of Markov-switching models under Methods and formulas. Example 1: MSDR model with switching intercepts Suppose we wish to model the federal funds rate. One potential model is a constant-only model rt = µst + εt where rt is the federal funds rate, st is the state, and µst is the mean in each state. In usmacro.dta, we have data for the series from the third quarter of 1954 to the fourth quarter of 2010 from the Federal Reserve Economic Database, a macroeconomic database provided by the Federal Reserve Bank of Saint Louis. The data are plotted below. mswitch — Markov-switching regression models 433 0 5 federal funds rate 10 15 20 US Federal Funds Rate 1950q1 1960q1 1970q1 1980q1 1990q1 quarterly time variable 2000q1 2010q1 We note that the decades of 1970s and 1980s were characterized by periods of high interest rates while the rest of the sample displays moderate levels. Thus, a two-state model seems reasonable. st ∈ (1, 2) is the state; µ1 is the mean in the moderaterate state; and µ2 is the mean in high-rate state. We can use mswitch dr with dependent variable fedfunds to estimate the parameters of the model. . use http://www.stata-press.com/data/r14/usmacro (Federal Reserve Economic Data - St. Louis Fed) . mswitch dr fedfunds Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -508.66031 Iteration 1: log likelihood = -508.6382 Iteration 2: log likelihood = -508.63592 Iteration 3: log likelihood = -508.63592 Markov-switching dynamic regression Sample: 1954q3 - 2010q4 No. of obs Number of states = 2 AIC Unconditional probabilities: transition HQIC SBIC Log likelihood = -508.63592 fedfunds Coef. Std. Err. z = = = = 226 4.5455 4.5760 4.6211 P>|z| [95% Conf. Interval] State1 _cons 3.70877 .1767083 20.99 0.000 3.362428 4.055112 _cons 9.556793 .2999889 31.86 0.000 8.968826 10.14476 sigma 2.107562 .1008692 1.918851 2.314831 p11 .9820939 .0104002 .9450805 .9943119 p21 .0503587 .0268434 .0173432 .1374344 State2 The header reports the sample size, fit statistics, the number of states, and the method used for computing the unconditional state probabilities. The EM algorithm was used to find the starting values 434 mswitch — Markov-switching regression models for the quasi-Newton optimizer, and we see that it took three iterations for the model to converge. Finally, the header reports that the transition method was used to compute the unconditional state probabilities as a function of the transition probabilities; see Methods and formulas. The estimation table reports results for each state-dependent mean and the constant error variance. Below that, the table displays the elements of the first k − 1 rows of the transition matrix, where k is the number of states. State 1 is the moderate-rate state and has a mean interest rate of 3.71%. State 2 is the high-rate state and has a mean interest rate of 9.56%. p11 is the estimated probability of staying in state 1 in the next period given that the process is in state 1 in the current period. The estimate of 0.98 implies that state 1 is highly persistent. Similarly, p21 is the probability of transitioning to state 1 from state 2. The probability of staying in state 2 is therefore 1 − 0.05 = 0.95, which implies that state 2 is also highly persistent. Note that it is arbitrary which state is called 1 or 2. Changing the initial values for the iterations, for example, can change the state labels for a given model-data combination. The transition probabilities will get swapped in accordance with the change in labels. Technical note As mentioned in Introduction, a model with one state is equivalent to linear regression. To estimate a one-state constant-only model for the data in example 1, you type . mswitch dr fedfunds, states(1) This is equivalent to typing . arima fedfunds, technique(nr) or . regress fedfunds The commands produce nearly identical parameter estimates for the coefficients. Example 2: MSDR model with switching intercepts and coefficients Continuing example 1, we specify a more complex model that includes the lagged value of the interest rate and allows its coefficient to switch as well. The respecified model is rt = µst + φst rt−1 + εt mswitch — Markov-switching regression models 435 We estimate the switching coefficient by including the switch() option. . mswitch dr fedfunds, switch(L.fedfunds) Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -265.37725 Iteration 1: log likelihood = -264.74265 Iteration 2: log likelihood = -264.71073 Iteration 3: log likelihood = -264.71069 Iteration 4: log likelihood = -264.71069 Markov-switching dynamic regression Sample: 1954q4 - 2010q4 Number of states = 2 Unconditional probabilities: transition No. of obs AIC HQIC SBIC = = = = 225 2.4152 2.4581 2.5215 Log likelihood = -264.71069 fedfunds Coef. Std. Err. State1 fedfunds L1. .7631424 .0337234 _cons .724457 State2 fedfunds L1. z P>|z| [95% Conf. Interval] 22.63 0.000 .6970457 .8292392 .2886657 2.51 0.012 .1586826 1.290231 1.061174 .0185031 57.35 0.000 1.024908 1.097439 _cons -.0988764 .1183838 -0.84 0.404 -.3309043 .1331515 sigma .6915759 .0358644 .6247373 .7655653 p11 .6378175 .1202616 .3883032 .830089 p21 .1306295 .0495924 .0600137 .2612432 The output indicates that the coefficients on the lagged dependent variable in the two states are significant. Also, we favor this model over the constant-only model because the SBIC for this model, 2.52, is lower than the SBIC for the constant-only model, 4.62. Example 3: Changing the number of states for an MSDR model Continuing example 2, we now specify a Taylor-rule model with two and three states and select the preferred number of states. Taylor-rule models specify that the interest rate depends on its own lag, the current value of inflation, and the output gap. In our dataset, ogap is the output gap and inflation is inflation. This model does not have a constant term, so we specify suboption noconstant in switch() after the variables with switching coefficients. 436 mswitch — Markov-switching regression models First, we fit a two-state MSDR Taylor-rule model with fedfunds as the interest rate. . mswitch dr fedfunds, switch(L.fedfunds ogap inflation, noconstant) Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -237.14148 Iteration 1: log likelihood = -237.11346 Iteration 2: log likelihood = -237.11345 Markov-switching dynamic regression Sample: 1955q3 - 2010q4 No. of obs = Number of states = 2 AIC = Unconditional probabilities: transition HQIC = SBIC = Log likelihood = -237.11345 fedfunds Coef. Std. Err. State1 fedfunds L1. .8649288 .0420639 ogap inflation .152642 .0119843 State2 fedfunds L1. z 222 2.2172 2.2729 2.3552 P>|z| [95% Conf. Interval] 20.56 0.000 .7824851 .9473726 .0417278 .0495214 3.66 0.24 0.000 0.809 .0708571 -.0850758 .2344269 .1090444 .9187939 .0204739 44.88 0.000 .8786659 .9589219 ogap inflation .0421467 .2011764 .0196035 .0271228 2.15 7.42 0.032 0.000 .0037245 .1480165 .080569 .2543362 sigma .5777446 .0312415 .5196455 .6423395 p11 .6172176 .099633 .4136784 .786556 p21 .129722 .0400303 .0692423 .2299753 The results indicate that inflation does not significantly affect fedfunds in state 1 but that it does in state 2. mswitch — Markov-switching regression models Would a model with three states be better than the above two-state model? . mswitch dr fedfunds, switch(L.fedfunds ogap inflation, noconstant) states(3) Performing EM optimization: Performing gradient-based optimization: Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: log log log log log log log log log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = = = = = = = = = -196.56927 -195.93129 -194.39917 -193.81305 -193.17561 -192.70636 -192.55488 -192.51034 -192.50204 -192.49988 -192.49942 -192.49931 -192.49928 -192.49928 (not concave) (not concave) Markov-switching dynamic regression Sample: 1955q3 - 2010q4 Number of states = 3 Unconditional probabilities: transition No. of obs AIC HQIC SBIC = = = = 222 1.8784 1.9774 2.1236 Log likelihood = -192.49928 fedfunds Coef. Std. Err. State1 fedfunds L1. .8268613 .0351457 ogap inflation .1966035 .0517806 State2 fedfunds L1. z P>|z| [95% Conf. Interval] 23.53 0.000 .7579769 .8957456 .0337338 .0394383 5.83 1.31 0.000 0.189 .1304864 -.025517 .2627206 .1290782 .9526463 .0153751 61.96 0.000 .9225117 .9827809 ogap inflation .0403391 .1319626 .0148812 .0206491 2.71 6.39 0.007 0.000 .0111725 .0914911 .0695057 .1724341 State3 fedfunds L1. .4703446 .0812193 5.79 0.000 .3111577 .6295315 ogap inflation -.0212463 .9021315 .0421034 .083461 -0.50 10.81 0.614 0.000 -.1037674 .7385508 .0612748 1.065712 sigma .4250504 .0234556 .3814772 .4736006 p11 p12 .5772811 .3029914 .0920555 .0876632 .3946655 .16156 .7409643 .4951202 p21 p22 .1036127 .8963872 .0306354 .0306354 .0570974 .8192421 .1807579 .9429025 p31 p32 .1364478 .2521249 .1227456 .1924899 .0200977 .0435989 .5489975 .7137206 437 438 mswitch — Markov-switching regression models We favor the three-state model over the two-state model because it has the lower SBIC. The three states, in this case, can be thought of as representing low, moderate, and high-interest rate states. The results for the three-state model indicate that inflation does not affect the interest rate in state 1, but it does affect the interest rate in states 2 and 3. The results also indicate that when the coefficient on inflation is large and significant in state 3, the output gap coefficient is not significant. Technical note In some situations, the quasi-Newton optimization will not converge, which implies that the parameters of the specified model are not identified by the data. These convergence problems most frequently arise when attempting to fit a model with too many states. Example 4: Switching variances All examples thus far have assumed a constant variance across states. In some cases, we may wish to relax this assumption. For example, in the snp500 dataset we have weekly data on the absolute returns of the S&P 500 index from the period 2004w17 to 2014w18, which we present below. The graph indicates that there were high-volatility periods in 2008 to 2009 and in late 2011. It would be unreasonable to assume that the variance in this high-volatility state is equal to the variance in the low-volatility state. 0 2 absolute returns 4 6 8 10 Absolute Returns of S&P 500 index 2004w1 2006w1 2008w1 2010w1 weekly time variable 2012w1 2014w1 mswitch — Markov-switching regression models 439 Below we fit areturns, the absolute returns, with an MSDR model in which the coefficients on the lagged dependent variable and the variances differ by state. . use http://www.stata-press.com/data/r14/snp500 (Federal Reserve Economic Data - St. Louis Fed) . mswitch dr areturns, switch(L.areturns) varswitch Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -753.27687 Iteration 1: log likelihood = -746.54052 Iteration 2: log likelihood = -745.80829 Iteration 3: log likelihood = -745.7977 Iteration 4: log likelihood = -745.7977 Markov-switching dynamic regression Sample: 2004w19 - 2014w18 No. of obs Number of states = 2 AIC Unconditional probabilities: transition HQIC SBIC Log likelihood = -745.7977 areturns Coef. State1 areturns L1. = = = = 520 2.8992 2.9249 2.9647 Std. Err. z P>|z| [95% Conf. Interval] .0790744 .0301862 2.62 0.009 .0199105 .1382384 _cons .7641424 .0782852 9.76 0.000 .6107062 .9175785 State2 areturns L1. .527953 .0857841 6.15 0.000 .3598193 .6960867 _cons 1.972771 .2784204 7.09 0.000 1.427077 2.518465 sigma1 .5895792 .0517753 .4963544 .7003135 sigma2 1.605333 .1262679 1.375985 1.872908 p11 .7530865 .0634387 .6097999 .856167 p21 .6825357 .0662574 .5414358 .7965346 The estimated standard deviations, reported in sigma1 and sigma2, indicate that state 1 corresponds to the low-volatility periods and that state 2 corresponds to the high-volatility periods. Example 5: An MSDR model of population health We can apply these same methods to noneconomic data that exhibit similar periods of high and low volatility. For example, in public health and epidemiology, the process that determines the incidence of disease over time may evolve with changes in health practices. In mumpspc.dta, we have monthly data on the number of new mumps cases and the interpolated population in New York City between January 1928 to December 1972. The mumpspc variable represents the number of new mumps cases per 10,000 residents. We apply seasonal differencing to the population-adjusted mumpspc variable using time-series operators, and we plot the resulting series; see [U] 11.4.4 Time-series varlists. 440 mswitch — Markov-switching regression models −2 Seasonally differenced mumps cases −1 0 1 2 Population−adjusted number of new mumps cases in NYC 1930m1 1940m1 1950m1 1960m1 monthly time variable 1970m1 The data clearly show periods of high and low volatility. We fit a two-state MSDR model to the seasonally differenced dependent variable with state-dependent variances and state-dependent coefficients on the lagged dependent variable. . use http://www.stata-press.com/data/r14/mumpspc (Mumps data from Hipel and Mcleod (1994) with interpolated population) . mswitch dr S12.mumpspc, varswitch switch(LS12.mumpspc, noconstant) Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = 110.9372 (not concave) Iteration 1: log likelihood = 120.68028 Iteration 2: log likelihood = 121.86307 Iteration 3: log likelihood = 131.3275 Iteration 4: log likelihood = 131.72159 Iteration 5: log likelihood = 131.7225 Iteration 6: log likelihood = 131.7225 Markov-switching dynamic regression Sample: 1929m2 - 1972m6 No. of obs = 521 Number of states = 2 AIC = -0.4826 Unconditional probabilities: transition HQIC = -0.4634 SBIC = -0.4336 Log likelihood = 131.7225 S12.mumpspc Coef. Std. Err. State1 mumpspc LS12. .420275 .0167461 State2 mumpspc LS12. .9847369 .0258383 sigma1 .0562405 sigma2 z P>|z| [95% Conf. Interval] 25.10 0.000 .3874533 .4530968 38.11 0.000 .9340947 1.035379 .0050954 .0470901 .067169 .2611362 .0111191 .2402278 .2838644 p11 .762733 .0362619 .6846007 .8264175 p21 .1473767 .0257599 .1036675 .2052939 mswitch — Markov-switching regression models 441 The output indicates that there are two distinct states; state 1 is the low-volatility state and state 2 is the high-volatility state. While lagged seasonally differenced number of mumps cases are a significant predictor of current seasonally differenced cases, the estimates differ between states. Both states are persistent. Markov-switching AR In this section, we use a series of examples to describe MSAR models and the mswitch ar command. MSAR models allow a gradual adjustment after the process changes state. These models are often used to model quarterly and lower-frequency data. An MSAR model with two state-dependent AR terms for the dependent variable that is in state s at time t is yt = µst + xt α + zt βst + φ1,st (yt−1 − µst−1 − xt−1 α − zt−1 βst−1 ) + φ2,st (yt−2 − µst−2 − xt−2 α − zt−2 βst−2 ) + εst where yt is the dependent variable at time t; µst is the state-dependent intercept; xt are covariates whose coefficients α are state invariant; zt are covariates whose coefficients βst are state-dependent; φ1,st is the first AR term in state st ; φ2,st is the second AR term in state st ; and εst is the i.i.d. normal error with mean 0 and state-dependent variance. As in MSDR models, xt and zt may contain lags of yt . Note that µst−1 is the intercept corresponding to the state that the process was in the previous period and that µst−2 is the intercept corresponding to the state that the process was in at t − 2. Similarly, βst−1 is the coefficient vector on zt−1 corresponding to the state that the process was in the previous period, and βst−2 is the coefficient vector on zt−2 corresponding to the state that the process was in at t − 2. In the default model fit by mswitch ar, s = 2 and a constant σ 2 is assumed (σ12 = σ22 = σ 2 ). In the simplest form, a single AR term is specified and the coefficient is common to both states, so four parameters, µ1 , µ2 , φ, and σ 2 , are estimated. There is no xt or zt . The number of AR terms may be increased by specifying a numlist in ar(). To allow the estimated parameters for the AR terms to vary across states, you specify option arswitch. The number of states may be increased with option state(). To include xt , you specify a varlist after the command name, and to include zt , you specify option switch(). The assumption of constant variances over states may be relaxed with option varswitch. MSAR models allow states to switch according to a Markov process, as described in Methods and formulas under Markov-switching regression models. A more complete discussion of the MSAR model is provided in Specification of Markov-switching models under Methods and formulas. 442 mswitch — Markov-switching regression models Example 6: MSAR model with switching intercepts Hamilton (1989) and Hamilton (1994, chap. 22) fit an MSAR to the growth of quarterly U.S. real gross national product using data from 1952q1 to 1984q4. We replicate those results here using rgnp.dta. . use http://www.stata-press.com/data/r14/rgnp (Data from Hamilton (1989)) . mswitch ar rgnp, ar(1/4) Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -182.54411 Iteration 1: log likelihood = -182.12714 Iteration 2: log likelihood = -181.68653 Iteration 3: log likelihood = -181.42342 Iteration 4: log likelihood = -181.26492 Iteration 5: log likelihood = -181.26339 Iteration 6: log likelihood = -181.26339 (not concave) (not concave) Markov-switching autoregression Sample: 1952q2 - 1984q4 Number of states = 2 Unconditional probabilities: transition No. of obs AIC HQIC SBIC = = = = 131 2.9048 2.9851 3.1023 Log likelihood = -181.26339 rgnp Coef. Std. Err. z P>|z| [95% Conf. Interval] ar L1. L2. L3. L4. .0134871 -.0575211 -.2469833 -.2129214 .1199942 .137663 .1069103 .1105311 0.11 -0.42 -2.31 -1.93 0.911 0.676 0.021 0.054 -.2216971 -.3273357 -.4565235 -.4295583 .2486713 .2122934 -.037443 .0037155 _cons -.3588127 .2645396 -1.36 0.175 -.8773007 .1596754 _cons 1.163517 .0745187 15.61 0.000 1.017463 1.309571 sigma .7690048 .0667396 .6487179 .9115957 p11 .754671 .0965189 .5254555 .8952432 p21 .0959153 .0377362 .0432569 .1993222 rgnp State1 State2 The output indicates that the average growth rate of U.S. real gross national product during expansions is 1.16% and during recessions is −0.36%, with each state being persistent. Example 7: Switching AR coefficients Continuing example 6, we now fit an MSAR with state-dependent AR coefficients to the same dataset. We include only two AR terms in each state to better estimate the parameters. mswitch — Markov-switching regression models . mswitch ar rgnp, ar(1/2) arswitch Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -179.68471 Iteration 1: log likelihood = -179.56238 Iteration 2: log likelihood = -179.32917 Iteration 3: log likelihood = -179.32356 Iteration 4: log likelihood = -179.32354 Iteration 5: log likelihood = -179.32354 Markov-switching autoregression Sample: 1951q4 - 1984q4 Number of states = 2 Unconditional probabilities: transition No. of obs AIC HQIC SBIC = = = = 443 133 2.8319 2.9114 3.0275 Log likelihood = -179.32354 rgnp Coef. Std. Err. z P>|z| [95% Conf. Interval] ar L1. L2. .3710719 .7002937 .1754383 .187409 2.12 3.74 0.034 0.000 .0272191 .3329787 .7149246 1.067609 _cons -.0055216 .2057086 -0.03 0.979 -.408703 .3976599 ar L1. L2. .4621503 -.3206652 .1652473 .1295937 2.80 -2.47 0.005 0.013 .1382714 -.5746642 .7860291 -.0666661 _cons 1.195482 .1225987 9.75 0.000 .9551925 1.435771 sigma .6677098 .0719638 .5405648 .8247604 p11 .3812383 .1424841 .1586724 .6680876 p21 .3564492 .0994742 .1914324 .5644178 State1 State2 The results indicate that state 1 has negative average growth that is different than the positive average growth in state 2. The AR coefficients for state 1 indicate that shocks will die out very slowly, while the AR coefficients for state 2 indicate that shocks will die out moderately quickly. In other words, shocks in a recession last a long time, while shocks in an expansion die out moderately quickly. Example 8: Markov-switching regression model with constraints mswitch can fit models subject to constraints. To facilitate the optimization, mswitch estimates a logit transform of the transition probabilities (see Methods and formulas) and a log transformation of the variance parameter. Therefore, all constraints must be specified to the transformed parameter. In example 6, the estimated transition probability of staying in state 1 was about 0.75. In this example, we constrain that probability to be 0.75 and estimate the remaining parameters. For this case, the transformed value is q = −ln(0.75/0.25) = −1.0986123. We use the constraint command to define this constraint; see [R] constraint. 444 mswitch — Markov-switching regression models . constraint 1 [p11]_cons = -1.0986123 . mswitch ar rgnp, ar(1/4) constraints(1) Performing EM optimization: Performing gradient-based optimization: Iteration Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: 6: log log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = = -182.86708 -182.05084 -181.79995 -181.29355 -181.26463 -181.26456 -181.26456 (not concave) Markov-switching autoregression Sample: 1952q2 - 1984q4 Number of states = 2 Unconditional probabilities: transition No. of obs AIC HQIC SBIC = = = = 131 2.8895 2.9609 3.0651 Log likelihood = -181.26456 ( 1) [p11]_cons = -1.098612 rgnp Coef. Std. Err. z P>|z| [95% Conf. Interval] ar L1. L2. L3. L4. .0133924 -.0591073 -.2473259 -.2130605 .1196067 .133834 .1067244 .1106088 0.11 -0.44 -2.32 -1.93 0.911 0.659 0.020 0.054 -.2210324 -.3214172 -.456502 -.4298498 .2478172 .2032026 -.0381499 .0037288 _cons -.3648129 .23039 -1.58 0.113 -.816369 .0867432 _cons 1.163125 .0738402 15.75 0.000 1.018401 1.307849 sigma .7682327 .0644585 .6517376 .9055508 p11 .75 p21 .0962226 .0439668 .1977399 rgnp State1 State2 (constrained) .037246 The point estimates are similar to those reported in example 6 while the standard errors reported here are slightly smaller. Note that an MSAR model with no AR terms is equivalent to estimating an MSDR model, so typing . mswitch ar rgnp, ar(0) is the same as typing . mswitch dr rgnp Technical note Achieving convergence in Markov-switching models can be difficult due to the existence of multiple local minima. Furthermore, a model with switching variance is able to generate a likelihood function that is unbounded when µ = yi and σ 2 → 0 (Frühwirth-Schnatter 2006, chap. 6). Four methods for mswitch — Markov-switching regression models 445 overcoming convergence problems are 1) selecting an alternate optimization algorithm by using the technique() option; 2) using alternative starting values by specifying the from() option; 3) using starting values obtained by estimating the parameters of a restricted version of the model of interest; and 4) transforming the variables to be on the same scale. Stored results mswitch stores the following in e(): Scalars e(N) e(N gaps) e(k) e(k eq) e(k aux) e(states) e(ll) e(rank) e(aic) e(hqic) e(sbic) e(tmin) e(tmax) e(emiter) Macros e(cmd) e(cmdline) e(eqnames) e(depvar) e(switchvars) e(nonswitchvars) e(model) e(title) e(tsfmt) e(timevar) e(tmins) e(tmaxs) e(vce) e(vcetype) e(technique) e(p0) e(varswitch) e(arswitch) e(ar) e(properties) e(estat cmd) e(predict) e(marginsnotok) e(asbalanced) e(asobserved) Matrices e(b) e(Cns) e(bf) e(V) e(V modelbased) e(initvals) e(uncprob) Functions e(sample) number of observations number of gaps number of parameters number of equations in e(b) number of auxiliary parameters number of states log likelihood rank of e(V) Akaike information criterion Hannan–Quinn information criterion Schwarz–Bayesian information criterion minimum time maximum time number of EM iterations mswitch command as typed names of equations name of dependent variable list of switching variables list of nonswitching variables dr or ar title in estimation output format for the current time variable time variable specified in tsset formatted minimum time formatted maximum time vcetype specified in vce() title use to label Std. Err. maximization technique unconditional probabilities varswitch, if specified arswitch, if specified list of AR lags, if ar() is specified b V program used to implement estat program used to implement predict predictions disallowed by margins factor variables fvset as asbalanced factor variables fvset as asobserved coefficient vector constraints matrix constrained coefficient vector variance–covariance matrix of the estimators model-based variance matrix of initial values matrix of unconditional probabilities marks estimation sample 446 mswitch — Markov-switching regression models Methods and formulas Methods and formulas are presented under the following headings: Markov-switching regression models Markov chains Specification of Markov-switching models Markov-switching dynamic regression Markov-switching AR Likelihood function with latent states Smoothed probabilities Unconditional probabilities Markov-switching regression models Consider the evolution of yt , where t = 1, 2, . . . , T , that is characterized by two states or regimes as in the models below State 1 : yt = µ1 + φyt−1 + εt State 2 : yt = µ2 + φyt−1 + εt where µ1 and µ2 are the intercept terms in state 1 and state 2, respectively; φ is the AR parameter; and εt is a white noise error with variance σ 2 . The two states model abrupt shifts in the intercept term. If the timing of switches is known, the above model can be expressed as yt = st µ1 + (1 − st )µ2 + φyt−1 + εt where st is 1 if the process is in state 1 and 0 otherwise. Estimation in this case can be performed using standard procedures. In the case of interest, we never know in which state the process is; that is to say, st is not observed. Markov-switching regression models specify that the unobserved st follows a Markov chain. In the simplest case, we can express this model as a state-dependent intercept term for k states yt = µst + φyt−1 + εt where µst = µ1 when st = 1, µst = µ2 when st = 2, . . . , and µst = µk when st = k . The conditional density of yt is assumed to be dependent only on the realization of the current state st and is given by f (yt |st = i, yt−1 ; θ), where θ is a vector of parameters. There are k conditional densities for k states, and estimation of θ is performed by updating the conditional likelihood using a nonlinear filter. Markov chains st is an irreducible, aperiodic Markov chain starting from its ergodic distribution π = (π1 , . . . , πk ); see Hamilton (1994, chap. 22). The probability that st is equal to j ∈ (1, . . . , k) depends only on the most recent realization, st−1 , and is given by Pr(st = j|st−1 = i) = pij All possible transitions from one state to the other can be collected in a k × k transition matrix   p11 . . . pk1  p12 . . . pk2  P= ..  ..  ..  . . . p1k . . . pkk mswitch — Markov-switching regression models 447 which governs the evolution of the Markov chain. All elements of P are nonnegative and each column sums to 1. For an excellent introduction on the properties of Markov chains, refer to Hamilton (1994, chap. 22) and Frühwirth-Schnatter (2006, chap. 10). For a deeper treatment, see Karlin and Taylor (1975, chap. 2 and 3). We estimate the probabilities using the following parameterization pij = exp(−qij ) 1 + exp(−qi1 ) + · · · + exp(−qij ) pik = 1 1 + exp(−qi1 ) + · · · + exp(−qij ) where j ∈ (1, . . . , k − 1) and q is the transformed parameter and may be computed as  qij = − log pij pik  Specification of Markov-switching models Consider an AR(1) model given by yt = µ + φyt−1 + εt This model can be rewritten in terms of an AR(1) error specification as yt = ν + et et = ρet−1 + εt which can be written as the single equation yt = ν + ρ(yt−1 − ν) + εt such that φ = ρ and µ = ν(1 − ρ). This result, however, does not hold in the case of Markov-switching regression models, as seen below in a simple two-state case where the constant term is state dependent. Consider the following models: Model I : yt = µst + φyt−1 + εt Model II : yt = µst + φ(yt−1 − µst−1 ) + εt Model I is also referred to as a MSDR model or a Markov-switching intercept model (Krolzig 1997). It may consist of other switching parameters, but for simplicity, we only consider the switching-intercept case. The evolution of yt depends on the realization of the switching intercept at time t. The discrete latent state st that governs the value of the intercept at time t has a transition matrix  P= p11 p12 p21 p22  This specification allows for two possible intercepts at any given time t. 448 mswitch — Markov-switching regression models By contrast, the evolution of yt in model II depends on the value of the switching mean at its current state and its lagged value. Model II is also referred to as MSAR or Markov-switching mean (Krolzig 1997). At any given time t, there are four possible values of the intercept given by µ1 − ρµ1 µ2 − ρµ1 µ1 − ρµ2 µ2 − ρµ2 which implies that models I and II do not yield equivalent representations as compared with the AR(1) model with no switching. The MA(∞) representation shown below better illustrates the different dynamics of yt obtained as a result of these specifications. Model I : yt = ∞ X ∞ X φi µst−i + i=0 i=0 Model II : yt = µst + φi ε t ∞ X φi εt i=0 In model I, the effect of a one-time change in state accumulates over time similar to a permanent shift in the error term εt . In model II, the effect of a one-time change in state is the same for all time periods. Also see Hamilton (1993). Model II allows yt to depend on lagged values of the state st−1 , which in turn leads to four conditional densities. We define a new state variable s∗t such that s∗t is a four-state Markov chain and yt depends only on the current state as s∗t = 1 if st = 1 and st−1 = 1 s∗t s∗t s∗t =2 if st = 2 and st−1 = 1 =3 if st = 1 and st−1 = 2 =4 if st = 2 and st−1 = 2 The corresponding 4 × 4 transition matrix is p11  p12 P= 0 0  0 0 p21 p22 p11 p12 0 0  0 0   p21 p22 The conditional density of yt is given by f (yt |s∗t = i, yt−1 ; θ) for i = 1, . . . , 4. Also see Hamilton (1994, chap. 22). More generally, for MSAR models, s∗t is a k (p+1) -state Markov chain, where p is the number of lagged states. Because MSAR models require larger state vectors, they are often used with low-frequency data and smaller AR lags. However, the state vector in MSDR models do not depend on the AR lags and can thus be used to accommodate high-frequency data and higher AR lags. mswitch — Markov-switching regression models 449 Markov-switching dynamic regression A general specification of the MSDR model is written as yt = µs + xt α + zt βs + εs where yt is the dependent variable, µs is the state-dependent intercept, xt is a vector of exogenous variables with state-invariant coefficients α, zt is a vector of exogenous variables with state-dependent coefficients βs , and εs is an i.i.d. normal error with mean 0 and state-dependent variance σs2 . xt and zt may contain lags of yt . Markov-switching AR A general specification of the MSAR model is written as yt = µst + xt α + zt βst + p X φi,st (yt−i − µst−i − xt−i α − zt−i βst−i ) + εst i=1 where yt is the dependent variable at time t, µst is the state-dependent intercept, xt are covariates whose coefficients α are state-invariant, and zt are covariates whose coefficients βst are state-dependent. As in MSDR models, xt and zt may contain lags of yt . φi,st is the ith AR term in state st . Note that µst−i is the intercept corresponding to the state that the process was in at period t − i. Similarly, βst−i is the coefficient vector on zt−i corresponding to the state that the process was in at period t − i. εst is the i.i.d. normal error with mean 0 and state-dependent variance. This representation clarifies that the demeaned, lagged errors depend on the state previously occupied by the process. This dependence is not present in the MSDR model. Likelihood function with latent states The conditional density of yt is given by f (yt |st = i, yt−1 ; θ) for i = 1, . . . , k . The marginal density of yt is obtained by weighting the conditional densities by their respective probabilities. This is written as follows: f (yt |θ) = k X f (yt |st = i, yt−1 ; θ) Pr(st = i; θ) i=1 Let ηt denote a k × 1 vector of conditional densities given by  f (y |s = 1; y ; θ)  t t t−1  f (yt |st = 2; yt−1 ; θ)   ηt =  ..   . f (yt |st = k; yt−1 ; θ) Constructing the likelihood function requires estimating the probability that st takes on a specific value using the data through time t and model parameters θ. Let Pr(st = i|yt ; θ) denote the conditional probability of observing st = i based on data until time t. Then Pr(st = i|yt ; θ) = f (yt |st = i, yt−1 ; θ) Pr(st = i|yt−1 ; θ) f (yt |yt−1 ; θ) 450 mswitch — Markov-switching regression models where f (yt |yt−1 ; θ) is the likelihood of yt and Pr(st = i|yt−1 ; θ) is the forecasted probability of st = i given observation until time t − 1. Then Pr(st = 1|yt−1 ; θ) = k X Pr(st = i|st−1 = j, yt−1 ; θ) Pr(st−1 = j|yt−1 ; θ) j=1 Let ξt|t and ξt|t−1 denote k × 1 vectors of conditional probabilities Pr(st = i|yt ; θ) and Pr(st = i|yt−1 ; θ). The likelihood is then obtained by iterating on the following equations [Hamilton (1994, chap. 22)]: (ξt|t−1 ηt ) ξt|t = 0 1 (ξt|t−1 ηt ) ξt+1|t = P ξt|t where 1 is a k × 1 vector of 1s. The log-likelihood function is obtained as L(θ) = T X log f (yt |yt−1 ; θ) t=1 where f (yt |yt−1 ; θ) = 10 (ξt|t−1 ηt ) Smoothed probabilities Let ξt|T , where t < T , denote the k × 1 vector of conditional probabilities Pr(st = i|yT ; θ), which represents the probability of st = i using observations available through time T . The smoothed probabilities are calculated using an algorithm developed in Kim (1994). {P0 (ξt+1|T (÷)ξt+1|t )} ξt|T = ξt|t where (÷) denotes element-by-element division. The smoothed probabilities are obtained by iterating backwards from t = T − 1, T − 2, . . . , 1. Unconditional probabilities The log-likelihood function has a recursive structure that starts from the unconditional state probabilities ξ1|0 . These unconditional state probabilities are unknown. There are three standard ways to obtain them. By default, or by option p0(transition), the unconditional state probabilities are estimated from the conditional transition probabilities and the Markov structure of the model. Specifically, the vector of unconditional state probabilities is obtained as π = (A0 A)−1 A0 ek+1 where A is a (k + 1) × k matrix given by  A= Ik − P 10  Ik denotes a k × k identity matrix, and ek denotes the k th column of Ik . mswitch — Markov-switching regression models 451 Sometimes researchers prefer to estimate unconditional state probabilities by adding k − 1 additional parameters to the model. This method is seldom used because it requires enough observations to estimate the additional parameters. mswitch uses this method when option p0(smoothed) is specified. Sometimes researchers prefer to set the unconditional state probabilities to 1/k . mswitch uses this method when option p0(fixed) is specified. References Engel, C., and J. D. Hamilton. 1990. Long swings in the dollar: Are they in the data and do markets know it? American Economic Review 80: 689–713. Frühwirth-Schnatter, S. 2006. Finite Mixture and Markov Switching Models. New York: Springer. Garcia, R., and P. Perron. 1996. An analysis of the real interest rate under regime shifts. Review of Economics and Statistics 78: 111–125. Goldfeld, S. M., and R. E. Quandt. 1973. A Markov model for switching regressions. Journal of Econometrics 1: 3–15. Guidolin, M. 2011a. Markov switching in portfolio choice and asset pricing models: A survey. In Advances in Econometrics: Vol. 27B—Missing Data Methods: Time-series Methods and Applications, ed. D. M. Drukker, 87–178. Bingley, UK: Emerald. . 2011b. Markov switching models in empirical finance. In Advances in Econometrics: Vol. 27B—Missing Data Methods: Time-series Methods and Applications, ed. D. M. Drukker, 1–86. Bingley, UK: Emerald. Hamaker, E. L., R. P. P. P. Grasman, and J. H. Kamphuis. 2010. Regime-switching models to study psychological processes. In Individual Pathways of Change: Statistical Models for Analyzing Learning and Development, ed. P. C. Molenaar and K. M. Newell, 155–168. Washington, DC: American Psychological Association. Hamilton, J. D. 1989. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57: 357–384. . 1990. Analysis of time series subject to changes in regime. Journal of Econometrics 45: 39–70. . 1993. Estimation, inference and forecasting of time series subject to changes in regime. In Handbook of Statistics 11: Econometrics, ed. G. S. Maddala, C. R. Rao, and H. D. Vinod, 231–260. San Diego, CA: Elseiver. . 1994. Time Series Analysis. Princeton: Princeton University Press. Jones, B. D., C.-J. Kim, and R. Startz. 2010. Does congress realign or smoothly adjust? A discrete switching model of congressional partisan regimes. Statistical Methodology 7: 254–276. Karlin, S., and H. M. Taylor. 1975. A First Course in Stochastic Processes. 2nd ed. San Diego, CA: Elseiver. Kim, C.-J. 1994. Dynamic linear models with Markov-switching. Journal of Econometrics 60: 1–22. Kim, C.-J., C. R. Nelson, and R. Startz. 1998. Testing for mean reversion in heteroskedastic data based on Gibbs-sampling-augmented randomization. Journal of Empirical Finance 5: 131–154. Krolzig, H.-M. 1997. Markov-Switching Vector Autoregressions: Modelling, Statistical Inference, and Application to Business Cycle Analysis. New York: Springer. Lu, H.-M., D. Zeng, and H. Chen. 2010. Prospective infectious disease outbreak detection using Markov switching models. IEEE Transactions on Knowledge and Data Engineering 22: 565–577. Martinez-Beneito, M. A., D. Conesa, A. López-Quilez, and A. López-Maside. 2008. Bayesian Markov switching models for the early detection of influenza epidemics. Statistics in Medicine 27: 4455–4468. Quandt, R. E. 1972. A new approach to estimating switching regressions. Journal of the American Statistical Association 67: 306–310. 452 mswitch — Markov-switching regression models Also see [TS] mswitch postestimation — Postestimation tools for mswitch [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] sspace — State-space models [TS] tsset — Declare data to be time-series data [TS] ucm — Unobserved-components model [U] 20 Estimation and postestimation commands Title mswitch postestimation — Postestimation tools for mswitch Postestimation commands Remarks and examples References predict Stored results Also see estat Methods and formulas Postestimation commands The following postestimation commands are of special interest after mswitch: Command Description estat transition estat duration display transition probabilities in a table display expected duration of states in a table The following standard postestimation commands are also available: Command Description contrast estat ic estat summarize estat vce estimates forecast lincom contrasts and ANOVA-style joint tests of estimates Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions pairwise comparisons of estimates Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest nlcom predict predictnl pwcompare test testnl 453 454 mswitch postestimation — Postestimation tools for mswitch predict Description for predict predict creates new variables containing predictions such as predicted values, probabilities, residuals, and standardized residuals. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic { stub* | newvarlist }  if   in   , statistic options  Description Main yhat xb pr residuals rstandard predicted values; the default equation-specific predicted values; default is predicted values for the first equation compute probabilities of being in a given state; default is one-step-ahead probabilities residuals standardized residuals These statistics are available both in and out of sample; type predict the estimation sample. options . . . if e(sample) . . . if wanted only for Description Options method for predicting unobserved states; specify one of onestep, filter, or smooth; default is smethod(onestep) rmse(stub* | newvarlist) put estimated root mean squared errors of predicted statistics in new variables dynamic(time constant) begin dynamic forecast at specified time equation(eqnames) names of equations for which predictions are to be made smethod(method) method Description onestep filter smooth predict using past information predict using past and contemporaneous information predict using all sample information mswitch postestimation — Postestimation tools for mswitch 455 Options for predict  Main  yhat, xb, pr, residuals, and rstandard specify the statistic to be predicted. yhat, the default, calculates the weighted and state-specific linear predictions of the observed variables. xb calculates the equation-specific linear predictions of the observed variables. pr calculates the probabilities of being in a given state. residuals calculates the residuals in the equations for observable variables. rstandard calculates the standardized residuals, which are the residuals normalized to have unit variances.   Options smethod(method) specifies the method for predicting the unobserved states; smethod(onestep), smethod(filter), and smethod(smooth) allow different amounts of information on the dependent variables to be used in predicting the states at each time period. smethod() may not be specified with xb. smethod(onestep), the default, causes predict to estimate the states at each time period using previous information on the dependent variables. The nonlinear filter is performed on previous periods, but only the one-step predictions are made for the current period. smethod(filter) causes predict to estimate the states at each time period using previous and contemporaneous data by using the nonlinear filter. The filtering is performed on previous periods and the current period. smethod(smooth) causes predict to estimate the states at each time period using all sample data by using the smoothing algorithm. rmse(stub* | newvarlist) puts the root mean squared errors of the predicted statistics into the specified new variables. The root mean squared errors measure the variances due to the disturbances but do not account for estimation error. dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified time constant must be in the scale of the time variable specified in tsset, and the time constant must be inside a sample for which observations on the dependent variables are available. For example, dynamic(tq(2014q4)) causes dynamic predictions to begin in the fourth quarter of 2014, assuming that the time variable is quarterly; see [D] datetime. If the model contains exogenous variables, they must be present for the whole predicted sample. dynamic() may not be specified with xb, pr, residuals, or rstandard. equation(eqnames) specifies the equations for which the predictions are to be calculated. If you do not specify equation() or stub*, the results are the same as if you had specified the name of the first equation for the predicted statistic. equation() may be specified with xb only. You specify a list of equation names, such as equation(income consumption) or equation(factor1 factor2), to identify the equations. equation() may not be specified with stub*. 456 mswitch postestimation — Postestimation tools for mswitch estat Description for estat estat transition displays all of the transition probabilities in tabular form. estat duration computes the expected duration that the process spends in each state and displays the results in a table. Menu for estat Statistics > Postestimation Syntax for estat Display transition probabilities in a table   estat transition , level(#) Display expected duration of states in a table   estat duration , level(#) Option for estat level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals. Remarks and examples Remarks are presented under the following headings: One-step predictions Dynamic predictions Model fit and state predictions We assume that you have already read [TS] mswitch. In this entry, we illustrate some of the features of predict after using mswitch to estimate the parameters of a Markov-switching model. All the predictions after mswitch depend on the unobserved states, which are estimated recursively using a nonlinear filter. Changing the sample can alter the state estimates, which can change all other predictions. One-step predictions One-step predictions in a Markov-switching model are the forecasted values of the dependent variable using one-step-ahead predicted probabilities. mswitch postestimation — Postestimation tools for mswitch 457 Example 1: One-step predictions for a series In example 3 of [TS] mswitch, we estimated the parameters of a Markov-switching dynamic regression for the federal funds rate fedfunds as a function of its lag, the output gap ogap, and inflation. . use http://www.stata-press.com/data/r14/usmacro (Federal Reserve Economic Data - St. Louis Fed) . mswitch dr fedfunds, switch(L.fedfunds ogap inflation, noconstant) (output omitted ) We obtain the one-step predictions for the dependent variable using the default settings for predict. The predictions are stored in the new variable fedf. . predict fedf (option yhat assumed; predicted values) (4 missing values generated) Next, we graph the actual values, fedfunds, and predicted values, fedf, using tsline. We change the label for fedf to “Predicted values”; see [TS] tsline. 0 5 10 15 20 . tsline fedfunds fedf, legend(label(2 "Predicted values")) 1950q1 1960q1 1970q1 1980q1 1990q1 date (quarters) Federal funds rate 2000q1 2010q1 Predicted values The graph shows that one-step-ahead predictions account for large swings in the federal funds rate. Example 2: State-specific one-step predictions Continuing example 1, we may also wish to obtain state-specific predictions. This allows us to compare the predictions obtained for different states. Note that this time, we specify fedf* rather than fedf so that predict generates two state-specific predictions with the prefix fedf instead of a single weighted prediction. Also note that the predicted values obtained in example 1 are the weighted average of the state-specific predictions, the weights being the one-step-ahead probabilities. 458 mswitch postestimation — Postestimation tools for mswitch . predict fedf* (option yhat assumed; predicted values) (4 missing values generated) 0 5 10 15 20 . tsline fedf1 fedf2, legend(label(1 "State 1 predictions") > label(2 "State 2 predictions")) 1950q1 1960q1 1970q1 1980q1 1990q1 date (quarters) State 1 predictions 2000q1 2010q1 State 2 predictions The graph shows that, as expected, the predicted values of fedfunds are higher in state 2, the high-interest rate state, than in state 1, the moderate-interest rate state. Dynamic predictions Dynamic predictions are out-of-sample forecasted values of the dependent variable using one-stepahead probabilities. Example 3: Dynamic predictions for Markov-switching autoregression In example 6 of [TS] mswitch, we estimated the parameters of a Markov-switching autoregression for the U.S. real gross national product as a function of its own lags. . use http://www.stata-press.com/data/r14/rgnp, clear (Data from Hamilton (1989)) . mswitch ar rgnp, ar(1/4) (output omitted ) To obtain dynamic predictions, we use predict with the dynamic() option. The dynamic() option requires that all exogenous variables be present for the whole predicted sample. In this example, we have not specified any exogenous variables, so we do not check for that. However, we do need to have time values available for the predictions. So before submitting our predict command, we use tsappend to extend the dataset by eight periods. Within dynamic(), we specify that dynamic predictions will begin in the first quarter of 1985, and we use the convenience function tq() to convert 1985q1 into a numeric date that Stata understands; see [FN] Date and time functions. mswitch postestimation — Postestimation tools for mswitch 459 . tsappend, add(8) . predict rgnp_f, dynamic(tq(1985q1)) (option yhat assumed; predicted values) (13 missing values generated) We again use tsline to plot the in- and out-of-sample predictions. We restrict the range to quarters 1982q1 to 1986q4 using function tin(). −2 −1 Out−of−sample predictions 0 1 2 3 . tsline rgnp_f if tin(1982q1,1986q4), ytitle("Out-of-sample predictions") > tline(1985q1) 1982q1 1983q1 1984q1 1985q1 date (quarters) 1986q1 1987q1 The vertical line shows where our out-of-sample prediction begins. Model fit and state predictions Example 4: Assessing model fit In this example, we examine the model fit by comparing the fitted values of U.S. real gross national product and the residuals with the actual data. The fitted values are obtained using smoothed probabilities that consider all sample information. . predict yhat, smethod(smooth) (option yhat assumed; predicted values) (13 missing values generated) . predict res, residuals smethod(smooth) (13 missing values generated) . tsline yhat res rgnp, legend(label(1 "Fitted values") label(2 "Residuals")) mswitch postestimation — Postestimation tools for mswitch −2 −1 0 1 2 3 460 1950q1 1955q1 1960q1 1965q1 1970q1 1975q1 date (quarters) Fitted values US real gross national product 1980q1 1985q1 Residuals We see in the graph above that we do not obtain a good fit; the residuals account for much of the variation in the dependent variable. Example 5: Filtered probabilities Continuing example 4, recall that the states in the model correspond to recession periods and expansion periods for the U.S. economy. State 1 was the recession state. Here we compare the predicted probability of being in state 1 with the National Bureau of Economic Research recession periods stored in the indicator variable recession. To obtain the filtered probabilities, typically used to predict state probabilities, we specify options pr and smethod(filter) with predict. 0 .2 .4 .6 .8 1 . predict fprob, pr smethod(filter) (13 missing values generated) . tsline fprob recession 1950q1 1955q1 1960q1 1965q1 1970q1 1975q1 date (quarters) filter probabilities 1980q1 1985q1 NBER recession indicator mswitch postestimation — Postestimation tools for mswitch 461 The predictions of recession and expansion states fit well with the NBER dates. Thus, it appears that while our model does not have good fit, it does a good job of predicting the probability of being in a given state. We could also have specified smethod(smooth) to obtain better estimates of the state probability using all sample information. Example 6: Expected duration Rather than predicting which state the series is in at a point in time, we may wish to know the average time it spends in a given state. We can compute the expected duration of the process being in a given state and show the result in a table using estat duration. Continuing example 5, we can calculate the average length of recession periods and expansion periods for the U.S. economy. . estat duration Number of obs = 131 Expected Duration Estimate Std. Err. [95% Conf. Interval] State1 4.076159 1.603668 2.107284 9.545915 State2 10.42587 4.101872 5.017004 23.11771 The table indicates that state 1, the recession state, will typically persist for about 4 quarters and state 2, the expansion state, will persist for about 10 quarters. Stored results estat transition stores the following in r(): Scalars r(level) Macros r(label#) Matrices r(prob) r(se) r(ci#) confidence level of confidence intervals label for transition probability vector of transition probabilities vector of standard errors of transition probabilities vector of confidence interval (lower and upper) for transition probability estat duration stores the following in r(): Scalars r(d#) r(se#) r(level) Macros r(label#) Matrices r(ci#) expected duration for state # standard error of expected duration for state # confidence level of confidence intervals label for state # vector of confidence interval (lower and upper) for expected duration for state # 462 mswitch postestimation — Postestimation tools for mswitch Methods and formulas Forecasting a Markov-switching model requires estimating the probability of the process being at any given state at each time period. The forecasts are then computed by weighting the state-specific forecasts by the estimated probabilities. Refer to Hamilton (1993) and Davidson (2004) for more details on forecasting Markov-switching regression models. By default and with the smethod(filter) option, predict estimates the probability of being at a state in each period by applying a nonlinear filter on all previous periods and the current period. (See Methods and formulas of [TS] mswitch for the filter equations.) With the smethod(smooth) option, predict estimates the probabilities by applying a smoothing algorithm (Kim 1994) using all the sample information. With the smethod(onestep) option, predict estimates the probabilities using information from all previous periods to make one-step-ahead predictions. The dependent variable is predicted by averaging the state-specific forecasts by the estimated probabilities. The residuals are computed as the difference between the predicted and the realized dependent variable. The standardized residuals are the residuals normalized to have unit variances. Using an if or in qualifier to alter the prediction sample can change the estimate of the unobserved states in the period prior to beginning the dynamic predictions and hence alter the dynamic predictions. The initial values for the probabilities are obtained by calculating the ergodic probabilities from the transition matrix. References Davidson, J. 2004. Forecasting Markov-switching dynamic, conditionally heteroscedastic processes. Statistics and Probability Letters 68: 137–147. Hamilton, J. D. 1993. Estimation, inference and forecasting of time series subject to changes in regime. In Handbook of Statistics 11: Econometrics, ed. G. S. Maddala, C. R. Rao, and H. D. Vinod, 231–260. San Diego, CA: Elseiver. Kim, C.-J. 1994. Dynamic linear models with Markov-switching. Journal of Econometrics 60: 1–22. Also see [TS] mswitch — Markov-switching regression models [U] 20 Estimation and postestimation commands Title newey — Regression with Newey–West standard errors Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description newey produces Newey – West standard errors for coefficients estimated by OLS regression. The error structure is assumed to be heteroskedastic and possibly autocorrelated up to some lag. Quick start OLS regression of y on x1 and x2 with Newey–West standard errors robust to heteroskedasticity and first-order autocorrelation using tsset data newey y x1 x2, lag(1) With heteroskedasticity-robust standard errors newey y x1 x2, lag(0) Menu Statistics > Time series > Regression with Newey-West std. errors Syntax newey depvar options Model ∗ lag(#) noconstant  indepvars   if   in      weight , lag(#) options Description set maximum lag order of autocorrelation suppress constant term Reporting level(#) display options set confidence level; default is level(95) control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling coeflegend display legend instead of statistics ∗ lag(#) is required. You must tsset your data before using newey; see [TS] tsset. indepvars may contain factor variables; see [U] 11.4.3 Factor variables. depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. aweights are allowed; see [U] 11.1.6 weight. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. 463 464 newey — Regression with Newey–West standard errors Options   Model lag(#) specifies the maximum lag to be considered in the autocorrelation structure. If you specify lag(0), the output is the same as regress, vce(robust). lag() is required. noconstant; see [R] estimation options.   Reporting level(#); see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options. The following option is available with newey but is not shown in the dialog box: coeflegend; see [R] estimation options. Remarks and examples The Huber/White/sandwich robust variance estimator (see White [1980]) produces consistent standard errors for OLS regression coefficient estimates in the presence of heteroskedasticity. The Newey – West (1987) variance estimator is an extension that produces consistent estimates when there is autocorrelation in addition to possible heteroskedasticity. The Newey – West variance estimator handles autocorrelation up to and including a lag of m, where m is specified by stipulating the lag() option. Thus, it assumes that any autocorrelation at lags greater than m can be ignored. If lag(0) is specified, the variance estimates produced by newey are simply the Huber/White/sandwich robust variances estimates calculated by regress, vce(robust); see [R] regress. Example 1 newey, lag(0) is equivalent to regress, vce(robust): . use http://www.stata-press.com/data/r14/auto (1978 Automobile Data) . regress price weight displ, vce(robust) Linear regression price Coef. weight displacement _cons 1.823366 2.087054 247.907 . generate t = _n Number of obs F(2, 71) Prob > F R-squared Root MSE Robust Std. Err. t P>|t| .7808755 7.436967 1129.602 2.34 0.28 0.22 0.022 0.780 0.827 = = = = = 74 14.44 0.0000 0.2909 2518.4 [95% Conf. Interval] .2663445 -12.74184 -2004.455 3.380387 16.91595 2500.269 newey — Regression with Newey–West standard errors . tsset t time variable: delta: 465 t, 1 to 74 1 unit . newey price weight displ, lag(0) Regression with Newey-West standard errors maximum lag: 0 price Coef. weight displacement _cons 1.823366 2.087054 247.907 Number of obs = F( 2, 71) = Prob > F = Newey-West Std. Err. t P>|t| .7808755 7.436967 1129.602 2.34 0.28 0.22 0.022 0.780 0.827 74 14.44 0.0000 [95% Conf. Interval] .2663445 -12.74184 -2004.455 3.380387 16.91595 2500.269 Because newey requires the dataset to be tsset, we generated a dummy time variable t, which in this example played no role in the estimation. Example 2 Say that we have time-series measurements on variables usr and idle and now wish to fit an OLS model but obtain Newey – West standard errors allowing for a lag of up to 3: . use http://www.stata-press.com/data/r14/idle2, clear . tsset time time variable: time, 1 to 30 delta: 1 unit . newey usr idle, lag(3) Regression with Newey-West standard errors Number of obs = maximum lag: 3 F( 1, 28) = Prob > F = usr Coef. idle _cons -.2281501 23.13483 Newey-West Std. Err. .0690927 6.327031 t -3.30 3.66 P>|t| 0.003 0.001 30 10.90 0.0026 [95% Conf. Interval] -.3696801 10.17449 -.08662 36.09516 466 newey — Regression with Newey–West standard errors Stored results newey stores the following in e(): Scalars e(N) e(df m) e(df r) e(F) e(lag) e(rank) Macros e(cmd) e(cmdline) e(depvar) e(wtype) e(wexp) e(title) e(vcetype) e(properties) e(estat cmd) e(predict) e(asbalanced) e(asobserved) Matrices e(b) e(Cns) e(V) Functions e(sample) number of observations model degrees of freedom residual degrees of freedom F statistic maximum lag rank of e(V) newey command as typed name of dependent variable weight type weight expression title in estimation output title used to label Std. Err. b V program used to implement estat program used to implement predict factor variables fvset as asbalanced factor variables fvset as asobserved coefficient vector constraints matrix variance–covariance matrix of the estimators marks estimation sample Methods and formulas newey calculates the estimates b OLS = (X0 X)−1 X0 y β d β b OLS ) = (X0 X)−1 X0 Ω b X(X0 X)−1 Var( That is, the coefficient estimates are simply those of OLS linear regression. For lag(0) (no autocorrelation), the variance estimates are calculated using the White formulation: X b X = X0 Ω b 0X = n eb2i x0i xi X0 Ω n−k i b OLS , where xi is the ith row of the X matrix, n is the number of observations, Here ebi = yi − xi β and k is the number of predictors in the model, including the constant if there is one. The above formula is the same as that used by regress, vce(robust) with the regression-like formula (the default) for the multiplier qc ; see Methods and formulas of [R] regress. For lag(m), m > 0, the variance estimates are calculated using the Newey – West (1987) formulation  X m  n n X l 0b 0b X ΩX = X Ω0 X + 1− ebt ebt−l (x0t xt−l + x0t−l xt ) n−k m+1 l=1 where xt is the row of the X matrix observed at time t. t=l+1 newey — Regression with Newey–West standard errors 467  Whitney K. Newey (1954– ) earned degrees in economics at Brigham Young University and MIT. After a period at Princeton, he returned to MIT as a professor in 1990. His interests in theoretical and applied econometrics include bootstrapping, nonparametric estimation of models, semiparametric models, and choosing the number of instrumental variables.  Kenneth D. West (1953– ) earned a bachelor’s degree in economics and mathematics at Wesleyan University and then a PhD in economics at MIT. After a period at Princeton, he joined the University of Wisconsin in 1988. His interests include empirical macroeconomics and timeseries econometrics.  References Hardin, J. W. 1997. sg72: Newey–West standard errors for probit, logit, and poisson models. Stata Technical Bulletin 39: 32–35. Reprinted in Stata Technical Bulletin Reprints, vol. 7, pp. 182–186. College Station, TX: Stata Press. Newey, W. K., and K. D. West. 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55: 703–708. Wang, Q., and N. Wu. 2012. Long-run covariance and its applications in cointegration regression. Stata Journal 12: 515–542. White, H. L., Jr. 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48: 817–838. Also see [TS] newey postestimation — Postestimation tools for newey [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] forecast — Econometric model forecasting [TS] tsset — Declare data to be time-series data [R] regress — Linear regression [U] 20 Estimation and postestimation commands Title newey postestimation — Postestimation tools for newey Postestimation commands Also see predict margins Remarks and examples Postestimation commands The following postestimation commands are available after newey: Command Description contrast estat summarize estat vce estimates forecast lincom contrasts and ANOVA-style joint tests of estimates summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients link test for model specification marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions pairwise comparisons of estimates Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses linktest margins marginsplot nlcom predict predictnl pwcompare test testnl 468 newey postestimation — Postestimation tools for newey 469 predict Description for predict predict creates a new variable containing predictions such as linear predictions and residuals. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic newvar  if   in   , statistic  Description Main xb stdp residuals linear prediction; the default standard error of the linear prediction residuals These statistics are available both in and out of sample; type predict the estimation sample. Options for predict   Main xb, the default, calculates the linear prediction. stdp calculates the standard error of the linear prediction. residuals calculates the residuals. . . . if e(sample) . . . if wanted only for 470 newey postestimation — Postestimation tools for newey margins Description for margins margins estimates margins of response for linear predictions. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . ) statistic Description xb stdp residuals linear prediction; the default not allowed with margins not allowed with margins  options  Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins. Remarks and examples Example 1 We use the test command after newey to illustrate the importance of accounting for the presence of serial correlation in the error term. The dataset contains daily stock returns of three car manufacturers from January 2, 2003, to December 31, 2010, in the variables toyota, nissan, and honda. We fit a model for the Nissan stock returns on the Honda and Toyota stock returns, and we use estat bgodfrey to test for serial correlation of order one: . use http://www.stata-press.com/data/r14/stocks (Data from Yahoo! Finance) . regress nissan honda toyota (output omitted ) . estat bgodfrey Breusch-Godfrey LM test for autocorrelation lags(p) chi2 df Prob > chi2 1 6.415 1 0.0113 H0: no serial correlation newey postestimation — Postestimation tools for newey 471 The result implies that the error term is serially correlated; therefore, we should rather fit the model with newey. But let’s use the outcome from regress to conduct a test for the statistical significance of a particular linear combination of the two coefficients in the regression: . test 1.15*honda+toyota = 1 ( 1) 1.15*honda + toyota = 1 F( 1, 2012) = 5.52 Prob > F = 0.0189 We reject the null hypothesis that the linear combination is valid. Let’s see if the conclusion remains the same when we fit the model with newey, obtaining the Newey–West standard errors for the OLS coefficient estimates. . newey nissan honda toyota,lag(1) (output omitted ) . test 1.15*honda+toyota = 1 ( 1) 1.15*honda + toyota = 1 F( 1, 2012) = 2.57 Prob > F = 0.1088 The conclusion would be the opposite, which illustrates the importance of using the proper estimator for the standard errors. Example 2 We want to produce forecasts based on dynamic regressions for each of the three stocks. We will treat the stock returns for toyota as a leading indicator for the two other stocks. We also check for autocorrelation with the Breusch–Godfrey test. . use http://www.stata-press.com/data/r14/stocks (Data from Yahoo! Finance) . regress toyota l(1/2).toyota (output omitted ) . estat bgodfrey Breusch-Godfrey LM test for autocorrelation lags(p) chi2 df Prob > chi2 1 4.373 1 0.0365 H0: no serial correlation . regress nissan l(1/2).nissan l.toyota (output omitted ) . estat bgodfrey Breusch-Godfrey LM test for autocorrelation lags(p) chi2 df Prob > chi2 1 0.099 1 0.7536 H0: no serial correlation 472 newey postestimation — Postestimation tools for newey . regress honda l(1/2).honda l.toyota (output omitted ) . estat bgodfrey Breusch-Godfrey LM test for autocorrelation lags(p) chi2 df Prob > chi2 1 0.923 1 0.3367 H0: no serial correlation The first result indicates that we should consider using newey to fit the model for toyota. The point forecasts would not be actually affected because newey produces the same OLS coefficient estimates reported by regress. However, if we were interested in obtaining measures of uncertainty surrounding the point forecasts, we should then use the results from newey for that first equation. Let’s illustrate the use of forecast with newey for the first equation and regress for the two other equations. We first declare the forecast model: . forecast create stocksmodel Forecast model stocksmodel started. Then we refit the equations and add them to the forecast model: . quietly newey toyota l(1/2).toyota, lag(1) . estimates store eq_toyota . forecast estimates eq_toyota Added estimation results from newey. Forecast model stocksmodel now contains 1 endogenous variable. . quietly regress nissan l(1/2).nissan l.toyota . estimates store eq_nissan . forecast estimates eq_nissan Added estimation results from regress. Forecast model stocksmodel now contains 2 endogenous variables. . quietly regress honda l(1/2).honda l.toyota . estimates store eq_honda . forecast estimates eq_honda Added estimation results from regress. Forecast model stocksmodel now contains 3 endogenous variables. We use tsappend to add the number of periods for the forecast, and then we obtain the predicted values with forecast solve: . tsappend, add(7) . forecast solve, prefix(stk_) Computing dynamic forecasts for model stocksmodel. Starting period: 2016 Ending period: 2022 Forecast prefix: stk_ 2016: ............ 2017: ........... 2018: ........... 2019: .......... 2020: ......... 2021: ........ 2022: ........ Forecast 3 variables spanning 7 periods. newey postestimation — Postestimation tools for newey 473 The graph below shows several interesting results. First, the stock returns of the competitor (toyota) does not seem to be a leading indicator for the stock returns of the two other companies (otherwise, the patterns for the movements in nissan and honda would be following the recent past movements in toyota). You can actually fit the models above for nissan and honda to confirm that the coefficient estimate for the first lag of toyota is not significant in any of the two equations. Second, immediately after the second forecasted period, there is basically no variation in the predictions, which indicates the very short-run predicting influence of past history on the forecasts of the three stock returns. Current and forecasted stock returns −.02 Stock returns 0 .02 .04 Dynamic forecast start at 01Jan2011 01 0 01 c2 de 0 01 c2 de 08 0 01 c2 de 15 Honda 0 01 c2 de 24 Date Toyota Also see [TS] newey — Regression with Newey–West standard errors [U] 20 Estimation and postestimation commands 0 01 c2 de 31 11 20 jan 08 Nissan Title pergram — Periodogram Description Options Also see Quick start Remarks and examples Menu Methods and formulas Syntax References Description pergram plots the log-standardized periodogram for a dense time series. Quick start Plot periodogram of y using tsset data pergram y As above, but generate variable newvar to hold raw periodogram values pergram y, generate(newvar) As above, but suppress display of periodogram graph pergram y, generate(newvar) nograph Menu Statistics > Time series > Graphs > Periodogram 474 pergram — Periodogram 475 Syntax pergram varname  if   in   , options  Description options Main generate(newvar) generate newvar to contain the raw periodogram values Plot cline options marker options marker label options affect rendition of the plotted points connected by lines change look of markers (color, size, etc.) add marker labels; change look or position Add plots addplot(plot) add other plots to the generated graph Y axis, X axis, Titles, Legend, Overall twoway options any options other than by() documented in [G-3] twoway options nograph suppress the graph You must tsset your data before using pergram; see [TS] tsset. Also, the time series must be dense (nonmissing with no gaps in the time variable) in the specified sample. varname may contain time-series operators; see [U] 11.4.4 Time-series varlists. nograph does not appear in the dialog box. Options   Main generate(newvar) specifies a new variable to contain the raw periodogram values. The generated graph log-transforms and scales the values by the sample variance and then truncates them to the [ −6, 6 ] interval before graphing them.   Plot cline options affect the rendition of the plotted points connected by lines; see [G-3] cline options. marker options specify the look of markers. This look includes the marker symbol, the marker size, and its color and outline; see [G-3] marker options. marker label options specify if and how the markers are to be labeled; see [G-3] marker label options.   Add plots addplot(plot) adds specified plots to the generated graph; see [G-3] addplot option.   Y axis, X axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, excluding by(). These include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see [G-3] saving option). 476 pergram — Periodogram The following option is available with pergram but is not shown in the dialog box: nograph prevents pergram from constructing a graph. Remarks and examples A good discussion of the periodogram is provided in Chatfield (2004), Hamilton (1994), and Newton (1988). Chatfield is also a good introductory reference for time-series analysis. Another classic reference is Box, Jenkins, and Reinsel (2008). pergram produces a scatterplot in which the points of the scatterplot are connected. The points themselves represent the log-standardized periodogram, and the connections between points represent the (continuous) log-standardized sample spectral density. In the following examples, we present the periodograms with an interpretation of the main features of the plots. Example 1 We have time-series data consisting of 144 observations on the monthly number of international airline passengers (in thousands) between 1949 and 1960 (Box, Jenkins, and Reinsel 2008, Series G). We can graph the raw series and the log periodogram for these data by typing 100 Airline Passengers (1949−1960) 200 300 400 500 600 . use http://www.stata-press.com/data/r14/air2 (TIMESLAB: Airline passengers) . scatter air time, m(o) c(l) 1950 1955 Time (in months) 1960 pergram — Periodogram 477 . pergram air 6.00 4.00 2.00 0.00 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 Airline Passengers (1949−1960) Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 Sample spectral density function Evaluated at the natural frequencies The periodogram highlights the annual cycle together with the harmonics. Notice the peak at a frequency of about 0.08 cycles per month (cpm). The period is the reciprocal of frequency, and the reciprocal of 0.08 cpm is approximately 12 months per cycle. The similarity in shape of each group of 12 observations reveals the annual cycle. The magnitude of the cycle is increasing, resulting in the peaks in the periodogram at the harmonics of the principal annual cycle. Example 2 This example uses 215 observations on the annual number of sunspots from 1749 to 1963 (Box and Jenkins 1976, Series E). The graph of the raw series and the log periodogram for these data are given as 0 50 Number of sunspots 100 150 200 . use http://www.stata-press.com/data/r14/sunspot (TIMESLAB: Wolfer sunspot data) . scatter spot time, m(o) c(l) 1750 1800 1850 Year 1900 1950 478 pergram — Periodogram . pergram spot 6.00 4.00 2.00 0.00 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 Number of sunspots Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 Sample spectral density function Evaluated at the natural frequencies The periodogram peaks at a frequency of slightly less than 0.10 cycles per year, indicating a 10to 12-year cycle in sunspot activity. Example 3 Here we examine the number of trapped Canadian lynx from 1821 through 1934 (Newton 1988, 587). The raw series and the log periodogram are given as 0 Number of lynx trapped 2000 4000 6000 8000 . use http://www.stata-press.com/data/r14/lynx2 (TIMESLAB: Canadian lynx) . scatter lynx time, m(o) c(l) 0 50 100 Time 150 pergram — Periodogram 479 . pergram lynx 6.00 4.00 2.00 0.00 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 Number of lynx trapped Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 Sample spectral density function Evaluated at the natural frequencies The periodogram indicates that there is a cycle with a duration of about 10 years for these data but that it is otherwise random. Example 4 To more clearly highlight what the periodogram depicts, we present the result of analyzing a time series of the sum of four sinusoids (of different periods). The periodogram should be able to decompose the time series into four different sinusoids whose periods may be determined from the plot. −20 −10 Sum of 4 cosines 0 10 20 . use http://www.stata-press.com/data/r14/cos4 (TIMESLAB: Sum of 4 Cosines) . scatter sumfc time, m(o) c(l) 0 50 100 Time 150 480 pergram — Periodogram . pergram sumfc, gen(ordinate) 6.00 4.00 2.00 0.00 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 Sum of 4 cosines Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 Sample spectral density function Evaluated at the natural frequencies The periodogram clearly shows the four contributions to the original time series. From the plot, we can see that the periods of the summands were 3, 6, 12, and 36, although you can confirm this by using . generate double omega = (_n-1)/144 . generate double period = 1/omega (1 missing value generated) . list period omega if ordinate> 1e-5 & omega <=.5 5. 13. 25. 49. period omega 36 12 6 3 .02777778 .08333333 .16666667 .33333333 Methods and formulas We use the notation of Newton (1988) in the following discussion. A time series of interest is decomposed into a unique set of sinusoids of various frequencies and amplitudes. A plot of the sinusoidal amplitudes (ordinates) versus the frequencies for the sinusoidal decomposition of a time series gives us the spectral density of the time series. If we calculate the sinusoidal amplitudes for a discrete set of “natural” frequencies (1/n, 2/n, . . . , q/n), we obtain the periodogram. Let x(1), . . . , x(n) be a time series, and let ωk = (k − 1)/n denote the natural frequencies for k = 1, . . . , ( n/2 ) + 1. Define Ck2 = n 1 X x(t)e2πi(t−1)ωk n2 t=1 A plot of nCk2 versus ωk is then called the periodogram. 2 pergram — Periodogram 481 The sample spectral density is defined for a continuous frequency ω as  n X   1 x(t)e2πi(t−1)ω fb(ω) = n t=1    fb(1 − ω) 2 if ω ∈ [ 0, .5 ] if ω ∈ [ .5, 1 ] The periodogram (and sample spectral density) is symmetric about ω = 0.5. Further standardize the periodogram such that n 1 X nCk2 =1 n σ b2 k=2 2 where σ b is the sample variance of the time series so that the average value of the ordinate is one. Once the amplitudes are standardized, we may then take the natural log of the values and produce the log periodogram. In doing so, we truncate the graph at ±6. We drop the word “log” and simply refer to the “log periodogram” as the “periodogram” in text. References Box, G. E. P., and G. M. Jenkins. 1976. Time Series Analysis: Forecasting and Control. Oakland, CA: Holden–Day. Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: Wiley. Chatfield, C. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Newton, H. J. 1988. TIMESLAB: A Time Series Analysis Laboratory. Belmont, CA: Wadsworth. Also see [TS] tsset — Declare data to be time-series data [TS] corrgram — Tabulate and graph autocorrelations [TS] cumsp — Cumulative spectral distribution [TS] wntestb — Bartlett’s periodogram-based test for white noise Title pperron — Phillips–Perron unit-root test Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description pperron performs the Phillips–Perron (1988) test that a variable has a unit root. The null hypothesis is that the variable contains a unit root, and the alternative is that the variable was generated by a stationary process. pperron uses Newey–West (1987) standard errors to account for serial correlation, whereas the augmented Dickey–Fuller test implemented in dfuller (see [TS] dfuller) uses additional lags of the first-differenced variable. Quick start Phillips–Perron unit-root test for y using tsset data pperron y As above, and include a trend in the specification pperron y, trend As above, but use 10 lags when calculating Newey–West standard errors pperron y, trend lags(10) As above, but without a trend or constant in the specification pperron y, lags(10) noconstant Menu Statistics > Time series > Tests > Phillips-Perron unit-root test 482 pperron — Phillips–Perron unit-root test 483 Syntax pperron varname  if   in   , options  Description options Main noconstant trend regress lags(#) suppress constant term include trend term in regression display regression table use # Newey–West lags You must tsset your data before using pperron; see [TS] tsset. varname may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main noconstant suppresses the constant term (intercept) in the model. trend specifies that a trend term be included in the associated regression. This option may not be specified if noconstant is specified. regress specifies that the associated regression table appear in the output. By default, the regression table is not produced. lags(#) specifies the n number of Newey–West lags to use in calculating the standard error. The o 2 /9 default is to use int 4(T /100) lags. Remarks and examples As noted in [TS] dfuller, the Dickey–Fuller test involves fitting the regression model yt = α + ρyt−1 + δt + ut (1) by ordinary least squares (OLS), but serial correlation will present a problem. To account for this, the augmented Dickey–Fuller test’s regression includes lags of the first differences of yt . The Phillips–Perron test involves fitting (1), and the results are used to calculate the test statistics. Phillips and Perron (1988) proposed two alternative statistics, which pperron presents. Phillips and Perron’s test statistics can be viewed as Dickey–Fuller statistics that have been made robust to serial correlation by using the Newey–West (1987) heteroskedasticity- and autocorrelation-consistent covariance matrix estimator. 484 pperron — Phillips–Perron unit-root test Hamilton (1994, chap. 17) and [TS] dfuller discuss four different cases into which unit-root tests can be classified. The Phillips–Perron test applies to cases one, two, and four but not to case three. Cases one and two assume that the variable has a unit root without drift under the null hypothesis, the only difference being whether the constant term α is included in regression (1). Case four assumes that the variable has a random walk, with or without drift, under the null hypothesis. Case three, which assumes that the variable has a random walk with drift under the null hypothesis, is just a special case of case four, so the fact that the Phillips–Perron test does not apply is not restrictive. The table below summarizes the relevant cases: Case 1 2 4 Process under null hypothesis Regression restrictions dfuller option Random walk without drift Random walk without drift Random walk with or without drift α = 0, δ = 0 δ=0 (none) noconstant (default) trend The critical values for the Phillips–Perron test are the same as those for the augmented Dickey–Fuller test. See Hamilton (1994, chap. 17) for more information. Example 1 Here we use the international airline passengers dataset (Box, Jenkins, and Reinsel 2008, Series G). This dataset has 144 observations on the monthly number of international airline passengers from 1949 through 1960. Because the data exhibit a clear upward trend over time, we will use the trend option. . use http://www.stata-press.com/data/r14/air2 (TIMESLAB: Airline passengers) . pperron air, lags(4) trend regress Phillips-Perron test for unit root Test Statistic Z(rho) Z(t) -46.405 -5.049 Number of obs = 143 Newey-West lags = 4 Interpolated Dickey-Fuller 1% Critical 5% Critical 10% Critical Value Value Value -27.687 -4.026 -20.872 -3.444 -17.643 -3.144 MacKinnon approximate p-value for Z(t) = 0.0002 air Coef. air L1. _trend _cons .7318116 .7107559 25.95168 Std. Err. .0578092 .1670563 7.325951 t 12.66 4.25 3.54 P>|t| [95% Conf. Interval] 0.000 0.000 0.001 .6175196 .3804767 11.46788 .8461035 1.041035 40.43547 Just as in the example in [TS] dfuller, we reject the null hypothesis of a unit root at all common significance levels. The interpolated critical values for Zt differ slightly from those shown in the example in [TS] dfuller because the sample sizes are different: with the augmented Dickey–Fuller regression we lose observations because of the inclusion of lagged difference terms as regressors. pperron — Phillips–Perron unit-root test 485 Stored results pperron stores the following in r(): Scalars r(N) r(lags) r(pval) r(Zt) r(Zrho) number of observations number of lagged differences used MacKinnon approximate p-value (not included if noconstant specified) Phillips–Perron τ test statistic Phillips – Perron ρ test statistic Methods and formulas In the OLS estimation of an AR(1) process with Gaussian errors, yi = ρyi−1 + i where i are independent and identically distributed as N (0, σ 2 ) and y0 = 0, the OLS estimate (based on an n-observation time series) of the autocorrelation parameter ρ is given by n X ρbn = yi−1 yi i=1 n X yi2 i=1 √ If |ρ| < 1, then n(b ρn − ρ) → N (0, 1 − ρ2 ). If this result were valid for when ρ = 1, then the resulting distribution would have a variance of zero. When ρ = 1, the OLS estimate ρb still converges to one, though we need to find a nondegenerate distribution so that we can test H0 : ρ = 1. See Hamilton (1994, chap. 17). The Phillips–Perron test involves fitting the regression yi = α + ρyi−1 + i where we may exclude the constant or include a trend term. There are two statistics, Zρ and Zτ , calculated as  b 2 b 2 1 n2 σ λ − γ b Zρ = n(b ρn − 1) − 0,n n 2 s2n s  1 nb γ b0,n ρbn − 1 1 b 2 σ Zτ = − λn − γ b0,n 2 b b σ b 2 λn λn sn n 1 X u bi u bi−j n i=j+1 q  X 2 b λn = γ b0,n + 2 1− γ bj,n = s2n = 1 n−k j=1 n X u bi2 i=1 j q+1  γ bj,n 486 pperron — Phillips–Perron unit-root test where ui is the OLS residual, k is the number of covariates in the regression, q is the number of b 2 , and σ Newey – West lags to use in calculating λ b is the OLS standard error of ρb. n The critical values, which have the same distribution as the Dickey – Fuller statistic (see Dickey and Fuller 1979) included in the output, are linearly interpolated from the table of values that appear in Fuller (1996), and the MacKinnon approximate p-values use the regression surface published in MacKinnon (1994).  Peter Charles Bonest Phillips (1948– ) was born in Weymouth, England, and earned degrees in economics at the University of Auckland in New Zealand, and the London School of Economics. After periods at the Universities of Essex and Birmingham, Phillips moved to Yale in 1979. He also holds appointments at the University of Auckland and the University of York. His main research interests are in econometric theory, financial econometrics, time-series and panel-data econometrics, and applied macroeconomics.  Pierre Perron (1959– ) was born in Québec, Canada, and earned degrees at McGill, Queen’s, and Yale in economics. After posts at Princeton and the Université de Montréal, he joined Boston University in 1997. His research interests include time-series analysis, econometrics, and applied macroeconomics.  References Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: Wiley. Dickey, D. A., and W. A. Fuller. 1979. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74: 427–431. Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. MacKinnon, J. G. 1994. Approximate asymptotic distribution functions for unit-root and cointegration tests. Journal of Business and Economic Statistics 12: 167–176. Newey, W. K., and K. D. West. 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55: 703–708. Phillips, P. C. B., and P. Perron. 1988. Testing for a unit root in time series regression. Biometrika 75: 335–346. Also see [TS] tsset — Declare data to be time-series data [TS] dfgls — DF-GLS unit-root test [TS] dfuller — Augmented Dickey–Fuller unit-root test [XT] xtunitroot — Panel-data unit-root tests Title prais — Prais – Winsten and Cochrane – Orcutt regression Description Options Acknowledgment Quick start Remarks and examples References Menu Stored results Also see Syntax Methods and formulas Description prais uses the generalized least-squares method to estimate the parameters in a linear regression model in which the errors are serially correlated. Specifically, the errors are assumed to follow a first-order autoregressive process. Quick start Prais–Winsten regression of y on x estimating the autocorrelation parameter by a single-lag OLS regression of residuals using tsset data prais y x As above, but estimate the autocorrelation parameter using a single-lead OLS of residuals prais y x, rhotype(freg) As above, but estimate the autocorrelation parameter using autocorrelation of residuals prais y x, rhotype(tscorr) Cochrane–Orcutt regression of y on x with first-order serial correlation prais y x, corc Menu Statistics > Time series > Prais-Winsten regression 487 488 prais — Prais – Winsten and Cochrane – Orcutt regression Syntax prais depvar  indepvars options   if   in   , options  Description Model rhotype(regress) rhotype(freg) rhotype(tscorr) rhotype(dw) rhotype(theil) rhotype(nagar) corc ssesearch twostep noconstant hascons savespace base ρ on single-lag OLS of residuals; the default base ρ on single-lead OLS of residuals base ρ on autocorrelation of residuals base ρ on autocorrelation based on Durbin–Watson base ρ on adjusted autocorrelation base ρ on adjusted Durbin–Watson use Cochrane–Orcutt transformation search for ρ that minimizes SSE stop after the first iteration suppress constant term has user-defined constant conserve memory during estimation SE/Robust vce(vcetype) vcetype may be ols, robust, cluster clustvar, hc2, or hc3 Reporting level(#) nodw display options set confidence level; default is level(95) do not report the Durbin–Watson statistic control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling Optimization optimize options control the optimization process; seldom used coeflegend display legend instead of statistics You must tsset your data before using prais; see [TS] tsset. indepvars may contain factor variables; see [U] 11.4.3 Factor variables. depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. prais — Prais – Winsten and Cochrane – Orcutt regression 489 Options   Model rhotype(rhomethod) selects a specific computation for the autocorrelation parameter ρ, where rhomethod can be regress freg tscorr dw theil nagar ρreg = β from the residual regression t = βt−1 ρfreg = β from the residual regression t = βt+1 ρtscorr = 0 t−1 /0 , where  is the vector of residuals ρdw = 1 − dw/2, where dw is the Durbin – Watson d statistic ρtheil = ρtscorr (N − k)/N ρnagar = (ρdw ∗ N 2 + k 2 )/(N 2 − k 2 ) The prais estimator can use any consistent estimate of ρ to transform the equation, and each of these estimates meets that requirement. The default is regress, which produces the minimum sum-of-squares solution (ssesearch option) for the Cochrane – Orcutt transformation — none of these computations will produce the minimum sum-of-squares solution for the full Prais – Winsten transformation. See Judge et al. (1985) for a discussion of each estimate of ρ. corc specifies that the Cochrane – Orcutt transformation be used to estimate the equation. With this option, the Prais – Winsten transformation of the first observation is not performed, and the first observation is dropped when estimating the transformed equation; see Methods and formulas below. ssesearch specifies that a search be performed for the value of ρ that minimizes the sum-of-squared errors of the transformed equation (Cochrane – Orcutt or Prais – Winsten transformation). The search method is a combination of quadratic and modified bisection searches using golden sections. twostep specifies that prais stop on the first iteration after the equation is transformed by ρ — the two-step efficient estimator. Although iterating these estimators to convergence is customary, they are efficient at each step. noconstant; see [R] estimation options. hascons indicates that a user-defined constant, or a set of variables that in linear combination forms a constant, has been included in the regression. For some computational concerns, see the discussion in [R] regress. savespace specifies that prais attempt to save as much space as possible by retaining only those variables required for estimation. The original data are restored after estimation. This option is rarely used and should be used only if there is insufficient space to fit a model without the option.   SE/Robust vce(vcetype) specifies the estimator for the variance–covariance matrix of the estimator; see [R] vce option. vce(ols), the default, uses the standard variance estimator for ordinary least-squares regression. vce(robust) specifies to use the Huber/White/sandwich estimator. vce(cluster clustvar) specifies to use the intragroup correlation estimator. vce(hc2) and vce(hc3) specify an alternative bias correction for the vce(robust) variance calculation; for more information, see [R] regress. You may specify only one of vce(hc2), vce(hc3), or vce(robust). 490 prais — Prais – Winsten and Cochrane – Orcutt regression All estimates from prais are conditional on the estimated value of ρ. Robust variance estimates here are robust only to heteroskedasticity and are not generally robust to misspecification of the functional form or omitted variables. The estimation of the functional form is intertwined with the estimation of ρ, and all estimates are conditional on ρ. Thus estimates cannot be robust to misspecification of functional form. For these reasons, it is probably best to interpret vce(robust) in the spirit of White’s (1980) original paper on estimation of heteroskedastic-consistent covariance matrices.   Reporting level(#); see [R] estimation options. nodw suppresses reporting of the Durbin – Watson statistic. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options.   Optimization   optimize options: iterate(#), no log, tolerance(#). iterate() specifies the maximum number of iterations. log/nolog specifies whether to show the iteration log. tolerance() specifies the tolerance for the coefficient vector; tolerance(1e-6) is the default. These options are seldom used. The following option is available with prais but is not shown in the dialog box: coeflegend; see [R] estimation options. Remarks and examples prais fits a linear regression of depvar on indepvars that is corrected for first-order serially correlated residuals by using the Prais – Winsten (1954) transformed regression estimator, the Cochrane – Orcutt (1949) transformed regression estimator, or a version of the search method suggested by Hildreth and Lu (1960). Davidson and MacKinnon (1993) provide theoretical details on the three methods (see pages 333–335 for the latter two and pages 343–351 for Prais–Winsten). See Becketti (2013) for more examples showing how to use prais. The most common autocorrelated error process is the first-order autoregressive process. Under this assumption, the linear regression model can be written as yt = xt β + ut where the errors satisfy ut = ρ ut−1 + et and the et are independent and identically distributed as N (0, σ 2 ). error term u can then be written as  1 ρ ρ2 ··· 1 ρ ···  ρ 1  2 ρ ρ 1 ···  Ψ= 1 − ρ2  .. .. ..  .. . . . . T −1 T −2 T −3 ρ ρ ρ ··· The covariance matrix Ψ of the  ρT −1 T −2 ρ   ρT −3  ..   . 1 prais — Prais – Winsten and Cochrane – Orcutt regression 491 The Prais – Winsten estimator is a generalized least-squares (GLS) estimator. The Prais – Winsten method (as described in Judge et al. 1985) is derived from the AR(1) model for the error term described above. Whereas the Cochrane – Orcutt method uses a lag definition and loses the first observation in the iterative method, the Prais – Winsten method preserves that first observation. In small samples, this can be a significant advantage. Technical note To fit a model with autocorrelated errors, you must specify your data as time series and have (or create) a variable denoting the time at which an observation was collected. The data for the regression should be equally spaced in time. Example 1 Say that we wish to fit a time-series model of usr on idle but are concerned that the residuals may be serially correlated. We will declare the variable t to represent time by typing . use http://www.stata-press.com/data/r14/idle . tsset t time variable: t, 1 to 30 delta: 1 unit We can obtain Cochrane–Orcutt estimates by specifying the corc option: . prais usr idle, corc Iteration 0: rho = 0.0000 Iteration 1: rho = 0.3518 (output omitted ) Iteration 13: rho = 0.5708 Cochrane-Orcutt AR(1) regression -- iterated estimates Source SS df MS Number of obs F(1, 27) Model 40.1309584 1 40.1309584 Prob > F Residual 166.898474 27 6.18142498 R-squared Adj R-squared Total 207.029433 28 7.39390831 Root MSE usr Coef. idle _cons -.1254511 14.54641 rho .5707918 Std. Err. .0492356 4.272299 t -2.55 3.40 P>|t| 0.017 0.002 = = = = = = 29 6.49 0.0168 0.1938 0.1640 2.4862 [95% Conf. Interval] -.2264742 5.78038 -.024428 23.31245 Durbin-Watson statistic (original) 1.295766 Durbin-Watson statistic (transformed) 1.466222 The fitted model is usrt = −0.1255 idlet + 14.55 + ut and ut = 0.5708 ut−1 + et 492 prais — Prais – Winsten and Cochrane – Orcutt regression We can also fit the model with the Prais – Winsten method, . prais usr idle Iteration 0: rho = 0.0000 Iteration 1: rho = 0.3518 (output omitted ) Iteration 14: rho = 0.5535 Prais-Winsten AR(1) regression -- iterated estimates SS df MS Number of obs Source F(1, 28) Model 43.0076941 1 43.0076941 Prob > F Residual 169.165739 28 6.04163354 R-squared Adj R-squared Total 212.173433 29 7.31632528 Root MSE usr Coef. idle _cons -.1356522 15.20415 rho .5535476 Std. Err. t .0472195 4.160391 -2.87 3.65 P>|t| 0.008 0.001 = = = = = = 30 7.12 0.0125 0.2027 0.1742 2.458 [95% Conf. Interval] -.2323769 6.681978 -.0389275 23.72633 Durbin-Watson statistic (original) 1.295766 Durbin-Watson statistic (transformed) 1.476004 where the Prais – Winsten fitted model is usrt = −.1357 idlet + 15.20 + ut ut = .5535 ut−1 + et and As the results indicate, for these data there is little difference between the Cochrane – Orcutt and Prais – Winsten estimators, whereas the OLS estimate of the slope parameter is substantially different. Example 2 We have data on quarterly sales, in millions of dollars, for 5 years, and we would like to use this information to model sales for company X. First, we fit a linear model by OLS and obtain the Durbin–Watson statistic by using estat dwatson; see [R] regress postestimation time series. . use http://www.stata-press.com/data/r14/qsales . regress csales isales Source SS df MS Model Residual 110.256901 .133302302 1 18 110.256901 .007405683 Total 110.390204 19 5.81001072 csales Coef. isales _cons .1762828 -1.454753 . estat dwatson Durbin-Watson d-statistic( Std. Err. .0014447 .2141461 2, t 122.02 -6.79 20) = Number of obs F(1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 .7347276 = = = = = = 20 14888.15 0.0000 0.9988 0.9987 .08606 [95% Conf. Interval] .1732475 -1.904657 .1793181 -1.004849 prais — Prais – Winsten and Cochrane – Orcutt regression 493 Because the Durbin–Watson statistic is far from 2 (the expected value under the null hypothesis of no serial correlation) and well below the 5% lower limit of 1.2, we conclude that the disturbances are serially correlated. (Upper and lower bounds for the d statistic can be found in most econometrics texts; for example, Harvey [1990]. The bounds have been derived for only a limited combination of regressors and observations.) To reinforce this conclusion, we use two other tests to test for serial correlation in the error distribution. . estat bgodfrey, lags(1) Breusch-Godfrey LM test for autocorrelation lags(p) chi2 df Prob > chi2 1 7.998 1 0.0047 H0: no serial correlation . estat durbinalt Durbin’s alternative test for autocorrelation lags(p) chi2 df Prob > chi2 1 11.329 1 0.0008 H0: no serial correlation estat bgodfrey reports the Breusch–Godfrey Lagrange multiplier test statistic, and estat durbinalt reports the Durbin’s alternative test statistic. Both tests give a small p-value and thus reject the null hypothesis of no serial correlation. These two tests are asymptotically equivalent when testing for AR(1) process. See [R] regress postestimation time series if you are not familiar with these two tests. We correct for autocorrelation with the ssesearch option of prais to search for the value of ρ that minimizes the sum-of-squared residuals of the Cochrane – Orcutt transformed equation. Normally, the default Prais – Winsten transformations is used with such a small dataset, but the less-efficient Cochrane – Orcutt transformation allows us to demonstrate an aspect of the estimator’s convergence. . prais csales isales, corc ssesearch Iteration 1: rho = 0.8944 , criterion = -.07298558 Iteration 2: rho = 0.8944 , criterion = -.07298558 (output omitted ) Iteration 15: rho = 0.9588 , criterion = -.07167037 Cochrane-Orcutt AR(1) regression -- SSE search estimates Source SS df MS Model Residual 2.33199178 .071670369 1 17 2.33199178 .004215904 Total 2.40366215 18 .133536786 csales Coef. isales _cons .1605233 1.738946 rho .9588209 Std. Err. .0068253 1.432674 t 23.52 1.21 Durbin-Watson statistic (original) 0.734728 Durbin-Watson statistic (transformed) 1.724419 Number of obs F(1, 17) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.241 = = = = = = 19 553.14 0.0000 0.9702 0.9684 .06493 [95% Conf. Interval] .1461233 -1.283732 .1749234 4.761624 494 prais — Prais – Winsten and Cochrane – Orcutt regression We noted in Options that, with the default computation of ρ, the Cochrane – Orcutt method produces an estimate of ρ that minimizes the sum-of-squared residuals — the same criterion as the ssesearch option. Given that the two methods produce the same results, why would the search method ever be preferred? It turns out that the back-and-forth iterations used by Cochrane – Orcutt may have difficulty converging if the value of ρ is large. Using the same data, the Cochrane – Orcutt iterative procedure requires more than 350 iterations to converge, and a higher tolerance must be specified to prevent premature convergence: . prais csales isales, corc tol(1e-9) iterate(500) Iteration 0: rho = 0.0000 Iteration 1: rho = 0.6312 Iteration 2: rho = 0.6866 (output omitted ) Iteration 377: rho = 0.9588 Iteration 378: rho = 0.9588 Iteration 379: rho = 0.9588 Cochrane-Orcutt AR(1) regression -- iterated estimates Source SS df MS Number of obs F(1, 17) Model 2.33199171 1 2.33199171 Prob > F Residual .071670369 17 .004215904 R-squared Adj R-squared 2.40366208 18 .133536782 Root MSE Total csales Coef. isales _cons .1605233 1.738946 rho .9588209 Std. Err. .0068253 1.432674 t 23.52 1.21 P>|t| 0.000 0.241 = = = = = = 19 553.14 0.0000 0.9702 0.9684 .06493 [95% Conf. Interval] .1461233 -1.283732 Durbin-Watson statistic (original) 0.734728 Durbin-Watson statistic (transformed) 1.724419 Once convergence is achieved, the two methods produce identical results. .1749234 4.761625 prais — Prais – Winsten and Cochrane – Orcutt regression 495 Stored results prais stores the following in e(): Scalars e(N) e(N gaps) e(mss) e(df m) e(rss) e(df r) e(r2) e(r2 a) e(F) e(rmse) e(ll) e(N clust) e(rho) e(dw) e(dw 0) e(rank) e(tol) e(max ic) e(ic) Macros e(cmd) e(cmdline) e(depvar) e(title) e(clustvar) e(cons) e(method) e(tranmeth) e(rhotype) e(vce) e(vcetype) e(properties) e(predict) e(marginsok) e(asbalanced) e(asobserved) Matrices e(b) e(V) e(V modelbased) Functions e(sample) number of observations number of gaps model sum of squares model degrees of freedom residual sum of squares residual degrees of freedom R2 adjusted R2 F statistic root mean squared error log likelihood number of clusters autocorrelation parameter ρ Durbin–Watson d statistic for transformed regression Durbin–Watson d statistic of untransformed regression rank of e(V) target tolerance maximum number of iterations number of iterations prais command as typed name of dependent variable title in estimation output name of cluster variable noconstant or not reported twostep, iterated, or SSE search corc or prais method specified in rhotype() option vcetype specified in vce() title used to label Std. Err. b V program used to implement predict predictions allowed by margins factor variables fvset as asbalanced factor variables fvset as asobserved coefficient vector variance–covariance matrix of the estimators model-based variance estimation sample Methods and formulas Consider the command ‘prais y x z ’. The 0th iteration is obtained by estimating a, b, and c from the standard linear regression: yt = axt + bzt + c + ut An estimate of the correlation in the residuals is then obtained. By default, prais uses the auxiliary regression: ut = ρut−1 + et This can be changed to any computation noted in the rhotype() option. 496 prais — Prais – Winsten and Cochrane – Orcutt regression Next we apply a Cochrane – Orcutt transformation (1) for observations t = 2, . . . , n yt − ρyt−1 = a(xt − ρxt−1 ) + b(zt − ρzt−1 ) + c(1 − ρ) + vt (1) and the transformation (10 ) for t = 1 p p p p p 1 − ρ2 y1 = a( 1 − ρ2 x1 ) + b( 1 − ρ2 z1 ) + c 1 − ρ2 + 1 − ρ2 v1 (10 ) Thus the differences between the Cochrane – Orcutt and the Prais – Winsten methods are that the latter uses (10 ) in addition to (1), whereas the former uses only (1), necessarily decreasing the sample size by one. Equations (1) and (10 ) are used to transform the data and obtain new estimates of a, b, and c. When the twostep option is specified, the estimation process stops at this point and reports these estimates. Under the default behavior of iterating to convergence, this process is repeated until the change in the estimate of ρ is within a specified tolerance. The new estimates are used to produce fitted values ybt = b axt + bbzt + b c and then ρ is reestimated using, by default, the regression defined by yt − ybt = ρ(yt−1 − ybt−1 ) + ut (2) We then reestimate (1) by using the new estimate of ρ and continue to iterate between (1) and (2) until the estimate of ρ converges. Convergence is declared after iterate() iterations or when the absolute difference in the estimated correlation between two iterations is less than tol(); see [R] maximize. Sargan (1964) has shown that this process will always converge. Under the ssesearch option, a combined quadratic and bisection search using golden sections searches for the value of ρ that minimizes the sum-of-squared residuals from the transformed equation. The transformation may be either the Cochrane – Orcutt (1 only) or the Prais – Winsten (1 and 10 ). All reported statistics are based on the ρ-transformed variables, and ρ is assumed to be estimated without error. See Judge et al. (1985) for details. The Durbin – Watson d statistic reported by prais and estat dwatson is n−1 P d= (uj+1 − uj )2 j=1 n P j=1 u2j where uj represents the residual of the j th observation. This command supports the Huber/White/sandwich estimator of the variance and its clustered version using vce(robust) and vce(cluster clustvar), respectively. See [P] robust, particularly Introduction and Methods and formulas. prais — Prais – Winsten and Cochrane – Orcutt regression 497 All estimates from prais are conditional on the estimated value of ρ. Robust variance estimates here are robust only to heteroskedasticity and are not generally robust to misspecification of the functional form or omitted variables. The estimation of the functional form is intertwined with the estimation of ρ, and all estimates are conditional on ρ. Thus estimates cannot be robust to misspecification of functional form. For these reasons, it is probably best to interpret vce(robust) in the spirit of White’s original paper on estimation of heteroskedastic-consistent covariance matrices. Acknowledgment We thank Richard Dickens of the Centre for Economic Performance at the London School of Economics and Political Science for testing and assistance with an early version of this command.  Sigbert Jon Prais (1928–2014) was born in Frankfurt and moved to Britain in 1934 as a refugee. After earning degrees at the universities of Birmingham and Cambridge and serving in various posts in research and industry, he settled at the National Institute of Economic and Social Research. Prais’s interests extended widely across economics, including studies of the influence of education on economic progress. Christopher Blake Winsten (1923–2005) was born in Welwyn Garden City, England; the son of the writer Stephen Winsten and the painter and sculptress Clare Blake. He was educated at the University of Cambridge and worked with the Cowles Commission at the University of Chicago and at the universities of Oxford, London (Imperial College) and Essex, making many contributions to economics and statistics, including the Prais–Winsten transformation and joint authorship of a celebrated monograph on transportation economics. Donald Cochrane (1917–1983) was an Australian economist and econometrician. He was born in Melbourne and earned degrees at Melbourne and Cambridge. After wartime service in the Royal Australian Air Force, he held chairs at Melbourne and Monash, being active also in work for various international organizations and national committees.  Guy Henderson Orcutt (1917–2006) was born in Michigan and earned degrees in physics and economics at the University of Michigan. He worked at Harvard, the University of Wisconsin, and Yale. He contributed to econometrics and economics in several fields, most distinctively in developing microanalytical models of economic behavior.  References Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Cochrane, D., and G. H. Orcutt. 1949. Application of least squares regression to relationships containing auto-correlated error terms. Journal of the American Statistical Association 44: 32–61. Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press. Durbin, J., and G. S. Watson. 1950. Testing for serial correlation in least squares regression. I. Biometrika 37: 409–428. . 1951. Testing for serial correlation in least squares regression. II. Biometrika 38: 159–177. Hardin, J. W. 1995. sts10: Prais–Winsten regression. Stata Technical Bulletin 25: 26–29. Reprinted in Stata Technical Bulletin Reprints, vol. 5, pp. 234–237. College Station, TX: Stata Press. Harvey, A. C. 1990. The Econometric Analysis of Time Series. 2nd ed. Cambridge, MA: MIT Press. 498 prais — Prais – Winsten and Cochrane – Orcutt regression Hildreth, C., and J. Y. Lu. 1960. Demand relations with autocorrelated disturbances. Reprinted in Agricultural Experiment Station Technical Bulletin, No. 276. East Lansing, MI: Michigan State University Press. Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T.-C. Lee. 1985. The Theory and Practice of Econometrics. 2nd ed. New York: Wiley. King, M. L., and D. E. A. Giles, ed. 1987. Specification Analysis in the Linear Model: Essays in Honor of Donald Cochrane. London: Routledge & Kegan Paul. Kmenta, J. 1997. Elements of Econometrics. 2nd ed. Ann Arbor: University of Michigan Press. Prais, S. J., and C. B. Winsten. 1954. Trend estimators and serial correlation. Working paper 383, Cowles Commission. http://cowles.econ.yale.edu/P/ccdp/st/s-0383.pdf. Sargan, J. D. 1964. Wages and prices in the United Kingdom: A study in econometric methodology. In Econometric Analysis for National Economic Planning, ed. P. E. Hart, G. Mills, and J. K. Whitaker, 25–64. London: Butterworths. Theil, H. 1971. Principles of Econometrics. New York: Wiley. White, H. L., Jr. 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48: 817–838. Wooldridge, J. M. 2013. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western. Zellner, A. 1990. Guy H. Orcutt: Contributions to economic statistics. Journal of Economic Behavior and Organization 14: 43–51. Also see [TS] prais postestimation — Postestimation tools for prais [TS] tsset — Declare data to be time-series data [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] mswitch — Markov-switching regression models [R] regress — Linear regression [R] regress postestimation time series — Postestimation tools for regress with time series [U] 20 Estimation and postestimation commands Title prais postestimation — Postestimation tools for prais Postestimation commands predict margins Also see Postestimation commands The following standard postestimation commands are available after prais: Command Description contrast estat ic estat summarize estat vce estimates forecast lincom contrasts and ANOVA-style joint tests of estimates Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients link test for model specification likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions pairwise comparisons of estimates Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses linktest lrtest margins marginsplot nlcom predict predictnl pwcompare test testnl 499 500 prais postestimation — Postestimation tools for prais predict Description for predict predict creates a new variable containing predictions such as linear predictions and residuals. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic newvar  if   in   , statistic  Description Main xb stdp residuals linear prediction; the default standard error of the linear prediction residuals These statistics are available both in and out of sample; type predict the estimation sample. . . . if e(sample) . . . if wanted only for Options for predict  Main  xb, the default, calculates the fitted values — the prediction of xj b for the specified equation. This is the linear predictor from the fitted regression model; it does not apply the estimate of ρ to prior residuals. stdp calculates the standard error of the prediction for the specified equation, that is, the standard error of the predicted expected value or mean for the observation’s covariate pattern. The standard error of the prediction is also referred to as the standard error of the fitted value. As computed for prais, this is strictly the standard error from the variance in the estimates of the parameters of the linear model and assumes that ρ is estimated without error. residuals calculates the residuals from the linear prediction. prais postestimation — Postestimation tools for prais margins Description for margins margins estimates margins of response for linear predictions. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . ) statistic Description xb stdp residuals linear prediction; the default not allowed with margins not allowed with margins  options  Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins. Also see [TS] prais — Prais – Winsten and Cochrane – Orcutt regression [U] 20 Estimation and postestimation commands 501 Title psdensity — Parametric spectral density estimation after arima, arfima, and ucm Description Options Also see Quick start Remarks and examples Menu Methods and formulas Syntax References Description psdensity estimates the spectral density of a stationary process using the parameters of a previously estimated parametric model. psdensity works after arfima, arima, and ucm; see [TS] arfima, [TS] arima, and [TS] ucm. Quick start Obtain spectral density values spden and corresponding frequencies sfreq psdensity spden sfreq Obtain power spectrum values pspec and corresponding frequencies pfreq psdensity pspec pfreq, pspectrum As above, but limit the frequency range to between 0 and 1 psdensity pspec pfreq, pspectrum range(0 1) After ucm, obtain spectral density values for the second stochastic cycle cpden and corresponding frequencies cfreq psdensity cpden cfreq, cycle(2) Menu Statistics > Time series > Postestimation > Parametric spectral density 502 psdensity — Parametric spectral density estimation after arima, arfima, and ucm 503 Syntax psdensity  type  newvarsd newvarf  if   in   , options  where newvarsd is the name of the new variable that will contain the estimated spectral density and newvarf is the name of the new variable that will contain the frequencies at which the spectral density estimate is computed. options Description pspectrum estimate the power spectrum rather than the spectral density range(a b) limit the frequency range to [a, b) estimate the spectral density from the specified stochastic cycle; only allowed cycle(#) after ucm smemory estimate the spectral density of the short-memory component of the ARFIMA process; only allowed after arfima Options pspectrum causes psdensity to estimate the power spectrum rather than the spectral density. The power spectrum is equal to the spectral density times the variance of the process. range(a b) limits the frequency range. By default, the spectral density is computed over [0, π). Specifying range(a b) causes the spectral density to be computed over [a, b). We require that 0 ≤ a < b < π. cycle(#) causes psdensity to estimate the spectral density from the specified stochastic cycle after ucm. By default, the spectral density from the first stochastic cycle is estimated. cycle(#) must specify an integer that corresponds to a cycle in the model fit by ucm. smemory causes psdensity to ignore the ARFIMA fractional integration parameter. The spectral density computed is for the short-memory ARMA component of the model. Remarks and examples Remarks are presented under the following headings: The frequency-domain approach to time series Some ARMA examples The frequency-domain approach to time series A stationary process can be decomposed into random components that occur at the frequencies ω ∈ [0, π]. The spectral density of a stationary process describes the relative importance of these random components. psdensity uses the estimated parameters of a parametric model to estimate the spectral density of a stationary process. We need some concepts from the frequency-domain approach to time-series analysis to interpret estimated spectral densities. Here we provide a simple, intuitive explanation. More technical presentations can be found in Priestley (1981), Harvey (1989, 1993), Hamilton (1994), Fuller (1996), and Wei (2006). 504 psdensity — Parametric spectral density estimation after arima, arfima, and ucm In the time domain, the dependent variable evolves over time because of random shocks. The autocovariances γj , j ∈ {0, 1, . . . , ∞}, of a covariance-stationary process yt specify its variance and dependence structure, and the autocorrelations ρj , j ∈ {1, 2, . . . , ∞}, provide a scale-free measure of its dependence structure. The autocorrelation at lag j specifies whether realizations at time t and realizations at time t − j are positively related, unrelated, or negatively related. In the frequency domain, the dependent variable is generated by an infinite number of random components that occur at the frequencies ω ∈ [0, π]. The spectral density specifies the relative importance of these random components. The area under the spectral density in the interval (ω, ω+dω) is the fraction of the variance of the process than can be attributed to the random components that occur at the frequencies in the interval (ω, ω + dω). The spectral density and the autocorrelations provide the same information about the dependence structure, albeit in different domains. The spectral density can be written as a weighted average of the autocorrelations of yt , and it can be inverted to retrieve the autocorrelations as a function of the spectral density. Like autocorrelations, the spectral density is normalized by γ0 , the variance of yt . Multiplying the spectral density by γ0 yields the power spectrum of yt , which changes with the units of yt . A peak in the spectral density around frequency ω implies that the random components around ω make an important contribution to the variance of yt . A random variable primarily generated by low-frequency components will tend to have more runs above or below its mean than an independent and identically distributed (i.i.d.) random variable, and its plot may look smoother than the plot of the i.i.d. variable. A random variable primarily generated by high-frequency components will tend to have fewer runs above or below its mean than an i.i.d. random variable, and its plot may look more jagged than the plot of the i.i.d. variable. Technical note A more formal specification of the spectral density allows us to be more specific about how the spectral density specifies the relative importance of the random components. If yt is a covariance-stationary process with absolutely summable autocovariances, its spectrum is given by gy (ω) = ∞ 1X 1 γ0 + γk cos(ωk) 2π π (1) k=1 where gy (ω) is the spectrum of yt at frequency ω and γk is the k th autocovariance of yt . Taking the inverse Fourier transform of each side of (1) yields Z π gy (ω)eiωk dω γk = −π where i is the imaginary number i = √ −1. Evaluating (2) at k = 0 yields Z π γ0 = gy (ω)dω −π (2) psdensity — Parametric spectral density estimation after arima, arfima, and ucm 505 which means that the variance of yt can be decomposed in terms of the spectrum gy (ω). In particular, gy (ω)dω is the contribution to the variance of yt attributable to the random components in the interval (ω, ω + dω). The spectrum depends on the units in which yt is measured, because it depends on the γ0 . Dividing both sides of (1) by γ0 gives us the scale-free spectral density of yt : ∞ 1 1X + ρk cos(ωk) 2π π fy (ω) = k=1 By construction, Z π fy (ω)dω = 1 −π so fy (ω)dω is the fraction of the variance of yt attributable to the random components in the interval (ω, ω + dω). Some ARMA examples In this section, we estimate and interpret the spectral densities implied by the estimated ARMA parameters. The examples illustrate some of the essential relationships between covariance-stationary processes, the parameters of ARMA models, and the spectral densities implied by the ARMA-model parameters. See [TS] ucm for a discussion of unobserved-components models and the stochastic-cycle model derived by Harvey (1989) for stationary processes. The stochastic-cycle model has a different parameterization of the spectral density, and it tends to produce spectral densities that look more like probability densities than ARMA models. See Remarks and examples in [TS] ucm for an introduction to these models, some examples, and some comparisons between the stochastic-cycle model and ARMA models. 506 psdensity — Parametric spectral density estimation after arima, arfima, and ucm Example 1 Let’s consider the changes in the number of manufacturing employees in the United States, which we plot below. −5 Change in number of mfg. employees, D 0 5 . use http://www.stata-press.com/data/r14/manemp2 (FRED data: Number of manufacturing employees in U.S.) . tsline D.manemp, yline(-0.206) 1950m1 1960m1 1970m1 1980m1 Month 1990m1 2000m1 2010m1 We added a horizontal line at the sample mean of −0.0206 to highlight that there appear to be more runs above or below the mean than we would expect in data generated by an i.i.d. process. As a first pass at modeling this dependence, we use arima to estimate the parameters of a first-order autoregressive (AR(1)) model. Formally, the AR(1) model is given by yt = αyt−1 + t where yt is the dependent variable, α is the autoregressive coefficient, and t is an i.i.d. error term. See [TS] arima for an introduction to ARMA modeling and the arima command. psdensity — Parametric spectral density estimation after arima, arfima, and ucm . arima D.manemp, ar(1) noconstant (setting optimization to BHHH) Iteration 0: log likelihood = -870.64844 Iteration 1: log likelihood = -870.64794 Iteration 2: log likelihood = -870.64789 Iteration 3: log likelihood = -870.64787 Iteration 4: log likelihood = -870.64786 (switching optimization to BFGS) Iteration 5: log likelihood = -870.64786 Iteration 6: log likelihood = -870.64786 ARIMA regression Sample: 1950m2 - 2011m2 Number of obs Wald chi2(1) Prob > chi2 Log likelihood = -870.6479 D.manemp Coef. OPG Std. Err. z = = = 507 733 730.51 0.0000 P>|z| [95% Conf. Interval] ARMA ar L1. .5179561 .0191638 27.03 0.000 .4803959 .5555164 /sigma .7934554 .0080636 98.40 0.000 .777651 .8092598 Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. The statistically significant estimate of 0.518 for the autoregressive coefficient indicates that there is an important amount of positive autocorrelation in this series. The spectral density of a covariance-stationary process is symmetric around 0. Following convention, psdensity estimates the spectral density over the interval [0, π) at the points given in Methods and formulas. Now we use psdensity to estimate the spectral density of the process implied by the estimated ARMA parameters. We specify the names of two new variables in the call to psdensity. The first new variable will contain the estimated spectral density. The second new variable will contain the frequencies at which the spectral density is estimated. 508 psdensity — Parametric spectral density estimation after arima, arfima, and ucm 0 .1 ARMA spectral density .2 .3 .4 .5 . psdensity psden1 omega . line psden1 omega 0 1 2 3 Frequency The above graph is typical of a spectral density of an AR(1) process with a positive coefficient. The curve is highest at frequency 0, and it tapers off toward zero or a positive asymptote. The estimated spectral density is telling us that the low-frequency random components are the most important random components of an AR(1) process with a positive autoregressive coefficient. 0 .8 20 .9 Spectral Density (α=.9) 40 60 Spectral Density (α=.1) 1 1.1 80 1.2 100 The closer the α is to 1, the more important are the low-frequency components relative to the high-frequency components. To illustrate this point, we plot the spectral densities implied by AR(1) models with α = 0.1 and α = 0.9. 0 1 2 Frequency 3 0 1 2 Frequency 3 psdensity — Parametric spectral density estimation after arima, arfima, and ucm 509 As α gets closer to 1, the plot of the spectral density gets closer to being a spike at frequency 0, implying that only the lowest-frequency components are important. Example 2 Now let’s consider a dataset for which the estimated coefficient from an AR(1) model is negative. Below we plot the changes in initial claims for unemployment insurance in the United States. −200 Change in initial claims, D −100 0 100 200 . use http://www.stata-press.com/data/r14/icsa1, clear . tsline D.icsa, yline(0.08) 01jan1970 01jan1980 01jan1990 Date 01jan2000 01jan2010 The plot looks a little more jagged than we would expect from an i.i.d. process, but it is hard to tell. Below we estimate the AR(1) coefficient. . arima D.icsa, ar(1) noconstant (setting optimization to BHHH) Iteration 0: log likelihood = -9934.0659 Iteration 1: log likelihood = -9934.0657 Iteration 2: log likelihood = -9934.0657 ARIMA regression Sample: 14jan1967 - 19feb2011 Number of obs Wald chi2(1) Prob > chi2 Log likelihood = -9934.066 OPG Std. Err. z P>|z| = = = 2302 666.06 0.0000 D.icsa Coef. [95% Conf. Interval] ar L1. -.2756024 .0106789 -25.81 0.000 -.2965326 -.2546722 /sigma 18.10988 .1176556 153.92 0.000 17.87928 18.34048 ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. The estimated coefficient is negative and statistically significant. 510 psdensity — Parametric spectral density estimation after arima, arfima, and ucm The spectral density implied by the estimated parameters is .1 ARMA spectral density .15 .2 .25 .3 . psdensity psden2 omega2 . line psden2 omega2 0 1 2 3 Frequency The above graph is typical of a spectral density of an AR(1) process with a negative coefficient. The curve is lowest at frequency 0, and it monotonically increases to its highest point, which occurs when the frequency is π . 0 .8 20 .9 Spectral Density (α=−.9) 40 60 Spectral Density (α=−.1) 1 1.1 80 1.2 100 When the coefficient of an AR(1) model is negative, the high-frequency random components are the most important random components of the process. The closer the α is to −1, the more important are the high-frequency components relative to the low-frequency components. To illustrate this point, we plot the spectral densities implied by AR(1) models with α = −0.1, and α = −0.9. 0 1 2 Frequency 3 0 1 2 Frequency 3 psdensity — Parametric spectral density estimation after arima, arfima, and ucm 511 As α gets closer to −1, the plot of the spectral density shifts toward becoming a spike at frequency π , implying that only the highest-frequency components are important. For examples of psdensity after arfima and ucm, see [TS] arfima and [TS] ucm. Methods and formulas Methods and formulas are presented under the following headings: Introduction Spectral density after arima or arfima Spectral density after ucm Introduction The spectral density f (ω) is estimated at the values ω ∈ {ω1 , ω2 , . . . , ωN } using one of the formulas given below. Given a sample of size N , after accounting for any if or in restrictions, the N values of ω are given by ωi = π(i − 1)/(N − 1) for i ∈ {1, 2, . . . , N }. In the rare case in which the dataset in memory has insufficient observations for the desired resolution of the estimated spectral density, you may use tsappend or set obs (see [TS] tsappend or [D] obs) to increase the number of observations in the current dataset. You may use an if restriction or an in restriction to restrict the observations to handle panel data or to compute the estimates for a subset of the observations. Spectral density after arima or arfima Let φk and θk denote the p autoregressive and q moving-average parameters of an ARMA model, respectively. Box, Jenkins, and Reinsel (2008) show that the spectral density implied by the ARMA parameters is 2 fARMA (ω; φ, θ, σ2 , γ0 ) = σ2 |1 + θ1 e−iω + θ2 e−i2ω + · · · + θq e−iqω | 2πγ0 |1 − φ1 e−iω − φ2 e−i2ω − · · · − φp e−ipω |2 where ω ∈ [0, π] and σ2 is the variance of the idiosyncratic error and γ0 is the variance of the dependent variable. We estimate γ0 using the arima parameter estimates. The spectral density for the ARFIMA model is fARFIMA (ω; φ, θ, d, σ2 , γ0 ) = |1 − eiω |−2d fARMA (ω; φ, θ, σ2 ) where d, −1/2 < d < 1/2, is the fractional integration parameter. The spectral density goes to infinity as the frequency approaches 0 for 0 < d < 1/2, and it is zero at frequency 0 for −1/2 < d < 0. The smemory option causes psdensity to perform the estimation with d = 0, which is equivalent to estimating the spectral density of the fractionally differenced series. The power spectrum omits scaling by γ0 . 512 psdensity — Parametric spectral density estimation after arima, arfima, and ucm Spectral density after ucm The spectral density of an order-k stochastic cycle with frequency λ and damping ρ is (Trimbur 2006) ( f (ω; ρ, λ, σκ2 ) = (1 − ρ2 )2k−1 Pk−1 k−12 2 σκ i=0 Pk j=0 i ) × ρ2i Pk i=0 (−1) j+i k j  k i ρj+i cos λ(j − i) cos ω(j − i) k 2π {1 + 4ρ2 cos2 λ + ρ4 − 4ρ(1 + ρ2 ) cos λ cos ω + 2ρ2 cos 2ω} where σκ2 is the variance of the cycle error term. The variance of the cycle is σω2 = and the power spectrum omits scaling by σκ2 Pk−1 i=0 (1 − k−1 2 2i ρ i 2 2k−1 ρ )  σω2 . References Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. 2008. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: Wiley. Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. . 1993. Time Series Models. 2nd ed. Cambridge, MA: MIT Press. Priestley, M. B. 1981. Spectral Analysis and Time Series. London: Academic Press. Trimbur, T. M. 2006. Properties of higher order stochastic cycles. Journal of Time Series Analysis 27: 1–17. Wei, W. W. S. 2006. Time Series Analysis: Univariate and Multivariate Methods. 2nd ed. Boston: Pearson. Also see [TS] arfima — Autoregressive fractionally integrated moving-average models [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] ucm — Unobserved-components model Title rolling — Rolling-window and recursive estimation Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Acknowledgment Description rolling executes a command on each of a series of windows of observations and stores the results. rolling can perform what are commonly called rolling regressions, recursive regressions, and reverse recursive regressions. However, rolling is not limited to just linear regression analysis: any command that stores results in e() or r() can be used with rolling. Quick start Fit an AR(1) model for y with a 20-period rolling window using tsset data rolling, window(20): arima y, ar(1) Recursive rolling window estimation with a fixed starting period rolling, window(20) recursive: arima y, ar(1) As above, but specify that estimation start in 1990 and end in 2011 rolling, window(20) recursive start(1990) end(2011): arima y, ar(1) Reverse recursive rolling window estimation with the last period fixed rolling, window(20) rrecursive start(1990) end(2011): arima y, ar(1) Save results from a 20-period rolling window estimation to new dataset mydata.dta rolling, window(20) saving(mydata): arima y, ar(1) Note: Any command that accepts the rolling prefix may be substituted for arima above. Menu Statistics > Time series > Rolling-window and recursive estimation 513 514 rolling — Rolling-window and recursive estimation Syntax rolling  exp list   if   in   , options  : command Description options Main ∗ number of consecutive data points in each sample use recursive samples use reverse recursive samples window(#) recursive rrecursive Options clear saving(filename, . . .) stepsize(#) start(time constant) end(time constant)   keep(varname , start ) replace data in memory with results save results to filename; save statistics in double precision; save results to filename every # replications number of periods to advance window period at which rolling is to start period at which rolling is to end save varname along with results; optionally, use value at left edge of window Reporting nodots noisily trace suppress replication dots display any output from command trace command’s execution Advanced reject(exp) identify invalid results ∗ window(#) is required. You must tsset your data before using rolling; see [TS] tsset. command is any command that follows standard Stata syntax and allows the if qualifier. The by prefix cannot be part of command. aweights are allowed in command if command accepts aweights; see [U] 11.1.6 weight. exp list contains elist contains eexp is specname is eqno is (name: elist) elist eexp newvar = (exp) (exp) specname [eqno]specname b b[] se se[] ## name exp is a standard Stata expression; see [U] 13 Functions and expressions.  , which indicate optional arguments. Distinguish between [ ], which are to be typed, and rolling — Rolling-window and recursive estimation 515 Options   Main window(#) defines the window size used each time command is executed. The window size refers to calendar periods, not the number of observations. If there are missing data (for example, because of weekends), the actual number of observations used by command may be less than window(#). window(#) is required. recursive specifies that a recursive analysis be done. The starting period is held fixed, the ending period advances, and the window size grows. rrecursive specifies that a reverse recursive analysis be done. Here the ending period is held fixed, the starting period advances, and the window size shrinks.   Options clear specifies that Stata replace the data in memory with the collected statistics even though the current data in memory have not been saved to disk.   saving( filename , suboptions ) creates a Stata data file (.dta file) consisting of (for each statistic in exp list) a variable containing the window replicates. double specifies that the results for each replication be saved as doubles, meaning 8-byte reals. By default, they are saved as floats, meaning 4-byte reals. every(#) specifies that results be written to disk every #th replication. every() should be specified in conjunction with saving() only when command takes a long time for each replication. This will allow recovery of partial results should your computer crash. See [P] postfile. stepsize(#) specifies the number of periods the window is to be advanced each time command is executed. start(time constant) specifies the date on which rolling is to start. start() may be specified as an integer or as a date literal. end(time constant) specifies the date on which rolling is to end. end() may be specified as an integer or as a date literal.   keep(varname , start ) specifies a variable to be posted along with the results. The value posted is the value that corresponds to the right edge of the window. Specifying the start() option requests that the value corresponding to the left edge of the window be posted instead. This option is often used to record calendar dates.   Reporting nodots suppresses display of the replication dot for each window on which command is executed. By default, one dot character is printed for each window. A red ‘x’ is printed if command returns with an error or if any value in exp list is missing. noisily causes the output of command to be displayed for each window on which command is executed. This option implies the nodots option. trace causes a trace of the execution of command to be displayed. This option implies the noisily and nodots options.   Advanced reject(exp) identifies an expression that indicates when results should be rejected. When exp is true, the saved statistics are set to missing values. 516 rolling — Rolling-window and recursive estimation Remarks and examples rolling is a moving sampler that collects statistics from command after executing command on subsets of the data in memory. Typing . rolling exp˙list, window(50) clear: command executes command on sample windows of span 50. That is, rolling will first execute command by using periods 1–50 of the dataset, and then using periods 2–51, 3–52, and so on. command defines the statistical command to be executed. Most Stata commands and user-written programs can be used with rolling, as long as they follow standard Stata syntax and allow the if qualifier; see [U] 11 Language syntax. The by prefix cannot be part of command. exp list specifies the statistics to be collected from the execution of command. If no expressions are given, exp list assumes a default of b if command stores results in e() and of all the scalars if command stores results in r() and not in e(). Otherwise, not specifying an expression in exp list is an error. Suppose that you have data collected at 100 consecutive points in time, numbered 1–100, and you wish to perform a rolling regression with a window size of 20 periods. Typing . rolling _b, window(20) clear: regress depvar indepvar causes Stata to regress depvar on indepvar using periods 1–20, store the regression coefficients ( b), run the regression using periods 2–21, and so on, finishing with a regression using periods 81–100 (the last 20 periods). The stepsize() option specifies how far ahead the window is moved each time. For example, if you specify step(2), then command is executed on periods 1–20, and then 3–22, 5–24, etc. By default, rolling replaces the dataset in memory with the computed statistics unless the saving() option is specified, in which case the computed statistics are saved in the filename specified. If the dataset in memory has been changed since it was last saved and you do not specify saving(), you must use clear. rolling can also perform recursive and reverse recursive analyses. In a recursive analysis, the starting date is held fixed, and the window size grows as the ending date is advanced. In a reverse recursive analysis, the ending date is held fixed, and the window size shrinks as the starting date is advanced. Example 1 We have data on the daily returns to IBM stock (ibm), the S&P 500 (spx), and short-term interest rates (irx), and we want to create a series containing the beta of IBM by using the previous 200 trading days at each date. We will also record the standard errors, so that we can obtain 95% confidence intervals for the betas. See, for example, Stock and Watson (2011, 118) for more information on estimating betas. We type . use http://www.stata-press.com/data/r14/ibm (Source: Yahoo! Finance) . tsset t time variable: delta: t, 1 to 494 1 unit . generate ibmadj = ibm - irx (1 missing value generated) . generate spxadj = spx - irx (1 missing value generated) rolling — Rolling-window and recursive estimation 517 . rolling _b _se, window(200) saving(betas, replace) keep(date): > regress ibmadj spxadj (running regress on estimation sample) Rolling replications (295) 1 2 3 4 5 .................................................. .................................................. .................................................. .................................................. .................................................. ............................................. file betas.dta saved 50 100 150 200 250 Our dataset has both a time variable t that runs consecutively and a date variable date that measures the calendar date and therefore has gaps at weekends and holidays. Had we used the date variable as our time variable, rolling would have used windows consisting of 200 calendar days instead of 200 trading days, and each window would not have exactly 200 observations. We used the keep(date) option so that we could refer to the date variable when working with the results dataset. We can list a portion of the dataset created by rolling to see what it contains: . use betas, clear (rolling: regress) . sort date . list in 1/3, abbreviate(10) table 1. 2. 3. start end date _b_spxadj _b_cons _se_spxadj _se_cons 1 2 3 200 201 202 16oct2003 17oct2003 20oct2003 1.043422 1.039024 1.038371 -.0181504 -.0126876 -.0235616 .0658531 .0656893 .0654591 .0748295 .074609 .0743851 The variables start and end indicate the first and last observations used each time that rolling called regress, and the date variable contains the calendar date corresponding the period represented by end. The remaining variables are the estimated coefficients and standard errors from the regression. In our example , b spxadj contains the estimated betas, and b cons contains the estimated alphas. The variables se spxadj and se cons have the corresponding standard errors. 518 rolling — Rolling-window and recursive estimation Finally, we compute the confidence intervals for the betas and examine how they have changed over time: generate lower = _b_spxadj - 1.96*_se_spxadj generate upper = _b_spxadj + 1.96*_se_spxadj twoway (line _b_spxadj date) (rline lower upper date) if date>=td(1oct2003), ytitle("Beta") .6 .8 Beta 1 1.2 . . . > 01oct2003 01jan2004 01apr2004 01jul2004 date _b[spxadj] 01oct2004 01jan2005 lower/upper As 2004 progressed, IBM’s stock returns were less influenced by returns in the broader market. Beginning in June of 2004, IBM’s beta became significantly different from unity at the 95% confidence level, as indicated by the fact that the confidence interval does not contain one from then onward. In addition to rolling-window analyses, rolling can also perform recursive ones. Suppose again that you have data collected at 100 consecutive points in time, and now you type . rolling _b, window(20) recursive clear: regress depvar indepvar Stata will first regress depvar on indepvar by using observations 1–20, store the coefficients, run the regression using observations 1–21, observations 1–22, and so on, finishing with a regression using all 100 observations. Unlike a rolling regression, in which case the number of observations is held constant and the starting and ending points are shifted, a recursive regression holds the starting point fixed and increases the number of observations. Recursive analyses are often used in forecasting situations. As time goes by, more information becomes available that can be used in making forecasts. See Kmenta (1997, 423–424). Example 2 Using the same dataset, we type . use http://www.stata-press.com/data/r14/ibm, clear (Source: Yahoo! Finance) . tsset t time variable: t, 1 to 494 delta: 1 unit . generate ibmadj = ibm - irx (1 missing value generated) rolling — Rolling-window and recursive estimation 519 . generate spxadj = spx - irx (1 missing value generated) . rolling _b _se, recursive window(200) clear: regress ibmadj spxadj (output omitted ) . list in 1/3, abbrev(10) 1. 2. 3. start end _b_spxadj _b_cons _se_spxadj _se_cons 1 1 1 200 201 202 1.043422 1.039024 1.037687 -.0181504 -.0126876 -.016475 .0658531 .0656893 .0655896 .0748295 .074609 .0743481 Here the starting period remains fixed and the window grows larger. In a reverse recursive analysis, the ending date is held fixed, and the window size becomes smaller as the starting date is advanced. For example, with a dataset that has observations numbered 1–100, typing . rolling _b, window(20) reverse recursive clear: regress depvar indepvar creates a dataset in which the first observation has the results based on periods 1–100, the second observation has the results based on 2–100, the third having 3–100, and so on, up to the last observation having results based on periods 81–100 (the last 20 observations). Example 3 Using the data on stock returns, we want to build a model in which we predict today’s IBM stock return on the basis of yesterday’s returns on IBM and the S&P 500. That is, letting it and st denote the returns to IBM and the S&P 500 on date t, we want to fit the regression model it = β0 + β1 it−1 + β2 st−1 + t where t is a regression error term, and then compute c c c id t+1 = β0 + β1 it + β2 st We will use recursive regression because we suspect that the more data we have to fit the regression model, the better the model will predict returns. We will use at least 20 periods in fitting the regression. . use http://www.stata-press.com/data/r14/ibm, clear (Source: Yahoo! Finance) . tsset t time variable: t, 1 to 494 delta: 1 unit One alternative would be to use rolling with the recursive option to fit the regressions, collect the coefficients, and then compute the predicted values afterward. However, we will instead write a short program that computes the forecasts automatically and then use rolling, recursive on that program. The program must accept an if expression so that rolling can indicate to the program which observations are to be used. Our program is 520 rolling — Rolling-window and recursive estimation program myforecast, rclass syntax [if] regress ibm L.ibm L.spx ‘if’ // Find last time period of estimation sample and // make forecast for period just after that summ t if e(sample) local last = r(max) local fcast = _b[_cons] + _b[L.ibm]*ibm[‘last’] + /// _b[L.spx]*spx[‘last’] return scalar forecast = ‘fcast’ // Next period’s actual return // Will return missing value for final period return scalar actual = ibm[‘last’+1] end Now we call rolling: . rolling actual=r(actual) forecast=r(forecast), recursive window(20): myforecast (output omitted ) . corr actual forecast (obs=474) actual forecast actual forecast 1.0000 -0.0957 1.0000 Our model does not work too well—the correlation between actual returns and our forecasts is negative. Stored results rolling sets no r- or e-class macros. The results from the command used with rolling, depending on the last window of data used, are available after rolling has finished. Acknowledgment We thank Christopher F. Baum of the Department of Economics at Boston College and author of the Stata Press books An Introduction to Modern Econometrics Using Stata and An Introduction to Stata Programming for an earlier rolling regression command. References Kmenta, J. 1997. Elements of Econometrics. 2nd ed. Ann Arbor: University of Michigan Press. Stock, J. H., and M. W. Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: Addison–Wesley. Also see [D] statsby — Collect statistics for a command across a by list [R] stored results — Stored results Title sspace — State-space models Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description sspace estimates the parameters of linear state-space models by maximum likelihood. Linear state-space models are very flexible and many linear time-series models can be written as linear state-space models. sspace uses two forms of the Kalman filter to recursively obtain conditional means and variances of both the unobserved states and the measured dependent variables that are used to compute the likelihood. The covariance-form syntax and the error-form syntax of sspace reflect the two different forms in which researchers specify state-space models. Choose the syntax that is easier for you; the two forms are isomorphic. Quick start AR(1) model for y with unobserved state u modeled as lag of itself in the state equation, and requiring the coefficient of u constrained to 1 in the observation equation constraint 1 [y]u = 1 sspace (u L.u, state noconstant) (y u, noerror), constraints(1) Dynamic-factor model of the first difference of y1, y2, and y3 as linear functions of an unobserved factor that follows a first-order autoregressive process constraint 1 [y1]u = 1 sspace (u L.u, state noconstant) (d.y1 u) (d.y2 u) (d.y3 u), nolog Menu Statistics > Multivariate time series > State-space models 521 522 sspace — State-space models Syntax Covariance-form syntax     sspace state ceq state ceq . . . state ceq obs ceq obs ceq . . . obs ceq       if in , options where each state ceq is of the form      (statevar lagged statevars indepvars , state noerror noconstant ) and each obs ceq is of the form     (depvar statevars indepvars , noerror noconstant ) Error-form syntax     sspace state efeq state efeq . . . state efeq obs efeq obs efeq . . . obs efeq       if in , options where each state efeq is of the form       (statevar lagged statevars indepvars state errors , state noconstant ) and each obs efeq is of the form      (depvar statevars indepvars obs errors , noconstant ) statevar is the name of an unobserved state, not a variable. If there happens to be a variable of the same name, the variable is ignored and plays no role in the estimation. lagged statevars is a list of lagged statevars. Only first lags are allowed. state errors is a list of state-equation errors that enter a state equation. Each state error has the form e.statevar, where statevar is the name of a state in the model. obs errors is a list of observation-equation errors that enter an equation for an observed variable. Each error has the form e.depvar, where depvar is an observed dependent variable in the model. equation-level options Description Model state noerror noconstant specifies that the equation is a state equation specifies that there is no error term in the equation suppresses the constant term from the equation sspace — State-space models options 523 Description Model covstate(covform) covobserved(covform) constraints(constraints) specifies the covariance structure for the errors in the state variables specifies the covariance structure for the errors in the observed dependent variables apply specified linear constraints SE/Robust vce(vcetype) vcetype may be oim or robust Reporting level(#) nocnsreport display options set confidence level; default is level(95) do not display constraints control columns and column formats, row spacing, display of omitted variables and base and empty cells, and factor-variable labeling Maximization maximize options control the maximization process; seldom used Advanced method(method) specify the method for calculating the log likelihood; seldom used coeflegend display legend instead of statistics covform Description identity dscalar diagonal unstructured identity matrix; the default for error-form syntax diagonal scalar matrix diagonal matrix; the default for covariance-form syntax symmetric, positive-definite matrix; not allowed with error-form syntax method Description hybrid use the stationary Kalman filter and the De Jong diffuse Kalman filter; the default use the stationary De Jong Kalman filter and the De Jong diffuse Kalman filter use the stationary Kalman filter and the nonstationary large-κ diffuse Kalman filter; seldom used dejong kdiffuse You must tsset your data before using sspace; see [TS] tsset. indepvars may contain factor variables; see [U] 11.4.3 Factor variables. indepvars and depvar may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. 524 sspace — State-space models Options  Equation-level options  Model state specifies that the equation is a state equation. noerror specifies that there is no error term in the equation. noerror may not be specified in the error-form syntax. noconstant suppresses the constant term from the equation.  Options  Model covstate(covform) specifies the covariance structure for the state errors. covstate(identity) specifies a covariance matrix equal to an identity matrix, and it is the default for the error-form syntax. 2 times an identity matrix. covstate(dscalar) specifies a covariance matrix equal to σstate covstate(diagonal) specifies a diagonal covariance matrix, and it is the default for the covarianceform syntax. covstate(unstructured) specifies a symmetric, positive-definite covariance matrix with parameters for all variances and covariances. covstate(unstructured) may not be specified with the error-form syntax. covobserved(covform) specifies the covariance structure for the observation errors. covobserved(identity) specifies a covariance matrix equal to an identity matrix, and it is the default for the error-form syntax. 2 covobserved(dscalar) specifies a covariance matrix equal to σobserved times an identity matrix. covobserved(diagonal) specifies a diagonal covariance matrix, and it is the default for the covariance-form syntax. covobserved(unstructured) specifies a symmetric, positive-definite covariance matrix with parameters for all variances and covariances. covobserved(unstructured) may not be specified with the error-form syntax. constraints(constraints); see [R] estimation options.   SE/Robust vce(vcetype) specifies the estimator for the variance–covariance matrix of the estimator. vce(oim), the default, causes sspace to use the observed information matrix estimator. vce(robust) causes sspace to use the Huber/White/sandwich estimator.   Reporting level(#), nocnsreport; see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), and sformat(% fmt); see [R] estimation options. sspace — State-space models  525  Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), and from(matname); see [R] maximize for all options except from(), and see below for information on from(). These options are seldom used. from(matname) specifies initial values for the maximization process. from(b0) causes sspace to begin the maximization algorithm with the values in b0. b0 must be a row vector; the number of columns must equal the number of parameters in the model; and the values in b0 must be in the same order as the parameters in e(b).   Advanced method(method) specifies how to compute the log likelihood. This option is seldom used. method(hybrid), the default, uses the Kalman filter with model-based initial values for the states when the model is stationary and uses the De Jong (1988, 1991) diffuse Kalman filter when the model is nonstationary. method(dejong) uses the Kalman filter with the De Jong (1988) method for estimating the initial values for the states when the model is stationary and uses the De Jong (1988, 1991) diffuse Kalman filter when the model is nonstationary. method(kdiffuse) is a seldom used method that uses the Kalman filter with model-based initial values for the states when the model is stationary and uses the large-κ diffuse Kalman filter when the model is nonstationary. The following option is available with sspace but is not shown in the dialog box: coeflegend; see [R] estimation options. Remarks and examples Remarks are presented under the following headings: An introduction to state-space models Some stationary state-space models Some nonstationary state-space models An introduction to state-space models Many linear time-series models can be written as linear state-space models, including vector autoregressive moving-average (VARMA) models, dynamic-factor (DF) models, and structural timeseries (STS) models. The solutions to some stochastic dynamic-programming problems can also be written in the form of linear state-space models. We can estimate the parameters of a linear statespace model by maximum likelihood (ML). The Kalman filter or a diffuse Kalman filter is used to write the likelihood function in prediction-error form, assuming normally distributed errors. The quasi–maximum likelihood (QML) estimator, which drops the normality assumption, is consistent and asymptotically normal when the model is stationary. Chang, Miller, and Park (2009) establish consistency and asymptotic normality of the QML estimator for a class of nonstationary state-space models. The QML estimator differs from the ML estimator only in the VCE; specify the vce(robust) option to obtain the QML estimator. 526 sspace — State-space models Hamilton (1994a, 1994b), Harvey (1989), and Brockwell and Davis (1991) provide good introductions to state-space models. Anderson and Moore’s (1979) text is a classic reference; they produced many results used subsequently. Caines (1988) and Hannan and Deistler (1988) provide excellent, more advanced, treatments. sspace estimates linear state-space models with time-invariant coefficient matrices, which cover the models listed above and many others. sspace can estimate parameters from state-space models of the form zt = Azt−1 + Bxt + Ct yt = Dzt + Fwt + Gνt where zt is an m × 1 vector of unobserved state variables; xt is a kx × 1 vector of exogenous variables; t is a q × 1 vector of state-error terms, (q ≤ m); yt is an n × 1 vector of observed endogenous variables; wt is a kw × 1 vector of exogenous variables; νt is an r × 1 vector of observation-error terms, (r ≤ n); and A, B, C, D, F, and G are parameter matrices. The equations for zt are known as the state equations, and the equations for yt are known as the observation equations. The error terms are assumed to be zero mean, normally distributed, serially uncorrelated, and uncorrelated with each other; t ∼ N (0, Q) νt ∼ N (0, R) E[t 0s ] = 0 for all s 6= t E[t ν0s ] = 0 for all s and t The state-space form is used to derive the log likelihood of the observed endogenous variables conditional on their own past and any exogenous variables. When the model is stationary, a method for recursively predicting the current values of the states and the endogenous variables, known as the Kalman filter, is used to obtain the prediction error form of the log-likelihood function. When the model is nonstationary, a diffuse Kalman filter is used. How the Kalman filter and the diffuse Kalman filter initialize their recursive computations depends on the method() option; see Methods and formulas. The linear state-space models with time-invariant coefficient matrices defined above can be specified in the covariance-form syntax and the error-form syntax. The covariance-form syntax requires that C and G be selection matrices, but places no restrictions on Q or R. In contrast, the error-form syntax places no restrictions C or G, but requires that Q and R be either diagonal, diagonal-scalar, or identity matrices. Some models are more easily specified in the covariance-form syntax, while others are more easily specified in the error-form syntax. Choose the syntax that is easiest for your application. sspace — State-space models 527 Some stationary state-space models Example 1: An AR(1) model Following Hamilton (1994b, 373–374), we can write the first-order autoregressive (AR(1)) model yt − µ = α(yt−1 − µ) + t as a state-space model with the observation equation yt = µ + ut and the state equation ut = αut−1 + t where the unobserved state is ut = yt − µ. Here we fit this model to data on the capacity utilization rate. The variable lncaputil contains data on the natural log of the capacity utilization rate for the manufacturing sector of the U.S. economy. We treat the series as first-difference stationary and fit its first-difference to an AR(1) process. Here we estimate the parameters of the above state-space form of the AR(1) model: . use http://www.stata-press.com/data/r14/manufac (St. Louis Fed (FRED) manufacturing data) . constraint 1 [D.lncaputil]u = 1 . sspace (u L.u, state noconstant) (D.lncaputil u, noerror), constraints(1) searching for initial values ........... (setting technique to bhhh) Iteration 0: log likelihood = 1515.8693 Iteration 1: log likelihood = 1516.4187 (output omitted ) Refining estimates: Iteration 0: log likelihood = 1516.44 Iteration 1: log likelihood = 1516.44 State-space model Sample: 1972m2 - 2008m12 Number of obs = 443 Wald chi2(1) = 61.73 Log likelihood = 1516.44 Prob > chi2 = 0.0000 ( 1) [D.lncaputil]u = 1 lncaputil Coef. u L1. .3523983 D.lncaputil u _cons 1 -.0003558 OIM Std. Err. z P>|z| [95% Conf. Interval] .0448539 7.86 0.000 .2644862 .4403104 (constrained) .0005781 -0.62 0.538 -.001489 .0007773 0.000 .000054 .0000704 u var(u) .0000622 4.18e-06 14.88 Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. 528 sspace — State-space models The iteration log has three parts: the dots from the search for initial values, the log from finding the maximum, and the log from a refining step. Here is a description of the logic behind each part: 1. The quality of the initial values affect the speed and robustness of the optimization algorithm. sspace takes a few iterations in a nonlinear least-squares (NLS) algorithm to find good initial values and reports a dot for each (NLS) iteration. 2. This iteration log is the standard method by which Stata reports the search for the maximum likelihood estimates of the parameters in a nonlinear model. 3. Some of the parameters are transformed in the maximization process that sspace reports in part 2. After a maximum candidate is found in part 2, sspace looks for a maximum in the unconstrained space, checks that the Hessian of the log-likelihood function is of full rank, and reports these iterations as the refining step. The header in the output describes the estimation sample, reports the log-likelihood function at the maximum, and gives the results of a Wald test against the null hypothesis that the coefficients on all the independent variables, state variables, and lagged state variables are zero. In this example, the null hypothesis that the coefficient on L1.u is zero is rejected at all conventional levels. The estimation table reports results for the state equations, the observation equations, and the variance–covariance parameters. The estimated autoregressive coefficient of 0.3524 indicates that there is persistence in the first-differences of the log of the manufacturing rate. The estimated mean of the differenced series is −0.0004, which is smaller in magnitude than its standard error, indicating that there is no deterministic linear trend in the series. Typing . arima D.lncaputil, ar(1) technique(nr) (output omitted ) produces nearly identical parameter estimates and standard errors for the mean and the autoregressive parameter. Because sspace estimates the variance of the state error while arima estimates the standard deviation, calculations are required to obtain the same results. The different parameterization of the variance parameter can cause small numerical differences. Technical note In some situations, the second part of the iteration log terminates but the refining step never converges. Only when the refining step converges does the maximization algorithm find interpretable estimates. If the refining step iterates without convergence, the parameters of the specified model are not identified by the data. (See Rothenberg [1971], Drukker and Wiggins [2004], and Davidson and MacKinnon [1993, sec. 5.2] for discussions of identification.) Example 2: An ARMA(1,1) model Following Harvey (1993, 95–96), we can write a zero-mean, first-order, autoregressive movingaverage (ARMA(1,1)) model yt = αyt−1 + θt−1 + t (1) sspace — State-space models 529 as a state-space model with state equations  yt θt   = α 1 0 0  yt−1 θt−1    1 + t θ (2) and observation equation  yt = ( 1 0 ) yt θt  (3) The unobserved states in this model are u1t = yt and u2t = θt . We set the process mean to zero because economic theory and the previous example suggest that we should do so. Below we estimate the parameters in the state-space model by using the error-form syntax: . constraint 2 [u1]L.u2 = 1 . constraint 3 [u1]e.u1 = 1 . constraint 4 [D.lncaputil]u1 = 1 . sspace (u1 L.u1 L.u2 e.u1, state noconstant) (u2 e.u1, state noconstant) > (D.lncaputil u1, noconstant), constraints(2/4) covstate(diagonal) searching for initial values .......... (setting technique to bhhh) Iteration 0: log likelihood = 1478.5361 Iteration 1: log likelihood = 1490.5202 (output omitted ) Refining estimates: Iteration 0: log likelihood = 1531.255 Iteration 1: log likelihood = 1531.255 State-space model Sample: 1972m2 - 2008m12 Number of obs Wald chi2(2) Prob > chi2 Log likelihood = 1531.255 ( 1) [u1]L.u2 = 1 ( 2) [u1]e.u1 = 1 ( 3) [D.lncaputil]u1 = 1 lncaputil Coef. u1 L1. .8056815 u2 L1. e.u1 1 1 e.u1 -.5188453 D.lncaputil u1 1 OIM Std. Err. z = = = 443 333.84 0.0000 P>|z| [95% Conf. Interval] 0.000 .7032418 .9081212 0.000 -.6564317 -.3812588 0.000 .0000505 .0000659 u1 .0522661 15.41 (constrained) (constrained) u2 var(u1) .0000582 .0701985 -7.39 (constrained) 3.91e-06 14.88 Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. 530 sspace — State-space models The command in the above output specifies two state equations, one observation equation, and two options. The first state equation defines u1t and the second defines u2t according to (2) above. The observation equation defines the process for D.lncaputil according to the one specified in (3) above. Several coefficients in (2) and (3) are set to 1, and constraints 2–4 place these restrictions on the model. The estimated coefficient on L.u1 in equation u1, 0.806, is the estimate of α in (2), which is the autoregressive coefficient in the ARMA model in (1). The estimated coefficient on e.u1 in equation u2, −0.519, is the estimate of θ, which is the moving-average term in the ARMA model in (1). This example highlights a difference between the error-form syntax and the covariance-form syntax. The error-form syntax used in this example includes only explicitly included errors. In contrast, the covariance-form syntax includes an error term in each equation, unless the noerror option is specified. The default for covstate() also differs between the error-form syntax and the covarianceform syntax. Because the coefficients on the errors in the error-form syntax are frequently used to estimate the standard deviation of the errors, covstate(identity) is the default for the errorform syntax. In contrast, unit variances are less common in the covariance-form syntax, for which covstate(diagonal) is the default. In this example, we specified covstate(diagonal) to estimate a nonunitary variance for the state. Typing . arima D.lncaputil, noconstant ar(1) ma(1) technique(nr) (output omitted ) produces nearly identical results. As in the AR(1) example above, arima estimates the standard deviation of the error term, while sspace estimates the variance. Although they are theoretically equivalent, the different parameterizations give rise to small numerical differences in the other parameters. Example 3: A VAR(1) model The variable lnhours contains data on the log of manufacturing hours, which we treat as firstdifference stationary. We have a theory in which the process driving the changes in the log utilization rate affects the changes in the log of hours, but changes in the log hours do not affect changes in the log utilization rate. In line with this theory, we estimate the parameters of a lower triangular, first-order vector autoregressive (VAR(1)) process  ∆lncaputilt ∆lnhourst   = α1 α2 0 α3  ∆lncaputilt−1 ∆lnhourst−1   + 1t 2t  (4) where ∆yt = yt − yt−1 , t = (1t , 2t )0 and Var() = Σ. We can write this VAR(1) process as a state-space model with state equations  u1t u2t   = α1 α2 0 α3  u1(t−1) u2(t−1)   + with Var() = Σ and observation equations  ∆lncaputil ∆lnhours   = u1t u2t Below we estimate the parameters of the state-space model:  1t 2t  (5) sspace — State-space models 531 . constraint 5 [D.lncaputil]u1 = 1 . constraint 6 [D.lnhours]u2 = 1 . sspace (u1 L.u1, state noconstant) > (u2 L.u1 L.u2, state noconstant) > (D.lncaputil u1, noconstant noerror) > (D.lnhours u2, noconstant noerror), > constraints(5/6) covstate(unstructured) searching for initial values .......... (setting technique to bhhh) Iteration 0: log likelihood = 2789.6095 Iteration 1: log likelihood = 2957.8299 (output omitted ) Refining estimates: Iteration 0: log likelihood = 3211.7532 Iteration 1: log likelihood = 3211.7532 State-space model Sample: 1972m2 - 2008m12 Number of obs Wald chi2(3) Prob > chi2 Log likelihood = 3211.7532 ( 1) [D.lncaputil]u1 = 1 ( 2) [D.lnhours]u2 = 1 Coef. = = = 443 166.87 0.0000 OIM Std. Err. z P>|z| [95% Conf. Interval] u1 u1 L1. .353257 .0448456 7.88 0.000 .2653612 .4411528 u1 L1. .1286218 .0394742 3.26 0.001 .0512537 .2059899 u2 L1. -.3707083 .0434255 -8.54 0.000 -.4558208 -.2855959 D.lncaputil u1 1 (constrained) 1 (constrained) 0.000 0.000 0.000 .0000541 .0000208 .0000335 .0000705 .0000312 .0000437 u2 D.lnhours u2 var(u1) cov(u1,u2) var(u2) .0000623 .000026 .0000386 4.19e-06 2.67e-06 2.61e-06 14.88 9.75 14.76 Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. Specifying covstate(unstructured) caused sspace to estimate the off-diagonal element of Σ. The output indicates that this parameter, cov(u2,u1): cons, is small but statistically significant. The estimated coefficient on L.u1 in equation u1, 0.353, is the estimate of α1 in (5). The estimated coefficient on L.u1 in equation u2, 0.129, is the estimate of α2 in (5). The estimated coefficient on L.u1 in equation u2, −0.371, is the estimate of α3 in (5). For the VAR(1) model in (4), the estimated autoregressive coefficient for D.lncaputil is similar to the corresponding estimate in the univariate results in example 1. The estimated effect of LD.lncaputil on D.lnhours is 0.129, the estimated autoregressive coefficient of D.lnhours is −0.371, and both are statistically significant. 532 sspace — State-space models These estimates can be compared with those produced by typing . constraint 101 [D_lncaputil]LD.lnhours = 0 . var D.lncaputil D.lnhours, lags(1) noconstant constraints(101) (output omitted ) . matrix list e(Sigma) (output omitted ) The var estimates are not the same as the sspace estimates because the generalized least-squares estimator implemented in var is only asymptotically equivalent to the ML estimator implemented in sspace, but the point estimates are similar. The comparison is useful for pedagogical purposes because the var estimator is relatively simple. Some problems require constraining a covariance term to zero. If we wanted to constrain cov(u2,u1): cons to zero, we could type . constraint 7 [cov(u2,u1)]_cons = 0 . sspace (u1 L.u1, state noconstant) > (u2 L.u1 L.u2, state noconstant) > (D.lncaputil u1, noconstant noerror) > (D.lnhours u2, noconstant noerror), > constraints(5/7) covstate(unstructured) (output omitted ) Example 4: A VARMA(1,1) model We now extend the previous example by modeling D.lncaputil and D.lnhours as a first-order vector autoregressive moving-average (VARMA(1,1)) process. Building on the previous examples, we allow the lag of D.lncaputil to affect D.lnhours but we do not allow the lag of D.lnhours to affect the lag of D.lncaputil. Previous univariate analysis revealed that D.lnhours is better modeled as an autoregressive process than as an ARMA(1,1) process. As a result, we estimate the parameters of  ∆lncaputilt ∆lnhourst   = α1 α2  0 α3 ∆lncaputilt−1 ∆lnhourst−1   + θ1 0 0 0  1(t−1) 2(t−1) We can write this VARMA(1,1) process as a state-space model with state equations    s1t α1  s2t  =  0 s3t α2 where the states are    1 0 s1(t−1) 1 0 0   s2(t−1)  +  θ1 0 α3 s3(t−1) 0  0    0  1t 2t 1     s1t ∆lncaputilt  s2t  =   θ1 1t s3t ∆lnhourst and we simplify the problem by assuming that  Var 1t 2t   = σ12 0 0 σ22  Below we estimate the parameters of this model by using sspace:   + 1t 2t  sspace — State-space models . . . . . constraint constraint constraint constraint constraint 7 8 9 10 11 [u1]L.u2 = 1 [u1]e.u1 = 1 [u3]e.u3 = 1 [D.lncaputil]u1 = 1 [D.lnhours]u3 = 1 . sspace (u1 L.u1 L.u2 e.u1, state noconstant) > (u2 e.u1, state noconstant) > (u3 L.u1 L.u3 e.u3, state noconstant) > (D.lncaputil u1, noconstant) > (D.lnhours u3, noconstant), > constraints(7/11) technique(nr) covstate(diagonal) searching for initial values .......... (output omitted ) Refining estimates: Iteration 0: log likelihood = 3156.0564 Iteration 1: log likelihood = 3156.0564 State-space model Sample: 1972m2 - 2008m12 Number of obs Wald chi2(4) Prob > chi2 Log likelihood = 3156.0564 ( 1) [u1]L.u2 = 1 ( 2) [u1]e.u1 = 1 ( 3) [u3]e.u3 = 1 ( 4) [D.lncaputil]u1 = 1 ( 5) [D.lnhours]u3 = 1 Coef. OIM Std. Err. z = = = 443 427.55 0.0000 P>|z| [95% Conf. Interval] 0.000 .7033964 .9082098 u1 u1 L1. .8058031 u2 L1. e.u1 1 1 e.u1 -.518907 .0701848 -7.39 0.000 -.6564667 -.3813474 u1 L1. .1734868 .0405156 4.28 0.000 .0940776 .252896 u3 L1. e.u3 -.4809376 1 .0498574 -9.65 (constrained) 0.000 -.5786563 -.3832188 D.lncaputil u1 1 (constrained) 1 (constrained) 0.000 0.000 .0000505 .0000331 .0000659 .0000432 .0522493 15.42 (constrained) (constrained) u2 u3 D.lnhours u3 var(u1) var(u3) .0000582 .0000382 3.91e-06 2.56e-06 14.88 14.88 Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. 533 534 sspace — State-space models The estimates of the parameters in the model for D.lncaputil are similar to those in the univariate model fit in example 2. The estimates of the parameters in the model for D.lnhours indicate that the lag of D.lncaputil has a positive effect on D.lnhours. Technical note The technique(nr) option facilitates convergence in example 4. Fitting state-space models is notoriously difficult. Convergence problems are common. Four methods for overcoming convergence problems are 1) selecting an alternate optimization algorithm by using the technique() option, 2) using alternative starting values by specifying the from() option, 3) using starting values obtained by estimating the parameters of a restricted version of the model of interest, or 4) putting the variables on the same scale. Example 5: A dynamic-factor model Stock and Watson (1989, 1991) wrote a simple macroeconomic model as a dynamic-factor model, estimated the parameters by ML, and extracted an economic indicator. In this example, we estimate the parameters of a dynamic-factor model. In [TS] sspace postestimation, we extend this example and extract an economic indicator for the differenced series. We have data on an industrial-production index, ipman; an aggregate weekly hours index, hours; and aggregate unemployment, unemp. income is real disposable income divided by 100. We rescaled real disposable income to avoid convergence problems. We postulate a latent factor that follows an AR(2) process. Each measured variable is then related to the current value of that latent variable by a parameter. The state-space form of our model is        ft−1 νt = + ft−1 ft−2 0      ∆ipmant γ1 1t  ∆incomet   γ2   2t    =   ft +   ∆hourst γ3 3t ∆unempt γ4 4t where ft θ1 1 θ2 0    2 1t σ1     0 Var  2t  =  0 3t 4t 0  0 σ22 0 0 0 0 σ32 0  0 0   0 σ42 sspace — State-space models 535 The parameter estimates are . use http://www.stata-press.com/data/r14/dfex (St. Louis Fed (FRED) macro data) . constraint 12 [lf]L.f = 1 . sspace (f L.f L.lf, state noconstant) > (lf L.f, state noconstant noerror) > (D.ipman f, noconstant) > (D.income f, noconstant) > (D.hours f, noconstant) > (D.unemp f, noconstant), > covstate(identity) constraints(12) searching for initial values ................. (setting technique to bhhh) Iteration 0: log likelihood = -674.18497 Iteration 1: log likelihood = -667.23913 (output omitted ) Refining estimates: Iteration 0: log likelihood = -662.09507 Iteration 1: log likelihood = -662.09507 State-space model Sample: 1972m2 - 2008m11 Log likelihood = -662.09507 ( 1) [lf]L.f = 1 Coef. Number of obs Wald chi2(6) Prob > chi2 = = = 442 751.95 0.0000 OIM Std. Err. z P>|z| [95% Conf. Interval] f f L1. .2651932 .0568663 4.66 0.000 .1537372 .3766491 lf L1. .4820398 .0624635 7.72 0.000 .3596136 .604466 f L1. 1 f .3502249 .0287389 12.19 0.000 .2938976 .4065522 f .0746338 .0217319 3.43 0.001 .0320401 .1172276 f .2177469 .0186769 11.66 0.000 .1811407 .254353 f -.0676016 .0071022 -9.52 0.000 -.0815217 -.0536816 .1383158 .2773808 .0911446 .0237232 .0167086 .0188302 .0080847 .0017932 8.28 14.73 11.27 13.23 0.000 0.000 0.000 0.000 .1055675 .2404743 .0752988 .0202086 .1710641 .3142873 .1069903 .0272378 lf (constrained) D.ipman D.income D.hours D.unemp var(D.ipman) var(D.income) var(D.hours) var(D.unemp) Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The output indicates that the unobserved factor is quite persistent and that it is a significant predictor for each of the observed variables. 536 sspace — State-space models These models are frequently used to forecast the dependent variables and to estimate the unobserved factors. We present some illustrative examples in [TS] sspace postestimation. The dfactor command estimates the parameters of dynamic-factor models; see [TS] dfactor. Some nonstationary state-space models Example 6: A local-level model Harvey (1989) advocates the use of STS models. These models parameterize the trends and seasonal components of a set of time series. The simplest STS model is the local-level model, which is given by yt = µt + t where µt = µt−1 + νt The model is called a local-level model because the level of the series is modeled as a random walk plus an idiosyncratic noise term. (The model is also known as the random-walk-plus-noise model.) The local-level model is nonstationary because of the random-walk component. When the variance of the idiosyncratic-disturbance t is zero and the variance of the level-disturbance νt is not zero, the local-level model reduces to a random walk. When the variance of the level-disturbance νt is zero and the variance of the idiosyncratic-disturbance t is not zero, µt = µt−1 = µ and the local-level model reduces to y t = µ + t which is a simple regression with a time-invariant mean. The parameter µ is not estimated in the state-space formulation below. In this example, we fit weekly levels of the Standard and Poor’s 500 Index to a local-level model. Because this model is already in state-space form, we fit close by typing sspace — State-space models 537 . use http://www.stata-press.com/data/r14/sp500w . constraint 13 [z]L.z = 1 . constraint 14 [close]z = 1 . sspace (z L.z, state noconstant) (close z, noconstant), constraints(13 14) searching for initial values .......... (setting technique to bhhh) Iteration 0: log likelihood = -12582.89 Iteration 1: log likelihood = -12577.146 (output omitted ) Refining estimates: Iteration 0: log likelihood = -12576.99 Iteration 1: log likelihood = -12576.99 State-space model Sample: 1 - 3093 Number of obs = 3,093 Log likelihood = -12576.99 ( 1) [z]L.z = 1 ( 2) [close]z = 1 OIM Std. Err. close Coef. z z L1. 1 (constrained) z 1 (constrained) P>|z| [95% Conf. Interval] 0.000 0.000 155.4794 8.599486 z close var(z) var(close) 170.3456 15.24858 7.584909 3.392457 22.46 4.49 185.2117 21.89767 Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The results indicate that both components have nonzero variances. The output footer informs us that the model is nonstationary at the estimated parameter values. Technical note In the previous example, we estimated the parameters of a nonstationary state-space model. The model is nonstationary because one of the eigenvalues of the A matrix has unit modulus. That all the coefficients in the A matrix are fixed is also important. See Lütkepohl (2005, 636–637) for why the ML estimator for the parameters of a nonstationary state-model that is nonstationary because of eigenvalues with unit moduli from a fixed A matrix is still consistent and asymptotically normal. Example 7: A local linear-trend model In another basic STS model, known as the local linear-trend model, both the level and the slope of a linear time trend are random walks. Here are the state equations and the observation equation for a local linear-trend model for the level of industrial production contained in variable ipman:        µt 1 1 µt−1 ν1t = + βt 0 1 βt−1 ν2t ipmant = µt + t 538 sspace — State-space models The estimated parameters are . use http://www.stata-press.com/data/r14/dfex (St. Louis Fed (FRED) macro data) . constraint 15 [f1]L.f1 = 1 . constraint 16 [f1]L.f2 = 1 . constraint 17 [f2]L.f2 = 1 . constraint 18 [ipman]f1 = 1 . sspace (f1 L.f1 L.f2, state noconstant) > (f2 L.f2, state noconstant) > (ipman f1, noconstant), constraints(15/18) searching for initial values .......... (setting technique to bhhh) Iteration 0: log likelihood = -362.93861 Iteration 1: log likelihood = -362.12048 (output omitted ) Refining estimates: Iteration 0: log likelihood = -359.1266 Iteration 1: log likelihood = -359.1266 State-space model Sample: 1972m1 - 2008m11 Number of obs Log likelihood = -359.1266 ( 1) [f1]L.f1 = 1 ( 2) [f1]L.f2 = 1 ( 3) [f2]L.f2 = 1 ( 4) [ipman]f1 = 1 OIM Std. Err. ipman Coef. f1 L1. 1 (constrained) f2 L1. 1 (constrained) f2 L1. 1 (constrained) f1 1 (constrained) = 443 z P>|z| [95% Conf. Interval] 3.62 2.72 2.39 0.000 0.003 0.008 .067506 .0049898 .0063989 f1 f2 ipman var(f1) var(f2) var(ipman) .1473071 .0178752 .0354429 .0407156 .0065743 .0148186 .2271082 .0307606 .0644868 Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. There is little evidence that either of the variance parameters are zero. The fit obtained indicates that we could now proceed with specification testing and checks to see how well this model forecasts these data. sspace — State-space models Stored results sspace stores the following in e(): Scalars e(N) e(k) e(k aux) e(k eq) e(k dv) e(k obser) e(k state) e(k obser err) e(k state err) e(df m) e(ll) e(chi2) e(p) e(tmin) e(tmax) e(stationary) e(rank) e(ic) e(rc) e(converged) Macros e(cmd) e(cmdline) e(depvar) e(obser deps) e(state deps) e(covariates) e(indeps) e(tvar) e(eqnames) e(title) e(tmins) e(tmaxs) e(R structure) e(Q structure) e(chi2type) e(vce) e(vcetype) e(opt) e(method) e(initial values) e(technique) e(tech steps) e(datasignature) e(datasignaturevars) e(properties) e(estat cmd) e(predict) e(marginsok) e(marginsnotok) e(asbalanced) e(asobserved) number of observations number of parameters number of auxiliary parameters number of equations in e(b) number of dependent variables number of observation equations number of state equations number of observation-error terms number of state-error terms model degrees of freedom log likelihood χ2 significance minimum time in sample maximum time in sample 1 if the estimated parameters indicate a stationary model, 0 otherwise rank of VCE number of iterations return code 1 if converged, 0 otherwise sspace command as typed unoperated names of dependent variables in observation equations names of dependent variables in observation equations names of dependent variables in state equations list of covariates independent variables variable denoting time within groups names of equations title in estimation output formatted minimum time formatted maximum time structure of observed-variable-error covariance matrix structure of state-error covariance matrix Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. type of optimization likelihood method type of initial values maximization technique iterations taken in maximization technique the checksum variables used in calculation of checksum b V program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins factor variables fvset as asbalanced factor variables fvset as asobserved 539 540 sspace — State-space models Matrices e(b) e(Cns) e(ilog) e(gradient) e(gamma) e(A) e(B) e(C) e(D) e(F) e(G) e(chol R) e(chol Q) e(chol Sz0) e(z0) e(d) e(T) e(M) e(V) e(V modelbased) Functions e(sample) parameter vector constraints matrix iteration log (up to 20 iterations) gradient vector mapping from parameter vector to state-space matrices estimated A matrix estimated B matrix estimated C matrix estimated D matrix estimated F matrix estimated G matrix Cholesky factor of estimated R matrix Cholesky factor of estimated Q matrix Cholesky factor of initial state covariance matrix initial state vector augmented with a matrix identifying nonstationary components additional term in diffuse initial state vector, if nonstationary model inner part of quadratic form for initial state covariance in a partially nonstationary model outer part of quadratic form for initial state covariance in a partially nonstationary model variance–covariance matrix of the estimators model-based variance marks estimation sample Methods and formulas Recall that our notation for linear state-space models with time-invariant coefficient matrices is zt = Azt−1 + Bxt + Ct yt = Dzt + Fwt + Gνt where zt is an m × 1 vector of unobserved state variables; xt is a kx × 1 vector of exogenous variables; t is a q × 1 vector of state-error terms, (q ≤ m); yt is an n × 1 vector of observed endogenous variables; wt is a kw × 1 vector of exogenous variables; νt is an r × 1 vector of observation-error terms, (r ≤ n); and A, B, C, D, F, and G are parameter matrices. The equations for zt are known as the state equations, and the equations for yt are known as the observation equations. The error terms are assumed to be zero mean, normally distributed, serially uncorrelated, and uncorrelated with each other; t ∼ N (0, Q) νt ∼ N (0, R) E[t 0s ] = 0 for all s 6= t E[t ν0s ] = 0 for all s and t sspace — State-space models 541 sspace estimates the parameters of linear state-space models by maximum likelihood. The Kalman filter is a method for recursively obtaining linear, least-squares forecasts of yt conditional on past information. These forecasts are used to construct the log likelihood, assuming normality and stationarity. When the model is nonstationary, a diffuse Kalman filter is used. Hamilton (1994a; 1994b, 389) shows that the QML estimator, obtained when the normality assumption is dropped, is consistent and asymptotically normal, although the variance–covariance matrix of the estimator (VCE) must be estimated by the Huber/White/sandwich estimator. Hamilton’s discussion applies to stationary models, and specifying vce(robust) produces a consistent estimator of the VCE when the errors are not normal. Methods for computing the log likelihood differ in how they calculate initial values for the Kalman filter when the model is stationary, how they compute a diffuse Kalman filter when the model is nonstationary, and whether terms for initial states are included. sspace offers the method(hybrid), method(dejong), and method(kdiffuse) options for computing the log likelihood. All three methods handle both stationary and nonstationary models. method(hybrid), the default, uses the initial values for the states implied by stationarity to initialize the Kalman filter when the model is stationary. Hamilton (1994b, 378) discusses this method of computing initial values for the states and derives a log-likelihood function that does not include terms for the initial states. When the model is nonstationary, method(hybrid) uses the De Jong (1988, 1991) diffuse Kalman filter and log-likelihood function, which includes terms for the initial states. method(dejong) uses the stationary De Jong (1988) method when the model is stationary and the De Jong (1988, 1991) diffuse Kalman filter when the model is nonstationary. The stationary De Jong (1988) method estimates initial values for the Kalman filter as part of the log-likelihood computation, as in De Jong (1988). method(kdiffuse) implements the seldom-used large-κ diffuse approximation to the diffuse Kalman filter when the model is nonstationary and uses initial values for the states implied by stationarity when the model is stationary. The log likelihood does not include terms for the initial states in either case. We recommend that you do not use method(kdiffuse) except to replicate older results computed using this method. De Jong (1988, 1991) and De Jong and Chu-Chun-Lin (1994) derive the log likelihood and a diffuse Kalman filter for handling nonstationary data. De Jong (1988) replaces the stationarity assumption with a time-immemorial assumption, which he uses to derive the log-likelihood function, an initial state vector, and a covariance of the initial state vector when the model is nonstationary. By default, and when method(hybrid) or method(dejong) is specified, sspace uses the diffuse Kalman filter given in definition 5 of De Jong and Chu-Chun-Lin (1994). This method uses theorem 3 of De Jong and Chu-Chun-Lin (1994) to compute the covariance of the initial states. When using this method, sspace saves the matrices from their theorem 3 in e(), although the names are changed. e(Z) is their U1 , e(T) is their U2 , e(A) is their T, and e(M) is their M. See De Jong (1988, 1991) and De Jong and Chu-Chun-Lin (1994) for the details of the De Jong diffuse Kalman filter. Practical estimation and inference require that the maximum likelihood estimator be consistent and normally distributed in large samples. These statistical properties of the maximum likelihood estimator are well established when the model is stationary; see Caines (1988, chap. 5 and 7), Hamilton (1994b, 388–389), and Hannan and Deistler (1988, chap. 4). When the model is nonstationary, additional assumptions must hold for the maximum likelihood estimator to be consistent and asymptotically normal; see Harvey (1989, sec. 3.4), Lütkepohl (2005, 636–637), and Schneider (1988). Chang, Miller, and Park (2009) show that the ML and the QML estimators are consistent and asymptotically normal for a class of nonstationary state-space models. 542 sspace — State-space models We now give an intuitive version of the Kalman filter. sspace uses theoretically equivalent, but numerically more stable, methods. For each time t, the Kalman filter produces the conditional expected state vector zt|t and the conditional covariance matrix Ωt|t ; both are conditional on information up to and including time t. Using the model and previous period results, for each t we begin with zt|t−1 = Azt−1|t−1 + Bxt Ωt|t−1 = AΩt−1|t−1 A0 + CQC0 (6) yt|t−1 = Dzt|t−1 + Fwt The residuals and the mean squared error (MSE) matrix of the forecast error are e νt|t = yt − yt|t−1 (7) Σt|t = DΩt|t−1 D0 + GRG0 In the last steps, we update the conditional expected state vector and the conditional covariance with the time t information: zt|t = zt|t−1 + Ωt|t−1 DΣ−1 νt|t t|t e (8) −1 0 Ωt|t = Ωt|t−1 − Ωt|t−1 DΣt|t D Ωt|t−1 Equations (6)–(8) are the Kalman filter. The equations denoted by (6) are the one-step predictions. The one-step predictions do not use contemporaneous values of yt ; only past values of yt , past values of the exogenous xt , and contemporaneous values of xt are used. Equations (7) and (8) form the update step of the Kalman filter; they incorporate the contemporaneous dependent variable information into the predicted states. The Kalman filter requires initial values for the states and a covariance matrix for the initial states to start off the recursive process. Hamilton (1994b) discusses how to compute initial values for the Kalman filter assuming stationarity. This method is used by default when the model is stationary. De Jong (1988) discusses how to estimate initial values by maximum likelihood; this method is used when method(dejong) is specified. Letting δ be the vector of parameters in the model, Lütkepohl (2005) and Harvey (1989) show that the log-likelihood function for the parameters of a stationary model is given by ( lnL(δ) = −0.5 nT ln(2π) + T X t=1 ln(|Σt|t−1 |) + T X ) et Σ−1 t|t−1 et 0 t=1 where et = (yt − yt|t−1 ) depends on δ and Σ also depends on δ. The variance–covariance matrix of the estimator (VCE) is estimated by the observed information matrix (OIM) estimator by default. Specifying vce(robust) causes sspace to use the Huber/White/sandwich estimator. Both estimators of the VCE are standard and documented in Hamilton (1994b). Hamilton (1994b), Hannan and Deistler (1988), and Caines (1988) show that the ML estimator is consistent and asymptotically normal when the model is stationary. Schneider (1988) establishes consistency and asymptotic normality when the model is nonstationary because A has some eigenvalues with modulus 1 and there are no unknown parameters in A. sspace — State-space models 543 Not all state-space models are identified, as discussed in Hamilton (1994b) and Lütkepohl (2005). sspace checks for local identification at the optimum. sspace will not declare convergence unless the Hessian is full rank. This check for local identifiability is due to Rothenberg (1971). Specifying method(dejong) causes sspace to maximize the log-likelihood function given in section 2 (vii) of De Jong (1988). This log-likelihood function includes the initial states as parameters to be estimated. We use some of the methods in Casals, Sotoca, and Jerez (1999) for computing the De Jong (1988) log-likelihood function. References Anderson, B. D. O., and J. B. Moore. 1979. Optimal Filtering. Englewood Cliffs, NJ: Prentice Hall. Brockwell, P. J., and R. A. Davis. 1991. Time Series: Theory and Methods. 2nd ed. New York: Springer. Caines, P. E. 1988. Linear Stochastic Systems. New York: Wiley. Casals, J., S. Sotoca, and M. Jerez. 1999. A fast and stable method to compute the likelihood of time invariant state-space models. Economics Letters 65: 329–337. Chang, Y., J. I. Miller, and J. Y. Park. 2009. Extracting a common stochastic trend: Theory with some applications. Journal of Econometrics 150: 231–247. Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press. De Jong, P. 1988. The likelihood for a state space model. Biometrika 75: 165–169. . 1991. The diffuse Kalman filter. Annals of Statistics 19: 1073–1083. De Jong, P., and S. Chu-Chun-Lin. 1994. Stationary and non-stationary state space models. Journal of Time Series Analysis 15: 151–166. Drukker, D. M., and V. L. Wiggins. 2004. Verifying the solution from a nonlinear solver: A case study: Comment. American Economic Review 94: 397–399. Hamilton, J. D. 1994a. State-space models. In Vol. 4 of Handbook of Econometrics, ed. R. F. Engle and D. L. McFadden, 3039–3080. Amsterdam: Elsevier. . 1994b. Time Series Analysis. Princeton: Princeton University Press. Hannan, E. J., and M. Deistler. 1988. The Statistical Theory of Linear Systems. New York: Wiley. Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. . 1993. Time Series Models. 2nd ed. Cambridge, MA: MIT Press. Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Rothenberg, T. J. 1971. Identification in parametric models. Econometrica 39: 577–591. Schneider, W. 1988. Analytical uses of Kalman filtering in econometrics: A survey. Statistical Papers 29: 3–33. Stock, J. H., and M. W. Watson. 1989. New indexes of coincident and leading economic indicators. In NBER Macroeconomics Annual 1989, ed. O. J. Blanchard and S. Fischer, vol. 4, 351–394. Cambridge, MA: MIT Press. . 1991. A probability model of the coincident economic indicators. In Leading Economic Indicators: New Approaches and Forecasting Records, ed. K. Lahiri and G. H. Moore, 63–89. Cambridge: Cambridge University Press. 544 sspace — State-space models Also see [TS] sspace postestimation — Postestimation tools for sspace [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] dfactor — Dynamic-factor models [TS] tsset — Declare data to be time-series data [TS] ucm — Unobserved-components model [TS] var — Vector autoregressive models [U] 20 Estimation and postestimation commands Title sspace postestimation — Postestimation tools for sspace Postestimation commands References predict Also see Remarks and examples Methods and formulas Postestimation commands The following standard postestimation commands are available after sspace: Command Description estat ic estat summarize estat vce estimates forecast lincom Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing and inference for linear combinations of coefficients likelihood-ratio test point estimates, standard errors, testing and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest nlcom predict predictnl test testnl 545 546 sspace postestimation — Postestimation tools for sspace predict Description for predict predict creates a new variable containing predictions such as expected values. The root mean squared error is available for all predictions. All predictions are also available as static one-step-ahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for predict Statistics > Postestimation Syntax for predict   predict type { stub* | newvarlist } statistic  if   in   , statistic options  Description Main xb states residuals rstandard observable variables latent state variables residuals standardized residuals These statistics are available both in and out of sample; type predict the estimation sample. options . . . if e(sample) . . . if wanted only for Description Options name(s) of equation(s) for which predictions are to be made put estimated root mean squared errors of predicted statistics in new variables dynamic(time constant) begin dynamic forecast at specified time equation(eqnames) rmse(stub* | newvarlist) Advanced smethod(method) method for predicting unobserved states method Description onestep smooth filter predict using past information predict using all sample information predict using past and contemporaneous information sspace postestimation — Postestimation tools for sspace 547 Options for predict   Main xb, states, residuals, and rstandard specify the statistic to be predicted. xb, the default, calculates the linear predictions of the observed variables. states calculates the linear predictions of the latent state variables. residuals calculates the residuals in the equations for observable variables. residuals may not be specified with dynamic(). rstandard calculates the standardized residuals, which are the residuals normalized to be uncorrelated and to have unit variances. rstandard may not be specified with smethod(filter), smethod(smooth), or dynamic().   Options equation(eqnames) specifies the equation(s) for which the predictions are to be calculated. If you do not specify equation() or stub*, the results are the same as if you had specified the name of the first equation for the predicted statistic. You specify a list of equation names, such as equation(income consumption) or equation(factor1 factor2), to identify the equations. Specify names of state equations when predicting states and names of observable equations in all other cases. equation() may not be specified with stub*. rmse(stub* | newvarlist) puts the root mean squared errors of the predicted statistics into the specified new variables. The root mean squared errors measure the variances due to the disturbances but do not account for estimation error. dynamic(time constant) specifies when predict starts producing dynamic forecasts. The specified time constant must be in the scale of the time variable specified in tsset, and the time constant must be inside a sample for which observations on the dependent variables are available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of 2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains exogenous variables, they must be present for the whole predicted sample. dynamic() may not be specified with rstandard, residuals, or smethod(smooth).   Advanced smethod(method) specifies the method for predicting the unobserved states; smethod(onestep), smethod(filter), and smethod(smooth) cause different amounts of information on the dependent variables to be used in predicting the states at each time period. smethod(onestep), the default, causes predict to estimate the states at each time period using previous information on the dependent variables. The Kalman filter is performed on previous periods, but only the one-step predictions are made for the current period. smethod(smooth) causes predict to estimate the states at each time period using all the sample data by the Kalman smoother. smethod(smooth) may not be specified with rstandard. smethod(filter) causes predict to estimate the states at each time period using previous and contemporaneous data by the Kalman filter. The Kalman filter is performed on previous periods and the current period. smethod(filter) may be specified only with states. 548 sspace postestimation — Postestimation tools for sspace Remarks and examples We assume that you have already read [TS] sspace. In this entry, we illustrate some of the features of predict after using sspace to estimate the parameters of a state-space model. All the predictions after sspace depend on the unobserved states, which are estimated recursively. Changing the sample can alter the state estimates, which can change all other predictions. Example 1: One-step predictions In example 5 of [TS] sspace, we estimated the parameters of the dynamic-factor model        ft−1 νt + ft−1 ft−2 0      γ1 1t ∆ipmant  2t   ∆incomet   γ2   =   ft +    γ3 3t ∆hourst γ4 4t ∆unempt where ft = θ1 1 θ2 0    2 1t σ1  2t   0 Var   =  3t 0 4t 0  0 σ22 0 0 by typing . use http://www.stata-press.com/data/r14/dfex (St. Louis Fed (FRED) macro data) . constraint 1 [lf]L.f = 1 . sspace (f L.f L.lf, state noconstant) > (lf L.f, state noconstant noerror) > (D.ipman f, noconstant) > (D.income f, noconstant) > (D.hours f, noconstant) > (D.unemp f, noconstant), > covstate(identity) constraints(1) (output omitted ) 0 0 σ32 0  0 0   0 σ42 sspace postestimation — Postestimation tools for sspace 549 Below we obtain the one-step predictions for each of the four dependent variables in the model, and then we graph the actual and predicted ipman: −4 −2 0 2 . predict dep* (option xb assumed; fitted values) . tsline D.ipman dep1, lcolor(gs10) xtitle("") legend(rows(2)) 1970m1 1980m1 1990m1 2000m1 2010m1 Industrial production; manufacturing (NAICS), D xb prediction, D.ipman, onestep The graph shows that the one-step predictions account for only a small part of the swings in the realized ipman. Example 2: Out-of-sample, dynamic predictions We use the estimates from the previous example to make out-of-sample predictions. After using tsappend to extend the dataset by six periods, we use predict with the dynamic() option and graph the result. . tsappend, add(6) . predict Dipman_f, dynamic(tm(2008m12)) equation(D.ipman) . tsline D.ipman Dipman_f if month>=tm(2008m1), xtitle("") legend(rows(2)) sspace postestimation — Postestimation tools for sspace −6 −4 −2 0 2 550 2008m1 2008m4 2008m7 2008m10 2009m1 2009m4 Industrial production; manufacturing (NAICS), D xb prediction, D.ipman, dynamic(tm(2008m12)) The model predicts that the changes in industrial production will remain negative for the forecast horizon, although they increase toward zero. Example 3: Estimating an unobserved factor In this example, we want to estimate the unobserved factor instead of predicting a dependent variable. Specifying smethod(smooth) causes predict to use all sample information in estimating the states by the Kalman smoother. Below we estimate the unobserved factor by using the estimation sample, and we graph ipman and the estimated factor: −6 −4 −2 0 2 4 . predict fac if e(sample), states smethod(smooth) equation(f) . tsline D.ipman fac, xtitle("") legend(rows(2)) 1970m1 1980m1 1990m1 2000m1 Industrial production; manufacturing (NAICS), D states, f, smooth 2010m1 sspace postestimation — Postestimation tools for sspace 551 Example 4: Calculating residuals The residuals and the standardized residuals are frequently used to review the specification of the model. Below we calculate the standardized residuals for each of the series and display them in a combined graph: . predict sres1-sres4 if e(sample), rstandard . tsline sres1, xtitle("") name(sres1) . tsline sres2, xtitle("") name(sres2) . tsline sres3, xtitle("") name(sres3) . tsline sres4, xtitle("") name(sres4) rstandard, D.hours, onestep −4 −2 0 2 4 6 1970m1 1980m1 1990m1 2000m1 2010m1 1970m1 1980m1 1990m1 2000m1 2010m1 rstandard, D.income, onestep −5 0 5 10 1970m1 1980m1 1990m1 2000m1 2010m1 rstandard, D.unemp, onestep −4 −2 0 2 4 rstandard, D.ipman, onestep −10 −5 0 5 . graph combine sres1 sres2 sres3 sres4, name(combined) 1970m1 1980m1 1990m1 2000m1 2010m1 Methods and formulas Estimating the unobserved states is key to predicting the dependent variables. By default and with the smethod(onestep) option, predict estimates the states in each period by applying the Kalman filter to all previous periods and only making the one-step predictions to the current period. (See Methods and formulas of [TS] sspace for the Kalman filter equations.) With the smethod(filter) option, predict estimates the states in each period by applying the Kalman filter on all previous periods and the current period. The computational difference between smethod(onestep) and smethod(filter) is that smethod(filter) performs the update step on the current period while smethod(onestep) does not. The statistical difference between 552 sspace postestimation — Postestimation tools for sspace smethod(onestep) and smethod(filter) is that smethod(filter) uses contemporaneous information on the dependent variables while smethod(onestep) does not. As noted in [TS] sspace, sspace has both a stationary and a diffuse Kalman filter. predict uses the same Kalman filter used for estimation. With the smethod(smooth) option, predict estimates the states in each period using all the sample information by applying the Kalman smoother. predict uses the Harvey (1989, sec. 3.6.2) fixed-interval smoother with model-based initial values to estimate the states when the estimated parameters imply a stationary model. De Jong (1989) provides a computationally efficient method. Hamilton (1994) discusses the model-based initial values for stationary state-space models. When the model is nonstationary, the De Jong (1989) diffuse Kalman smoother is used to predict the states. The smoothed estimates of the states are subsequently used to predict the dependent variables. The dependent variables are predicted by plugging in the estimated states. The residuals are calculated as the differences between the predicted and the realized dependent variables. The root mean squared errors are the square roots of the diagonal elements of the mean squared error matrices that are computed by the Kalman filter. The standardized residuals are the residuals normalized by the Cholesky factor of their mean squared error produced by the Kalman filter. predict uses the Harvey (1989, sec. 3.5) methods to compute the dynamic forecasts and the root mean squared errors. Let τ be the period at which the dynamic forecasts begin; τ must either be in the specified sample or be in the period immediately following the specified sample. The dynamic forecasts depend on the predicted states in the period τ −1, which predict obtains by running the Kalman filter or the diffuse Kalman filter on the previous sample observations. The states in the periods prior to starting the dynamic predictions may be estimated using smethod(onestep) or smethod(smooth). Using an if or in qualifier to alter the prediction sample can change the estimate of the unobserved states in the period prior to beginning the dynamic predictions and hence alter the dynamic predictions. The initial states are estimated using e(b) and the prediction sample. References De Jong, P. 1988. The likelihood for a state space model. Biometrika 75: 165–169. . 1989. Smoothing and interpolation with the state-space model. Journal of the American Statistical Association 84: 1085–1088. . 1991. The diffuse Kalman filter. Annals of Statistics 19: 1073–1083. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Also see [TS] sspace — State-space models [TS] dfactor — Dynamic-factor models [TS] dfactor postestimation — Postestimation tools for dfactor [U] 20 Estimation and postestimation commands Title tsappend — Add observations to a time-series dataset Description Options Quick start Remarks and examples Menu Stored results Syntax Also see Description tsappend appends observations to a time-series dataset or to a panel dataset. tsappend uses and updates the information set by tsset or xtset. Any gaps in the dataset are removed. Quick start Add 10 time periods to tsset data tsappend, add(10) Incorporate additional months to data up to the third month of 1999 tsappend, last(1999m3) tsfmt(tm) Add 2 time periods to the panel identified by pvar = 333 after xtset pvar tvar tsappend, add(2) panel(333) Menu Statistics > Time series > Setup and utilities > Add observations to time-series dataset 553 554 tsappend — Add observations to a time-series dataset Syntax tsappend ,  add(#) | last(date | clock) tsfmt(string) add(#) last(date | clock) ∗ tsfmt(string) panel(panel id) ∗ options  Description options ∗  add # observations add observations at date or clock use time-series function string with last(date | clock) add observations to panel panel id ∗ Either add(#) is required, or last(date | clock) and tsfmt(string) are required. You must tsset or xtset your data before using tsappend; see [TS] tsset and [XT] xtset. Options add(#) specifies the number of observations to add. last(date | clock) and tsfmt(string) must be specified together and are an alternative to add(). last(date | clock) specifies the date or the date and time of the last observation to add. tsfmt(string) specifies the name of the Stata time-series function to use in converting the date specified in last() to an integer. The function names are tc (clock), tC (Clock), td (daily), tw (weekly), tm (monthly), tq (quarterly), and th (half-yearly). For clock times, the last time added (if any) will be earlier than the time requested in last(date | clock) if last() is not a multiple of delta units from the last time in the data. For instance, you might specify last(17may2007) tsfmt(td), last(2001m1) tsfmt(tm), or last(17may2007 15:30:00) tsfmt(tc). panel(panel id) specifies that observations be added only to panels with the ID specified in panel(). Remarks and examples Remarks are presented under the following headings: Introduction Using tsappend with time-series data Using tsappend with panel data Introduction tsappend adds observations to a time-series dataset or to a panel dataset. You must tsset or xtset your data before using tsappend. tsappend simultaneously removes any gaps from the dataset. There are two ways to use tsappend: you can specify the add(#) option to request that # observations be added, or you can specify the last(date | clock) option to request that observations be appended until the date specified is reached. If you specify last(), you must also specify tsfmt(). tsfmt() specifies the Stata time-series date function that converts the date held in last() to an integer. tsappend works with time series of panel data. With panel data, tsappend adds the requested observations to all the panels, unless the panel() option is also specified. tsappend — Add observations to a time-series dataset 555 Using tsappend with time-series data tsappend can be useful for appending observations when dynamically predicting a time series. Consider an example in which tsappend adds the extra observations before dynamically predicting from an AR(1) regression: . use http://www.stata-press.com/data/r14/tsappend1 . regress y l.y Source SS df MS Number of obs F(1, 477) Model 115.349555 1 115.349555 Prob > F 461.241577 477 .966963473 R-squared Residual Adj R-squared Total 576.591132 478 1.2062576 Root MSE y Coef. Std. Err. y L1. .4493507 .0411417 _cons 11.11877 .8314581 t = = = = = = 479 119.29 0.0000 0.2001 0.1984 .98334 P>|t| [95% Conf. Interval] 10.92 0.000 .3685093 .5301921 13.37 0.000 9.484993 12.75254 . matrix b = e(b) . matrix colnames b = L.xb one . tsset time variable: t2, 1960m2 to 2000m1 delta: 1 month . tsappend, add(12) . tsset time variable: t2, 1960m2 to 2001m1 delta: 1 month . predict xb if t2<=tm(2000m2) (option xb assumed; fitted values) (12 missing values generated) . generate one=1 . matrix score xb=b if t2>=tm(2000m2), replace The calls to tsset before and after tsappend were made without a time variable; thus both commands display how the data are currently tsset. The results from the first tsset command show that we have monthly data and that our time variable, t2, starts at 1960m2 and ends at 2000m1. tsappend with the add(12) option used these results to add 12 months to the dataset. The results of the second tsset command show that this new year of data has been added, as shown by the end year now being 2001m1. We could have skipped these calls to tsset, but they are shown here to illustrate how tsappend uses and updates time-series settings of the dataset. We then used predict and matrix score to obtain the dynamic predictions, which allows us to produce the following graph: 556 tsappend — Add observations to a time-series dataset 18 19 20 21 22 23 . line y xb t2 if t2>=tm(1995m1), ytitle("") xtitle("time") 1995m1 1996m1 1997m1 1998m1 time y 1999m1 2000m1 2001m1 Fitted values In the call to tsappend, instead of saying that we wanted to add 12 observations, we could have specified that we wanted to fill in observations through the first month of 2001: . use http://www.stata-press.com/data/r14/tsappend1, clear . tsset time variable: delta: t2, 1960m2 to 2000m1 1 month . tsappend, last(2001m1) tsfmt(tm) . tsset time variable: t2, 1960m2 to 2001m1 delta: 1 month We specified the tm() function in the tsfmt() option. [FN] Date and time functions contains a list of time-series functions for translating date literals to integers. Because we have monthly data, and since [FN] Date and time functions tells us that we want to use the tm() function, we specified the tsfmt(tm) option. The following table shows the most common types of time-series data, their formats, the appropriate translation functions, and the corresponding options for tsappend: Description time time daily weekly monthly quarterly half-yearly yearly Format %tc %tC %td %tw %tm %tq %th %ty Function tc() tC() td() tw() tm() tq() th() ty() Option tsfmt(tc) tsfmt(tC) tsfmt(td) tsfmt(tw) tsfmt(tm) tsfmt(tq) tsfmt(th) tsfmt(ty) tsappend — Add observations to a time-series dataset 557 Using tsappend with panel data tsappend’s actions on panel data are similar to its action on time-series data, except that tsappend performs those actions on each time series within the panels. To work within panels, a panel variable must have been specified with tsset or xtset. It does not matter which command you use; the two are equivalent. If the end dates vary over panels, last() and add() will produce different results. add(#) always adds # observations to each panel. If the data end at different periods before tsappend, add() is used, the data will still end at different periods after tsappend, add(). In contrast, tsappend, last() tsfmt() will cause all the panels to end on the specified last date. If the beginning dates differ across panels, using tsappend, last() tsfmt() to provide a uniform ending date will not create balanced panels because the number of observations per panel will still differ. Consider the panel data summarized in the output below: . use http://www.stata-press.com/data/r14/tsappend3, clear . xtdescribe id: 1, 2, ..., 3 n = t2: 1998m1, 1998m2, ..., 2000m1 T = Delta(t2) = 1 month Span(t2) = 25 periods (id*t2 uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 13 13 13 20 24 Freq. Percent Cum. Pattern 1 1 1 33.33 33.33 33.33 33.33 66.67 100.00 3 100.00 . by id: summarize t2 3 25 95% 24 max 24 ............1111111111111 1111.11111111111111111111 11111111111111111111..... XXXXXXXXXXXXXXXXXXXXXXXXX -> id = 1 Variable Obs Mean Std. Dev. Min Max t2 13 474 3.89444 468 480 -> id = 2 Variable Obs Mean Std. Dev. Min Max t2 20 465.5 5.91608 456 475 -> id = 3 Variable Obs Mean Std. Dev. Min Max t2 24 468.3333 7.322786 456 480 The output from xtdescribe and summarize on these data tells us that one panel starts later than the other, that another panel ends before the other two, and that the remaining panel has a gap in the time variable but otherwise spans the entire time frame. 558 tsappend — Add observations to a time-series dataset Now consider the data after a call to tsappend, add(6): . tsappend, add(6) . xtdescribe id: 1, 2, ..., 3 t2: 1998m1, 1998m2, ..., 2000m7 Delta(t2) = 1 month Span(t2) = 31 periods (id*t2 uniquely identifies each observation) Distribution of T_i: Percent min 19 Cum. 1 1 1 33.33 33.33 33.33 33.33 66.67 100.00 3 100.00 Freq. 5% 25% 19 19 Pattern 50% 26 n = T = 75% 31 3 31 95% 31 max 31 ............1111111111111111111 11111111111111111111111111..... 1111111111111111111111111111111 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX . by id: summarize t2 -> id = 1 Variable Obs Mean Std. Dev. Min Max t2 19 477 5.627314 468 486 -> id = 2 Variable Obs Mean Std. Dev. Min Max t2 26 468.5 7.648529 456 481 -> id = 3 Variable Obs Mean Std. Dev. Min Max t2 31 471 9.092121 456 486 This output from xtdescribe and summarize after the call to tsappend shows that the call to tsappend, add(6) added 6 observations to each panel and filled in the gap in the time variable in the second panel. tsappend, add() did not cause a uniform end date over the panels. The following output illustrates the contrast between tsappend, add() and tsappend, last() tsfmt() with panel data that end at different dates. The output from xtdescribe and summarize shows that the call to tsappend, last() tsfmt() filled in the gap in t2 and caused all the panels to end at the specified end date. The output also shows that the panels remain unbalanced because one panel has a later entry date than the other two. tsappend — Add observations to a time-series dataset . use http://www.stata-press.com/data/r14/tsappend2, clear . tsappend, last(2000m7) tsfmt(tm) . xtdescribe id: 1, 2, ..., 3 n = t2: 1998m1, 1998m2, ..., 2000m7 T = Delta(t2) = 1 month Span(t2) = 31 periods (id*t2 uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 19 19 19 31 31 Pattern Freq. Percent Cum. 2 1 66.67 33.33 66.67 100.00 3 100.00 . by id: summarize t2 3 31 95% 31 1111111111111111111111111111111 ............1111111111111111111 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -> id = 1 Variable Obs Mean Std. Dev. Min Max t2 19 477 5.627314 468 486 Variable Obs Mean Std. Dev. Min Max t2 31 471 9.092121 456 486 -> id = 3 Variable Obs Mean Std. Dev. Min Max t2 31 471 9.092121 456 486 -> id = 2 Stored results tsappend stores the following in r(): Scalars r(add) number of observations added Also see [TS] tsset — Declare data to be time-series data [XT] xtset — Declare data to be panel data max 31 559 Title tsfill — Fill in gaps in time variable Description Option Quick start Remarks and examples Menu Also see Syntax Description tsfill is used to fill in gaps in time-series data and gaps in panel data with new observations, which contain missing values. tsfill is not needed to obtain correct lags, leads, and differences when gaps exist in a series because Stata’s time-series operators handle gaps automatically. Quick start Add new observations with missing values for missing time periods in a time-series dataset that has been tsset tsfill Add new observations with missing values to eliminate gaps in a panel dataset that has been xtset tsfill As above, but making the panel strongly balanced tsfill, full Menu Statistics > Time series > Setup and utilities > Fill in gaps in time variable 560 tsfill — Fill in gaps in time variable 561 Syntax tsfill  , full  You must tsset or xtset your data before using tsfill; see [TS] tsset and [XT] xtset. Option full is for use with panel data only. With panel data, tsfill by default fills in observations for each panel according to the minimum and maximum values of timevar for the panel. Thus if the first panel spanned the times 5–20 and the second panel the times 1–15, after tsfill they would still span the same periods; observations would be created to fill in any missing times from 5–20 in the first panel and from 1–15 in the second. If full is specified, observations are created so that both panels span the time 1–20, the overall minimum and maximum of timevar across panels. Remarks and examples Remarks are presented under the following headings: Introduction Using tsfill with time-series data Using tsfill with panel data Video example Introduction tsfill is used after tsset or xtset to fill gaps in time-series data and gaps in panel data with new observations. Each new observation contains the appropriate values of the time variable, timevar, and, when specified, the panel variable, panelvar, and missing values for all other variables in the dataset. For instance, perhaps observations for timevar = 1, 3, 5, 6, . . . , 22 exist. tsfill would create observations for timevar = 2 and timevar = 4 containing all missing values. tsfill is intended as an intermediate step in a data management process. For example, you may wish to use tsfill with time-series data if you plan to interpolate missing values or with panel data if you intend to impute missing values. You do not need to use tsfill to correctly create variables with lags, leads, and differencing, because Stata’s time-series operators handle gaps in the series for you; see [U] 11.4.4 Time-series varlists. These operators consider timevar, not the observation number. For example, suppose we have data on GNP in the years 1989–1991 and 1993–1995. Referring to L.gnp to obtain lagged gnp values would correctly produce a missing value of lagged gnp for timevar = 1989 and timevar = 1993 even if missing values were not explicitly created using tsfill. 562 tsfill — Fill in gaps in time variable Using tsfill with time-series data You have monthly data, with gaps: . use http://www.stata-press.com/data/r14/tsfillxmpl . tsset time variable: mdate, 1995m7 to 1996m3, but with gaps delta: 1 month . list mdate income mdate income 1. 2. 3. 4. 5. 1995m7 1995m8 1995m11 1995m12 1996m1 1153 1181 1236 1297 1265 6. 1996m3 1282 You can fill in the gaps by interpolation easily with tsfill and ipolate. tsfill creates the missing observations: . tsfill . list mdate income mdate income 1. 2. 3. 4. 5. 1995m7 1995m8 1995m9 1995m10 1995m11 1153 1181 . . 1236 6. 7. 8. 9. 1995m12 1996m1 1996m2 1996m3 1297 1265 . 1282 ← new ← new ← new We can now use ipolate (see [D] ipolate) to fill them in: . ipolate income mdate, gen(ipinc) . list mdate income ipinc mdate income ipinc 1. 2. 3. 4. 5. 1995m7 1995m8 1995m9 1995m10 1995m11 1153 1181 . . 1236 1153 1181 1199.3333 1217.6667 1236 6. 7. 8. 9. 1995m12 1996m1 1996m2 1996m3 1297 1265 . 1282 1297 1265 1273.5 1282 tsfill — Fill in gaps in time variable 563 Using tsfill with panel data You have the following panel dataset: . use http://www.stata-press.com/data/r14/tsfillxmpl2, clear . tsset panel variable: edlevel (unbalanced) time variable: year, 1988 to 1992, but with a gap delta: 1 unit . list edlevel year income edlevel year income 1. 2. 3. 4. 5. 1 1 1 1 2 1988 1989 1990 1991 1989 14500 14750 14950 15100 22100 6. 7. 2 2 1990 1992 22200 22800 Just as with nonpanel time-series datasets, you can use tsfill to fill in the gaps within each panel: . tsfill . list edlevel year income edlevel year income 1. 2. 3. 4. 5. 1 1 1 1 2 1988 1989 1990 1991 1989 14500 14750 14950 15100 22100 6. 7. 8. 2 2 2 1990 1991 1992 22200 . 22800 ← new You could instead use tsfill to produce fully balanced panels with the full option: . tsfill, full . list edlevel year income, sep(0) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. edlevel year income 1 1 1 1 1 2 2 2 2 2 1988 1989 1990 1991 1992 1988 1989 1990 1991 1992 14500 14750 14950 15100 . . 22100 22200 . 22800 ← new ← new ← new 564 tsfill — Fill in gaps in time variable Video example Time series, part 1: Formatting dates, tsset, tsreport, and tsfill Also see [TS] tsappend — Add observations to a time-series dataset [TS] tsset — Declare data to be time-series data [XT] xtset — Declare data to be panel data Title tsfilter — Filter a time-series, keeping only selected periodicities Description Acknowledgments Syntax References Remarks and examples Also see Methods and formulas Description tsfilter separates a time series into trend and cyclical components. The trend component may contain a deterministic or a stochastic trend. The stationary cyclical component is driven by stochastic cycles at the specified periods. Syntax Filter one variable tsfilter filter  type  newvar = varname  if   in   , options  Filter multiple variables, unique names         tsfilter filter type newvarlist = varlist if in , options Filter multiple variables, common name stub         tsfilter filter type stub* = varlist if in , options filter Name See bk bw cf hp Baxter–King Butterworth Christiano–Fitzgerald Hodrick–Prescott [TS] [TS] [TS] [TS] tsfilter tsfilter tsfilter tsfilter bk bw cf hp You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset. varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. options differ across the filters and are documented in each filter’s manual entry. Remarks and examples The time-series filters implemented in tsfilter separate a time-series yt into trend and cyclical components: yt = τt + ct where τt is the trend component and ct is the cyclical component. τt may be nonstationary; it may contain a deterministic or a stochastic trend, as discussed below. The primary objective of the methods implemented in tsfilter is to estimate ct , a stationary cyclical component that is driven by stochastic cycles within a specified range of periods. The trend component τt is calculated by the difference τt = yt − ct . 565 566 tsfilter — Filter a time-series, keeping only selected periodicities Although the filters implemented in tsfilter have been widely applied by macroeconomists, they are general time-series methods and may be of interest to other researchers. Remarks are presented under the following headings: An example dataset A baseline method: Symmetric moving-average (SMA) filters An overview of filtering in the frequency domain SMA revisited: The Baxter–King filter Filtering a random walk: The Christiano–Fitzgerald filter A one-parameter high-pass filter: The Hodrick–Prescott filter A two-parameter high-pass filter: The Butterworth filter An example dataset Time series are frequently filtered to remove unwanted characteristics, such as trends and seasonal components, or to estimate components driven by stochastic cycles from a specific range of periods. Although the filters implemented in tsfilter can be used for both purposes, their primary purpose is the latter, and we restrict our discussion to that use. We explain the methods implemented in tsfilter by estimating the business-cycle component of a macroeconomic variable, because they are frequently used for this purpose. We estimate the business-cycle component of the natural log of an index of the industrial production of the United States, which is plotted below. Example 1: A trending time series 1 log of industrial production 2 3 4 5 . use http://www.stata-press.com/data/r14/ipq (Federal Reserve Economic Data, St. Louis Fed) . tsline ip_ln 1920q1 1930q1 1940q1 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1 quarterly time variable The above graph shows that ip ln contains a trend component. Time series may contain deterministic trends or stochastic trends. A polynomial function of time is the most common deterministic time trend. An integrated process is the most common stochastic trend. An integrated process is a random variable that must be differenced one or more times to be stationary; see Hamilton (1994) for a discussion. The different filters implemented in tsfilter allow for different orders of deterministic time trends or integrated processes. tsfilter — Filter a time-series, keeping only selected periodicities 567 We now illustrate the four methods implemented in tsfilter, each of which will remove the trend and estimate the business-cycle component. Burns and Mitchell (1946) defined oscillations in business data with recurring periods between 1.5 and 8 years to be business-cycle fluctuations; we use their commonly accepted definition. A baseline method: Symmetric moving-average (SMA) filters Symmetric moving-average (SMA) filters form a baseline method for estimating a cyclical component because of their properties and simplicity. An SMA filter of a time series yt , t ∈ {1, . . . , T }, is the data transform defined by yt∗ = q X αj yt−j j=−q for each t ∈ {q + 1, . . . , T − q}, where α−j = αj for j ∈ {−q, . . . , q}. Although the original series has T observations, the filtered series has only T − 2q , where q is known as the order of the SMA filter. SMA filters with weights that sum to zero remove deterministic and stochastic trends of order 2 or less, as shown by Fuller (1996) and Baxter and King (1999). Example 2: A trend-removing SMA filter This trend-removal property of SMA filters with coefficients that sum to zero may surprise some readers. For illustration purposes, we filter ip ln by the filter −0.2ip lnt−2 − 0.2ip lnt−1 + 0.8ip lnt − 0.2ip lnt+1 − 0.2ip lnt+2 and plot the filtered series. We do not even need tsfilter to implement this second-order SMA filter; we can use generate. . generate ip_sma = -.2*L2.ip_ln-.2*L.ip_ln+.8*ip_ln-.2*F.ip_ln-.2*F2.ip_ln (4 missing values generated) −.2 −.1 ip_sma 0 .1 .2 . tsline ip_sma 1920q1 1930q1 1940q1 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1 quarterly time variable 568 tsfilter — Filter a time-series, keeping only selected periodicities The filter has removed the trend. There is no good reason why we chose that particular SMA filter. Baxter and King (1999) derived a class of SMA filters with coefficients that sum to zero and get as close as possible to keeping only the specified cyclical component. An overview of filtering in the frequency domain We need some concepts from the frequency-domain approach to time-series analysis to motivate how Baxter and King (1999) defined “as close as possible”. These concepts also motivate the other filters in tsfilter. The intuitive explanation presented here glosses over many technical details discussed by Priestley (1981), Hamilton (1994), Fuller (1996), and Wei (2006). As with much time-series analysis, the basic results are for covariance-stationary processes with additional results handling some nonstationary cases. We present some useful results for covariancestationary processes and discuss how to handle nonstationary series below. The autocovariances γj , j ∈ {0, 1, . . . , ∞}, of a covariance-stationary process yt specify its variance and dependence structure. In the frequency-domain approach to time-series analysis, yt and the autocovariances are specified in terms of independent stochastic cycles that occur at frequencies ω ∈ [−π, π]. The spectral density function fy (ω) specifies the contribution of stochastic cycles at each frequency ω relative to the variance of yt , which is denoted by σy2 . The variance and the autocovariances can be expressed as an integral of the spectral density function. Formally, Z π eiωj fy (ω)dω γj = (1) −π where i is the imaginary number i = √ −1. Equation (1) can be manipulated to show what fraction of the variance of yt is attributable to stochastic cycles in a specified range of frequencies. Hamilton (1994, 156) discusses this point in more detail. Equation (1) implies that if fy (ω) = 0 for ω ∈ [ω1 , ω2 ], then stochastic cycles at these frequencies contribute zero to the variance and autocovariances of yt . The goal of time-series filters is to transform the original series into a new series yt∗ for which the spectral density function of the filtered series fy∗ (ω) is zero for unwanted frequencies and equal to fy (ω) for desired frequencies. A linear filter of yt can be written as yt∗ = ∞ X αj yt−j = α(L)yt j=−∞ where we let yt be an infinitely long series as required by some of the results below. To see the impact of the filter on the components of yt at each frequency ω , we need an expression for fy∗ (ω) in terms of fy (ω) and the filter weights αj . Wei (2006, 282) shows that for each ω , fy∗ (ω) = |α(eiω )|2 fy (ω) (2) tsfilter — Filter a time-series, keeping only selected periodicities 569 where |α(eiω )| is known as the gain of the filter. Equation (2) makes explicit that the squared gain function |a(eiω )|2 converts the spectral density of the original series, fy (ω), into the spectral density of the filtered series, fy∗ (ω). In particular, (2) says that for each frequency ω , the spectral density of the filtered series is the product of the square of the gain of the filter and the spectral density of the original series. As we will see in the examples below, the gain function provides a crucial interpretation of what a filter is doing. We want a filter for which fy∗ (ω) = 0 for unwanted frequencies and for which fy∗ (ω) = fy (ω) for desired frequencies. So we seek a filter for which the gain is 0 for unwanted frequencies and for which the gain is 1 for desired frequencies. In practice, we cannot find such an ideal filter exactly, because the constraints an ideal filter places on filter coefficients cannot be satisfied for time series with only a finite number of observations. The expansive literature on filters is a result of the trade-offs involved in designing implementable filters that approximate the ideal filter. Ideally, filters pass or block the stochastic cycles at specified frequencies by having a gain of 1 or 0. Band-pass filters, such as the Baxter–King (BK) and the Christiano–Fitzgerald (CF) filters, pass through stochastic cycles in the specified range of frequencies and block all the other stochastic cycles. High-pass filters, such as the Hodrick–Prescott (HP) and Butterworth filters, only allow the stochastic cycles at or above a specified frequency to pass through and block the lower-frequency stochastic cycles. For band-pass filters, let [ω0 , ω1 ] be the set of desired frequencies with all other frequencies being undesired. For high-pass filters, let ω0 be the cutoff frequency with only those frequencies ω ≥ ω0 being desired. SMA revisited: The Baxter–King filter We now return to the class of SMA filters with coefficients that sum to zero and get as close as possible to keeping only the specified cyclical component as derived by Baxter and King (1999). For an infinitely long series, there is an ideal band-pass filter for which the gain function is 1 for ω ∈ [ω0 , ω1 ] and 0 for all other frequencies. It just so happens that this ideal band-pass filter is an SMA filter with coefficients that sum to zero. Baxter and King (1999) derive the coefficients of this ideal band-pass filter and then define the BK filter to be the SMA filter with 2q + 1 terms that are as close as possible to those of the ideal filter. There is a trade-off in choosing q : larger values of q cause the gain of the BK filter to be closer to the gain of the ideal filter, but larger values also increase the number of missing observations in the filtered series. Although the mathematics of the frequency-domain approach to time-series analysis is in terms of stochastic cycles at frequencies ω ∈ [−π, π], applied work is generally in terms of periods p, where p = 2π/ω . So the options for the tsfilter subcommands are in terms of periods. Example 3: A BK estimate of the business-cycle component Below we use tsfilter bk, which implements the BK filter, to estimate the business-cycle component composed of stochastic cycles between 6 and 32 periods, and then we graph the estimated component. 570 tsfilter — Filter a time-series, keeping only selected periodicities −.3 ip_ln cyclical component from bk filter −.2 −.1 0 .1 .2 . tsfilter bk ip_bk = ip_ln, minperiod(6) maxperiod(32) . tsline ip_bk 1920q1 1930q1 1940q1 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1 quarterly time variable The above graph tells us what the estimated business-cycle component looks like, but it presents no evidence as to how well we have estimated the component. A periodogram is better for this purpose. A periodogram is an estimator of a transform of the spectral density function; see [TS] pergram for details. Below we plot the periodogram for the BK estimate of the business-cycle component. pergram displays the results in natural frequencies, which are the standard frequencies divided by 2π . We use the xline() option to draw vertical lines at the lower natural-frequency cutoff (1/32 = 0.03125) and the upper natural-frequency cutoff (1/6 ≈ 0.16667). 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 ip_ln cyclical component from bk filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . pergram ip_bk, xline(0.03125 0.16667) Evaluated at the natural frequencies If the filter completely removed the stochastic cycles corresponding to the unwanted frequencies, the periodogram would be a flat line at the minimum value of −6 outside the range identified by the vertical lines. That the periodogram takes on values greater than −6 outside the specified range indicates the inability of the BK filter to pass through only stochastic cycles at frequencies inside the specified band. tsfilter — Filter a time-series, keeping only selected periodicities 571 We can also evaluate the BK filter by plotting its gain function against the gain function of an ideal filter. In the output below, we reestimate the business-cycle component to store the gain of the BK filter for the specified parameters. (The coefficients and the gain of the BK filter are completely determined by the specified minimum period, the maximum period, and the order of the SMA filter.) We label the variable bkgain for the graph below. . drop ip_bk . tsfilter bk ip_bk = ip_ln, minperiod(6) maxperiod(32) gain(bkgain abk) . label variable bkgain "BK filter" Below we generate ideal, the gain function of the ideal band-pass filter at the frequencies f. Then we plot the gain of the ideal filter and the gain of the BK filter. . generate f = _pi*(_n-1)/_N . generate ideal = cond(f<_pi/16, 0, cond(f<_pi/3, 1,0)) . label variable ideal "Ideal filter" 0 .5 1 . twoway line ideal f || line bkgain abk 0 1 2 Ideal filter 3 BK filter The graph reveals that the gain of the BK filter deviates markedly from the square-wave gain of the ideal filter. Increasing the symmetric moving average via the smaorder() option will cause the gain of the BK filter to more closely approximate the gain of the ideal filter at the cost of lost observations in the filtered series. Filtering a random walk: The Christiano–Fitzgerald filter Although Baxter and King (1999) minimized the error between the coefficients in their filter and the ideal band-pass filter, Christiano and Fitzgerald (2003) minimized the mean squared error between the estimated component and the true component, assuming that the raw series is a random-walk process. Christiano and Fitzgerald (2003) give three important reasons for using their filter: 1. The true dependence structure of the data affects which filter is optimal. 2. Many economic time series are well approximated by random-walk processes. 572 tsfilter — Filter a time-series, keeping only selected periodicities 3. Their filter does a good job passing through stochastic cycles of desired frequencies and blocking stochastic cycles from unwanted frequencies on a range of processes that are close to being a random-walk process. The CF filter obtains its optimality properties at the cost of an additional parameter that must be estimated and a loss of robustness. The CF filter is optimal for a random-walk process. If the true process is a random walk with drift, then the drift term must be estimated and removed; see [TS] tsfilter cf for details. The CF filter is not symmetric, so it will not remove second-order deterministic or second-order integrated processes. tsfilter cf also implements another filter that Christiano and Fitzgerald (2003) derived that is an SMA filter with coefficients that sum to zero. This filter is designed to be as close as possible to the random-walk optimal filter under the constraint that it be an SMA filter with constraints that sum to zero; see [TS] tsfilter cf for details. Technical note A random-walk process is a first-order integrated process; it must be differenced once to produce a stationary process. Formally, a random-walk process is given by yt = yt−1 + t , where t is a zeromean stationary random variable. A random-walk-plus-drift process is given by yet = µ + yet−1 + t , where t is a zero-mean stationary random variable. Example 4: A CF estimate of the business-cycle component In this example, we use the CF filter to estimate the business-cycle component, and we plot the periodogram of the CF estimates. We specify the drift option because ip ln is well approximated by a random-walk-plus-drift process. . tsfilter cf ip_cf = ip_ln, minperiod(6) maxperiod(32) drift 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 ip_ln cyclical component from cf filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . pergram ip_cf, xline(0.03125 0.16667) Evaluated at the natural frequencies The periodogram of the CF estimates of the business-cycle component indicates that the CF filter did a better job than the BK filter of passing through only the desired stochastic cycles. Given that ip ln is well approximated by a random-walk-plus-drift process, the relative performance of the CF filter is not surprising. tsfilter — Filter a time-series, keeping only selected periodicities 573 As with the BK filter, plotting the gain of the CF filter and the gain of the ideal filter gives an impression of how well the filter isolates the specified components. In the output below, we reestimate the business-cycle component, using the gain() option to store the gain of the CF filter, and we plot the gain functions. . drop ip_cf 0 .5 1 1.5 . tsfilter cf ip_cf = ip_ln, minperiod(6) maxperiod(32) drift gain(cfgain acf) . label variable cfgain "CF filter" . twoway line ideal f || line cfgain acf 0 1 2 Ideal filter 3 CF filter Comparing this graph with the graph of the BK gain function reveals that the CF filter is closer to the gain of the ideal filter than is the BK filter. The graph also reveals that the gain of the CF filter oscillates above and below 1 for desired frequencies. The choice between the BK or the CF filter is one between robustness or efficiency. The BK filter handles a broader class of stochastic processes, but the CF filter produces a better estimate of ct if yt is close to a random-walk process or a random-walk-plus-drift process. A one-parameter high-pass filter: The Hodrick–Prescott filter Hodrick and Prescott (1997) motivated the Hodrick–Prescott (HP) filter as a trend-removal technique that could be applied to data that came from a wide class of data-generating processes. In their view, the technique specified a trend in the data, and the data were filtered by removing the trend. The smoothness of the trend depends on a parameter λ. The trend becomes smoother as λ → ∞. Hodrick and Prescott (1997) recommended setting λ to 1,600 for quarterly data. King and Rebelo (1993) showed that removing a trend estimated by the HP filter is equivalent to a high-pass filter. They derived the gain function of this high-pass filter and showed that the filter would make integrated processes of order 4 or less stationary, making the HP filter comparable with the band-pass filters discussed above. 574 tsfilter — Filter a time-series, keeping only selected periodicities Example 5: An HP estimate of the business-cycle component We begin by applying the HP high-pass filter to ip ln and plotting the periodogram of the estimated business-cycle component. We specify the gain() option because will use the gain of the filter in the next example. 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 ip_ln cyclical component from hp filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . tsfilter hp ip_hp = ip_ln, gain(hpg1600 ahp1600) . label variable hpg1600 "HP(1600) filter" . pergram ip_hp, xline(0.03125) Evaluated at the natural frequencies Because the HP filter is a high-pass filter, the high-frequency stochastic cycles corresponding to those periods below 6 remain in the estimated component. Of more concern is the presence of the low-frequency stochastic cycles that the filter should remove. We address this issue in the example below. Example 6: Choosing the parameters for the HP filter Hodrick and Prescott (1997) argued that the smoothing parameter λ should be 1,600 on the basis of a heuristic argument that specified values for the variance of the cyclical component and the variance of the second difference of the trend component, both recorded at quarterly frequencies. In this example, we choose the smoothing parameter to be 677.13, which sets the gain of the filter to 0.5 at the frequency corresponding to 32 periods, as explained in the technical note below. We then plot the periodogram of the filtered series. tsfilter — Filter a time-series, keeping only selected periodicities gain(hpg677 ahp677) 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 ip_ln cyclical component from hp filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . tsfilter hp ip_hp2 = ip_ln, smooth(677.13) . label variable hpg677 "HP(677) filter" . pergram ip_hp, xline(0.03125) 575 Evaluated at the natural frequencies Although the periodogram looks better than the periodogram with the default smoothing, the HP filter still did not zero out the low-frequency stochastic cycles as well as the CF filter did. We take another look at this issue by plotting the gain functions for these filters along with the gain function from the ideal band-pass filter. 0 .2 .4 .6 .8 1 . twoway line ideal f || line hpg677 ahp677 0 1 2 Ideal filter HP(677) filter 3 Comparing the gain graphs reveals that the gain of the CF filter is closest to the gain of the ideal filter. Both the BK and the HP filters allow some low-frequency stochastic cycles to pass through. The plot also illustrates that the HP filter is a high-pass filter because its gain is 1 for those stochastic cycles at frequencies above 6 periods, whereas the other gain functions go to zero. 576 tsfilter — Filter a time-series, keeping only selected periodicities Technical note Conventionally, economists have used λ = 1600, which Hodrick and Prescott (1997) recommended for quarterly data. Ravn and Uhlig (2002) derived values for λ at monthly and annual frequencies that are rescalings of the conventional λ = 1600 for quarterly data. These heuristic values are the default values; see [TS] tsfilter hp for details. In the filter literature, filter parameters are set as functions of the cutoff frequency; see Pollock (2000, 324), for instance. This method finds the filter parameter that sets the gain of the filter equal to 1/2 at the cutoff frequency. Applying this method to selecting λ at the cutoff frequency of 32 periods requires solving 2 1/2 = 4λ {1 − cos(2π/32)} 2 1 + 4λ {1 − cos(2π/32)} for λ, which yields λ ≈ 677.13, which was used in the previous example. 0 .2 .4 .6 .8 1 The gain function of the HP filter is a function of the parameter λ, and λ sets both the location of the cutoff frequency and the slope of the gain function. The graph below illustrates this dependence by plotting the gain function of the HP filter for λ set to 10, 677.13, and 1,600 along with the gain function for the ideal band-pass filter with cutoff periods of 32 periods and 6 periods. 0 1 Ideal filter HP(677) filter 2 3 HP(10) filter HP(1600) filter A two-parameter high-pass filter: The Butterworth filter Engineers have used Butterworth filters for a long time because they are “maximally flat”. The gain functions of these filters are as close as possible to being a flat line at 0 for the unwanted periods and a flat line at 1 for the desired periods; see Butterworth (1930) and Bianchi and Sorrentino (2007, 17–20). Pollock (2000) showed that Butterworth filters can be derived from some axioms that specify properties we would like a filter to have. Although the Butterworth and BK filters share the properties of symmetry and phase neutrality, the coefficients of Butterworth filters do not need to sum to zero. (Phase-neutral filters do not shift the signal forward or backward in time; see Pollock [1999].) Although the BK filter relies on the detrending properties of SMA filters with coefficients that sum to zero, Pollock (2000) shows that Butterworth filters have detrending properties that depend on the filters’ parameters. tsfilter — Filter a time-series, keeping only selected periodicities 577 tsfilter bw implements the high-pass Butterworth filter using the computational method that Pollock (2000) derived. This filter has two parameters: the cutoff period and the order of the filter denoted by m. The cutoff period sets the location where the gain function starts to filter out the high-period (low-frequency) stochastic cycles, and m sets the slope of the gain function for a given cutoff period. For a given cutoff period, the slope of the gain function at the cutoff period increases with m. For a given m, the slope of the gain function at the cutoff period increases with the cutoff period. We cannot obtain a vertical slope at the cutoff frequency, which is the ideal, because the computation becomes unstable; see Pollock (2000). The m for which the computation becomes unstable depends on the cutoff period. Pollock (2000) and Gómez (1999) argue that the additional flexibility produced by the additional parameter makes the high-pass Butterworth filter a better filter than the HP filter for estimating the cyclical components. Pollock (2000) shows that the high-pass Butterworth filter can estimate the desired components of the dth difference of a dth-order integrated process as long as m ≥ d. Example 7: A Butterworth filter that removes low-frequency components Below we use tsfilter bw to estimate the components driven by stochastic cycles greater than 32 periods using Butterworth filters of order 2 and order 6. We also compute, label, and plot the gain functions for each filter. tsfilter bw ip_bw1 = ip_ln, gain(bwgain1 abw1) maxperiod(32) order(2) label variable bwgain1 "BW 2" tsfilter bw ip_bw6 = ip_ln, gain(bwgain6 abw6) maxperiod(32) order(6) label variable bwgain6 "BW 6" twoway line ideal f || line bwgain1 abw1 || line bwgain6 abw6 0 .2 .4 .6 .8 1 . . . . . 0 1 2 Ideal filter BW 6 3 BW 2 The graph illustrates that the slope of the gain function increases with the order of the filter. The graph below provides another perspective by plotting the gain function from the ideal band-pass filter on a graph with plots of the gain functions from the Butterworth filter of order 6, the CF filter, and the HP(677) filter. 578 tsfilter — Filter a time-series, keeping only selected periodicities 0 .25 .5 .75 1 1.25 . twoway line ideal f || line bwgain6 abw6 || line cfgain acf > || line hpg677 ahp677 0 1 2 Ideal filter CF filter BW 6 HP(677) filter 3 Although the slope of the gain function from the CF filter is closer to being vertical at the cutoff frequency, the gain function of the Butterworth filter does not oscillate above and below 1 after it first reaches the value of 1. The flatness of the Butterworth filter below and above the cutoff frequency is not an accident; it is one of the filter’s properties. Example 8: A Butterworth filter that removes high-frequency components In the previous example, we used the Butterworth filter of order 6 to remove low-frequency stochastic cycles, and we saved the results in ip bw6. The Butterworth filter did not address the high-frequency stochastic cycles below 6 periods because it is a high-pass filter. We remove those high-frequency stochastic cycles in this example by keeping the trend produced by refiltering the previously filtered series. This example uses a common trick: keeping the trend produced by a high-pass filter turns that high-pass filter into a low-pass filter. Because we want to remove the high-frequency stochastic cycles still in the previously filtered series ip bw6, we need a low-pass filter. So we keep the trend produced by refiltering the previously filtered series. In the output below, we apply a Butterworth filter of order 20 to the previously filtered series ip bw6. We explain why we used order 20 in the next example. We specify the trend() option to keep the low-frequency components from these filters. Then we compute and graph the periodogram for the trend variable. tsfilter — Filter a time-series, keeping only selected periodicities tsfilter bw ip_bwu20 = ip_bw6, gain(bwg20 fbw20) maxperiod(6) order(20) trend(ip_bwb) label variable bwg20 "BW upper filter 20" pergram ip_bwb, xline(0.03125 0.16667) 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 ip_bw6 trend component from bw filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . > . . 579 Evaluated at the natural frequencies The periodogram reveals that the two-pass process has passed the original series ip ln through a band-pass filter. It also reveals that the two-pass process did a reasonable job of filtering out the stochastic cycles corresponding to the unwanted frequencies. Example 9: Choosing the order of a Butterworth filter In the previous example, when the cutoff period was 6, we set the order of the Butterworth filter to 20. In contrast, in example 7, when the cutoff period was 32, we set the order of the Butterworth filter to 6. We had to increase filter order because the slope of the gain function of the Butterworth filter is increasing with the cutoff period. We needed a larger filter order to get an acceptable slope at the lower cutoff period. We illustrate this point in the output below. We apply Butterworth filters of orders 1 and 6 to the previously filtered series ip bw6, we compute the gain functions, we label the gain variables, and then we plot the gain functions from the ideal filter and the Butterworth filters. 580 tsfilter — Filter a time-series, keeping only selected periodicities tsfilter bw ip_bwu1 = ip_bw6, gain(bwg1 fbw1) maxperiod(6) order(2) label variable bwg1 "BW upper filter 2" tsfilter bw ip_bwu6 = ip_bw6, gain(bwg6 fbw6) maxperiod(6) order(6) label variable bwg6 "BW upper filter 6" twoway line ideal f || line bwg1 fbw1 || line bwg6 fbw6 || line bwg20 fbw20 0 .2 .4 .6 .8 1 . . . . . 0 1 Ideal filter BW upper filter 6 2 3 BW upper filter 2 BW upper filter 20 Because the cutoff period is 6, the gain functions for m = 2 and m = 6 are much flatter than the gain functions for m = 2 and m = 6 in example 7 when the cutoff period was 32. The gain function for m = 20 is reasonably close to vertical, so we used it in example 8. We mentioned above that for any given cutoff period, the computation eventually becomes unstable for larger values of m. For instance, when the cutoff period is 32, m = 20 is not numerically feasible. Example 10: Comparing the Butterworth and CF estimates As a conclusion, we plot the business-cycle components estimated by the CF filter and by the two passes of Butterworth filters. The shaded areas identify recessions. The two estimates are close but the differences could be important. Which estimate is better depends on whether the oscillations around 1 in the graph of the CF gain function (the second graph of example 7) cause more problems than the nonvertical slopes at the cutoff periods that occur in the BW6 gain function of that same graph and the BW upper filter 20 gain function graphed above. 581 −.25 0 .25 tsfilter — Filter a time-series, keeping only selected periodicities 1920q1 1930q1 1940q1 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1 quarterly time variable Butterworth filter CF filter There is a long tradition in economics of using models to estimate components. Instead of comparing filters by their gain functions, some authors compare filters by finding underlying models for which the filter parameters are the model parameters. For instance, Harvey and Jaeger (1993), Gómez (1999, 2001), Pollock (2000, 2006), and Harvey and Trimbur (2003) derive models that correspond to the HP or the Butterworth filter. Some of these references also compare components estimated by filters with components estimated by making predictions from estimated models. In effect, these references point out that arima, dfactor, sspace, and ucm (see [TS] arima, [TS] dfactor, [TS] sspace, and [TS] ucm) implement alternative methods to component estimation. Methods and formulas All filters work with both time-series data and panel data when there are many observations on each panel. When used with panel data, the calculations are performed separately within each panel. For these filters, the default minimum and maximum periods of oscillation correspond to the boundaries used by economists (Burns and Mitchell 1946) for business cycles. Burns and Mitchell defined business cycles as oscillations in business data with recurring periods between 1.5 and 8 years. Their definition continues to be cited by economists investigating correlations between business cycles. If yt is a time series, then the cyclical component is ct = B(L)yt = ∞ X bj yt−j j=−∞ where bj are the coefficients of the impulse–response sequence of some ideal filter. The impulse– response sequence is the inverse Fourier transform of either a square wave or step function depending upon whether the filter is a band-pass or high-pass filter, respectively. 582 tsfilter — Filter a time-series, keeping only selected periodicities In finite sequences, it is necessary to approximate this calculation with a finite impulse–response sequence b bj : n2 X bbj yt−j b b ct = Bt (L)yt = j=−n1 The infinite-order impulse–response sequence for the filters implemented in tsfilter are symmetric and time-invariant. In the frequency domain, the relationships between the true cyclical component and its finite estimates respectively are c(ω) = B(ω)y(ω) and b b c(ω) = B(ω)y(ω) b b. where B(ω) and B(ω) are the frequency transfer functions of the filters B and B The frequency transfer function for B(ω) can be expressed in polar form as B(ω) = |B(ω)|exp{iθ(ω)} where |B(ω)| is the filter’s gain function and θ(ω) is the filter’s phase function. The gain function determines whether the amplitude of the stochastic cycle is increased or decreased at a particular frequency. The phase function determines how a cycle at a particular frequency is shifted forward or backward in time. In this form, it can be shown that the spectrum of the cyclical component, fc (ω), is related to the spectrum of yt series by the squared gain: fc (ω) = |B(ω)|2 fy (ω) Each of the four filters in tsfilter has an option for returning an estimate of the gain function together with its associated scaled frequency a = ω/π , where 0 ≤ ω ≤ π . These are consistent estimates of |B(ω)|, the gain from the ideal linear filter. The band-pass filters implemented in tsfilter, the BK and CF filters, use a square wave as the ideal transfer function:   1 if |ω| ∈ [ωl , ωh ] B(ω) =  0 if |ω| ∈ / [ωl , ωh ] The high-pass filters, the Hodrick–Prescott and Butterworth filters, use a step function as the ideal transfer function:   1 if |ω| ≥ ωh B(ω) =  0 if |ω| < ωh Acknowledgments We thank Christopher F. Baum of the Department of Economics at Boston College and author of the Stata Press books An Introduction to Modern Econometrics Using Stata and An Introduction to Stata Programming for his previous implementations of these filters: Baxter–King (bking), Christiano– Fitzgerald (cfitzrw), Hodrick–Prescott (hprescott), and Butterworth (butterworth). tsfilter — Filter a time-series, keeping only selected periodicities 583 We also thank D. S. G. Pollock of the Department of Economics at the University of Leicester, UK, for his helpful responses to our questions about Butterworth filters and the methods that he has developed. References Baxter, M., and R. G. King. 1999. Measuring business cycles: Approximate band-pass filters for economic time series. Review of Economics and Statistics 81: 575–593. Bianchi, G., and R. Sorrentino. 2007. Electronic Filter Simulation and Design. New York: McGraw–Hill. Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic Research. Butterworth, S. 1930. On the theory of filter amplifiers. Experimental Wireless and the Wireless Engineer 7: 536–541. Christiano, L. J., and T. J. Fitzgerald. 2003. The band pass filter. International Economic Review 44: 435–465. Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley. Gómez, V. 1999. Three equivalent methods for filtering finite nonstationary time series. Journal of Business and Economic Statistics 17: 109–116. . 2001. The use of Butterworth filters for trend and cycle estimation in economic time series. Journal of Business and Economic Statistics 19: 365–373. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Harvey, A. C., and A. Jaeger. 1993. Detrending, stylized facts and the business cycle. Journal of Applied Econometrics 8: 231–247. Harvey, A. C., and T. M. Trimbur. 2003. General model-based filters for extracting cycles and trends in economic time series. The Review of Economics and Statistics 85: 244–255. Hodrick, R. J., and E. C. Prescott. 1997. Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit, and Banking 29: 1–16. King, R. G., and S. T. Rebelo. 1993. Low frequency filtering and real business cycles. Journal of Economic Dynamics and Control 17: 207–231. Leser, C. E. V. 1961. A simple method of trend construction. Journal of the Royal Statistical Society, Series B 23: 91–107. Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic Press. . 2000. Trend estimation and de-trending via rational square-wave filters. Journal of Econometrics 99: 317–334. . 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 2268–2292. Priestley, M. B. 1981. Spectral Analysis and Time Series. London: Academic Press. Ravn, M. O., and H. Uhlig. 2002. On adjusting the Hodrick–Prescott filter for the frequency of observations. Review of Economics and Statistics 84: 371–376. Schmidt, T. J. 1994. sts5: Detrending with the Hodrick–Prescott filter. Stata Technical Bulletin 17: 22–24. Reprinted in Stata Technical Bulletin Reprints, vol. 3, pp. 216–219. College Station, TX: Stata Press. Wei, W. W. S. 2006. Time Series Analysis: Univariate and Multivariate Methods. 2nd ed. Boston: Pearson. Also see [TS] tsset — Declare data to be time-series data [XT] xtset — Declare data to be panel data [TS] tssmooth — Smooth and forecast univariate time-series data Title tsfilter bk — Baxter–King time-series filter Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description tsfilter bk uses the Baxter and King (1999) band-pass filter to separate a time series into trend and cyclical components. The trend component may contain a deterministic or a stochastic trend. The stationary cyclical component is driven by stochastic cycles at the specified periods. See [TS] tsfilter for an introduction to the methods implemented in tsfilter bk. Quick start Use the Baxter–King filter for y and obtain cyclical component ct using tsset data tsfilter bk ct=y As above, but filter out stochastic cycles with periods smaller than 5 and those with periods larger than 20 tsfilter bk ct=y, minperiod(5) maxperiod(20) As above, and save the trend component in the variable trendvar tsfilter bk ct=y, minperiod(5) maxperiod(20) trend(trendvar) Save gain in gain and angular frequency in angle tsfilter bk ct=y, gain(gain angle) Use the Baxter–King filter for variables y1, y2, and y3 to obtain cyclical components with prefix cycl tsfilter bk cycl*=y1 y2 y3 Note: The above commands can also be used to apply the filter separately to each panel of a panel dataset when a panelvar has been specified by using tsset or xtset. Menu Statistics > Time series > Filters for cyclical components > 584 Baxter-King tsfilter bk — Baxter–King time-series filter 585 Syntax Filter one variable tsfilter bk  type  newvar = varname  if   in   , options  Filter multiple variables, unique names         tsfilter bk type newvarlist = varlist if in , options Filter multiple variables, common name stub         tsfilter bk type stub* = varlist if in , options Description options Main minperiod(#) maxperiod(#) smaorder(#) stationary filter out stochastic cycles at periods smaller than # filter out stochastic cycles at periods larger than # number of observations in each direction that contribute to each filtered value use calculations for a stationary time series Trend trend(newvar | newvarlist | stub*) save the trend component(s) in new variable(s) Gain gain(gainvar anglevar) save the gain and angular frequency You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset. varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main minperiod(#) filters out stochastic cycles at periods smaller than #, where # must be at least 2 and less than maxperiod(). By default, if the units of the time variable are set to daily, weekly, monthly, quarterly, or half-yearly, then # is set to the number of periods equivalent to 1.5 years; yearly data use minperiod(2); otherwise, the default value is minperiod(6). maxperiod(#) filters out stochastic cycles at periods larger than #, where # must be greater than minperiod(). By default, if the units of the time variable are set to daily, weekly, monthly, quarterly, half-yearly, or yearly, then # is set to the number of periods equivalent to 8 years; otherwise, the default value is maxperiod(32). smaorder(#) sets the order of the symmetric moving average, denoted by q . The order is an integer that specifies the number of observations in each direction used in calculating the symmetric moving average estimate of the cyclical component. This number must be an integer greater than zero and less than (T − 1)/2. The estimate for the cyclical component for the tth observation, yt , is based upon the 2q + 1 values yt−q , yt−q+1 , . . . , yt , yt+1 , . . . , yt+q . By default, if the units of the time variable are set to daily, weekly, monthly, quarterly, half-yearly, or yearly, then # is set to the number of periods equivalent to 3 years; otherwise, the default value is smaorder(12). 586 tsfilter bk — Baxter–King time-series filter stationary modifies the filter calculations to those appropriate for a stationary series. By default, the series is assumed nonstationary.   Trend trend(newvar | newvarlist | stub*) saves the trend component(s) in the new variable(s) specified by newvar, newvarlist, or stub*.   Gain gain(gainvar anglevar) saves the gain in gainvar and its associated angular frequency in anglevar. Gains are calculated at the N angular frequencies that uniformly partition the interval (0, π], where N is the sample size. Remarks and examples We assume that you have already read [TS] tsfilter, which provides an introduction to filtering and the methods implemented in tsfilter bk, more examples using tsfilter bk, and a comparison of the four filters implemented by tsfilter. In particular, an understanding of gain functions as presented in [TS] tsfilter is required to understand these remarks. tsfilter bk uses the Baxter–King (BK) band-pass filter to separate a time-series yt into trend and cyclical components: yt = τt + ct where τt is the trend component and ct is the cyclical component. τt may be nonstationary; it may contain a deterministic or a stochastic trend, as discussed below. The primary objective is to estimate ct , a stationary cyclical component that is driven by stochastic cycles within a specified range of periods. The trend component τt is calculated by the difference τt = yt − ct . Although the BK band-pass filter implemented in tsfilter bk has been widely applied by macroeconomists, it is a general time-series method and may be of interest to other researchers. Symmetric moving-average (SMA) filters with coefficients that sum to zero remove stochastic and deterministic trends of first and second order; see Fuller (1996), Baxter and King (1995), and Baxter and King (1999). For an infinitely long series, there is an ideal band-pass filter for which the gain function is 1 for ω ∈ [ω0 , ω1 ] and 0 for all other frequencies; see [TS] tsfilter for an introduction to gain functions. It just so happens that this ideal band-pass filter is an SMA filter with coefficients that sum to zero. Baxter and King (1999) derive the coefficients of this ideal band-pass filter and then define the BK filter to be the SMA filter with 2q + 1 terms that are as close as possible to those of the ideal filter. There is a trade-off in choosing q : larger values of q cause the gain of the BK filter to be closer to the gain of the ideal filter, but they also increase the number of missing observations in the filtered series. The smaorder() option specifies q . The default value of smaorder() is the number of periods equivalent to 3 years, following the Baxter and King (1999) recommendation. Although the mathematics of the frequency-domain approach to time-series analysis is in terms of stochastic cycles at frequencies ω ∈ [−π, π], applied work is generally in terms of periods p, where p = 2π/ω . So tsfilter bk has the minperiod() and maxperiod() options to specify the desired range of stochastic cycles. tsfilter bk — Baxter–King time-series filter 587 Among economists, the BK filter is commonly used for investigating business cycles. Burns and Mitchell (1946) defined business cycles as stochastic cycles in business data corresponding to periods between 1.5 and 8 years. The default values for minperiod() and maxperiod() are the Burns– Mitchell values of 1.5 and 8 years, scaled to the frequency of the dataset. The calculations of the default values assume that the time variable is formatted as daily, weekly, monthly, quarterly, half-yearly, or yearly; see [D] format. For each variable, the band-pass BK filter estimate of ct is put in the corresponding new variable, and when the trend() option is specified, the estimate of τt is put in the corresponding new variable. tsfilter bk automatically detects panel data from the information provided when the dataset was tsset or xtset. All calculations are done separately on each panel. Missing values at the beginning and end of the sample are excluded from the sample. The sample may not contain gaps. Baxter and King (1999) derived their method for nonstationary time series, but they noted that a small modification makes it applicable to stationary time series. Imposing the condition that the filter coefficients sum to zero is what makes their method applicable to nonstationary time series; dropping this condition yields a filter for stationary time series. Specifying the stationary option causes tsfilter bk to use coefficients calculated without the constraint that they sum to zero. Example 1: Estimating a business-cycle component In this and the subsequent examples, we use tsfilter bk to estimate the business-cycle component of the natural log of real gross domestic product (GDP) of the United States. Our sample of quarterly data goes from 1952q1 to 2010q4. Below we read in and plot the data. 7.5 8 natural log of real GDP 8.5 9 9.5 . use http://www.stata-press.com/data/r14/gdp2 (Federal Reserve Economic Data, St. Louis Fed) . tsline gdp_ln 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 quarterly time variable The series is nonstationary and is thus a candidate for the BK filter. 2010q1 588 tsfilter bk — Baxter–King time-series filter Below we use tsfilter bk to filter gdp ln, and we use pergram (see [TS] pergram) to compute and to plot the periodogram of the estimated cyclical component. . tsfilter bk gdp_bk = gdp_ln 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 gdp_ln cyclical component from bk filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . pergram gdp_bk, xline(.03125 .16667) Evaluated at the natural frequencies Because our sample is of quarterly data, tsfilter bk used the default values of minperiod(6), maxperiod(32), and smaorder(12). The minimum and maximum periods are the Burns and Mitchell (1946) business-cycle periods for quarterly data. The default of smaorder(12) was recommend by Baxter and King (1999) for quarterly data. In the periodogram, we added vertical lines at the natural frequencies corresponding to the conventional Burns and Mitchell (1946) values for business-cycle components. pergram displays the results in natural frequencies, which are the standard frequencies divided by 2π . We use the xline() option to draw vertical lines at the lower natural-frequency cutoff (1/32 = 0.03125) and the upper natural-frequency cutoff (1/6 ≈ 0.16667). If the filter completely removed the stochastic cycles at the unwanted frequencies, the periodogram would be a flat line at the minimum value of −6 outside the range identified by the vertical lines. The periodogram reveals that the default value of smaorder(12) did not do a good job of filtering out the high-periodicity stochastic cycles, because there are too many points above −6.00 to the left of the left-hand vertical line. It also reveals that the filter did not remove enough low-periodicity stochastic cycles, because there are too many points above −6.00 to the right of the right-hand vertical line. We address these problems in the next example. Example 2: Changing the order of the filter In this example, we change the symmetric moving average of the filter via the smaorder() option so that it will remove more of the unwanted stochastic cycles. As mentioned, larger values of q cause the gain of the BK filter to be closer to the gain of the ideal filter, but larger values also increase the number of missing observations in the filtered series. tsfilter bk — Baxter–King time-series filter 589 In the output below, we estimate the business-cycle component and compute the gain functions when the SMA-order of the filter is 12 and when it is 20. We also generate ideal, the gain function of the ideal band-pass filter at the frequencies f. Then we plot the gain functions from all three filters. . tsfilter bk gdp_bk12 = gdp_ln, gain(g12 a12) . . . . . label variable g12 "BK SMA-order 12" tsfilter bk gdp_bk20 = gdp_ln, gain(g20 a20) smaorder(20) label variable g20 "BK SMA-order 20" generate f = _pi*(_n-1)/_N generate ideal = cond(f<_pi/16, 0, cond(f<_pi/3, 1,0)) 0 .5 1 . label variable ideal "Ideal filter" . twoway line ideal f || line g12 a12 || line g20 a20 0 1 Ideal filter BK SMA−order 20 2 3 BK SMA−order 12 As discussed in [TS] tsfilter, the gain function of the ideal filter is a square wave with a value of 0 at the frequencies corresponding to unwanted frequencies and a value of 1 at the desired frequencies. The vertical lines in the gain function of the ideal filter occur at the frequencies π/16, corresponding to 32 periods, and at π/3, corresponding to 6 periods. (Given that p = 2π/ω , where p is the period corresponding to the frequency ω , the frequency is given by 2π/p.) The differences between the gain function of the filter with SMA-order 12 and the gain function of the ideal band-pass filter is the root of the issues mentioned at the end of example 1. The filter with SMA-order 20 is closer to the gain function of the ideal band-pass filter at the cost of 16 more missing values in the filtered series. 590 tsfilter bk — Baxter–King time-series filter Below we compute and graph the periodogram of the series filtered with SMA-order 20. 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 gdp_ln cyclical component from bk filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . pergram gdp_bk20, xline(.03125 .16667) Evaluated at the natural frequencies The above periodogram indicates that the filter of SMA-order 20 removed more of the stochastic cycles at the unwanted periodicities than did the filter of SMA-order 12. Whether removing the stochastic cycles at the unwanted periodicities is worth losing more observations in the filtered series is a judgment call. tsfilter bk — Baxter–King time-series filter 591 −.04 gdp_ln cyclical component from bk filter −.02 0 .02 .04 Below we plot the estimated business-cycle component with recessions identified by the shaded areas. 1957q3 1969q3 1981q3 1993q3 2005q3 quarterly time variable gdp_ln cyclical component from bk filter Stored results tsfilter bk stores the following in r(): Scalars r(smaorder) r(minperiod) r(maxperiod) Macros r(varlist) r(filterlist) r(trendlist) r(method) r(stationary) r(unit) Matrices r(filter) order of the symmetric moving average minimum period of stochastic cycles maximum period of stochastic cycles original time-series variables variables containing estimates of the cyclical components variables containing estimates of the trend components, if trend() was specified Baxter-King yes or no, indicating whether the calculations assumed the series was or was not stationary units of time variable set using tsset or xtset (q+1)×1 matrix of filter weights, where q is the order of the symmetric moving average Methods and formulas Baxter and King (1999) showed that there is an infinite-order SMA filter with coefficients that sum to zero that can extract the specified components from a nonstationary time series. The components are specified in terms of the minimum and maximum periods of the stochastic cycles that drive these components in the frequency domain. This ideal filter is not feasible, because the constraints imposed on the filter can only be satisfied using an infinite number of coefficients, so Baxter and King (1999) derived a finite approximation to this ideal filter. The infinite-order, ideal band-pass filter obtains the cyclical component with the calculation ct = ∞ X j=−∞ bj yt−j 592 tsfilter bk — Baxter–King time-series filter Letting pl and ph be the minimum and maximum periods of the stochastic cycles of interest, the weights bj in this calculation are given by bj =   π −1 (ωh − ωl )  if j = 0 (jπ)−1 {sin(jωh ) − sin(jωl )} if j 6= 0 where ωl = 2π/pl and ωh = 2π/ph are the lower and higher cutoff frequencies, respectively. For the default case of nonstationary time series with finite length, the ideal band-pass filter cannot be used without modification. Baxter and King (1999) derived modified weights for a finite order SMA filter with coefficients that sum to zero. As a result, Baxter and King (1999) estimate ct by ct = +q X bbj yt−j j=−q The coefficients b bj in this calculation are equal to bbj = bj − bq , where bb−j = bbj and bq is the mean of the ideal coefficients truncated at ±q : bq = (2q + 1)−1 q X bj j=−q P+q Note that j=−q b bj = 0 and that the first and last q values of the cyclical component cannot be estimated using this filter. If the stationary option is set, the BK filter sets the coefficients to the ideal coefficients, that is, Pq b bbj = bj . For these weights, bbj = bb−j , and although P∞ b j=−∞ bj = 0, for small q , −q bj 6= 0. References Baxter, M., and R. G. King. 1995. Measuring business cycles approximate band-pass filters for economic time series. NBER Working Paper No. 5022, National Bureau of Economic Research. http://www.nber.org/papers/w5022. . 1999. Measuring business cycles: Approximate band-pass filters for economic time series. Review of Economics and Statistics 81: 575–593. Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic Research. Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley. Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic Press. . 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 2268–2292. Also see [TS] tsset — Declare data to be time-series data [XT] xtset — Declare data to be panel data [TS] tsfilter — Filter a time-series, keeping only selected periodicities [D] format — Set variables’ output format [TS] tssmooth — Smooth and forecast univariate time-series data Title tsfilter bw — Butterworth time-series filter Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description tsfilter bw uses the Butterworth high-pass filter to separate a time series into trend and cyclical components. The trend component may contain a deterministic or a stochastic trend. The stationary cyclical component is driven by stochastic cycles at the specified periods. See [TS] tsfilter for an introduction to the methods implemented in tsfilter bw. Quick start Use the Butterworth filter for y and obtain cyclical component ct using tsset data tsfilter bw ct=y As above, but filter periods larger than 20 tsfilter bw ct=y, maxperiod(20) As above, and save the trend component in the variable trendvar tsfilter bw ct=y, maxperiod(20) trend(trendvar) As above, and save gain and angular frequency with the names gain and angle tsfilter bw ct=y, maxperiod(20) trend(trendvar) gain(gain angle) Set the order of the filter to 4 and save the trend component in the variable trendvar tsfilter bw ct=y, order(4) trend(trendvar) Use Butterworth filter for variables y1, y2, and y3 to obtain cyclical components with prefix cycl tsfilter bw cycl* = y1 y2 y3 Note: The above commands can also be used to apply the filter separately to each panel of a panel dataset when a panelvar has been specified by using tsset or xtset. Menu Statistics > Time series > Filters for cyclical components > 593 Butterworth 594 tsfilter bw — Butterworth time-series filter Syntax Filter one variable tsfilter bw  type  newvar = varname  if   in   , options  Filter multiple variables, unique names         tsfilter bw type newvarlist = varlist if in , options Filter multiple variables, common name stub         tsfilter bw type stub* = varlist if in , options Description options Main maxperiod(#) order(#) filter out stochastic cycles at periods larger than # set the order of the filter; default is order(2) Trend trend(newvar|newvarlist|stub*) save the trend component(s) in new variable(s) Gain gain(gainvar anglevar) save the gain and angular frequency You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset. varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main maxperiod(#) filters out stochastic cycles at periods larger than #, where # must be greater than 2. By default, if the units of the time variable are set to daily, weekly, monthly, quarterly, half-yearly, or yearly, then # is set to the number of periods equivalent to 8 years; otherwise, the default value is maxperiod(32). order(#) sets the order of the Butterworth filter, which must be an integer. The default is order(2).   Trend trend(newvar | newvarlist | stub*) saves the trend component(s) in the new variable(s) specified by newvar, newvarlist, or stub*.   Gain gain(gainvar anglevar) saves the gain in gainvar and its associated angular frequency in anglevar. Gains are calculated at the N angular frequencies that uniformly partition the interval (0, π], where N is the sample size. tsfilter bw — Butterworth time-series filter 595 Remarks and examples We assume that you have already read [TS] tsfilter, which provides an introduction to filtering and the methods implemented in tsfilter bw, more examples using tsfilter bw, and a comparison of the four filters implemented by tsfilter. In particular, an understanding of gain functions as presented in [TS] tsfilter is required to understand these remarks. tsfilter bw uses the Butterworth high-pass filter to separate a time-series yt into trend and cyclical components: yt = τt + ct where τt is the trend component and ct is the cyclical component. τt may be nonstationary; it may contain a deterministic or a stochastic trend, as discussed below. The primary objective is to estimate ct , a stationary cyclical component that is driven by stochastic cycles within a specified range of periods. The trend component τt is calculated by the difference τt = yt − ct . Although the Butterworth high-pass filter implemented in tsfilter bw has been widely applied by macroeconomists and engineers, it is a general time-series method and may be of interest to other researchers. Engineers have used Butterworth filters for a long time because they are “maximally flat”. The gain functions of these filters are as close as possible to being a flat line at 0 for the unwanted periods and a flat line at 1 for the desired periods; see Butterworth (1930) and Bianchi and Sorrentino (2007, 17–20). (See [TS] tsfilter for an introduction to gain functions.) The high-pass Butterworth filter is a two-parameter filter. The maxperiod() option specifies the maximum period; the stochastic cycles of all higher periodicities are filtered out. The maxperiod() option sets the location of the cutoff period in the gain function. The order() option specifies the order of the filter, which determines the slope of the gain function at the cutoff frequency. For a given cutoff period, the slope of the gain function at the cutoff period increases with filter order. For a given filter order, the slope of the gain function at the cutoff period increases with the cutoff period. We cannot obtain a vertical slope at the cutoff frequency, which is the ideal, because the computation becomes unstable; see Pollock (2000). The filter order for which the computation becomes unstable depends on the cutoff period. Among economists, the high-pass Butterworth filter is commonly used for investigating business cycles. Burns and Mitchell (1946) defined business cycles as stochastic cycles in business data corresponding to periods between 1.5 and 8 years. For this reason, the default value for maxperiod() is the number of periods in 8 years, if the time variable is formatted as daily, weekly, monthly, quarterly, half-yearly, or yearly; see [D] format. The default value for maxperiod() is 32 for all other time formats. For each variable, the high-pass Butterworth filter estimate of ct is put in the corresponding new variable, and when the trend() option is specified, the estimate of τt is put in the corresponding new variable. tsfilter bw automatically detects panel data from the information provided when the dataset was tsset or xtset. All calculations are done separately on each panel. Missing values at the beginning and end of the sample are excluded from the sample. The sample may not contain gaps. 596 tsfilter bw — Butterworth time-series filter Example 1: Estimating a business-cycle component In this and the subsequent examples, we use tsfilter bw to estimate the business-cycle component of the natural log of the real gross domestic product (GDP) of the United States. Our sample of quarterly data goes from 1952q1 to 2010q4. Below we read in and plot the data. . use http://www.stata-press.com/data/r14/gdp2 (Federal Reserve Economic Data, St. Louis Fed) 7.5 8 natural log of real GDP 8.5 9 9.5 . tsline gdp_ln 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1 quarterly time variable The series is nonstationary. Pollock (2000) shows that the high-pass Butterworth filter can estimate the components driven by the stochastic cycles at the specified frequencies when the original series is nonstationary. Below we use tsfilter bw to filter gdp ln and use pergram (see [TS] pergram) to compute and to plot the periodogram of the estimated cyclical component. . tsfilter bw gdp_bw = gdp_ln 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency Evaluated at the natural frequencies 0.40 0.50 −6.00 −4.00 −2.00 gdp_ln cyclical component from bw filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . pergram gdp_bw, xline(.03125 .16667) tsfilter bw — Butterworth time-series filter 597 tsfilter bw used the default value of maxperiod(32) because our sample is of quarterly data. In the periodogram, we added vertical lines at the natural frequencies corresponding to the conventional Burns and Mitchell (1946) values for business-cycle components. pergram displays the results in natural frequencies, which are the standard frequencies divided by 2π . We use option xline() to draw vertical lines at the lower natural-frequency cutoff (1/32 = 0.03125) and the upper natural-frequency cutoff (1/6 ≈ 0.16667). If the filter completely removed the stochastic cycles at the unwanted frequencies, the periodogram would be a flat line at the minimum value of −6 outside the range identified by the vertical lines. The periodogram reveals two issues. First, it indicates that the default value of order(2) did not do a good job of filtering out the high-periodicity stochastic cycles, because there are too many points above −6.00 to the left of the left-hand vertical line. Second, it reveals the high-pass nature of the filter, because none of the low-period (high-frequency) stochastic cycles have been filtered out. We cope with these two issues in the remaining examples. Example 2: Changing the order of the filter In this example, we change the order of the filter so that it will remove more of the unwanted low-frequency stochastic cycles. As previously mentioned, increasing the order of the filter increases the slope of the gain function at the cutoff period. For orders 2 and 8, we compute the filtered series, compute the gain functions, and label the gain variables. We also generate ideal, the gain function of the ideal band-pass filter at the frequencies f. Then we plot the gain function of the ideal band-pass filter and the gain functions of the high-pass Butterworth filters of orders 2 and 8. .2 .4 .6 .8 1 tsfilter bw gdp_bw2 = gdp_ln, gain(g1 a1) label variable g1 "BW order 2" tsfilter bw gdp_bw8 = gdp_ln, gain(g8 a8) order(8) label variable g8 "BW order 8" generate f = _pi*(_n-1)/_N generate ideal = cond(f<_pi/16, 0, cond(f<_pi/3, 1,0)) label variable ideal "Ideal filter" twoway line ideal f || line g1 a1 || line g8 a8 0 . . . . . . . . 0 1 Ideal filter BW order 8 2 3 BW order 2 598 tsfilter bw — Butterworth time-series filter As discussed in [TS] tsfilter, the gain function of the ideal filter is a square wave with a value of 0 at the frequencies corresponding to unwanted frequencies and a value of 1 at the desired frequencies. The vertical lines in the gain function of the ideal filter occur at π/16, corresponding to 32 periods, and at π/3, corresponding to 6 periods. (Given that p = 2π/ω , where p is the period corresponding to frequency ω , the frequency is given by 2π/p.) The distance between the gain function of the filter with order 2 and the gain function of the ideal band-pass filter at π/16 is the root of the first issue mentioned at the end of example 1. The filter with order 8 is much closer to the gain function of the ideal band-pass filter at π/16 than is the filter with order 2. That both gain functions are 1 to the right of the vertical line at π/3 reveals the high-pass nature of the filter. Example 3: Removing the high-frequency component In this example, we use a common trick to resolve the second issue mentioned at the end of example 1. Keeping the trend produced by a high-pass filter turns that high-pass filter into a low-pass filter. Because we want to remove the high-frequency stochastic cycles still in the previously filtered series gdp bw8, we need to run gdp bw8 through a low-pass filter. So we keep the trend produced by refiltering the previously filtered series. To determine an order for the filter, we run the filter with order(8), then with order(15), and then we plot the gain functions along with the gain function of the ideal filter. tsfilter bw gdp_bwn8 = gdp_bw8, gain(gc8 ac8) order(8) maxperiod(6) trend(gdp_bwc8) label variable gc8 "BW order 8" tsfilter bw gdp_bwn15 = gdp_bw8, gain(gc15 ac15) order(15) maxperiod(6) trend(gdp_bwc15) label variable gc15 "BW order 15" twoway line ideal f || line gc8 ac8 || line gc15 ac15 0 .2 .4 .6 .8 1 . > . . > . . 0 1 Ideal filter BW order 15 2 3 BW order 8 We specified much higher orders for the filter in this example because the cutoff period is 6 instead of 32. (As previously mentioned, holding the order of the filter constant, the slope of the gain function at the cutoff period decreases when the period decreases.) The above graph indicates that the filter with order(15) is reasonably close to the gain function of the ideal filter. tsfilter bw — Butterworth time-series filter 599 Now we compute and plot the periodogram of the estimated business-cycle component. 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 gdp_bw8 trend component from bw filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . pergram gdp_bwc15, xline(.03125 .16667) Evaluated at the natural frequencies The graph indicates that the above applications of the Butterworth filter did a reasonable job of filtering out the high-periodicity stochastic cycles but that the low-periodicity stochastic cycles have not been completely removed. −.04 gdp_bw8 trend component from bw filter −.02 0 .02 .04 Below we plot the estimated business-cycle component with recessions identified by the shaded areas. 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 quarterly time variable gdp_bw8 trend component from bw filter 2010q1 600 tsfilter bw — Butterworth time-series filter Stored results tsfilter bw stores the following in r(): Scalars r(order) r(maxperiod) Macros r(varlist) r(filterlist) r(trendlist) r(method) r(unit) order of the filter maximum period of stochastic cycles original time-series variables variables containing estimates of the cyclical components variables containing estimates of the trend components, if trend() was specified Butterworth units of time variable set using tsset or xtset Methods and formulas tsfilter bw uses the computational methods described in Pollock (2000) to implement the filter. Pollock (2000) shows that the gain of the Butterworth high-pass filter is given by "  ψ(ω) = 1 + tan(ωc /2) tan(ω/2) 2m #−1 where m is the order of the filter, ωc = 2π/ph is the cutoff frequency, and ph is the maximum period. Here is an outline of the computational procedure that Pollock (2000) derived. Pollock (2000) showed that the Butterworth filter corresponds to a particular model. Actually, his model is more general than the Butterworth filter, but tsfilter bw restricts the computations to the case in which the model corresponds to the Butterworth filter. The model represents the series to be filtered, yt , in terms of zero mean, covariance stationary, and independent and identically distributed shocks νt and εt : yt = (1 + L)m νt + εt (1 − L)m From this model, Pollock (2000) shows that the optimal estimate for the cyclical component is given by c = λQ(ΩL + λΩH )−1 Q0 y where Var{Q0 (y − c)} = σν2 ΩL and Var{Q0 c} = σε2 ΩH . Here ΩL and ΩH are symmetric Toeplitz matrices with 2m + 1 nonzero diagonal bands and generating functions (1 + z)m (1 + z −1 )m and (1 − z)m (1 − z −1 )m , respectively. The parameter λ in this expression is a function of ph (the maximum period of stochastic cycles filtered out) and the order of the filter: λ = {tan(π/ph )}−2m tsfilter bw — Butterworth time-series filter 601 The matrix Q0 in this expression is a function of the coefficients in the polynomial (1 − L)d = 1 + δ1 L + · · · + δd Ld : δd  ..  .  0  0 Q = 0  .  ..  0 0  . . . δ1 . .. . .. . . . δd ... 0 .. . ... ... 0 0 1 .. . δd−1 δd .. . ... 0 . .. . .. ... 1 . . . δ1 .. . 0 0 . . . δd ... 0 0 .. . ... 0 .. . 0 1 ... ... .. . 0 0 .. . δd−1 δd ... 1 . . . δ1 (T −d)×T 0 ..  .  0  0 ..   . 0 1 It can be shown that ΩH = Q0 Q and ΩL = |ΩH |, which simplifies the calculation of the cyclical component to c = λQ{|Q0 Q| + λ(Q0 Q)}−1 Q0 y References Bianchi, G., and R. Sorrentino. 2007. Electronic Filter Simulation and Design. New York: McGraw–Hill. Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic Research. Butterworth, S. 1930. On the theory of filter amplifiers. Experimental Wireless and the Wireless Engineer 7: 536–541. Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic Press. . 2000. Trend estimation and de-trending via rational square-wave filters. Journal of Econometrics 99: 317–334. . 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 2268–2292. Also see [TS] tsset — Declare data to be time-series data [XT] xtset — Declare data to be panel data [TS] tsfilter — Filter a time-series, keeping only selected periodicities [D] format — Set variables’ output format [TS] tssmooth — Smooth and forecast univariate time-series data Title tsfilter cf — Christiano–Fitzgerald time-series filter Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description tsfilter cf uses the Christiano and Fitzgerald (2003) band-pass filter to separate a time series into trend and cyclical components. The trend component may contain a deterministic or a stochastic trend. The stationary cyclical component is driven by stochastic cycles at the specified periods. See [TS] tsfilter for an introduction to the methods implemented in tsfilter cf. Quick start Use the Christiano–Fitzgerald filter for y to obtain cyclical component ct using tsset data tsfilter cf ct=y As above, but filter for periods smaller than 5 and larger than 20 tsfilter cf ct=y, minperiod(5) maxperiod(20) As above, and save the trend component in the variable trendvar tsfilter cf ct=y, minperiod(5) maxperiod(20) trend(trendvar) As above, and save gain and angular frequency with the names gain and angle tsfilter cf ct=y, minperiod(5) maxperiod(20) trend(trendvar) gain(gain angle) /// Use the Christiano–Fitzgerald filter for variables y1, y2, and y3 to obtain cyclical components with prefix cycl tsfilter cf cycl*=y1 y2 y3 Note: The above commands can also be used to apply the filter separately to each panel of a panel dataset when a panelvar has been specified by using tsset or xtset. Menu Statistics > Time series > Filters for cyclical components > 602 Christiano-Fitzgerald tsfilter cf — Christiano–Fitzgerald time-series filter 603 Syntax Filter one variable tsfilter cf  type  newvar = varname  if   in   , options  Filter multiple variables, unique names         in , options tsfilter cf type newvarlist = varlist if Filter multiple variables, common name stub         tsfilter cf type stub* = varlist if in , options Description options Main minperiod(#) maxperiod(#) smaorder(#) stationary drift filter out stochastic cycles at periods smaller than # filter out stochastic cycles at periods larger than # number of observations in each direction that contribute to each filtered value use calculations for a stationary time series remove drift from the time series Trend trend(newvar | newvarlist | stub*) save the trend component(s) in new variable(s) Gain gain(gainvar anglevar) save the gain and angular frequency You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset. varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main minperiod(#) filters out stochastic cycles at periods smaller than #, where # must be at least 2 and less than maxperiod(). By default, if the units of the time variable are set to daily, weekly, monthly, quarterly, or half-yearly, then # is set to the number of periods equivalent to 1.5 years; yearly data use minperiod(2); otherwise, the default value is minperiod(6). maxperiod(#) filters out stochastic cycles at periods larger than #, where # must be greater than minperiod(). By default, if the units of the time variable are set to daily, weekly, monthly, quarterly, half-yearly, or yearly, then # is set to the number of periods equivalent to 8 years; otherwise, the default value is maxperiod(32). smaorder(#) sets the order of the symmetric moving average, denoted by q . By default, smaorder() is not set, which invokes the asymmetric calculations for the Christiano–Fitzgerald filter. The order is an integer that specifies the number of observations in each direction used in calculating the symmetric moving average estimate of the cyclical component. This number must be an integer greater than zero and less than (T − 1)/2. The estimate of the cyclical component for the tth observation, yt , is based upon the 2q + 1 values yt−q , yt−q+1 , . . . , yt , yt+1 , . . . , yt+q . 604 tsfilter cf — Christiano–Fitzgerald time-series filter stationary modifies the filter calculations to those appropriate for a stationary series. By default, the series is assumed nonstationary. drift removes drift using the approach described in Christiano and Fitzgerald (2003). By default, drift is not removed.   Trend trend(newvar | newvarlist | stub*) saves the trend component(s) in the new variable(s) specified by newvar, newvarlist, or stub*.   Gain gain(gainvar anglevar) saves the gain in gainvar and its associated angular frequency in anglevar. Gains are calculated at the N angular frequencies that uniformly partition the interval (0, π], where N is the sample size. Remarks and examples We assume that you have already read [TS] tsfilter, which provides an introduction to filtering and the methods implemented in tsfilter cf, more examples using tsfilter cf, and a comparison of the four filters implemented by tsfilter. In particular, an understanding of gain functions as presented in [TS] tsfilter is required to understand these remarks. tsfilter cf uses the Christiano–Fitzgerald (CF) band-pass filter to separate a time-series yt into trend and cyclical components yt = τt + ct where τt is the trend component and ct is the cyclical component. τt may be nonstationary; it may contain a deterministic or a stochastic trend, as discussed below. The primary objective is to estimate ct , a stationary cyclical component that is driven by stochastic cycles at a specified range of periods. The trend component τt is calculated by the difference τt = yt − ct . Although the CF band-pass filter implemented in tsfilter cf has been widely applied by macroeconomists, it is a general time-series method and may be of interest to other researchers. As discussed by Christiano and Fitzgerald (2003) and in [TS] tsfilter, if one had an infinitely long series, one could apply an ideal band-pass filter that perfectly separates out cyclical components driven by stochastic cycles at the specified periodicities. In finite samples, it is not possible to exactly satisfy the conditions that a filter must fulfill to perfectly separate out the specified stochastic cycles; the expansive filter literature reflects the trade-offs involved in choosing a finite-length filter to separate out the specified stochastic cycles. Christiano and Fitzgerald (2003) derive a finite-length CF band-pass filter that minimizes the mean squared error between the filtered series and the series filtered by an ideal band-pass filter that perfectly separates out components driven by stochastic cycles at the specified periodicities. Christiano and Fitzgerald (2003) place two important restrictions on the mean squared error problem that their filter solves. First, the CF filter is restricted to be a linear filter. Second, yt is assumed to be a random-walk process; in other words, yt = yt−1 + t , where t is independent and identically distributed with mean zero and finite variance. The CF filter is the best linear predictor of the series filtered by the ideal band-pass filter when yt is a random walk. tsfilter cf — Christiano–Fitzgerald time-series filter 605 Christiano and Fitzgerald (2003) make four points in support of the random-walk assumption. First, the mean squared error problem solved by their filter requires that the process for yt be specified. Second, they provide a method for removing drift so that their filter handles cases in which yt is a random walk with drift. Third, many economic time series are well approximated by a random-walk-plus-drift process. (We add that many time series encountered in applied statistics are well approximated by a random-walk-plus-drift process.) Fourth, they provide simulation evidence that their filter performs well when the process generating yt is not a random-walk-plus-drift process but is close to being a random-walk-plus-drift process. Comparing the CF filter with the Baxter–King (BK) filter provides some intuition and explains the smaorder() option in tsfilter cf. As discussed in [TS] tsfilter and Baxter and King (1999), symmetric moving-average (SMA) filters with coefficients that sum to zero can extract the components driven by stochastic cycles at specified periodicities when the series to be filtered has a deterministic or stochastic trend of order 1 or 2. The coefficients of the finite-length BK filter are as close as possible to the coefficients of an ideal SMA band-pass filter under the constraints that the BK coefficients are symmetric and sum to zero. The coefficients of the CF filter are not symmetric nor do they sum to zero, but the CF filter was designed to filter out the specified periodicities when yt has a first-order stochastic trend. To be robust to second-order trends, Christiano and Fitzgerald (2003) derive a constrained version of the CF filter. The coefficients of the constrained filter are constrained to be symmetric and to sum to zero. Subject to these constraints, the coefficients of the constrained CF filter minimize the mean squared error between the filtered series and the series filtered by an ideal band-pass filter that perfectly separates out the components. Christiano and Fitzgerald (2003) note that the higher-order detrending properties of this constrained filter come at the cost of lost efficiency. If the constraints are binding, the constrained filter cannot predict the series filtered by the ideal filter as well as the unconstrained filter can. Specifying the smaorder() option causes tsfilter cf to compute the SMA-constrained CF filter. The choice between the BK and the CF filters is one between robustness and efficiency. The BK filter handles a broader class of stochastic processes than does the CF filter, but the CF filter produces a better estimate of ct if yt is close to a random-walk process or a random-walk-plus-drift process. Among economists, the CF filter is commonly used for investigating business cycles. Burns and Mitchell (1946) defined business cycles as stochastic cycles in business data corresponding to periods between 1.5 and 8 years. The default values for minperiod() and maxperiod() are the Burns– Mitchell values of 1.5 and 8 years scaled to the frequency of the dataset. The calculations of the default values assume that the time variable is formatted as daily, weekly, monthly, quarterly, half-yearly, or yearly; see [D] format. When yt is assumed to be a random-walk-plus-drift process instead of a random-walk process, specify the drift option, which removes the linear drift in the series before applying the filter. Drift is removed by transforming the original series to a new series by using the calculation zt = yt − (t − 1)(yT − y1 ) T −1 The cyclical component ct is calculated from drift-adjusted series zt . The trend component τt is calculated by τt = yt − ct . By default, the CF filter assumes the series is nonstationary. If the series is stationary, the stationary option is used to change the calculations to those appropriate for a stationary series. For each variable, the CF filter estimate of ct is put in the corresponding new variable, and when the trend() option is specified, the estimate of τt is put in the corresponding new variable. 606 tsfilter cf — Christiano–Fitzgerald time-series filter tsfilter cf automatically detects panel data from the information provided when the dataset was tsset or xtset. All calculations are done separately on each panel. Missing values at the beginning and end of the sample are excluded from the sample. The sample may not contain gaps. Example 1: Estimating a business-cycle component In this and the subsequent examples, we use tsfilter cf to estimate the business-cycle component of the natural log of real gross domestic product (GDP) of the United States. Our sample of quarterly data goes from 1952q1 to 2010q4. Below we read in and plot the data. 7.5 8 natural log of real GDP 8.5 9 9.5 . use http://www.stata-press.com/data/r14/gdp2 (Federal Reserve Economic Data, St. Louis Fed) . tsline gdp_ln 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1 quarterly time variable The series looks like it might be generated by a random-walk-plus-drift process and is thus a candidate for the CF filter. tsfilter cf — Christiano–Fitzgerald time-series filter 607 Below we use tsfilter cf to filter gdp ln, and we use pergram (see [TS] pergram) to compute and to plot the periodogram of the estimated cyclical component. 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 gdp_ln cyclical component from cf filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . tsfilter cf gdp_cf = gdp_ln . pergram gdp_cf, xline(.03125 .16667) Evaluated at the natural frequencies Because our sample is of quarterly data, tsfilter cf used the default values of minperiod(6) and maxperiod(32). The minimum and maximum periods are the Burns and Mitchell (1946) business-cycle periods for quarterly data. In the periodogram, we added vertical lines at the natural frequencies corresponding to the conventional Burns and Mitchell (1946) values for business-cycle components. pergram displays the results in natural frequencies, which are the standard frequencies divided by 2π . We use the xline() option to draw vertical lines at the lower natural-frequency cutoff (1/32 = 0.03125) and the upper natural-frequency cutoff (1/6 ≈ 0.16667). If the filter completely removed the stochastic cycles at the unwanted frequencies, the periodogram would be a flat line at the minimum value of −6 outside the range identified by the vertical lines. The periodogram reveals that the CF did a reasonable job of filtering out the unwanted stochastic cycles. 608 tsfilter cf — Christiano–Fitzgerald time-series filter −.04 gdp_ln cyclical component from cf filter −.02 0 .02 .04 Below we plot the estimated business-cycle component with recessions identified by the shaded areas. 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1 quarterly time variable gdp_ln cyclical component from cf filter Stored results tsfilter cf stores the following in r(): Scalars r(smaorder) r(minperiod) r(maxperiod) Macros r(varlist) r(filterlist) r(trendlist) r(method) r(symmetric) r(drift) r(stationary) r(unit) Matrices r(filter) order of the symmetric moving average, if specified minimum period of stochastic cycles maximum period of stochastic cycles original time-series variables variables containing estimates of the cyclical components variables containing estimates of the trend components, if trend() was specified Christiano-Fitzgerald yes or no, indicating whether the symmetric version of the filter was or was not used yes or no, indicating whether drift was or was not removed before filtering yes or no, indicating whether the calculations assumed the series was or was not stationary units of time variable set using tsset or xtset (q+1)×1 matrix of weights (b b0 ,b b1 ,...,b bq )0 , where q is the order of the symmetric moving average, and the weights are the Christiano–Fitzgerald coefficients; only returned when smaorder() is used to set q Methods and formulas For an infinitely long series, there is an ideal band-pass filter that extracts the cyclical component by using the calculation ∞ X ct = bj yt−j j=−∞ If pl and ph are the minimum and maximum periods of the stochastic cycles of interest, the weights bj in the ideal band-pass filter are given by tsfilter cf — Christiano–Fitzgerald time-series filter bj =   π −1 (ωh − ωl )  609 if j = 0 (jπ)−1 {sin(jωh ) − sin(jωl )} if j 6= 0 where ωl = 2π/pl and ωh = 2π/ph are the lower and higher cutoff frequencies, respectively. Because our time series has finite length, the ideal band-pass filter cannot be computed exactly. Christiano and Fitzgerald (2003) derive the finite-length CF band-pass filter that minimizes the mean squared error between the filtered series and the series filtered by an ideal band-pass filter that perfectly separates out the components. This filter is not symmetric nor do the coefficients sum to zero. The formula for calculating the value of cyclical component ct for t = 2, 3, . . . , T − 1 using the asymmetric version of the CF filter can be expressed as ct = b0 yt + TX −t−1 bj yt+j + ebT −t yT + j=1 t−2 X bj yt−j + ebt−1 y1 j=1 where b0 , b1 , . . . are the weights used by the ideal band-pass filter. e bT −t and ebt−1 are linear functions of the ideal weights used in this calculation. The CF filter uses two different calculations for e bt depending upon whether the series is assumed to be stationary or nonstationary. For the default nonstationary case with 1 < t < T , Christiano and Fitzgerald (2003) set e bT −t and ebt−1 to TX −t−1 t−2 X 1 ebT −t = − 1 b0 − bj and ebt−1 = − b0 − bj 2 2 j=1 j=1 which forces the weights to sum to zero. For the nonstationary case, when t = 1 or t = T , the two endpoints (c1 and cT ) use only one modified weight, e bT −1 : c1 = T −2 X 1 b0 y1 + bj yj+1 + ebT −1 yT 2 j=1 and cT = T −2 X 1 b0 y T + bj yT −j + ebT −1 y1 2 j=1 When the stationary option is used to invoke the stationary calculations, all weights are set to the ideal filter weight, that is, e bj = bj . If the smaorder() option is set, the symmetric version of the CF filter is used. This option specifies the length of the symmetric moving average denoted by q . The symmetric calculations for ct are similar to those used by the BK filter: ct = bbq {L−q (yt ) + Lq (yt )} + q−1 X bj Lj (yt ) j=−q+1 Pq−1 where, for the default nonstationary calculations, b bq = −(1/2)b0 − j=1 bj . If the smaorder() and stationary options are set, then b bq is set equal to the ideal weight bq . 610 tsfilter cf — Christiano–Fitzgerald time-series filter References Baxter, M., and R. G. King. 1999. Measuring business cycles: Approximate band-pass filters for economic time series. Review of Economics and Statistics 81: 575–593. Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic Research. Christiano, L. J., and T. J. Fitzgerald. 2003. The band pass filter. International Economic Review 44: 435–465. Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic Press. . 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 2268–2292. Also see [TS] tsset — Declare data to be time-series data [XT] xtset — Declare data to be panel data [TS] tsfilter — Filter a time-series, keeping only selected periodicities [D] format — Set variables’ output format [TS] tssmooth — Smooth and forecast univariate time-series data Title tsfilter hp — Hodrick–Prescott time-series filter Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description tsfilter hp uses the Hodrick–Prescott high-pass filter to separate a time series into trend and cyclical components. The trend component may contain a deterministic or a stochastic trend. The smoothing parameter determines the periods of the stochastic cycles that drive the stationary cyclical component. See [TS] tsfilter for an introduction to the methods implemented in tsfilter hp. Quick start Use the Hodrick–Prescott filter for y to obtain cyclical component ct using tsset data tsfilter hp ct=y As above, and save the trend component in the variable trendvar tsfilter hp ct=y, trend(trendvar) As above, and save gain and angular frequency with the names gain and angle tsfilter hp ct=y, trend(trendvar) gain(gain angle) As above, but set the Hodrick–Prescott smoothing parameter to be 1700 tsfilter hp ct=y, trend(trendvar) gain(gain angle) smooth(1700) Use Hodrick–Prescott filter for variables y1, y2, and y3 to obtain cyclical components with prefix cycl tsfilter hp cycl*=y1 y2 y3 Note: The above commands can also be used to apply the filter separately to each panel of a panel dataset when a panelvar has been specified by using tsset or xtset. Menu Statistics > Time series > Filters for cyclical components > 611 Hodrick-Prescott 612 tsfilter hp — Hodrick–Prescott time-series filter Syntax Filter one variable tsfilter hp  type  newvar = varname  if   in   , options  Filter multiple variables, unique names         tsfilter hp type newvarlist = varlist if in , options Filter multiple variables, common name stub         tsfilter hp type stub* = varlist if in , options Description options Main smooth(#) smoothing parameter for the Hodrick–Prescott filter Trend trend(newvar | newvarlist | stub*) save the trend component(s) in new variable(s) Gain gain(gainvar anglevar) save the gain and angular frequency You must tsset or xtset your data before using tsfilter; see [TS] tsset and [XT] xtset. varname and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main smooth(#) sets the smoothing parameter for the Hodrick–Prescott filter. By default, if the units of the time variable are set to daily, weekly, monthly, quarterly, half-yearly, or yearly, then the Ravn–Uhlig rule is used to set the smoothing parameter; otherwise, the default value is smooth(1600). The Ravn–Uhlig rule sets # to 1600p4q , where pq is the number of periods per quarter. The smoothing parameter must be greater than 0.   Trend trend(newvar | newvarlist | stub*) saves the trend component(s) in the new variable(s) specified by newvar, newvarlist, or stub*.   Gain gain(gainvar anglevar) saves the gain in gainvar and its associated angular frequency in anglevar. Gains are calculated at the N angular frequencies that uniformly partition the interval (0, π], where N is the sample size. tsfilter hp — Hodrick–Prescott time-series filter 613 Remarks and examples We assume that you have already read [TS] tsfilter, which provides an introduction to filtering and the methods implemented in tsfilter hp, more examples using tsfilter hp, and a comparison of the four filters implemented by tsfilter. In particular, an understanding of gain functions as presented in [TS] tsfilter is required to understand these remarks. tsfilter hp uses the Hodrick–Prescott (HP) high-pass filter to separate a time-series yt into trend and cyclical components yt = τt + ct where τt is the trend component and ct is the cyclical component. τt may be nonstationary; it may contain a deterministic or a stochastic trend, as discussed below. The primary objective is to estimate ct , a stationary cyclical component that is driven by stochastic cycles at a range of periods. The trend component τt is calculated by the difference τt = yt − ct . Although the HP high-pass filter implemented in tsfilter hp has been widely applied by macroeconomists, it is a general time-series method and may be of interest to other researchers. Hodrick and Prescott (1997) motivated the HP filter as a trend-removal technique that could be applied to data that came from a wide class of data-generating processes. In their view, the technique specified a trend in the data and the data was filtered by removing the trend. The smoothness of the trend depends on a parameter λ. The trend becomes smoother as λ → ∞, and Hodrick and Prescott (1997) recommended setting λ to 1,600 for quarterly data. King and Rebelo (1993) showed that removing a trend estimated by the HP filter is equivalent to a high-pass filter. They derived the gain function of this high-pass filter and showed that the filter would make integrated processes of order 4 or less stationary, making the HP filter comparable to the other filters implemented in tsfilter. Example 1: Estimating a business-cycle component In this and the subsequent examples, we use tsfilter hp to estimate the business-cycle component of the natural log of real gross domestic product (GDP) of the United States. Our sample of quarterly data goes from 1952q1 to 2010q4. Below we read in and plot the data. 7.5 8 natural log of real GDP 8.5 9 9.5 . use http://www.stata-press.com/data/r14/gdp2 (Federal Reserve Economic Data, St. Louis Fed) . tsline gdp_ln 1950q1 1960q1 1970q1 1980q1 1990q1 quarterly time variable 2000q1 2010q1 614 tsfilter hp — Hodrick–Prescott time-series filter The series is nonstationary and is thus a candidate for the HP filter. Below we use tsfilter hp to filter gdp ln, and we use pergram (see [TS] pergram) to compute and to plot the periodogram of the estimated cyclical component. . tsfilter hp gdp_hp = gdp_ln . pergram gdp_hp, xline(.03125 .16667) Because our sample is of quarterly data, tsfilter hp used the default value for the smoothing parameter of 1,600. In the periodogram, we added vertical lines at the natural frequencies corresponding to the conventional Burns and Mitchell (1946) values for business-cycle components of 32 periods and 6 periods. pergram displays the results in natural frequencies, which are the standard frequencies divided by 2π . We use the xline() option to draw vertical lines at the lower natural-frequency cutoff (1/32 = 0.03125) and the upper natural-frequency cutoff (1/6 ≈ 0.16667). 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 gdp_ln cyclical component from hp filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 If the filter completely removed the stochastic cycles at the unwanted frequencies, the periodogram would be a flat line at the minimum value of −6 outside the range identified by the vertical lines. Evaluated at the natural frequencies The periodogram reveals a high-periodicity issue and a low-periodicity issue. The points above −6.00 to the left of the left-hand vertical line in the periodogram reveal that the filter did not do a good job of filtering out the high-periodicity stochastic cycles with the default value smoothing parameter of 1,600. That there is no tendency of the points to the right of the right-hand vertical line to be smoothed toward −6.00 reveals that the HP filter did not remove any of the low-periodicity stochastic cycles. This result is not surprising, because the HP filter is a high-pass filter. In the next example, we address the high-periodicity issue. See [TS] tsfilter and [TS] tsfilter bw for how to turn a high-pass filter into a band-pass filter. Example 2: Choosing the filter parameters In the filter literature, filter parameters are set as functions of the cutoff frequency; see Pollock (2000, 324), for instance. This method finds the filter parameter that sets the gain of the filter equal to 1/2 at the cutoff frequency. In a technical note in [TS] tsfilter, we showed that applying this method to selecting λ at the cutoff frequency of 32 periods suggests setting λ ≈ 677.13. In the output below, we tsfilter hp — Hodrick–Prescott time-series filter 615 estimate the business-cycle component using this value for the smoothing parameter, and we compute and plot the periodogram of the estimated business-cycle component. 0.00 2.00 4.00 6.00 Sample spectral density function 0.00 0.10 0.20 0.30 Frequency 0.40 0.50 −6.00 −4.00 −2.00 gdp_ln cyclical component from hp filter Log Periodogram −6.00 −4.00 −2.00 0.00 2.00 4.00 6.00 . tsfilter hp gdp_hp677 = gdp_ln, smooth(677.13) . pergram gdp_hp677, xline(.03125 .16667) Evaluated at the natural frequencies A comparison of the two periodograms reveals that setting the smoothing parameter to 677.13 removes more of the high-periodicity stochastic cycles than does the default 1,600. In [TS] tsfilter, we found that the HP filter was not as good at removing the high-periodicity stochastic cycles as was the Christiano–Fitzgerald filter implemented in tsfilter cf or as was the Butterworth filter implemented in tsfilter bw. −.04 gdp_ln cyclical component from hp filter −.02 0 .02 .04 Below we plot the estimated business-cycle component with recessions identified by the shaded areas. 1950q1 1960q1 1970q1 1980q1 1990q1 2000q1 2010q1 quarterly time variable gdp_ln cyclical component from hp filter tsfilter hp automatically detects panel data from the information provided when the dataset was tsset or xtset. All calculations are done separately on each panel. Missing values at the beginning and end of the sample are excluded from the sample. The sample may not contain gaps. 616 tsfilter hp — Hodrick–Prescott time-series filter Stored results tsfilter hp stores the following in r(): Scalars r(smooth) Macros r(varlist) r(filterlist) r(trendlist) r(method) r(unit) smoothing parameter λ original time-series variables variables containing estimates of the cyclical components variables containing estimates of the trend components, if trend() was specified Hodrick-Prescott units of time variable set using tsset or xtset Methods and formulas Formally, the filter is defined as the solution to the following optimization problem for τt minτt X T 2 (yt − τt ) + λ T −1 X {(τt+1 − τt ) − (τt − τt−1 )} 2  t=2 t=1 where the smoothing parameter λ is set fixed to a value. If λ = 0, the solution degenerates to τt = yt , in which case the filter excludes all frequencies, that is, ct = 0. On the other extreme, as λ → ∞, the solution approaches the least-squares fit to the line τt = β0 + β1 t; see Hodrick and Prescott (1997) for a discussion. For a fixed λ, it can be shown that the cyclical component c0 = (c1 , c2 , . . . , cT ) is calculated by c = (IT − M−1 )y where y is the column vector y0 = (y1 , y2 , . . . , yT ), IT is the T × T identity matrix, and M is the T × T matrix:  (1 + λ) −2λ λ 0 0 0 ... 0  −2λ (1 + 5λ) −4λ λ 0 0 ... 0     λ −4λ (1 + 6λ) −4λ λ 0 ... 0     0 λ −4λ (1 + 6λ) −4λ λ ... 0    .. .. .. .. .. .. ..   . . . . .   . . . . . M=  .. .. .. .. .. ..  . . . . . . 0 0     0 ... λ −4λ (1 + 6λ) −4λ λ 0     0 ... 0 λ −4λ (1 + 6λ) −4λ λ     0 ... 0 0 λ −4λ (1 + 5λ) −2λ  0 ... 0 0 0 λ −2λ (1 + λ)  The gain of the HP filter is given by (see King and Rebelo [1993], Maravall and del Rio [2007], or Harvey and Trimbur [2008]) ψ(ω) = 4λ{1 − cos(ω)}2 1 + 4λ{1 − cos(ω)}2 tsfilter hp — Hodrick–Prescott time-series filter 617 As discussed in [TS] tsfilter, there are two approaches to selecting λ. One method, based on the heuristic argument of Hodrick and Prescott (1997), is used to compute the default values for λ. The method sets λ to 1,600 for quarterly data and to the rescaled values worked out by Ravn and Uhlig (2002). The rescaled default values for λ are 6.25 for yearly data, 100 for half-yearly data, 129,600 for monthly data, 1600 × 124 for weekly data, and 1600 × (365/4)4 for daily data. The second method for selecting λ uses the recommendations of Pollock (2000, 324), who uses the gain function of the filter to identify a value for λ. Additional literature critiques the HP filter by pointing out that the HP filter corresponds to a specific model. Harvey and Trimbur (2008) show that the cyclical component estimated by the HP filter is equivalent to one estimated by a particular unobserved-components model. Harvey and Jaeger (1993), Gómez (1999), Pollock (2000), and Gómez (2001) also show this result and provide interesting comparisons of estimating ct by filtering and model-based methods. References Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic Research. Gómez, V. 1999. Three equivalent methods for filtering finite nonstationary time series. Journal of Business and Economic Statistics 17: 109–116. . 2001. The use of Butterworth filters for trend and cycle estimation in economic time series. Journal of Business and Economic Statistics 19: 365–373. Harvey, A. C., and A. Jaeger. 1993. Detrending, stylized facts and the business cycle. Journal of Applied Econometrics 8: 231–247. Harvey, A. C., and T. M. Trimbur. 2008. Trend estimation and the Hodrick–Prescott filter. Journal of the Japanese Statistical Society 38: 41–49. Hodrick, R. J., and E. C. Prescott. 1997. Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit, and Banking 29: 1–16. King, R. G., and S. T. Rebelo. 1993. Low frequency filtering and real business cycles. Journal of Economic Dynamics and Control 17: 207–231. Leser, C. E. V. 1961. A simple method of trend construction. Journal of the Royal Statistical Society, Series B 23: 91–107. Maravall, A., and A. del Rio. 2007. Temporal aggregation, systematic sampling, and the Hodrick–Prescott filter. Working Paper No. 0728, Banco de España. http://www.bde.es/webbde/Secciones/Publicaciones/PublicacionesSeriadas/DocumentosTrabajo/07/Fic/dt0728e.pdf. Pollock, D. S. G. 1999. A Handbook of Time-Series Analysis, Signal Processing and Dynamics. London: Academic Press. . 2000. Trend estimation and de-trending via rational square-wave filters. Journal of Econometrics 99: 317–334. . 2006. Econometric methods of signal extraction. Computational Statistics & Data Analysis 50: 2268–2292. Ravn, M. O., and H. Uhlig. 2002. On adjusting the Hodrick–Prescott filter for the frequency of observations. Review of Economics and Statistics 84: 371–376. Also see [TS] tsset — Declare data to be time-series data [XT] xtset — Declare data to be panel data [TS] tsfilter — Filter a time-series, keeping only selected periodicities [D] format — Set variables’ output format [TS] tssmooth — Smooth and forecast univariate time-series data Title tsline — Plot time-series data Description Options Quick start Remarks and examples Menu References Syntax Also see Description tsline draws line plots for time-series data. tsrline draws a range plot with lines for time-series data. Quick start Line plot for the time series y1 using tsset data tsline y1 Add plots of time series y2 and y3 tsline y1 y2 y3 Range plot with lines for the lower and upper values of time series y1 stored in y1 lower and y1 upper, respectively tsrline y1_lower y1_upper Overlay a range plot of the lower and upper values of time series y1 stored in y1 lower and y1 upper, respectively, on a plot of y1 tsline y1 || tsrline y1_lower y1_upper Menu Statistics > Time series > Graphs > Line plots 618 tsline — Plot time-series data 619 Syntax Time-series line plot         in , tsline options twoway tsline varlist if Time-series range plot with lines         twoway tsrline y1 y2 if in , tsrline options where the time variable is assumed set by tsset (see [TS] tsset), varlist has the interpretation y1 y2 . . . yk . tsline options Description Plots scatter options any options documented in [G-2] graph twoway scatter with the exception of marker options, marker placement options, and marker label options, which will be ignored if specified Y axis, Time axis, Titles, Legend, Overall, By twoway options any options documented in [G-3] twoway options tsrline options Description Plots rline options any options documented in [G-2] graph twoway rline Y axis, Time axis, Titles, Legend, Overall, By twoway options any options documented in [G-3] twoway options Options   Plots scatter options are any of the options allowed by the graph twoway scatter command except that marker options, marker placement option, and marker label options will be ignored if specified; see [G-2] graph twoway scatter. rline options are any of the options allowed by the graph twoway rline command; see [G-2] graph twoway rline.   Y axis, Time axis, Titles, Legend, Overall, By twoway options are any of the options documented in [G-3] twoway options. These include options for titling the graph (see [G-3] title options), options for saving the graph to disk (see [G-3] saving option), and the by() option, which will allow you to simultaneously plot different subsets of the data (see [G-3] by option). Also see the recast() option discussed in [G-3] advanced options for information on how to plot spikes, bars, etc., instead of lines. 620 tsline — Plot time-series data Remarks and examples Remarks are presented under the following headings: Basic examples Advanced example Video example Basic examples Example 1: A time-series line plot We simulated two separate time series (each of 200 observations) and placed them in a Stata dataset, tsline1.dta. The first series simulates an AR(2) process with φ1 = 0.8 and φ2 = 0.2; the second series simulates an MA(2) process with θ1 = 0.8 and θ2 = 0.2. We use tsline to graph these two series. . use http://www.stata-press.com/data/r14/tsline1 . tsset lags time variable: lags, 0 to 199 delta: 1 unit −1 0 1 2 3 4 . tsline ar ma 0 50 100 lags Simulated AR(.8,.2) 150 200 Simulated MA(.8,.2) Example 2: Using options to highlight information Suppose that we kept a calorie log for an entire calendar year. At the end of the year, we would have a dataset (for example, tsline2.dta) that contains the number of calories consumed for 365 days. We could then use tsset to identify the date variable and tsline to plot calories versus time. Knowing that we tend to eat a little more food on Thanksgiving and Christmas day, we use the ttick() and ttext() options to point out these days on the time axis. tsline — Plot time-series data 621 . use http://www.stata-press.com/data/r14/tsline2 . tsset day time variable: delta: day, 01jan2002 to 31dec2002 1 day 01jan2002 01apr2002 01jul2002 Date 01oct2002 x−mas 3400 thanks 3600 Calories consumed 3800 4000 4200 4400 . tsline calories, ttick(28nov2002 25dec2002, tpos(in)) > ttext(3470 28nov2002 "thanks" 3470 25dec2002 "x-mas", orient(vert)) 01jan2003 Options associated with the time axis allow dates (and times) to be specified in place of numeric date (and time) values. For instance, we used ttick(28nov2002 25dec2002, tpos(in)) to place tick marks at the specified dates. This works similarly for tlabel(), tmlabel(), and tmtick(). Suppose that we wanted to place vertical lines for the previously mentioned holidays. We could specify the dates in the tline() option as follows: 3400 3600 Calories consumed 3800 4000 4200 4400 . tsline calories, tline(28nov2002 25dec2002) 01jan2002 01apr2002 01jul2002 Date 01oct2002 01jan2003 622 tsline — Plot time-series data Example 3: Formatting the time axis We could also modify the format of the time axis so that the labeled ticks display only the day in the year: 3400 3600 Calories consumed 3800 4000 4200 4400 . tsline calories, tlabel(, format(%tdmd)) ttitle("Date (2002)") Jan1 Apr1 Jul1 Date (2002) Oct1 Jan1 Advanced example tsline and tsrline are both commands and plottypes as defined in [G-2] graph twoway. Thus the syntax for tsline is . graph twoway tsline ... . twoway tsline ... . tsline ... and similarly for tsrline. Being plot types, these commands may be combined with other plot types in the twoway family, as in, . twoway (tsrline . . . ) (tsline . . . ) (lfit . . . ) . . . which can equivalently be written as . tsrline . . . || tsline . . . || lfit . . . || . . . tsline — Plot time-series data 623 Example 4: Combining line and range plots In the first plot of example 2, we were uncertain of the exact values we logged, so we also gave a range for each day. Here is a plot of the summer months. 3300 3400 Calories 3500 3600 3700 3800 . tsrline lcalories ucalories if tin(1may2002,31aug2002) || tsline calories || > if tin(1may2002,31aug2002), ytitle(Calories) 01may2002 01jun2002 Calorie range 01jul2002 Date 01aug2002 01sep2002 Calories consumed Video example Time series, part 2: Line graphs and tin() References Cox, N. J. 2009a. Speaking Stata: Graphs for all seasons. Stata Journal 6: 397–419. . 2009b. Stata tip 76: Separating seasonal time series. Stata Journal 9: 321–326. . 2012. Speaking Stata: Transforming the time axis. Stata Journal 12: 332–341. Also see [TS] tsset — Declare data to be time-series data [G-2] graph twoway — Twoway graphs [XT] xtline — Panel-data line plots Title tsreport — Report time-series aspects of a dataset or estimation sample Description Options Quick start Remarks and examples Menu Stored results Syntax Also see Description tsreport reports time gaps in a dataset or in a subset of variables. By default, tsreport reports periods in which no information is recorded in the dataset; the time variable does not include these periods. When you specify varlist, tsreport reports periods in which either no information is recorded in the dataset or the time variable is present, but one or more variables in varlist contain a missing value. Quick start Report time gaps in a tsset time-series dataset tsreport Report time gaps for the variable y tsreport y As above, and report the beginning and ending times of each gap tsreport y, detail Report time gaps, ignoring panel changes, using tsset or xtset data tsreport, panel Menu Statistics > Time series > Setup and utilities > Report time-series aspects of dataset 624 tsreport — Report time-series aspects of a dataset or estimation sample 625 Syntax tsreport  varlist   if   in   , options  Description options Main list periods for each gap treat a period as a gap if any of the specified variables are missing do not count panel changes as gaps detail casewise panel varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main detail reports the beginning and ending times of each gap. casewise specifies that a period for which any of the specified variables are missing be counted as a gap. By default, gaps are reported for each variable individually. panel specifies that panel changes not be counted as gaps. Whether panel changes are counted as gaps usually depends on how the calling command handles panels. Remarks and examples Remarks are presented under the following headings: Basic examples Video example Basic examples Time-series commands sometimes require that observations be on a fixed time interval with no gaps, or the command may not function properly. tsreport provides a tool for reporting the gaps in a sample. Example 1: A simple panel-data example The following monthly panel data have two panels and a missing month (March) in the second panel: . use http://www.stata-press.com/data/r14/tsrptxmpl . list edlevel month income in 1/6, sep(0) 1. 2. 3. 4. 5. 6. edlevel month income 1 1 1 2 2 2 1998m1 1998m2 1998m3 1998m1 1998m2 1998m4 687 783 790 1435 1522 1532 626 tsreport — Report time-series aspects of a dataset or estimation sample Invoking tsreport gives us the following report: . tsreport Panel variable: Time variable: edlevel month Starting period = 1998m1 Ending period = 1998m4 Observations = 6 Number of gaps = 2 (Gap count includes panel changes) Two gaps are reported in the sample. We know the second panel is missing the month of March, but where is the second gap? The note at the bottom of the output is telling us something about panel changes. Let’s use the detail option to get more information: . tsreport, detail Panel variable: edlevel Time variable: month Starting period = 1998m1 Ending period = 1998m4 Observations = 6 Number of gaps = 2 (Gap count includes panel changes) Gap report Obs. 5 3 6 edlevel Start End N. Obs. 1 2 1998m4 1998m3 . 1998m3 . 1 We now see what is happening. tsreport is counting the change from the first panel to the second panel as a gap. Look at the output from the list command above. The value of month in observation 4 is not one month later than the value of month in observation 3, so tsreport reports a gap. (If we are programmers writing a procedure that does not account for panels, a change from one panel to the next represents a break in the time series just as a gap in the data does.) For the second gap, tsreport indicates that just one observation is missing because we are only missing the month of March. This gap is between observations 5 and 6 of the data. In other cases, we may not care about changes in panels and not want them counted as gaps. We can use the panel option to specify that tsreport should ignore panel changes: . tsreport, detail panel Panel variable: edlevel Time variable: month Starting period Ending period Observations Number of gaps Gap report Obs. 5 6 = = = = 1998m1 1998m4 6 1 edlevel Start End N. Obs. 2 1998m3 1998m3 1 tsreport now indicates there is just one gap, corresponding to March for the second panel. tsreport — Report time-series aspects of a dataset or estimation sample 627 Example 2: Variables with missing data We asked two large hotels in Las Vegas to record the prices they were quoting people who called to make reservations. Because these prices change frequently in response to promotions and market conditions, we asked the hotels to record their prices hourly. Unfortunately, the managers did not consider us a top priority, so we are missing some data. Our dataset looks like this: . use http://www.stata-press.com/data/r14/hotelprice . list, sep(0) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 hour price1 price2 08:00:00 09:00:00 10:00:00 11:00:00 12:00:00 13:00:00 14:00:00 15:00:00 16:00:00 17:00:00 20:00:00 140 155 . 155 160 . 165 170 175 180 190 245 250 250 250 255 . 255 260 265 . 270 First, let’s invoke tsreport without specifying price1 or price2. We will specify the detail option so that we can see the periods corresponding to the gap or gaps reported: . tsreport, detail Time variable: hour Starting period Ending period Observations Number of gaps Gap report = 13feb2007 08:00:00 = 13feb2007 20:00:00 = 11 = 1 Obs. 10 11 Start End N. Obs. 13feb2007 18:00:00 13feb2007 19:00:00 2 One gap is reported, lasting two periods. We have no data corresponding to 6:00 p.m. and 7:00 p.m. on February 13, 2007. What about observations 3, 6, and 10? We are missing data on one or both of the price variables for those observations, but the time variable itself is present for those observations. By default, tsreport defines gaps as periods in which no information, not even the time variable itself, is recorded. If we instead want to obtain information about when one or more variables are missing information, then we specify those variables in our call to tsreport. Here we specify price1, first without the detail option: . tsreport price1 Gap summary report Variable price1 Start End 13feb2007 08:00:00 13feb2007 20:00:00 Number of Obs. Gaps 9 3 628 tsreport — Report time-series aspects of a dataset or estimation sample The output indicates that we have data on price1 from 8:00 a.m. to 8:00 p.m. However, we only have 9 observations on price1 during that span because we have 3 gaps in the data. Let’s specify the detail option to find out where: . tsreport price1, detail Variable: Time variable: price1 hour Starting period Ending period Observations Number of gaps Gap report = 13feb2007 08:00:00 = 13feb2007 20:00:00 = 9 = 3 Obs. 2 4 5 7 10 11 Start End N. Obs. 13feb2007 10:00:00 13feb2007 13:00:00 13feb2007 18:00:00 13feb2007 10:00:00 13feb2007 13:00:00 13feb2007 19:00:00 1 1 2 The three gaps correspond to observations 3 and 6, for which price1 is missing, as well as the two-period gap in the evening when not even the time variable is recorded in the dataset. When you specify multiple variables with tsreport, by default, it summarizes gaps in each variable separately. Apart from combining the information into one table, typing . tsreport price1 price2 is almost the same as typing . tsreport price1 . tsreport price2 The only difference between the two methods is that the former stores results for both variables in r-class macros for later use, whereas if you were to type the latter two commands in succession, r-class macros would only contain results for price2. In many types of analyses, including linear regression, you can only use observations for which all the variables contain nonmissing data. Similarly, you can have tsreport report as gaps periods in which any of the specified variables contain missing values. To do that, you use the casewise option. Example 3: Casewise analyses Continuing with our hotel data, we specify both price1 and price2 in the variable list of tsreport. We request casewise analysis, and we specify the detail option to get information on each gap tsreport finds. tsreport — Report time-series aspects of a dataset or estimation sample 629 . tsreport price1 price2, casewise detail Variables: price1 and price2 Time variable: hour Starting period Ending period Observations Number of gaps Gap report Obs. 2 5 9 4 7 11 = 13feb2007 08:00:00 = 13feb2007 20:00:00 = 8 = 3 Start End N. Obs. 13feb2007 10:00:00 13feb2007 13:00:00 13feb2007 17:00:00 13feb2007 10:00:00 13feb2007 13:00:00 13feb2007 19:00:00 1 1 3 The first gap reported by tsreport corresponds to observation 3, when price1 is missing, and the second gap corresponds to observation 6, when both price1 and price2 are missing. The third gap spans 3 observations: the 5:00 p.m. observation is missing for price2, and as we discovered earlier, not even the time variable is present at 6:00 p.m. and 7:00 p.m. Video example Time series, part 1: Formatting dates, tsset, tsreport, and tsfill Stored results tsreport, when no varlist is specified or when casewise is specified, stores the following in r(): Scalars r(N gaps) r(N obs) r(start) r(end) Macros r(tsfmt) Matrices r(table) number of gaps number of observations first time in series last time in series %fmt of time variable matrix containing start and end times of each gap, if detail is specified tsreport, when a varlist is specified and casewise is not specified, stores the following in r(): Scalars r(N gaps#) r(N obs#) r(start#) r(end#) Macros r(tsfmt) r(var#) Matrices r(table#) number of gaps for variable # number of observations for variable # first time in series for variable # last time in series for variable # %fmt of time variable name of variable # matrix containing start and end times of each gap for variable #, if detail is specified When k variables are specified in varlist, # ranges from 1 to k . 630 tsreport — Report time-series aspects of a dataset or estimation sample Also see [TS] tsset — Declare data to be time-series data Title tsrevar — Time-series operator programming command Description Remarks and examples Quick start Stored results Syntax Also see Options Description tsrevar, substitute takes a varlist that might contain op.varname combinations and substitutes equivalent temporary variables for the combinations. tsrevar, list creates no new variables. It returns in r(varlist) the list of base variables corresponding to varlist. Quick start Create temporary variables containing the first lag and difference of y using tsset data and store the temporary variable names in r(varlist) tsrevar l.y d.y Store the name of the base variable, y, in r(varlist) and do not create any temporary variables tsrevar l.y d.y, list Syntax      tsrevar varlist if in , substitute list You must tsset your data before using tsrevar; see [TS] tsset. Options substitute specifies that tsrevar resolve op.varname combinations by creating temporary variables as described above. substitute is the default action taken by tsrevar; you do not need to specify the option. list specifies that tsrevar return a list of base variable names. Remarks and examples tsrevar substitutes temporary variables for any op.varname combinations in a variable list. For instance, the original varlist might be “gnp L.gnp r”, and tsrevar, substitute would create newvar = L.gnp and create the equivalent varlist “gnp newvar r”. This new varlist could then be used with commands that do not otherwise support time-series operators, or it could be used in a program to make execution faster at the expense of using more memory. 631 632 tsrevar — Time-series operator programming command tsrevar, substitute might create no new variables, one new variable, or many new variables, depending on the number of op.varname combinations appearing in varlist. Any new variables created are temporary. The new, equivalent varlist is returned in r(varlist). The new varlist corresponds one to one with the original varlist. tsrevar, list returns in r(varlist) the list of base variable names of varlist with the timeseries operators removed. tsrevar, list creates no new variables. For instance, if the original varlist were “gnp l.gnp l2.gnp r l.cd”, then r(varlist) would contain “gnp r cd”. This is useful for programmers who might want to create programs to keep only the variables corresponding to varlist. Example 1 . use http://www.stata-press.com/data/r14/tsrevarex . tsrevar l.gnp d.gnp r creates two temporary variables containing the values for l.gnp and d.gnp. The variable r appears in the new variable list but does not require a temporary variable. The resulting variable list is . display "‘r(varlist)’" 00014P 00014Q r (Your temporary variable names may be different, but that is of no consequence.) We can see the results by listing the new variables alongside the original value of gnp. . list gnp ‘r(varlist)’ 1. 2. 3. 4. 5. in 1/5 gnp __00014P __00014Q r 128 135 132 138 145 . 128 135 132 138 . 7 -3 6 7 3.2 3.8 2.6 3.9 4.2 Temporary variables automatically vanish when the program concludes. If we had needed only the base variable names, we could have specified . tsrevar l.gnp d.gnp r, list . display "‘r(varlist)’" gnp r The order of the list will probably differ from that of the original list; base variables are listed only once and are listed in the order that they appear in the dataset. tsrevar — Time-series operator programming command 633 Technical note tsrevar, substitute avoids creating duplicate variables. Consider . tsrevar gnp l.gnp r cd l.cd l.gnp l.gnp appears twice in the varlist. tsrevar will create only one new variable for l.gnp and use that new variable twice in the resulting r(varlist). Moreover, tsrevar will even do this across multiple calls: . tsrevar gnp l.gnp cd l.cd . tsrevar cpi l.gnp l.gnp appears in two separate calls. At the first call, tsrevar creates a temporary variable corresponding to l.gnp. At the second call, tsrevar remembers what it has done and uses that same temporary variable for l.gnp again. Stored results tsrevar stores the following in r(): Macros r(varlist) the modified variable list or list of base variable names Also see [P] syntax — Parse Stata syntax [P] unab — Unabbreviate variable list [U] 11 Language syntax [U] 11.4.4 Time-series varlists [U] 18 Programming Stata Title tsset — Declare data to be time-series data Description Remarks and examples Quick start Stored results Menu References Syntax Also see Options Description tsset manages the time-series settings of a dataset. tsset timevar declares the data in memory to be a time series. This allows you to use Stata’s time-series operators and to analyze your data with the ts commands. tsset panelvar timevar declares the data to be panel data, also known as cross-sectional time-series data, which contain one time series for each value of panelvar. This allows you to also analyze your data with the xt commands without having to xtset your data. tsset without arguments displays how the data are currently set and sorts the data on timevar or panelvar timevar. tsset, clear is a rarely used programmer’s command to declare that the data are no longer a time series. Quick start Declare data to be a time series with time variable tvar tsset tvar As above, but specify that tvar records time for a weekly time series tsset tvar, weekly As above, but specify that observations occur every two weeks tsset tvar, weekly delta(2) Declare a panel dataset with panel identifier pvar and time variable tvar tsset pvar tvar As above, but specify that observations on each panel are made daily tsset pvar tvar, daily As above, but specify that observations on each panel are made every three days tsset pvar tvar, daily delta(3 days) Display current time-series settings and sort data by pvar and tvar if they are sorted differently tsset Menu Statistics > Time series > Setup and utilities > Declare dataset to be time-series data 634 tsset — Declare data to be time-series data 635 Syntax Declare data to be time series   tsset timevar , options   tsset panelvar timevar , options Display how data are currently tsset tsset Clear time-series settings tsset, clear In the declare syntax, panelvar identifies the panels and timevar identifies the times. options Description Main unitoptions specify units of timevar Delta deltaoption specify period of timevar noquery suppress summary calculations and output noquery is not shown in the dialog box. unitoptions Description (default) clocktime daily weekly monthly quarterly halfyearly yearly generic timevar’s units to be obtained from timevar’s display format timevar is %tc: 0 = 1jan1960 00:00:00.000, 1 = 1jan1960 00:00:00.001, . . . timevar is %td: 0 = 1jan1960, 1 = 2jan1960, . . . timevar is %tw: 0 = 1960w1, 1 = 1960w2, . . . timevar is %tm: 0 = 1960m1, 1 = 1960m2, . . . timevar is %tq: 0 = 1960q1, 1 = 1960q2,. . . timevar is %th: 0 = 1960h1, 1 = 1960h2,. . . timevar is %ty: 1960 = 1960, 1961 = 1961, . . . timevar is %tg: 0 = ?, 1 = ?, . . . format(% fmt) specify timevar’s format and then apply default rule In all cases, negative timevar values are allowed. 636 tsset — Declare data to be time-series data deltaoption specifies the period between observations in timevar units and may be specified as deltaoption Example delta(#) delta((exp)) delta(# units) delta((exp) units) delta(1) or delta(2) delta((7*24)) delta(7 days) or delta(15 minutes) or delta(7 days 15 minutes) delta((2+3) weeks) Allowed units for %tc and %tC timevars are seconds minutes hours days weeks second minute hour day week secs mins sec min and for all other %t timevars, units specified must match the frequency of the data; for example, for %ty, units must be year or years. Options   Main unitoptions clocktime, daily, weekly, monthly, quarterly, halfyearly, yearly, generic, and format(% fmt) specify the units in which timevar is recorded. timevar will usually be a %t variable; see [D] datetime. If timevar already has a %t display format assigned to it, you do not need to specify a unitoption; tsset will obtain the units from the format. If you have not yet bothered to assign the appropriate %t format, however, you can use the unitoptions to tell tsset the units. Then tsset will set timevar’s display format for you. Thus, the unitoptions are convenience options; they allow you to skip formatting the time variable. The following all have the same net result: Alternative 1 Alternative 2 Alternative 3 format t %td tsset t (t not formatted) (t not formatted) tsset t, daily tsset t, format(%td) timevar is not required to be a %t variable; it can be any variable of your own concocting so long as it takes on only integer values. In such cases, it is called generic and considered to be %tg. Specifying the unitoption generic or attaching a special format to timevar, however, is not necessary because tsset will assume that the variable is generic if it has any numerical format other than a %t format (or if it has a %tg format). clear—used in tsset, clear—makes Stata forget that the data ever were tsset. This is a rarely used programmer’s option.   Delta delta() specifies the period of timevar and is commonly used when timevar is %tc. delta() is only sometimes used with the other %t formats or with generic time variables. tsset — Declare data to be time-series data 637 If delta() is not specified, delta(1) is assumed. This means that at timevar = 5, the previous time is timevar = 5 − 1 = 4 and the next time would be timevar = 5 + 1 = 6. Lag and lead operators, for instance, would work this way. This would be assumed regardless of the units of timevar. If you specified delta(2), then at timevar = 5, the previous time would be timevar = 5 − 2 = 3 and the next time would be timevar = 5 + 2 = 7. Lag and lead operators would work this way. In the observation with timevar = 5, L.price would be the value of price in the observation for which timevar = 3 and F.price would be the value of price in the observation for which timevar = 7. If you then add an observation with timevar = 4, the operators will still work appropriately; that is, at timevar = 5, L.price will still have the value of price at timevar = 3. There are two aspects of timevar: its units and its periodicity. The unitoptions set the units. delta() sets the periodicity. We mentioned that delta() is commonly used with %tc timevars because Stata’s %tc variables have units of milliseconds. If delta() is not specified and in some model you refer to L.price, you will be referring to the value of price 1 ms ago. Few people have data with periodicity of a millisecond. Perhaps your data are hourly. You could specify delta(3600000). Or you could specify delta((60*60*1000)), because delta() will allow expressions if you include an extra pair of parentheses. Or you could specify delta(1 hour). They all mean the same thing: timevar has periodicity of 3,600,000 ms. In an observation for which timevar = 1,489,572,000,000 (corresponding to 15mar2007 10:00:00), L.price would be the observation for which timevar = 1,489,572,000,000 − 3,600,000 = 1,489,568,400,000 (corresponding to 15mar2007 9:00:00). When you tsset the data and specify delta(), tsset verifies that all the observations follow the specified periodicity. For instance, if you specified delta(2), then timevar could contain any subset of {. . . , −4, −2, 0, 2, 4, . . . } or it could contain any subset of {. . . , −3, −1, 1, 3, . . . }. If timevar contained a mix of values, tsset would issue an error message. If you also specify panelvar—you type tsset panelvar timevar, delta(2)—the check is made on each panel independently. One panel might contain timevar values from one set and the next, another, and that would be fine. The following option is available with tsset but is not shown in the dialog box: noquery prevents tsset from performing most of its summary calculations and suppresses output. With this option, only the following results are posted: r(tdelta) r(panelvar) r(timevar) r(tsfmt) r(unit) r(unit1) Remarks and examples Remarks are presented under the following headings: Overview Panel data Video example 638 tsset — Declare data to be time-series data Overview tsset sets timevar so that Stata’s time-series operators are understood in varlists and expressions. The time-series operators are Operator Meaning L. L2. ... F. F2. ... D. D2. ... S. S2. ... lag xt−1 2-period lag xt−2 lead xt+1 2-period lead xt+2 difference xt − xt−1 difference of difference xt − xt−1 − (xt−1 − xt−2 ) = xt − 2xt−1 + xt−2 “seasonal” difference xt − xt−1 lag-2 (seasonal) difference xt − xt−2 Time-series operators may be repeated and combined. L3.gnp refers to the third lag of variable gnp, as do LLL.gnp, LL2.gnp, and L2L.gnp. LF.gnp is the same as gnp. DS12.gnp refers to the one-period difference of the 12-period difference. LDS12.gnp refers to the same concept, lagged once. D1. = S1., but D2. 6= S2., D3. 6= S3., and so on. D2. refers to the difference of the difference. S2. refers to the two-period difference. If you wanted the difference of the difference of the 12-period difference of gnp, you would write D2S12.gnp. Operators may be typed in uppercase or lowercase. Most users would type d2s12.gnp instead of D2S12.gnp. You may type operators however you wish; Stata internally converts operators to their canonical form. If you typed ld2ls12d.gnp, Stata would present the operated variable as L2D3S12.gnp. Stata also understands operator(numlist). to mean a set of operated variables. For instance, typing L(1/3).gnp in a varlist is the same as typing ‘L.gnp L2.gnp L3.gnp’. The operators can also be applied to a list of variables by enclosing the variables in parentheses; for example, . list year L(1/3).(gnp cpi) year L.gnp 1. 2. 3. 4. 1989 1990 1991 1992 . 5452.8 5764.9 5932.4 8. 1996 7330.1 L2.gnp L3.gnp . . . . 5452.8 . 5764.9 5452.8 (output omitted ) 6892.2 6519.1 L.cpi L2.cpi L3.cpi . 100 105 108 . . 100 105 . . . 100 122 119 112 In operator#., making # zero returns the variable itself. L0.gnp is gnp. Thus, you can type list year l(0/3).gnp to mean list year gnp L.gnp L2.gnp L3.gnp. The parenthetical notation may be used with any operator. Typing D(1/3).gnp would return the first through third differences. tsset — Declare data to be time-series data 639 The parenthetical notation may be used in operator lists with multiple operators, such as L(0/3)D2S12.gnp. Operator lists may include up to one set of parentheses, and the parentheses may enclose a numlist; see [U] 11.1.8 numlist. Before you can use these time-series operators, however, the dataset must satisfy two requirements: 1. the dataset must be tsset and 2. the dataset must be sorted by timevar or, if it is a cross-sectional time-series dataset, by panelvar timevar. tsset handles both requirements. As you use Stata, however, you may later use a command that re-sorts that data, and if you do, the time-series operators will not work: . tsset time (output omitted ) . regress y x l.x (output omitted ) . (you continue to use Stata and, sometime later:) . regress y x l.x not sorted r(5); Then typing tsset without arguments will reestablish the sort order: . tsset (output omitted ) . regress y x l.x (output omitted ) Here typing tsset is the same as typing sort time. Had we previously tsset country time, however, typing tsset would be the same as typing sort country time. You can type the sort command or type tsset without arguments; it makes no difference. There are two syntaxes for setting your data: tsset timevar tsset panelvar timevar In both, timevar must contain integer values. If panelvar is specified, it too must contain integer values, and the dataset is declared to be a cross-section of time series, such as a collection of time series for different countries. Such datasets can be analyzed with xt commands as well as ts commands. If you tsset panelvar timevar, you do not need to xtset the data to use the xt commands. If you save the data after typing tsset, the data will be remembered to be time series, and you will not have to tsset the data again. Example 1: Numeric time variable You have monthly data on personal income. Variable t records the time of an observation, but there is nothing special about the name of the variable. There is nothing special about the values of the variable, either. t is not required to be %tm variable—perhaps you do not even know what that means. t is just a numeric variable containing integer values that represent the month, and we will imagine that t takes on the values 1, 2, . . . , 9, although it could just as well be −3, −2 . . . , 5, or 1,023, 1,024, . . . , 1,031. What is important is that the values are dense: adjacent months have a time value that differs by 1. 640 tsset — Declare data to be time-series data . use http://www.stata-press.com/data/r14/tssetxmpl . list t income t income 1. 2. 1 1153 2 1181 (output omitted ) 9. 9 1282 . tsset t time variable: delta: t, 1 to 9 1 unit . regress income l.income (output omitted ) Example 2: Adjusting the starting date In the example above, that t started at 1 was not important. As we said, the t variable could just as well be recorded −3, −2 . . . , 5, or 1,023, 1,024, . . . , 1,031. What is important is that the difference in t between observations be delta() when there are no gaps. Although how time is measured makes no difference, Stata has formats to display time nicely if it is recorded in certain ways; you can learn about the formats by seeing [D] datetime. Stata likes time variables in which 1jan1960 is recorded as 0. In our previous example, if t = 1 corresponds to July 1995, then we could make a variable that fits Stata’s preference by typing . generate newt = tm(1995m7) + t - 1 tm() is the function that returns a month equivalent; tm(1995m7) evaluates to the constant 426, meaning 426 months after January 1960. We now have variable newt containing . list t newt income t 1. 2. 3. 9. newt income 1 2 3 426 1153 427 1181 428 1208 (output omitted ) 9 434 1282 If we put a %tm format on newt, it will display more cleanly: . format newt %tm . list t newt income t 1. 2. 3. 1 2 3 9. 9 newt income 1995m7 1153 1995m8 1181 1995m9 1208 (output omitted ) 1996m3 1282 tsset — Declare data to be time-series data 641 We could now tsset newt rather than t: . tsset newt time variable: delta: newt, 1995m7 to 1996m3 1 month Technical note In addition to monthly, Stata understands clock times (to the millisecond level) as well as daily, weekly, quarterly, half-yearly, and yearly data. See [D] datetime for a description of these capabilities. Let’s reconsider the previous example, but rather than monthly, let’s assume the data are daily, weekly, etc. The only thing to know is that, corresponding to function tm(), there are functions td(), tw(), tq(), th(), and ty() and that, corresponding to format %tm, there are formats %td, %tw, %tq, %th, and %ty. Here is what we would have typed had our data been on a different time scale: Daily: if your t variable had t=1 corresponding to 15mar1993 . generate newt = td(15mar1993) + t - 1 . tsset newt, daily Weekly: if your t variable had t=1 corresponding to 1994w1: . generate newt = tw(1994w1) + t - 1 . tsset newt, weekly Monthly: if your t variable had t=1 corresponding to 2004m7: . generate newt = tm(2004m7) + t - 1 . tsset newt, monthly Quarterly: if your t variable had t=1 corresponding to 1994q1: . generate newt = tq(1994q1) + t - 1 . tsset newt, quarterly Half-yearly: if your t variable had t=1 corresponding to 1921h2: . generate newt = th(1921h2) + t - 1 . tsset newt, halfyearly Yearly: if your t variable had t=1 corresponding to 1842: . generate newt = 1842 + t - 1 . tsset newt, yearly In each example above, we subtracted one from our time variable in constructing the new time variable newt because we assumed that our starting time value was 1. For the quarterly example, if our starting time value were 5 and that corresponded to 1994q1, we would type . generate newt = tq(1994q1) + t - 5 Had our initial time value been t = 742 and that corresponded to 1994q1, we would have typed . generate newt = tq(1994q1) + t - 742 642 tsset — Declare data to be time-series data Example 3: Time-series data but no time variable Perhaps we have the same time-series data but no time variable: . use http://www.stata-press.com/data/r14/tssetxmpl2, clear . list income income 1. 2. 3. 4. 5. 1153 1181 1208 1272 1236 6. 7. 8. 9. 1297 1265 1230 1282 Say that we know that the first observation corresponds to July 1995 and continues without gaps. We can create a monthly time variable and format it by typing . generate t = tm(1995m7) + _n - 1 . format t %tm We can now tsset our dataset and list it: . tsset t time variable: delta: . list t income t 1. 2. 3. 9. income 1995m7 1153 1995m8 1181 1995m9 1208 (output omitted ) 1996m3 1282 t, 1995m7 to 1996m3 1 month tsset — Declare data to be time-series data 643 Example 4: Time variable as a string Your data might include a time variable that is encoded into a string. In the example below each monthly observation is identified by string variable yrmo containing the month and year of the observation, sometimes with punctuation between: . use http://www.stata-press.com/data/r14/tssetxmpl, clear . list yrmo income yrmo income 1. 2. 3. 4. 5. 7/1995 8/1995 9-1995 10,1995 11 1995 1153 1181 1208 1272 1236 6. 7. 8. 9. 12 1995 1/1996 2.1996 3- 1996 1297 1265 1230 1282 The first step is to convert the string to a numeric representation. Doing so is easy using the monthly() function; see [D] datetime. . generate mdate = monthly(yrmo, "MY") . list yrmo mdate income yrmo 1. 2. 3. 9. mdate income 7/1995 426 8/1995 427 9-1995 428 (output omitted ) 3- 1996 434 1153 1181 1208 1282 Our new variable, mdate, contains the number of months from January 1960. Now that we have numeric variable mdate, we can tsset the data: . format mdate %tm . tsset mdate time variable: delta: mdate, 1995m7 to 1996m3 1 month In fact, we can combine the two and type . tsset mdate, format(%tm) time variable: mdate, 1995m7 to 1996m3 delta: 1 month or type . tsset mdate, monthly time variable: delta: mdate, 1995m7 to 1996m3 1 month 644 tsset — Declare data to be time-series data In all cases, we obtain . list yrmo mdate income yrmo mdate income 1. 2. 3. 4. 5. 7/1995 8/1995 9-1995 10,1995 11 1995 1995m7 1995m8 1995m9 1995m10 1995m11 1153 1181 1208 1272 1236 6. 7. 8. 9. 12 1995 1/1996 2.1996 3- 1996 1995m12 1996m1 1996m2 1996m3 1297 1265 1230 1282 Stata can translate many different date formats, including strings like 12jan2009; January 12, 2009; 12-01-2009; 01/12/2009; 01/12/09; 12jan2009 8:14; 12-01-2009 13:12; 01/12/09 1:12 pm; Wed Jan 31 13:03:25 CST 2009; 1998q1; and more. See [D] datetime. Example 5: Time-series data with gaps Gaps in the time series cause no difficulties: . use http://www.stata-press.com/data/r14/tssetxmpl3, clear . list yrmo income yrmo income 1. 2. 3. 4. 5. 7/1995 8/1995 11 1995 12 1995 1/1996 1153 1181 1236 1297 1265 6. 3- 1996 1282 . generate mdate = monthly(yrmo, "MY") . tsset mdate, monthly time variable: mdate, 1995m7 to 1996m3, but with gaps delta: 1 month Once the dataset has been tsset, we can use the time-series operators. The D operator specifies first differences: . list mdate income d.income mdate income D.income 1. 2. 3. 4. 5. 1995m7 1995m8 1995m11 1995m12 1996m1 1153 1181 1236 1297 1265 . 28 . 61 -32 6. 1996m3 1282 . tsset — Declare data to be time-series data 645 We can use the operators in an expression or varlist context; we do not have to create a new variable to hold D.income. We can use D.income with the list command, with regress or any other Stata command that allows time-series varlists. Example 6: Clock times We have data from a large hotel in Las Vegas that changes the reservation prices for its rooms hourly. A piece of the data looks like . use http://www.stata-press.com/data/r14/tssetxmpl4, clear . list in 1/5 1. 2. 3. 4. 5. 02.13.2007 02.13.2007 02.13.2007 02.13.2007 02.13.2007 time price 08:00 09:00 10:00 11:00 12:00 140 155 160 155 160 Variable time is a string variable. The first step in making this dataset a time-series dataset is to translate the string to a numeric variable: . generate double t = clock(time, "MDY hm") . list in 1/5 1. 2. 3. 4. 5. 02.13.2007 02.13.2007 02.13.2007 02.13.2007 02.13.2007 time price t 08:00 09:00 10:00 11:00 12:00 140 155 160 155 160 1.487e+12 1.487e+12 1.487e+12 1.487e+12 1.487e+12 See [D] datetime for an explanation of what is going on here. clock() is the function that converts strings to datetime (%tc) values. We typed clock(time, "MDY hm") to convert string variable time, and we told clock() that the values in time were in the order month, day, year, hour, and minute. We stored new variable t as a double because time values are large, and doing so is required to prevent rounding. Even so, the resulting values 1.487e+12 look rounded, but that is only because of the default display format for new variables. We can see the values better if we change the format: . format t %20.0gc . list in 1/5 1. 2. 3. 4. 5. 02.13.2007 02.13.2007 02.13.2007 02.13.2007 02.13.2007 time price t 08:00 09:00 10:00 11:00 12:00 140 155 160 155 160 1,486,972,800,000 1,486,976,400,000 1,486,980,000,000 1,486,983,600,000 1,486,987,200,000 646 tsset — Declare data to be time-series data Even better would be to change the format to %tc—Stata’s clock-time format: . format t %tc . list in 1/5 1. 2. 3. 4. 5. 02.13.2007 02.13.2007 02.13.2007 02.13.2007 02.13.2007 time price 08:00 09:00 10:00 11:00 12:00 140 155 160 155 160 t 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 08:00:00 09:00:00 10:00:00 11:00:00 12:00:00 We could drop variable time. New variable t contains the same information as time and t is better because it is a Stata time variable, the most important property of which being that it is numeric rather than string. We can tsset it. Here, however, we also need to specify the period with tsset’s delta() option. Stata’s time variables are numeric, but they record milliseconds since 01jan1960 00:00:00. By default, tsset uses delta(1), and that means the time-series operators would not work as we want them to work. For instance, L.price would look back only 1 ms (and find nothing). We want L.price to look back 1 hour (3,600,000 ms): . tsset t, delta(1 hour) time variable: t, 13feb2007 08:00:00.000 to 13feb2007 14:00:00.000 delta: 1 hour . list t price l.price in 1/5 1. 2. 3. 4. 5. 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 t price L.price 08:00:00 09:00:00 10:00:00 11:00:00 12:00:00 140 155 160 155 160 . 140 155 160 155 Example 7: Clock times must be double In the previous example, it was of vital importance that when we generated the %tc variable t, . generate double t = clock(time, "MDY hm") we generated it as a double. Let’s see what would have happened had we forgotten and just typed generate t = clock(time, "MDY hm"). Let’s go back and start with the same original data: . use http://www.stata-press.com/data/r14/tssetxmpl4, clear . list in 1/5 1. 2. 3. 4. 5. 02.13.2007 02.13.2007 02.13.2007 02.13.2007 02.13.2007 time price 08:00 09:00 10:00 11:00 12:00 140 155 160 155 160 tsset — Declare data to be time-series data 647 Remember, variable time is a string variable, and we need to translate it to numeric. So we translate, but this time we forget to make the new variable a double: . generate t = clock(time, "MDY hm") . list in 1/5 1. 2. 3. 4. 5. 02.13.2007 02.13.2007 02.13.2007 02.13.2007 02.13.2007 time price t 08:00 09:00 10:00 11:00 12:00 140 155 160 155 160 1.49e+12 1.49e+12 1.49e+12 1.49e+12 1.49e+12 We see the first difference—t now lists as 1.49e+12 rather than 1.487e+12 as it did previously—but this is nothing that would catch our attention. We would not even know that the value is different. Let’s continue. We next put a %20.0gc format on t to better see the numerical values. In fact, that is not something we would usually do in an analysis. We did that in the example to emphasize to you that the t values were really big numbers. We will repeat the exercise just to be complete, but in real analysis, we would not bother. . format t %20.0gc . list in 1/5 1. 2. 3. 4. 5. 02.13.2007 02.13.2007 02.13.2007 02.13.2007 02.13.2007 time price t 08:00 09:00 10:00 11:00 12:00 140 155 160 155 160 1,486,972,780,544 1,486,976,450,560 1,486,979,989,504 1,486,983,659,520 1,486,987,198,464 Okay, we see big numbers in t. Let’s continue. Next we put a %tc format on t, and that is something we would usually do, and you should always do. You should also list a bit of the data, as we did: . format t %tc . list in 1/5 1. 2. 3. 4. 5. 02.13.2007 02.13.2007 02.13.2007 02.13.2007 02.13.2007 time price 08:00 09:00 10:00 11:00 12:00 140 155 160 155 160 t 13feb2007 13feb2007 13feb2007 13feb2007 13feb2007 07:59:40 09:00:50 09:59:49 11:00:59 11:59:58 By now, you should see a problem: the translated datetime values are off by a second or two. That was caused by rounding. Dates and times should be the same, not approximately the same, and when you see a difference like this, you should say to yourself, “The translation is off a little. Why is that?” and then you should think, “Of course, rounding. I bet that I did not create t as a double.” 648 tsset — Declare data to be time-series data Let us assume, however, that you do not do this. You instead plow ahead: . tsset t, delta(1 hour) time values with period less than delta() found r(451); And that is what will happen when you forget to create t as a double. The rounding will cause uneven period, and tsset will complain. By the way, it is only important that clock times (%tc and %tC variables) be stored as doubles. The other date values %td, %tw, %tm, %tq, %th, and %ty are small enough that they can safely be stored as floats, although forgetting and storing them as doubles does no harm. Technical note Stata provides two clock-time formats, %tc and %tC. %tC provides a clock with leap seconds. Leap seconds are occasionally inserted to account for randomness of the earth’s rotation, which gradually slows. Unlike the extra day inserted in leap years, the timing of when leap seconds will be inserted cannot be foretold. The authorities in charge of such matters announce a leap second approximately 6 months before insertion. Leap seconds are inserted at the end of the day, and the leap second is called 23:59:60 (that is, 11:59:60 p.m.), which is then followed by the usual 00:00:00 (12:00:00 a.m.). Most nonastronomers find these leap seconds vexing. The added seconds cause problems because of their lack of predictability—knowing how many seconds there will be between 01jan2012 and 01jan2013 is not possible—and because there are not necessarily 24 hours in a day. If you use a leap second adjusted–clock, most days have 24 hours, but a few have 24 hours and 1 second. You must look at a table to find out. From a time-series analysis point of view, the nonconstant day causes the most problems. Let’s say that you have data on blood pressure, taken hourly at 1:00, 2:00, . . . , and that you have tsset your data with delta(1 hour). On most days, L24.bp would be blood pressure at the same time yesterday. If the previous day had a leap second, however, and your data were recorded using a leap second adjusted–clock, there would be no observation L24.bp because 86,400 seconds before the current reading does not correspond to an on-the-hour time; 86,401 seconds before the current reading corresponds to yesterday’s time. Thus, whenever possible, using Stata’s %tc encoding rather than %tC is better. When times are recorded by computers using leap second–adjusted clocks, however, avoiding %tC is not possible. For performing most time-series analysis, the recommended procedure is to map the %tC values to %tc and then tsset those. You must ask yourself whether the process you are studying is based on the clock—the nurse does something at 2 o’clock every day—or the true passage of time—the emitter spits out an electron every 86,400,000 ms. When dealing with computer-recorded times, first find out whether the computer (and its timerecording software) use a leap second–adjusted clock. If it does, translate that to a %tC value. Then use function cofC() to convert to a %tc value and tsset that. If variable T contains the %tC value, . generate double t = cofC(T) . format t %tc . tsset t, delta(. . . ) Function cofC() moves leap seconds forward: 23:59:60 becomes 00:00:00 of the next day. tsset — Declare data to be time-series data 649 Panel data Example 8: Time-series data for multiple groups Assume that we have a time series on average annual income and that we have the series for two groups: individuals who have not completed high school (edlevel = 1) and individuals who have (edlevel = 2). . use http://www.stata-press.com/data/r14/tssetxmpl5, clear . list edlevel year income, sep(0) 1. 2. 3. 4. 5. 6. 7. edlevel year income 1 1 1 1 2 2 2 1988 1989 1990 1991 1989 1990 1992 14500 14750 14950 15100 22100 22200 22800 We declare the data to be a panel by typing . tsset edlevel year, yearly panel variable: edlevel, (unbalanced) time variable: year, 1988 to 1992, but with a gap delta: 1 year Having tsset the data, we can now use time-series operators. The difference operator, for example, can be used to list annual changes in income: . list edlevel year income d.income, sep(0) 1. 2. 3. 4. 5. 6. 7. edlevel year income D.income 1 1 1 1 2 2 2 1988 1989 1990 1991 1989 1990 1992 14500 14750 14950 15100 22100 22200 22800 . 250 200 150 . 100 . We see that in addition to producing missing values due to missing times, the difference operator correctly produced a missing value at the start of each panel. Once we have tsset our panel data, we can use time-series operators and be assured that they will handle missing time periods and panel changes correctly. 650 tsset — Declare data to be time-series data Video example Time series, part 1: Formatting dates, tsset, tsreport, and tsfill Stored results tsset stores the following in r(): Scalars r(imin) r(imax) r(tmin) r(tmax) r(tdelta) Macros r(panelvar) r(timevar) r(tdeltas) r(tmins) r(tmaxs) r(tsfmt) r(unit) r(unit1) r(balanced) minimum panel ID maximum panel ID minimum time maximum time delta name of panel variable name of time variable formatted delta formatted minimum time formatted maximum time %fmt of time variable units of time variable: Clock, clock, daily, weekly, monthly, quarterly, halfyearly, yearly, or generic units of time variable: C, c, d, w, m, q, h, y, or "" unbalanced, weakly balanced, or strongly balanced; a set of panels are strongly balanced if they all have the same time values, otherwise balanced if same number of time values, otherwise unbalanced References Baum, C. F. 2000. sts17: Compacting time series data. Stata Technical Bulletin 57: 44–45. Reprinted in Stata Technical Bulletin Reprints, vol. 10, pp. 369–370. College Station, TX: Stata Press. Cox, N. J. 2010. Stata tip 68: Week assumptions. Stata Journal 10: 682–685. . 2012. Stata tip 111: More on working with weeks. Stata Journal 12: 565–569. Also see [TS] tsfill — Fill in gaps in time variable Title tssmooth — Smooth and forecast univariate time-series data Description Syntax Remarks and examples References Also see Description tssmooth creates new variable newvar and fills it in by passing the specified expression (usually a variable name) through the requested smoother. Syntax tssmooth smoother  type  newvar = exp  if   in    , ... Smoother category smoother Moving average with uniform weights with specified weights ma ma Recursive exponential double exponential nonseasonal Holt–Winters seasonal Holt–Winters exponential dexponential hwinters shwinters Nonlinear filter nl See [TS] tssmooth ma, [TS] tssmooth exponential, [TS] tssmooth dexponential, [TS] tssmooth hwinters, [TS] tssmooth shwinters, and [TS] tssmooth nl. Remarks and examples The recursive smoothers may also be used for forecasting univariate time series; indeed, the Holt–Winters methods are used almost exclusively for this. All can perform dynamic out-of-sample forecasts, and the smoothing parameters may be chosen to minimize the in-sample sum-of-squared prediction errors. The moving-average and nonlinear smoothers are generally used to extract the trend—or signal— from a time series while omitting the high-frequency or noise components. All smoothers work both with time-series data and panel data. When used with panel data, the calculation is performed separately within panel. Several texts provide good introductions to the methods available in tssmooth. Chatfield (2004) discusses how these methods fit into time-series analysis in general. Abraham and Ledolter (1983); Montgomery, Johnson, and Gardiner (1990); Bowerman, O’Connell, and Koehler (2005); and Chatfield (2001) discuss using these methods for modern time-series forecasting. Becketti (2013) includes a Stata-centric discussion of these techniques. As he emphasizes, these methods often work as well as more complicated methods and are easier to explain to lay audiences. Do not dismiss these techniques as being too simplistic or inferior. 651 652 tssmooth — Smooth and forecast univariate time-series data References Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley. Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Bowerman, B. L., R. T. O’Connell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied Approach. 4th ed. Pacific Grove, CA: Brooks/Cole. Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC. . 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC. Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129–140. Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting 20: 5–10. Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New York: McGraw–Hill. Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324–342. Also see [TS] tsset — Declare data to be time-series data [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] sspace — State-space models [TS] tsfilter — Filter a time-series, keeping only selected periodicities [R] smooth — Robust nonlinear smoother Title tssmooth dexponential — Double-exponential smoothing Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description tssmooth dexponential models the trend of a variable whose difference between changes from the previous values is serially correlated. More precisely, it models a variable whose second difference follows a low-order, moving-average process. Quick start Create smooth using a double-exponential smoother over y with tsset data tssmooth dexponential smooth=y As above, but forecast 10 periods out of sample tssmooth dexponential smooth=y, forecast(10) As above, but use 111 and 112 as the initial values for the recursion tssmooth dexponential smooth=y, forecast(10) s0(111 112) As above, but use 0.5 as the smoothing parameter tssmooth dexponential smooth=y, forecast(10) s0(111 112) parms(.5) Note: The above commands can also be used to apply the smoother separately to each panel of a panel dataset when a panelvar has been specified using tsset or xtset. Menu Statistics > Time series > Smoothers/univariate forecasters 653 > Double-exponential smoothing 654 tssmooth dexponential — Double-exponential smoothing Syntax tssmooth dexponential  type  newvar = exp  if   in   , options  Description options Main replace parms(#α ) samp0(#) s0(#1 #2 ) forecast(#) replace newvar if it already exists use #α as smoothing parameter use # observations to obtain initial values for recursions use #1 and #2 as initial values for recursions use # periods for the out-of-sample forecast You must tsset your data before using tssmooth dexponential; see [TS] tsset. exp may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main replace replaces newvar if it already exists. parms(#α ) specifies the parameter α for the double-exponential smoothers; 0 < #α < 1. If parms(#α ) is not specified, the smoothing parameter is chosen to minimize the in-sample sum-of-squared forecast errors. samp0(#) and s0(#1 #2 ) are mutually exclusive ways of specifying the initial values for the recursion. By default, initial values are obtained by fitting a linear regression with a time trend, using the first half of the observations in the dataset; see Remarks and examples. samp0(#) specifies that the first # be used in that regression. s0(#1 #2 ) specifies that #1 #2 be used as initial values. forecast(#) specifies the number of periods for the out-of-sample prediction; 0 ≤ # ≤ 500. The default is forecast(0), which is equivalent to not performing an out-of-sample forecast. Remarks and examples The double-exponential smoothing procedure is designed for series that can be locally approximated as x bt = mt + bt t where x bt is the smoothed or predicted value of the series x, and the terms mt and bt change over time. Abraham and Ledolter (1983), Bowerman, O’Connell, and Koehler (2005), and Montgomery, Johnson, and Gardiner (1990) all provide good introductions to double-exponential smoothing. Chatfield (2001, 2004) provides helpful discussions of how double-exponential smoothing relates to modern time-series methods. The double-exponential method has been used both as a smoother and as a prediction method. [TS] tssmooth exponential shows that the single-exponential smoothed series is given by St = αxt + (1 − α)St−1 tssmooth dexponential — Double-exponential smoothing 655 where α is the smoothing constant and xt is the original series. The double-exponential smoother is obtained by smoothing the smoothed series, [2] [2] St = αSt + (1 − α)St−1 [2] Values of S0 and S0 are necessary to begin the process. Per Montgomery, Johnson, and Gar[2] diner (1990), the default method is to obtain S0 and S0 from a regression of the first Npre values of xt on e t = (1, . . . , Npre − t0 )0 . By default, Npre is equal to one-half the number of observations in the sample. Npre can be specified using the samp0() option. [2] The values of S0 and S0 can also be specified using the option s0(). Example 1: Smoothing a locally trending series Suppose that we had some data on the monthly sales of a book and that we wanted to smooth this series. The graph below illustrates that this series is locally trending over time, so we would not want to use single-exponential smoothing. 90 100 110 Sales 120 130 140 Monthly book sales 0 20 40 Time 60 656 tssmooth dexponential — Double-exponential smoothing The following example illustrates that double-exponential smoothing is simply smoothing the smoothed series. Because the starting values are treated as time-zero values, we actually lose 2 observations when smoothing the smoothed series. . use http://www.stata-press.com/data/r14/sales2 . tssmooth exponential double sm1=sales, p(.7) s0(1031) exponential coefficient = sum-of-squared residuals = root mean squared error = 0.7000 13923 13.192 . tssmooth exponential double sm2=sm1, p(.7) s0(1031) exponential coefficient = sum-of-squared residuals = root mean squared error = 0.7000 7698.6 9.8098 . tssmooth dexponential double sm2b=sales, p(.7) s0(1031 1031) double-exponential coefficient sum-of-squared residuals root mean squared error = = = 0.7000 3724.4 6.8231 . generate double sm2c = f2.sm2 (2 missing values generated) . list sm2b sm2c in 1/10 sm2b sm2c 1. 2. 3. 4. 5. 1031 1028.3834 1030.6306 1017.8182 1022.938 1031 1028.3834 1030.6306 1017.8182 1022.938 6. 7. 8. 9. 10. 1026.0752 1041.8587 1042.8341 1035.9571 1030.6651 1026.0752 1041.8587 1042.8341 1035.9571 1030.6651 The double-exponential method can also be viewed as a forecasting mechanism. The exponential forecast method is a constrained version of the Holt–Winters method implemented in [TS] tssmooth hwinters (as discussed by Gardner [1985] and Chatfield [2001]). Chatfield (2001) also notes that the double-exponential method arises when the underlying model is an ARIMA(0,2,2) with equal roots. This method produces predictions x bt for t = t1 , . . . , T + forecast(). These predictions are obtained as a function of the smoothed series and the smoothed-smoothed series. For t ∈ [t0 , T ],  x bt = 2 + [2] where St and St  α  α  [2] St − 1 + S 1−α 1−α t are as given above. The out-of-sample predictions are obtained as a function of the constant term, the linear term of the [2] smoothed series at the last observation in the sample, and time. The constant term is aT = 2ST −ST , [2] α and the linear term is bT = 1−α (ST − ST ). The τ th-step-ahead out-of-sample prediction is given by x bt = at + τ bT tssmooth dexponential — Double-exponential smoothing 657 Example 2: Forecasting a locally trending series Specifying the forecast option puts the double-exponential forecast into the new variable instead of the double-exponential smoothed series. The code given below uses the smoothed series sm1 and sm2 that were generated above to illustrate how the double-exponential forecasts are computed. . tssmooth dexponential double f1=sales, p(.7) s0(1031 1031) forecast(4) double-exponential coefficient = 0.7000 sum-of-squared residuals = 20737 root mean squared error = 16.1 . generate double xhat = (2 + .7/.3) * sm1 - (1 + .7/.3)* f.sm2 (5 missing values generated) . list xhat f1 in 1/10 xhat f1 1. 2. 3. 4. 5. 1031 1031 1023.524 1034.8039 994.0237 1031 1031 1023.524 1034.8039 994.0237 6. 7. 8. 9. 10. 1032.4463 1031.9015 1071.1709 1044.6454 1023.1855 1032.4463 1031.9015 1071.1709 1044.6454 1023.1855 Example 3: Choosing an optimal parameter to forecast Generally, when you are forecasting, you do not know the smoothing parameter. tssmooth dexponential computes the double-exponential forecasts of a series and obtains the optimal smoothing parameter by finding the smoothing parameter that minimizes the in-sample sum-of-squared forecast errors. . tssmooth dexponential f2=sales, forecast(4) computing optimal double-exponential coefficient (0,1) optimal double-exponential coefficient = 0.3631 sum-of-squared residuals = 16075.805 root mean squared error = 14.175598 The following graph describes the fit that we obtained by applying the double-exponential forecast method to our sales data. The out-of-sample dynamic predictions are not constant, as in the singleexponential case. 658 tssmooth dexponential — Double-exponential smoothing . line f2 sales t, title("Double exponential forecast with optimal alpha") > ytitle(Sales) xtitle(time) 950 1000 Sales 1050 1100 Double exponential forecast with optimal alpha 0 20 40 time dexpc(0.3631) = sales 60 80 sales tssmooth dexponential automatically detects panel data from the information provided when the dataset was tsset. The starting values are chosen separately for each series. If the smoothing parameter is chosen to minimize the sum-of-squared prediction errors, the optimization is performed separately on each panel. The stored results contain the results from the last panel. Missing values at the beginning of the sample are excluded from the sample. After at least one value has been found, missing values are filled in using the one-step-ahead predictions from the previous period. Stored results tssmooth dexponential stores the following in r(): Scalars r(N) r(alpha) r(rss) r(rmse) r(N pre) r(s2 0) r(s1 0) r(linear) r(constant) r(period) Macros r(method) r(exp) r(timevar) r(panelvar) number of observations α smoothing parameter sum-of-squared errors root mean squared error number of observations used in calculating starting values, if starting values calculated initial value for linear term, i.e., S0[2] initial value for constant term, i.e., S0 final value of linear term final value of constant term period, if filter is seasonal smoothing method expression specified time variable specified in tsset panel variable specified in tsset tssmooth dexponential — Double-exponential smoothing 659 Methods and formulas A truncated description of the specified double-exponential filter is used to label the new variable. See [D] label for more information on labels. An untruncated description of the specified double-exponential filter is saved in the characteristic tssmooth for the new variable. See [P] char for more information on characteristics. The updating equations for the smoothing and forecasting versions are as given previously. The starting values for both the smoothing and forecasting versions of double-exponential are obtained using the same method, which begins with the model xt = β0 + β1 t where xt is the series to be smoothed and t is a time variable that has been normalized to equal 1 in the first period included in the sample. The regression coefficient estimates βb0 and βb1 are obtained via OLS. The sample is determined by the option samp0(). By default, samp0() includes the first half of the observations. Given the estimates βb0 and βb1 , the starting values are S0 = βb0 − {(1 − α)/α}βb1 [2] S0 = βb0 − 2{(1 − α)/α}βb1 References Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley. Bowerman, B. L., R. T. O’Connell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied Approach. 4th ed. Pacific Grove, CA: Brooks/Cole. Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC. . 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC. Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129–140. Gardner, E. S., Jr. 1985. Exponential smoothing: The state of the art. Journal of Forecasting 4: 1–28. Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting 20: 5–10. Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New York: McGraw–Hill. Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324–342. Also see [TS] tsset — Declare data to be time-series data [TS] tssmooth — Smooth and forecast univariate time-series data Title tssmooth exponential — Single-exponential smoothing Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description tssmooth exponential models the trend of a variable whose change from the previous value is serially correlated. More precisely, it models a variable whose first difference follows a low-order, moving-average process. Quick start Create smooth using a single-exponential smoother over y with tsset data tssmooth exponential smooth=y As above, but forecast 10 periods out of sample tssmooth exponential smooth=y, forecast(10) As above, but use 111 as the initial value for the recursion tssmooth exponential smooth=y, forecast(10) s0(111) As above, but use 0.5 as the smoothing parameter tssmooth exponential smooth=y, forecast(10) s0(111) parms(.5) Note: The above commands can also be used to apply the smoother separately to each panel of a panel dataset when a panelvar has been specified using tsset or xtset. Menu Statistics > Time series > Smoothers/univariate forecasters 660 > Single-exponential smoothing tssmooth exponential — Single-exponential smoothing 661 Syntax tssmooth exponential  type  newvar = exp  if   in   , options  Description options Main replace parms(#α ) samp0(#) s0(#) forecast(#) replace newvar if it already exists use #α as smoothing parameter use # observations to obtain initial value for recursion use # as initial value for recursion use # periods for the out-of-sample forecast You must tsset your data before using tssmooth exponential; see [TS] tsset. exp may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main replace replaces newvar if it already exists. parms(#α ) specifies the parameter α for the exponential smoother; 0 < #α < 1. If parms(#α ) is not specified, the smoothing parameter is chosen to minimize the in-sample sum-of-squared forecast errors. samp0(#) and s0(#) are mutually exclusive ways of specifying the initial value for the recursion. samp0(#) specifies that the initial value be obtained by calculating the mean over the first # observations of the sample. s0(#) specifies the initial value to be used. If neither option is specified, the default is to use the mean calculated over the first half of the sample. forecast(#) gives the number of observations for the out-of-sample prediction; 0 ≤ # ≤ 500. The default is forecast(0) and is equivalent to not forecasting out of sample. Remarks and examples Introduction Examples Treatment of missing values Introduction Exponential smoothing can be viewed either as an adaptive-forecasting algorithm or, equivalently, as a geometrically weighted moving-average filter. Exponential smoothing is most appropriate when used with time-series data that exhibit no linear or higher-order trends but that do exhibit lowvelocity, aperiodic variation in the mean. Abraham and Ledolter (1983), Bowerman, O’Connell, and Koehler (2005), and Montgomery, Johnson, and Gardiner (1990) all provide good introductions to single-exponential smoothing. Chatfield (2001, 2004) discusses how single-exponential smoothing relates to modern time-series methods. For example, simple exponential smoothing produces optimal forecasts for several underlying models, including ARIMA(0,1,1) and the random-walk-plus-noise state-space model. (See Chatfield [2001, sec. 4.3.1].) 662 tssmooth exponential — Single-exponential smoothing The exponential filter with smoothing parameter α creates the series St , where St = αXt + (1 − α)St−1 for t = 1, . . . , T and S0 is the initial value. This is the adaptive forecast-updating form of the exponential smoother. This implies that T −1 X St = α (1 − α)K XT −k + (1 − α)T S0 k=0 which is the weighted moving-average representation, with geometrically declining weights. The choice of the smoothing constant α determines how quickly the smoothed series or forecast will adjust to changes in the mean of the unfiltered series. For small values of α, the response will be slow because more weight is placed on the previous estimate of the mean of the unfiltered series, whereas larger values of α will put more emphasis on the most recently observed value of the unfiltered series. Examples Example 1: Smoothing a series for specified parameters Let’s consider some examples using sales data. Here we forecast sales for three periods with a smoothing parameter of 0.4: . use http://www.stata-press.com/data/r14/sales1 . tssmooth exponential sm1=sales, parms(.4) forecast(3) exponential coefficient = sum-of-squared residuals = root mean squared error = 0.4000 8345 12.919 To compare our forecast with the actual data, we graph the series and the forecasted series over time. . line sm1 sales t, title("Single exponential forecast") > ytitle(Sales) xtitle(Time) 1000 1020 Sales 1040 1060 1080 1100 Single exponential forecast 0 10 20 30 40 50 Time exp parms(0.4000) = sales sales tssmooth exponential — Single-exponential smoothing 663 The graph indicates that our forecasted series may not be adjusting rapidly enough to the changes in the actual series. The smoothing parameter α controls the rate at which the forecast adjusts. Smaller values of α adjust the forecasts more slowly. Thus we suspect that our chosen value of 0.4 is too small. One way to investigate this suspicion is to ask tssmooth exponential to choose the smoothing parameter that minimizes the sum-of-squared forecast errors. . tssmooth exponential sm2=sales, forecast(3) computing optimal exponential coefficient (0,1) optimal exponential coefficient = 0.7815 sum-of-squared residuals = 6727.7056 root mean squared error = 11.599746 The output suggests that the value of α = 0.4 is too small. The graph below indicates that the new forecast tracks the series much more closely than the previous forecast. . line sm2 sales t, title("Single exponential forecast with optimal alpha") > ytitle(sales) xtitle(Time) 1000 1020 Sales 1040 1060 1080 1100 Single exponential forecast with optimal alpha 0 10 20 30 40 50 Time parms(0.7815) = sales sales We noted above that simple exponential forecasts are optimal for an ARIMA (0,1,1) model. (See [TS] arima for fitting ARIMA models in Stata.) Chatfield (2001, 90) gives the following useful derivation that relates the MA coefficient in an ARIMA (0,1,1) model to the smoothing parameter in single-exponential smoothing. An ARIMA (0,1,1) is given by xt − xt−1 = t + θt−1 where t is an independent and identically distributed white-noise error term. Thus given θb, an estimate b t . Because t is not observable, it of θ, an optimal one-step prediction of x bt+1 is x bt+1 = xt + θ can be replaced by bt = xt − x bt−1 yielding b t−x x bt+1 = xt + θ(x bt−1 ) Letting α b = 1 + θb and doing more rearranging implies that b t − θb bxt−1 x bt+1 = (1 + θ)x x bt+1 = α bxt − (1 − α b)b xt−1 664 tssmooth exponential — Single-exponential smoothing Example 2: Comparing ARIMA to exponential smoothing Let’s compare the estimate of the optimal smoothing parameter of 0.7815 with the one we could obtain using [TS] arima. Below we fit an ARIMA(0,1,1) to the sales data and then remove the estimate of α. The two estimates of α are quite close, given the large estimated standard error of θb. . arima sales, arima(0,1,1) (setting optimization to BHHH) Iteration 0: log likelihood = -189.91037 Iteration 1: log likelihood = -189.62405 Iteration 2: log likelihood = -189.60468 Iteration 3: log likelihood = -189.60352 Iteration 4: log likelihood = -189.60343 (switching optimization to BFGS) Iteration 5: log likelihood = -189.60342 ARIMA regression Sample: 2 - 50 Number of obs Wald chi2(1) Prob > chi2 Log likelihood = -189.6034 D.sales Coef. OPG Std. Err. z P>|z| = = = 49 1.41 0.2347 [95% Conf. Interval] sales _cons .5025469 1.382727 0.36 0.716 -2.207548 3.212641 ma L1. -.1986561 .1671699 -1.19 0.235 -.5263031 .1289908 /sigma 11.58992 1.240607 9.34 0.000 9.158378 14.02147 ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. . di 1 + _b[ARMA:L.ma] .80134387 Example 3: Handling panel data tssmooth exponential automatically detects panel data. Suppose that we had sales figures for five companies in long form. Running tssmooth exponential on the variable that contains all five series puts the smoothed series and the predictions in one variable in long form. When the smoothing parameter is chosen to minimize the squared prediction error, an optimal value for the smoothing parameter is chosen separately for each panel. tssmooth exponential — Single-exponential smoothing 665 . use http://www.stata-press.com/data/r14/sales_cert, clear . tsset panel variable: id (strongly balanced) time variable: t, 1 to 100 delta: 1 unit . tssmooth exponential sm5=sales, forecast(3) -> id = 1 computing optimal exponential coefficient (0,1) optimal exponential coefficient = 0.8702 sum-of-squared residuals = 16070.567 root mean squared error = 12.676974 -> id = 2 computing optimal exponential coefficient (0,1) optimal exponential coefficient = sum-of-squared residuals = root mean squared error = 0.7003 20792.393 14.419568 -> id = 3 computing optimal exponential coefficient (0,1) optimal exponential coefficient = 0.6927 sum-of-squared residuals = 21629 root mean squared error = 14.706801 -> id = 4 computing optimal exponential coefficient (0,1) optimal exponential coefficient = 0.3866 sum-of-squared residuals = 22321.334 root mean squared error = 14.940326 -> id = 5 computing optimal exponential coefficient (0,1) optimal exponential coefficient = sum-of-squared residuals = root mean squared error = 0.4540 20714.095 14.392392 tssmooth exponential computed starting values and chose an optimal α for each panel individually. Treatment of missing values Missing values in the middle of the data are filled in with the one-step-ahead prediction using the previous values. Missing values at the beginning or end of the data are treated as if the observations were not there. tssmooth exponential treats observations excluded from the sample by if and in just as if they were missing. 666 tssmooth exponential — Single-exponential smoothing Example 4: Handling missing data in the middle of a sample Here the 28th observation is missing. The prediction for the 29th observation is repeated in the new series. . use http://www.stata-press.com/data/r14/sales1, clear . tssmooth exponential sm1=sales, parms(.7) forecast(3) (output omitted ) . generate sales2=sales if t!=28 (4 missing values generated) . tssmooth exponential sm3=sales2, parms(.7) forecast(3) exponential coefficient = 0.7000 sum-of-squared residuals = 6842.4 root mean squared error = 11.817 . list t sales2 sm3 if t>25 & t<31 26. 27. 28. 29. 30. t sales2 sm3 26 27 28 29 30 1011.5 1028.3 . 1028.4 1054.8 1007.5 1010.3 1022.9 1022.9 1026.75 Because the data for t = 28 are missing, the prediction for period 28 has been used in its place. This implies that the updating equation for period 29 is S29 = αS28 + (1 − α)S28 = S28 which explains why the prediction for t = 28 is repeated. Because this is a single-exponential procedure, the loss of that one observation will not be noticed several periods later. . generate diff = sm3-sm1 if t>28 (28 missing values generated) . list t diff if t>28 & t<39 t diff 29. 30. 31. 32. 33. 29 30 31 32 33 -3.5 -1.050049 -.3150635 -.0946045 -.0283203 34. 35. 36. 37. 38. 34 35 36 37 38 -.0085449 -.0025635 -.0008545 -.0003662 -.0001221 tssmooth exponential — Single-exponential smoothing 667 Example 5: Handling missing data at the beginning and end of a sample Now consider an example in which there are data missing at the beginning and end of the sample. . generate sales3=sales if t>2 & t<49 (7 missing values generated) . tssmooth exponential sm4=sales3, parms(.7) forecast(3) exponential coefficient = 0.7000 sum-of-squared residuals = 6215.3 root mean squared error = 11.624 . list t sales sales3 sm4 if t<5 | t>45 t sales sales3 sm4 1. 2. 3. 4. 46. 1 2 3 4 46 1031 1022.1 1005.6 1025 1055.2 . . 1005.6 1025 1055.2 . . 1016.787 1008.956 1057.2 47. 48. 49. 50. 51. 47 48 49 50 51 1056.8 1034.5 1041.1 1056.1 . 1056.8 1034.5 . . . 1055.8 1056.5 1041.1 1041.1 1041.1 52. 53. 52 53 . . . . 1041.1 1041.1 The output above illustrates that missing values at the beginning or end of the sample cause the sample to be truncated. The new series begins with nonmissing data and begins predicting immediately after it stops. One period after the actual data concludes, the exponential forecast becomes a constant. After the actual end of the data, the forecast at period t is substituted for the missing data. This also illustrates why the forecasted series is a constant. Stored results tssmooth exponential stores the following in r(): Scalars r(N) r(alpha) r(rss) r(rmse) r(N pre) r(s1 0) Macros r(method) r(exp) r(timevar) r(panelvar) number of observations α smoothing parameter sum-of-squared prediction errors root mean squared error number of observations used in calculating starting values initial value for St smoothing method expression specified time variable specified in tsset panel variable specified in tsset 668 tssmooth exponential — Single-exponential smoothing Methods and formulas The formulas for deriving smoothed series are as given in the text. When the value of α is not specified, an optimal value is found that minimizes the mean squared forecast error. A method of bisection is used to find the solution to this optimization problem. A truncated description of the specified exponential filter is used to label the new variable. See [D] label for more information about labels. An untruncated description of the specified exponential filter is saved in the characteristic tssmooth for the new variable. See [P] char for more information about characteristics. References Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley. Bowerman, B. L., R. T. O’Connell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied Approach. 4th ed. Pacific Grove, CA: Brooks/Cole. Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC. . 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC. Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129–140. Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting 20: 5–10. Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New York: McGraw–Hill. Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324–342. Also see [TS] tsset — Declare data to be time-series data [TS] tssmooth — Smooth and forecast univariate time-series data Title tssmooth hwinters — Holt–Winters nonseasonal smoothing Description Options Acknowledgment Quick start Remarks and examples References Menu Stored results Also see Syntax Methods and formulas Description tssmooth hwinters is used in smoothing or forecasting a series that can be modeled as a linear trend in which the intercept and the coefficient on time vary over time. Quick start Create smooth using Holt–Winters nonseasonal smoothing over y with tsset data tssmooth hwinters smooth=y As above, but forecast 10 periods out of sample tssmooth hwinters smooth=y, forecast(10) As above, but use 111 and 112 as the initial values for the recursion tssmooth hwinters smooth=y, forecast(10) s0(111 112) As above, but use 0.5 and 0.3 as the smoothing parameters tssmooth hwinters smooth=y, forecast(10) s0(111 112) parms(.5 .3) Note: The above commands can also be used to apply the smoother separately to each panel of a panel dataset when a panelvar has been specified using tsset or xtset. Menu Statistics > Time series > Smoothers/univariate forecasters 669 > Holt-Winters nonseasonal smoothing 670 tssmooth hwinters — Holt–Winters nonseasonal smoothing Syntax tssmooth hwinters  type  newvar = exp  if   in   , options  Description options Main replace parms(#α #β ) samp0(#) s0(#cons #lt ) forecast(#) replace newvar if it already exists use #α and #β as smoothing parameters use # observations to obtain initial values for recursion use #cons and #lt as initial values for recursion use # periods for the out-of-sample forecast Options alternative initial-value specification; see Options diff Maximization maximize options from(#α #β ) control the maximization process; seldom used use #α and #β as starting values for the parameters You must tsset your data before using tssmooth hwinters; see [TS] tsset. exp may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main replace replaces newvar if it already exists. parms(#α #β ), 0 ≤ #α ≤ 1 and 0 ≤ #β ≤ 1, specifies the parameters. If parms() is not specified, the values are chosen by an iterative process to minimize the in-sample sum-of-squared prediction errors. If you experience difficulty converging (many iterations and “not concave” messages), try using from() to provide better starting values. samp0(#) and s0(#cons #lt ) specify how the initial values #cons and #lt for the recursion are obtained. By default, initial values are obtained by fitting a linear regression with a time trend using the first half of the observations in the dataset. samp0(#) specifies that the first # observations be used in that regression. s0(#cons #lt ) specifies that #cons and #lt be used as initial values. forecast(#) specifies the number of periods for the out-of-sample prediction; 0 ≤ # ≤ 500. The default is forecast(0), which is equivalent to not performing an out-of-sample forecast.   Options diff specifies that the linear term is obtained by averaging the first difference of expt and the intercept is obtained as the difference of exp in the first observation and the mean of D.expt . If the diff option is not specified, a linear regression of expt on a constant and t is fit. tssmooth hwinters — Holt–Winters nonseasonal smoothing  671  Maximization maximize options controls the process for solving for the optimal α and β when parms() is not specified.   maximize options: nodifficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), and nonrtolerance; see [R] maximize. These options are seldom used. from(#α #β ), 0 < #α < 1 and 0 < #β < 1, specifies starting values from which the optimal values of α and β will be obtained. If from() is not specified, from(.5 .5) is used. Remarks and examples The Holt–Winters method forecasts series of the form x bt+1 = at + bt t where x bt is the forecast of the original series xt , at is a mean that drifts over time, and bt is a coefficient on time that also drifts. In fact, as Gardner (1985) has noted, the Holt–Winters method produces optimal forecasts for an ARIMA(0,2,2) model and some local linear models. See [TS] arima and the references in that entry for ARIMA models, and see Harvey (1989) for a discussion of the local linear model and its relationship to the Holt–Winters method. Abraham and Ledolter (1983), Bowerman, O’Connell, and Koehler (2005), and Montgomery, Johnson, and Gardiner (1990) all provide good introductions to the Holt–Winters method. Chatfield (2001, 2004) provides helpful discussions of how this method relates to modern time-series analysis. The Holt–Winters method can be viewed as an extension of double-exponential smoothing with two parameters, which may be explicitly set or chosen to minimize the in-sample sum-of-squared forecast errors. In the latter case, as discussed in Methods and formulas, the smoothing parameters are chosen to minimize the in-sample sum-of-squared forecast errors plus a penalty term that helps to achieve convergence when one of the parameters is too close to the boundary. Given the series xt , the smoothing parameters α and β , and the starting values a0 and b0 , the updating equations are at = αxt + (1 − α) (at−1 + bt−1 ) bt = β (at − at−1 ) + (1 − β) bt−1 After computing the series of constant and linear terms, at and bt , respectively, the τ -step-ahead prediction of xt is given by x bt+τ = at + bt τ Example 1: Smoothing a series for specified parameters Below we show how to use tssmooth hwinters with specified smoothing parameters. This example also shows that the Holt–Winters method can closely follow a series in which both the mean and the time coefficient drift over time. 672 tssmooth hwinters — Holt–Winters nonseasonal smoothing Suppose that we have data on the monthly sales of a book and that we want to forecast this series with the Holt–Winters method. . use http://www.stata-press.com/data/r14/bsales . tssmooth hwinters hw1=sales, parms(.7 .3) forecast(3) Specified weights: alpha = 0.7000 beta = 0.3000 sum-of-squared residuals = 2301.046 root mean squared error = 6.192799 . line sales hw1 t, title("Holt-Winters Forecast with alpha=.7 > ytitle(Sales) xtitle(Time) and beta=.3") 90 100 Sales 110 120 130 140 Holt−Winters forecast with alpha=.7 and beta = .3 0 20 40 60 Time sales hw parms(0.700 0.300) = sales The graph indicates that the forecasts are for linearly decreasing sales. Given aT and bT , the out-ofsample predictions are linear functions of time. In this example, the slope appears to be too steep, probably because our choice of α and β . Example 2: Choosing the initial values The graph in the previous example illustrates that the starting values for the linear and constant series can affect the in-sample fit of the predicted series for the first few observations. The previous example used the default method for obtaining the initial values for the recursion. The output below illustrates that, for some problems, the differenced-based initial values provide a better in-sample fit for the first few observations. However, the differenced-based initial values do not always outperform the regression-based initial values. Furthermore, as shown in the output below, for series of reasonable length, the predictions produced are nearly identical. tssmooth hwinters — Holt–Winters nonseasonal smoothing 673 . tssmooth hwinters hw2=sales, parms(.7 .3) forecast(3) diff Specified weights: alpha = 0.7000 beta = 0.3000 sum-of-squared residuals = 2261.173 root mean squared error = 6.13891 . list hw1 hw2 if _n<6 | _n>57 hw1 hw2 1. 2. 3. 4. 5. 93.31973 98.40002 100.8845 98.50404 93.62408 97.80807 98.11447 99.2267 96.78276 92.2452 58. 59. 60. 61. 62. 116.5771 119.2146 119.2608 111.0299 109.2815 116.5771 119.2146 119.2608 111.0299 109.2815 63. 107.5331 107.5331 When the smoothing parameters are chosen to minimize the in-sample sum-of-squared forecast errors, changing the initial values can affect the choice of the optimal α and β . When changing the initial values results in different optimal values for α and β , the predictions will also differ. When the Holt–Winters model fits the data well, finding the optimal smoothing parameters generally proceeds well. When the model fits poorly, finding the α and β that minimize the in-sample sum-of-squared forecast errors can be difficult. Example 3: Forecasting with optimal parameters In this example, we forecast the book sales data using the α and β that minimize the in-sample squared forecast errors. 674 tssmooth hwinters — Holt–Winters nonseasonal smoothing . tssmooth hwinters hw3=sales, forecast(3) computing optimal weights Iteration 0: penalized RSS = -2632.2073 (not concave) Iteration 1: penalized RSS = -1982.8431 Iteration 2: penalized RSS = -1976.4236 Iteration 3: penalized RSS = -1975.9172 Iteration 4: penalized RSS = -1975.9036 Iteration 5: penalized RSS = -1975.9036 Optimal weights: alpha = 0.8209 beta = 0.0067 penalized sum-of-squared residuals = 1975.904 sum-of-squared residuals = 1975.904 root mean squared error = 5.738617 The following graph contains the data and the forecast using the optimal α and β . Comparing this graph with the one above illustrates how different choices of α and β can lead to very different forecasts. Instead of linearly decreasing sales, the new forecast is for linearly increasing sales. . line sales hw3 t, title("Holt-Winters Forecast with optimal alpha and beta") > ytitle(Sales) xtitle(Time) 90 100 Sales 110 120 130 140 Holt−Winters forecast with optimal alpha and beta 0 20 40 60 Time sales hw parms(0.821 0.007) = sales Stored results tssmooth hwinters stores the following in r(): Scalars r(N) r(alpha) r(beta) r(rss) r(prss) number of observations α smoothing parameter β smoothing parameter sum-of-squared errors penalized sum-of-squared errors, if parms() not specified root mean squared error r(rmse) Macros r(method) smoothing method r(exp) expression specified r(s2 0) r(s1 0) r(linear) r(constant) number of observations used in calculating starting values initial value for linear term initial value for constant term final value of linear term final value of constant term r(timevar) r(panelvar) time variables specified in tsset panel variables specified in tsset r(N pre) tssmooth hwinters — Holt–Winters nonseasonal smoothing 675 Methods and formulas A truncated description of the specified Holt–Winters filter is used to label the new variable. See [D] label for more information on labels. An untruncated description of the specified Holt–Winters filter is saved in the characteristic named tssmooth for the new variable. See [P] char for more information on characteristics. Given the series, xt ; the smoothing parameters, α and β ; and the starting values, a0 and b0 , the updating equations are at = αxt + (1 − α) (at−1 + bt−1 ) bt = β (at − at−1 ) + (1 − β) bt−1 By default, the initial values are found by fitting a linear regression with a time trend. The time variable in this regression is normalized to equal one in the first period included in the sample. By default, one-half of the data is used in this regression, but this sample can be changed using samp0(). a0 is then set to the estimate of the constant, and b0 is set to the estimate of the coefficient on the time trend. Specifying the diff option sets b0 to the mean of D.x and a0 to x1 − b0 . s0() can also be used to specify the initial values directly. Sometimes, one or both of the optimal parameters may lie on the boundary of [ 0, 1 ]. To keep the estimates inside [ 0, 1 ], tssmooth hwinters parameterizes the objective function in terms of their inverse logits, that is, in terms of exp(α)/{1 + exp(α)} and exp(β)/{1 + exp(β)}. When one of these parameters is actually on the boundary, this can complicate the optimization. For this reason, e be the tssmooth hwinters optimizes a penalized sum-of-squared forecast errors. Let x bt (e α, β) e forecast for the series xt , given the choices of α e and β . Then the in-sample penalized sum-of-squared prediction errors is P = T h X t=1 e 2+I e − 12)2 {xt − x bt (e α, β)} α)| − 12)2 + I|f (βe)|>12) (|f (β)| |f (α e)|>12) (|f (e i where f (x) = ln {x(1 − x)}. The penalty term is zero unless one of the parameters is close to the boundary. When one of the parameters is close to the boundary, the penalty term will help to obtain convergence. Acknowledgment We thank Nicholas J. Cox of the Department of Geography at Durham University, UK, and coeditor of the Stata Journal and author of Speaking Stata Graphics for his helpful comments. References Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley. Bowerman, B. L., R. T. O’Connell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied Approach. 4th ed. Pacific Grove, CA: Brooks/Cole. Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC. . 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC. Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129–140. 676 tssmooth hwinters — Holt–Winters nonseasonal smoothing Gardner, E. S., Jr. 1985. Exponential smoothing: The state of the art. Journal of Forecasting 4: 1–28. Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting 20: 5–10. Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New York: McGraw–Hill. Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324–342. Also see [TS] tsset — Declare data to be time-series data [TS] tssmooth — Smooth and forecast univariate time-series data Title tssmooth ma — Moving-average filter Description Options Reference Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description tssmooth ma creates a new series in which each observation is an average of nearby observations in the original series. The moving average may be calculated with uniform or user-specified weights. Missing periods are excluded from calculations. Quick start Create ma1 using a second-order moving average of y1 with tsset data tssmooth ma ma1 = y1, window(2) Also include the current observation in the average tssmooth ma ma1 = y1, window(2 1) Also include 4 forward terms in the average tssmooth ma ma1 = y1, window(2 1 4) Create ma2 using a moving average of y2 with weight 2 for the first lag of y2, 3 for its current value, 5 for its first forward value, and 4 for its second forward value tssmooth ma ma2 = y2, weights(2 <3> 5 4) Note: The above commands can also be used to apply the smoother separately to each panel of a panel dataset when a panelvar has been specified using tsset or xtset. Menu Statistics > Time series > Smoothers/univariate forecasters 677 > Moving-average filter 678 tssmooth ma — Moving-average filter Syntax Moving average with uniform weights            tssmooth ma type newvar = exp if in , window(#l #c #f ) replace Moving average with specified weights           tssmooth ma type newvar = exp if in , weights( numlistl <#c > numlistf )   replace You must tsset your data before using tssmooth ma; see [TS] tsset. exp may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options    window(#l #c #f ) describes the span of the uniformly weighted moving average. #l specifies the number of lagged terms to be included, 0 ≤ #l ≤ one-half the number of observations in the sample. #c is optional and specifies whether to include the current observation in the filter. A 0 indicates exclusion and 1, inclusion. The current observation is excluded by default. #f is optional and specifies the number of forward terms to be included, 0 ≤ #f ≤ one-half the number of observations in the sample.     weights( numlistl <#c > numlistf ) is required for the weighted moving average and describes the span of the moving average, as well as the weights to be applied to each term in the average. The middle term literally is surrounded by < and >, so you might type weights(1/2 <3> 2/1). numlistl is optional and specifies the weights to be applied to the lagged terms when computing the moving average. #c is required and specifies the weight to be applied to the current term. numlistf is optional and specifies the weights to be applied to the forward terms when computing the moving average. The number of elements in each numlist is limited to one-half the number of observations in the sample. replace replaces newvar if it already exists. Remarks and examples Remarks are presented under the following headings: Overview Video example tssmooth ma — Moving-average filter 679 Overview Moving averages are simple linear filters of the form Pf x bt = i=−l Pf wi xt+i i=−l wi where x bt is the moving average xt is the variable or expression to be smoothed wi are the weights being applied to the terms in the filter l is the longest lag in the span of the filter f is the longest lead in the span of the filter Moving averages are used primarily to reduce noise in time-series data. Using moving averages to isolate signals is problematic, however, because the moving averages themselves are serially correlated, even when the underlying data series is not. Still, Chatfield (2004) discusses moving-average filters and provides several specific moving-average filters for extracting certain trends. Example 1: A symmetric moving-average filter with uniform weights Suppose that we have a time series of sales data, and we want to separate the data into two components: signal and noise. To eliminate the noise, we apply a moving-average filter. In this example, we use a symmetric moving average with a span of 5. This means that we will average the first two lagged values, the current value, and the first two forward terms of the series, with each term in the average receiving a weight of 1. . use http://www.stata-press.com/data/r14/sales1 . tsset time variable: t, 1 to 50 delta: 1 unit . tssmooth ma sm1 = sales, window(2 1 2) The smoother applied was (1/5)*[x(t-2) + x(t-1) + 1*x(t) + x(t+1) + x(t+2)]; x(t)= sales We would like to smooth our series so that there is no autocorrelation in the noise. Below we compute the noise as the difference between the smoothed series and the series itself. Then we use ac (see [TS] corrgram) to check for autocorrelation in the noise. 680 tssmooth ma — Moving-average filter −0.40 Autocorrelations of noise −0.20 0.00 0.20 0.40 . generate noise = sales-sm1 . ac noise 0 5 10 15 20 25 Lag Bartlett’s formula for MA(q) 95% confidence bands Example 2: A symmetric moving-average filter with nonuniform weights In the previous example, there is some evidence of negative second-order autocorrelation, possibly due to the uniform weighting or the length of the filter. We are going to specify a shorter filter in which the weights decline as the observations get farther away from the current observation. The weighted moving-average filter requires that we supply the weights to apply to each element with the weights() option. In specifying the weights, we implicitly specify the span of the filter. Below we use the filter x bt = (1/9)(1xt−2 + 2xt−1 + 3xt + 2xt+1 + 1xt+2 ) In what follows, 1/2 does not mean one-half, it means the numlist 1 2: . tssmooth ma sm2 = sales, weights( 1/2 <3> 2/1) The smoother applied was (1/9)*[1*x(t-2) + 2*x(t-1) + 3*x(t) + 2*x(t+1) + 1*x(t+2)]; x(t)= sales . generate noise2 = sales-sm2 We compute the noise and use ac to check for autocorrelation. tssmooth ma — Moving-average filter 681 −0.40 Autocorrelations of noise2 −0.20 0.00 0.20 0.40 . ac noise2 0 5 10 15 20 25 Lag Bartlett’s formula for MA(q) 95% confidence bands The graph shows no significant evidence of autocorrelation in the noise from the second filter. Technical note tssmooth ma gives any missing observations a coefficient of zero in both the uniformly weighted and weighted moving-average filters. This simply means that missing values or missing periods are excluded from the moving average. Sample restrictions, via if and in, cause the expression smoothed by tssmooth ma to be missing for the excluded observations. Thus sample restrictions have the same effect as missing values in a variable that is filtered in the expression. Also, gaps in the data that are longer than the span of the filter will generate missing values in the filtered series. Because the first l observations and the last f observations will be outside the span of the filter, those observations will be set to missing in the moving-average series. Video example Time series, part 6: Moving-average smoothers using tssmooth 682 tssmooth ma — Moving-average filter Stored results tssmooth ma stores the following in r(): Scalars r(N) r(w0) r(wlead#) r(wlag#) Macros r(method) r(exp) r(timevar) r(panelvar) number of observations weight on the current observation weight on lead #, if leads are specified weight on lag #, if lags are specified smoothing method expression specified time variable specified in tsset panel variable specified in tsset Methods and formulas The formula for moving averages is the same as previously given. A truncated description of the specified moving-average filter labels the new variable. See [D] label for more information on labels. An untruncated description of the specified moving-average filter is saved in the characteristic tssmooth for the new variable. See [P] char for more information on characteristics. Reference Chatfield, C. 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC. Also see [TS] tsset — Declare data to be time-series data [TS] tssmooth — Smooth and forecast univariate time-series data Title tssmooth nl — Nonlinear filter Description Options Also see Quick start Remarks and examples Menu Stored results Syntax Methods and formulas Description tssmooth nl uses nonlinear smoothers to identify the underlying trend in a series. Quick start Create nly as a running median smoother of y of span 5 using tsset data tssmooth nl nly=y, smoother(3) As above, but use a Hanning linear smoother tssmooth nl nly=y, smoother(H) As above, but smooth over y and then over the part of y that is not smooth and add the smooth components of the two steps tssmooth nl nly=y, smoother(H, twice) Note: The above commands can also be used to apply the smoother separately to each panel of a panel dataset when a panelvar has been specified using tsset or xtset. Menu Statistics > Time series > Smoothers/univariate forecasters 683 > Nonlinear filter 684 tssmooth nl — Nonlinear filter Syntax         tssmooth nl type newvar = exp if in , smoother(smoother , twice )   replace    where smoother is specified as Sm Sm . . . and Sm is one of    1|2|3|4|5|6|7|8| 9 R      3 R S S|R S|R ... E H The numbers specified in smoother represent the span of a running median smoother. For example, a number 3 specifies that each value be replaced by the median of the point and the two adjacent data values. The letter H indicates that a Hanning linear smoother, which is a span-3 smoother with binomial weights, be applied. The letters E, S, and R are three refinements that can be combined with the running median and Hanning smoothers. First, the end points of a smooth can be given special treatment. This is specified by the E operator. Second, smoothing by 3, the span-3 running median, tends to produce flat-topped hills and valleys. The splitting operator, S, “splits” these repeated values, applies the end-point operator to them, and then “rejoins” the series. Third, it is sometimes useful to repeat an odd-span median smoother or the splitting operator until the smooth no longer changes. Following a digit or an S with an R specifies this type of repetition. Finally, the twice operator specifies that after smoothing, the smoother be reapplied to the resulting rough, and any recovered signal be added back to the original smooth.   Letters may be specified in lowercase, if preferred. Examples of smoother , twice include 3RSSH 3rssh 3RSSH,twice 3rssh,twice 4253H 4253h 4253H,twice 4253h,twice 43RSR2H,twice 43rsr2h,twice You must tsset your data before using tssmooth nl; see [TS] tsset. exp may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main   smoother(smoother , twice ) is required; it specifies the nonlinear smoother to be used. replace replaces newvar if it already exists. Remarks and examples tssmooth nl works as a front end to smooth. See [R] smooth for details. tssmooth nl — Nonlinear filter 685 Stored results tssmooth nl stores the following in r(): Scalars r(N) Macros r(method) r(smoother) r(timevar) r(panelvar) number of observations nl specified smoother time variable specified in tsset panel variable specified in tsset Methods and formulas The methods are documented in [R] smooth. A truncated description of the specified nonlinear filter labels the new variable. See [D] label for more information on labels. An untruncated description of the specified nonlinear filter is saved in the characteristic tssmooth for the new variable. See [P] char for more information on characteristics. Also see [TS] tsset — Declare data to be time-series data [TS] tssmooth — Smooth and forecast univariate time-series data Title tssmooth shwinters — Holt–Winters seasonal smoothing Description Options Acknowledgment Quick start Remarks and examples References Menu Stored results Also see Syntax Methods and formulas Description tssmooth shwinters performs the seasonal Holt–Winters method on a user-specified expression, which is usually just a variable name, and generates a new variable containing the forecasted series. Quick start Create smooth using Holt–Winters seasonal smoothing over y with tsset data tssmooth shwinters smooth=y As above, but forecast 10 periods out of sample tssmooth shwinters smooth=y, forecast(10) As above, but use 111 and 112 as the initial values for the recursion tssmooth shwinters smooth=y, forecast(10) s0(111 112) As above but use 0.5, 0.3, and 0.7 as the smoothing parameters tssmooth shwinters smooth=y, forecast(10) s0(111 112) parms(.5 .3 .7) /// As above, but normalize seasonal values tssmooth shwinters smooth=y, forecast(10) s0(111 112) parms(.5 .3 .7) normalize /// Note: The above commands can also be used to apply the smoother separately to each panel of a panel dataset when a panelvar has been specified using tsset or xtset. Menu Statistics > Time series > Smoothers/univariate forecasters 686 > Holt-Winters seasonal smoothing tssmooth shwinters — Holt–Winters seasonal smoothing 687 Syntax         tssmooth shwinters type newvar = exp if in , options options Description Main replace parms(#α #β #γ ) samp0(#) s0(#cons #lt ) forecast(#) period(#) additive replace newvar if it already exists use #α , #β , and #γ as smoothing parameters use # observations to obtain initial values for recursion use #cons and #lt as initial values for recursion use # periods for the out-of-sample forecast use # for period of the seasonality use additive seasonal Holt–Winters method Options sn0 0(varname) sn0 v(newvar) snt v(newvar) normalize altstarts use initial seasonal values in varname store estimated initial values for seasonal terms in newvar store final year’s estimated seasonal terms in newvar normalize seasonal values use alternative method for computing the starting values Maximization maximize options from(#α #β #γ ) control the maximization process; seldom used use #α , #β , and #γ as starting values for the parameters You must tsset your data before using tssmooth shwinters; see [TS] tsset. exp may contain time-series operators; see [U] 11.4.4 Time-series varlists. Options   Main replace replaces newvar if it already exists. parms(#α #β #γ ), 0 ≤ #α ≤ 1, 0 ≤ #β ≤ 1, and 0 ≤ #γ ≤ 1, specifies the parameters. If parms() is not specified, the values are chosen by an iterative process to minimize the in-sample sum-of-squared prediction errors. If you experience difficulty converging (many iterations and “not concave” messages), try using from() to provide better starting values. samp0(#) and s0(#cons #lt ) have to do with how the initial values #cons and #lt for the recursion are obtained. s0(#cons #lt ) specifies the initial values to be used. samp0(#) specifies that the initial values be obtained using the first # observations of the sample. This calculation is described under Methods and formulas and depends on whether the altstart and additive options are also specified. If neither option is specified, the first half of the sample is used to obtain initial values. forecast(#) specifies the number of periods for the out-of-sample prediction; 0 ≤ # ≤ 500. The default is forecast(0), which is equivalent to not performing an out-of-sample forecast. 688 tssmooth shwinters — Holt–Winters seasonal smoothing period(#) specifies the period of the seasonality. If period() is not specified, the seasonality is obtained from the tsset options daily, weekly, . . . , yearly; see [TS] tsset. If you did not specify one of those options when you tsset the data, you must specify the period() option. For instance, if your data are quarterly and you did not specify tsset’s quarterly option, you must now specify period(4). By default, seasonal values are calculated, but you may specify the initial seasonal values to be used via the sn0 0(varname) option. The first period() observations of varname are to contain the initial seasonal values. additive uses the additive seasonal Holt–Winters method instead of the default multiplicative seasonal Holt–Winters method.   Options sn0 0(varname) specifies the initial seasonal values to use. varname must contain a complete year’s worth of seasonal values, beginning with the first observation in the estimation sample. For example, if you have monthly data, the first 12 observations of varname must contain nonmissing data. sn0 0() cannot be used with sn0 v(). sn0 v(newvar) stores in newvar the initial seasonal values after they have been estimated. sn0 v() cannot be used with sn0 0(). snt v(newvar) stores in newvar the seasonal values for the final year’s worth of data. normalize specifies that the seasonal values be normalized. In the multiplicative model, they are normalized to sum to one. In the additive model, the seasonal values are normalized to sum to zero. altstarts uses an alternative method to compute the starting values for the constant, the linear, and the seasonal terms. The default and the alternative methods are described in Methods and formulas. altstarts may not be specified with s0().   Maximization maximize options controls the process for solving for the optimal α, β , and γ when the parms() option is not specified.   maximize options: nodifficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), and nonrtolerance; see [R] maximize. These options are seldom used. from(#α #β #γ ), 0 < #α < 1, 0 < #β < 1, and 0 < #γ < 1, specifies starting values from which the optimal values of α, β , and γ will be obtained. If from() is not specified, from(.5 .5 .5) is used. Remarks and examples Remarks are presented under the following headings: Introduction Holt–Winters seasonal multiplicative method Holt–Winters seasonal additive method tssmooth shwinters — Holt–Winters seasonal smoothing 689 Introduction The seasonal Holt–Winters methods forecast univariate series that have a seasonal component. If the amplitude of the seasonal component grows with the series, the Holt–Winters multiplicative method should be used. If the amplitude of the seasonal component is not growing with the series, the Holt–Winters additive method should be used. Abraham and Ledolter (1983), Bowerman, O’Connell, and Koehler (2005), and Montgomery, Johnson, and Gardiner (1990) provide good introductions to the Holt–Winters methods in recursive univariate forecasting methods. Chatfield (2001, 2004) provides introductions in the broader context of modern time-series analysis. Like the other recursive methods in tssmooth, tssmooth shwinters uses the information stored by tsset to detect panel data. When applied to panel data, each series is smoothed separately, and the starting values are computed separately for each panel. If the smoothing parameters are chosen to minimize the in-sample sum-of-squared forecast errors, the optimization is performed separately on each panel. When there are missing values at the beginning of the series, the sample begins with the first nonmissing observation. Missing values after the first nonmissing observation are filled in with forecasted values. Holt–Winters seasonal multiplicative method This method forecasts seasonal time series in which the amplitude of the seasonal component grows with the series. Chatfield (2001) notes that there are some nonlinear state-space models whose optimal prediction equations correspond to the multiplicative Holt–Winters method. This procedure is best applied to data that could be described by xt+j = (µt + βj)St+j + t+j where xt is the series, µt is the time-varying mean at time t, β is a parameter, St is the seasonal component at time t, and t is an idiosyncratic error. See Methods and formulas for the updating equations. Example 1: Forecasting from the multiplicative model We have quarterly data on turkey sales by a new producer in the 1990s. The data have a strong seasonal component and an upward trend. We use the multiplicative Holt–Winters method to forecast sales for the year 2000. Because we have already tsset our data to the quarterly format, we do not need to specify the period() option. 690 tssmooth shwinters — Holt–Winters seasonal smoothing . use http://www.stata-press.com/data/r14/turksales . tssmooth shwinters shw1 = sales, forecast(4) computing optimal weights Iteration 0: penalized RSS = -189.34609 (not concave) Iteration 1: penalized RSS = -108.68038 Iteration 2: penalized RSS = -106.99574 Iteration 3: penalized RSS = -106.16725 Iteration 4: penalized RSS = -106.14094 Iteration 5: penalized RSS = -106.14093 Iteration 6: penalized RSS = -106.14093 Optimal weights: alpha = 0.1310 beta = 0.1428 gamma = 0.2999 penalized sum-of-squared residuals = 106.1409 sum-of-squared residuals = 106.1409 root mean squared error = 1.628964 The graph below describes the fit and the forecast that was obtained. . line sales shw1 t, title("Multiplicative Holt-Winters forecast") > xtitle(Time) ytitle(Sales) 95 100 Sales 105 110 115 Multiplicative Holt−Winters forecast 1990q1 1992q1 1994q1 sales 1996q1 Time 1998q1 2000q1 shw parms(0.131 0.143 0.300) = sales Holt–Winters seasonal additive method This method is similar to the previous one, but the seasonal effect is assumed to be additive rather than multiplicative. This method forecasts series that can be described by the equation xt+j = (µt + βj) + St+j + t+j See Methods and formulas for the updating equations. tssmooth shwinters — Holt–Winters seasonal smoothing 691 Example 2: Forecasting from the additive model In this example, we fit the data from the previous example to the additive model to forecast sales in the coming year. We use the snt v() option to save the last year’s seasonal terms in the new variable seas. . tssmooth shwinters shwa = sales, forecast(4) snt_v(seas) normalize additive computing optimal weights Iteration 0: penalized RSS = -190.90242 (not concave) Iteration 1: penalized RSS = -108.8357 Iteration 2: penalized RSS = -108.25359 Iteration 3: penalized RSS = -107.68187 Iteration 4: penalized RSS = -107.66444 Iteration 5: penalized RSS = -107.66442 Iteration 6: penalized RSS = -107.66442 Optimal weights: alpha = 0.1219 beta = 0.1580 gamma = 0.3340 penalized sum-of-squared residuals = 107.6644 sum-of-squared residuals = 107.6644 root mean squared error = 1.640613 The output reveals that the multiplicative model has a better in-sample fit, and the graph below shows that the forecast from the multiplicative model is higher than that of the additive model. . line shw1 shwa t if t>=tq(2000q1), title("Multiplicative and additive" > "Holt-Winters forecasts") xtitle("Time") ytitle("Sales") legend(cols(1)) 108 109 Sales 110 111 112 113 Multiplicative and additive Holt−Winters forecasts 2000q1 2000q2 2000q3 Time 2000q4 2001q1 shw parms(0.131 0.143 0.300) = sales shw−add parms(0.122 0.158 0.334) = sales To check whether the estimated seasonal components are intuitively sound, we list the last year’s seasonal components. . list t seas if seas < . 37. 38. 39. 40. t seas 1999q1 1999q2 1999q3 1999q4 -2.7533393 -.91752573 1.8082417 1.8626233 692 tssmooth shwinters — Holt–Winters seasonal smoothing The output indicates that the signs of the estimated seasonal components agree with our intuition. Stored results tssmooth shwinters stores the following in r(): Scalars r(N) r(alpha) r(beta) r(gamma) r(prss) r(rss) r(rmse) Macros r(method) number of observations α smoothing parameter β smoothing parameter γ smoothing parameter penalized sum-of-squared errors sum-of-squared errors root mean squared error shwinters, additive or shwinters, multiplicative r(normalize) normalize, if specified r(s2 0) r(s1 0) r(linear) r(constant) r(period) number of seasons used in calculating starting values initial value for linear term initial value for constant term final value of linear term final value of constant term period, if filter is seasonal r(exp) r(timevar) r(panelvar) expression specified time variable specified in tsset panel variable specified in tsset r(N pre) Methods and formulas A truncated description of the specified seasonal Holt–Winters filter labels the new variable. See [D] label for more information on labels. An untruncated description of the specified seasonal Holt–Winters filter is saved in the characteristic named tssmooth for the new variable. See [P] char for more information on characteristics. When the parms() option is not specified, the smoothing parameters are chosen to minimize the in-sample sum of penalized squared-forecast errors. Sometimes, one or more of the three optimal parameters lies on the boundary [ 0, 1 ]. To keep the estimates inside [ 0, 1 ], tssmooth shwinters parameterizes the objective function in terms of their inverse logits, that is, exp(α)/{1 + exp(α)}, exp(β)/{1 + exp(β)}, and exp(γ)/{1 + exp(γ)}. When one of these parameters is actually on the boundary, this can complicate the optimization. For this reason, tssmooth shwinters optimizes a eγ penalized sum-of-squared forecast errors. Let x bt (e α, β, e) be the forecast for the series xt given the e choices of α e, β , and γ e. Then the in-sample penalized sum-of-squared prediction errors is P = T h X t=1 eγ e − 12)2 {xt − x bt (e α, β, e)}2 + I|f (α α)| − 12)2 + I|f (βe)|>12) (|f (β)| e)|>12) (|f (e +I|f (e (|f (e γ )| − 12)2 γ )|>12) i   x . The penalty term is zero unless one of the parameters is close to the where f (x) = ln 1−x boundary. When one of the parameters is close to the boundary, the penalty term will help to obtain convergence. Holt–Winters seasonal multiplicative procedure As with the other recursive methods in tssmooth, there are three aspects to implementing the Holt–Winters seasonal multiplicative procedure: the forecasting equation, the initial values, and the updating equations. Unlike in the other methods, the data are now assumed to be seasonal with period L. tssmooth shwinters — Holt–Winters seasonal smoothing 693 Given the estimates a(t), b(t), and s(t + τ − L), a τ step-ahead point forecast of xt , denoted by ybt+τ , is ybt+τ = {a(t) + b(t)τ } s(t + τ − L) Given the smoothing parameters α, β , and γ , the updating equations are a(t) = α xt + (1 − α) {a(t − 1) + b(t − 1)} s(t − L) b(t) = β {a(t) − a(t − 1)} + (1 − β) b(t − 1) and  s(t) = γ xt a(t)  + (1 − γ)s(t − L) To restrict the seasonal terms to sum to 1 over each year, specify the normalize option. The updating equations require the L + 2 initial values a(0), b(0), s(1 − L), s(2 − L), . . . , s(0). Two methods calculate the initial values with the first m years, each of which contains L seasons. By default, m is set to the number of seasons in half the sample. The initial value of the trend component, b(0), can be estimated by b(0) = xm − x1 (m − 1)L where xm is the average level of xt in year m and x1 is the average level of xt in the first year. The initial value for the linear term, a(0), is then calculated as a(0) = x1 − L b(0) 2 To calculate the initial values for the seasons 1, 2, . . . , L, we first calculate the deviation-adjusted values, xt n o S(t) = (L+1) xi − − j b(0) 2 where i is the year that corresponds to time t, j is the season that corresponds to time t, and xi is the average level of xt in year i. Next, for each season l = 1, 2, . . . , L, we define sl as the average St over the years. That is, sl = m−1 1 X Sl+kL m for l = 1, 2, . . . , L k=0 Then the initial seasonal estimates are s0l = sl L ! PL l=1 sl and these values are used to fill in s(1 − L), . . . , s(0). for l = 1, 2, . . . , L 694 tssmooth shwinters — Holt–Winters seasonal smoothing If the altstarts option is specified, the starting values are computed based on a regression with seasonal indicator variables. Specifically, the series xt is regressed on a time variable normalized to equal one in the first period in the sample and on a constant. Then b(0) is set to the estimated coefficient on the time variable, and a(0) is set to the estimated constant term. To calculate the seasonal starting values, xt is regressed on a set of L seasonal dummy variables. The lth seasonal starting value is set to ( µ1 )βbl , where µ is the mean of xt and βbl is the estimated coefficient on the lth seasonal dummy variable. The sample used in both regressions and the mean computation is restricted to include the first samp0() years. By default, samp0() includes half the data. Technical note If there are missing values in the first few years, a small value of m can cause the starting value methods for seasonal term to fail. Here you should either specify a larger value of m by using samp0() or directly specify the seasonal starting values by using the snt0 0() option. Holt–Winters seasonal additive procedure This procedure is similar to the previous one, except that the data are assumed to be described by xt = (β0 + β1 t) + st + t As in the multiplicative case, there are three smoothing parameters, α, β , and γ , which can either be set or chosen to minimize the in-sample sum-of-squared forecast errors. The updating equations are a(t) = α {xt − s(t − L)} + (1 − α) {a(t − 1) + b(t − 1)} b(t) = β {a(t) − a(t − 1)} + (1 − β)b(t − 1) and s(t) = γ {xt − a(t)} + (1 − γ)s(t − L) To restrict the seasonal terms to sum to 0 over each year, specify the normalize option. A τ -step-ahead forecast, denoted by ybt+τ , is given by x bt+τ = a(t) + b(t)τ + s(t + τ − L) As in the multiplicative case, there are two methods for setting the initial values. The default method is to obtain the initial values for a(0), b(0), s(1 − L), . . . , s(0) from the regression xt = a(0) + b(0)t + βs,1−L D1 + βs,2−L D2 + · · · + βs,0 DL + et where the D1 , . . . , DL are dummy variables with  Di = 1 0 if t corresponds to season i otherwise  tssmooth shwinters — Holt–Winters seasonal smoothing 695 When altstarts is specified, an alternative method is used that regresses the xt series on a time variable that has been normalized to equal one in the first period in the sample and on a constant term. b(0) is set to the estimated coefficient on the time variable, and a(0) is set to the estimated constant term. Then the demeaned series x et = xt − µ is created, where µ is the mean of the xt . The x et are regressed on L seasonal dummy variables. The lth seasonal starting value is then set to βl , where βl is the estimated coefficient on the lth seasonal dummy variable. The sample in both the regression and the mean calculation is restricted to include the first samp0 years, where, by default, samp0() includes half the data. Acknowledgment We thank Nicholas J. Cox of the Department of Geography at Durham University, UK, and coeditor of the Stata Journal and author of Speaking Stata Graphics for his helpful comments. References Abraham, B., and J. Ledolter. 1983. Statistical Methods for Forecasting. New York: Wiley. Bowerman, B. L., R. T. O’Connell, and A. B. Koehler. 2005. Forecasting, Time Series, and Regression: An Applied Approach. 4th ed. Pacific Grove, CA: Brooks/Cole. Chatfield, C. 2001. Time-Series Forecasting. London: Chapman & Hall/CRC. . 2004. The Analysis of Time Series: An Introduction. 6th ed. Boca Raton, FL: Chapman & Hall/CRC. Chatfield, C., and M. Yar. 1988. Holt-Winters forecasting: Some practical issues. Statistician 37: 129–140. Holt, C. C. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting 20: 5–10. Montgomery, D. C., L. A. Johnson, and J. S. Gardiner. 1990. Forecasting and Time Series Analysis. 2nd ed. New York: McGraw–Hill. Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages. Management Science 6: 324–342. Also see [TS] tsset — Declare data to be time-series data [TS] tssmooth — Smooth and forecast univariate time-series data Title ucm — Unobserved-components model Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description Unobserved-components models (UCMs) decompose a time series into trend, seasonal, cyclical, and idiosyncratic components and allow for exogenous variables. ucm estimates the parameters of UCMs by maximum likelihood. All the components are optional. The trend component may be first-order deterministic or it may be first-order or second-order stochastic. The seasonal component is stochastic; the seasonal effects at each time period sum to a zero-mean finite-variance random variable. The cyclical component is modeled by the stochastic-cycle model derived by Harvey (1989). Quick start Random-walk model for y using tsset data ucm y Add a cyclical component of order 2 ucm y, cycle(2) Add a seasonal component arising every 3 periods ucm y, cycle(2) seasonal(3) Random-walk model for y with a drift component and a cyclical component of order 1 ucm y, model(rwdrift) cycle(1) Smooth-trend model for y with cyclical and seasonal components of order 2 ucm y, model(strend) cycle(2) seasonal(2) Menu Statistics > Time series > Unobserved-components model 696 ucm — Unobserved-components model 697 Syntax ucm depvar  options indepvars   if   in   , options  Description Model model(model) specify trend and idiosyncratic components seasonal(#)   include a seasonal component with a period of # time units cycle(# , frequency(#f ) ) include a cycle component of order # and optionally set initial frequency to #f , 0 < #f < π ; cycle() may be specified up to three times apply specified linear constraints constraints(constraints) collinear keep collinear variables SE/Robust vce(vcetype) vcetype may be oim or robust Reporting level(#) nocnsreport display options set confidence level; default is level(95) do not display constraints control columns and column formats, row spacing, display of omitted variables and base and empty cells, and factor-variable labeling Maximization maximize options control the maximization process coeflegend display legend instead of statistics model Description rwalk none ntrend dconstant llevel dtrend lldtrend rwdrift lltrend strend rtrend random-walk model; the default no trend or idiosyncratic component no trend component but include idiosyncratic component deterministic constant with idiosyncratic component local-level model deterministic-trend model with idiosyncratic component local-level model with deterministic trend random-walk-with-drift model local-linear-trend model smooth-trend model random-trend model You must tsset your data before using ucm; see [TS] tsset. indepvars may contain factor variables; see [U] 11.4.3 Factor variables. indepvars and depvar may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. 698 ucm — Unobserved-components model Options   Model model(model) specifies the trend and idiosyncratic components. The default is model(rwalk). The available models are listed in Syntax and discussed in detail in Models for the trend and idiosyncratic components under Remarks and examples below. seasonal(#) adds a stochastic-seasonal component to the model. # is the period of the season, that is, the number of time-series observations required for the period to complete. cycle(#) adds a stochastic-cycle component of order # to the model. The order # must be 1, 2, or 3. Multiple cycles are added by repeating the cycle(#) option with up to three cycles allowed. cycle(#, frequency(#f )) specifies #f as the initial value for the central-frequency parameter in the stochastic-cycle component of order #. #f must be in the interval (0, π). constraints(constraints), collinear; see [R] estimation options.   SE/Robust vce(vcetype) specifies the estimator for the variance–covariance matrix of the estimator. vce(oim), the default, causes ucm to use the observed information matrix estimator. vce(robust) causes ucm to use the Huber/White/sandwich estimator.   Reporting level(#), nocnsreport; see [R] estimation options. display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), and sformat(% fmt); see [R] estimation options.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), and from(matname); see [R] maximize for all options except from(), and see below for information on from(). from(matname) specifies initial values for the maximization process. from(b0) causes ucm to begin the maximization algorithm with the values in b0. b0 must be a row vector; the number of columns must equal the number of parameters in the model; and the values in b0 must be in the same order as the parameters in e(b). If you model fails to converge, try using the difficult option. Also see the technical note below example 5. The following option is available with ucm but is not shown in the dialog box: coeflegend; see [R] estimation options. ucm — Unobserved-components model 699 Remarks and examples Remarks are presented under the following headings: An introduction to UCMs A random-walk model example Frequency-domain concepts used in the stochastic-cycle model Another random-walk model example Comparing UCM and ARIMA A local-level model example Comparing UCM and ARIMA, revisited Models for the trend and idiosyncratic components Seasonal component An introduction to UCMs UCMs decompose a time series into trend, seasonal, cyclical, and idiosyncratic components and allow for exogenous variables. Formally, UCMs can be written as yt = τt + γt + ψt + βxt + t (1) where yt is the dependent variable, τt is the trend component, γt is the seasonal component, ψt is the cyclical component, β is a vector of fixed parameters, xt is a vector of exogenous variables, and t is the idiosyncratic component. By placing restrictions on τt and t , Harvey (1989) derived a series of models for the trend and the idiosyncratic components. These models are briefly described in Syntax and are further discussed in Models for the trend and idiosyncratic components. To these models, Harvey (1989) added models for the seasonal and cyclical components, and he also allowed for the presence of exogenous variables. It is rare that a UCM contains all the allowed components. For instance, the seasonal component is rarely needed when modeling deseasonalized data. Harvey (1989) and Durbin and Koopman (2012) show that UCMs can be written as state-space models that allow the parameters of a UCM to be estimated by maximum likelihood. In fact, ucm uses sspace (see [TS] sspace) to perform the estimation calculations; see Methods and formulas for details. After estimating the parameters, predict can produce in-sample predictions or out-of-sample forecasts; see [TS] ucm postestimation. After estimating the parameters of a UCM that contains a cyclical component, estat period converts the estimated central frequency to an estimated central period and psdensity estimates the spectral density implied by the model; see [TS] ucm postestimation and the examples below. We illustrate the basic approach of analyzing data with UCMs, and then we discuss the details of the different trend models in Models for the trend and idiosyncratic components. Although the methods implemented in ucm have been widely applied by economists, they are general time-series techniques and may be of interest to researchers from other disciplines. In example 8, we analyze monthly data on the reported cases of mumps in New York City. 700 ucm — Unobserved-components model A random-walk model example Example 1 We begin by plotting monthly data on the U.S. civilian unemployment rate. 2 Civilian Unemployment Rate 4 6 8 10 . use http://www.stata-press.com/data/r14/unrate . tsline unrate, name(unrate) 1950m1 1960m1 1970m1 1980m1 Month 1990m1 2000m1 2010m1 This series looks like it might be well approximated by a random-walk model. Formally, a random-walk model is given by yt = µt µt = µt−1 + ηt ucm — Unobserved-components model 701 The random-walk is so frequently applied, at least as a starting model, that it is the default model for ucm. In the output below, we fit the random-walk model to the unemployment data. . ucm unrate searching for initial values .......... (setting technique to bhhh) Iteration 0: log likelihood = 84.272992 Iteration 1: log likelihood = 84.394942 Iteration 2: log likelihood = 84.400923 Iteration 3: log likelihood = 84.401282 Iteration 4: log likelihood = 84.401305 (switching technique to nr) Iteration 5: log likelihood = 84.401306 Refining estimates: Iteration 0: log likelihood = 84.401306 Iteration 1: log likelihood = 84.401307 Unobserved-components model Components: random walk Sample: 1948m1 - 2011m1 Log likelihood = 84.401307 unrate var(level) Coef. .0467196 OIM Std. Err. .002403 Number of obs z 19.44 = 757 P>|z| [95% Conf. Interval] 0.000 .0420098 .0514294 Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The output indicates that the model is nonstationary, as all random-walk models are. We consider a richer model in the next example. Example 2 We suspect that there should be a stationary cyclical component that produces serially correlated shocks around the random-walk trend. Harvey (1989) derived a stochastic-cycle model for these stationary cyclical components. The stochastic-cycle model has three parameters: the frequency at which the random components are centered, a damping factor that parameterizes the dispersion of the random components around the central frequency, and the variance of the stochastic-cycle process that acts as a scale factor. 702 ucm — Unobserved-components model Fitting this model to unemployment data yields . ucm unrate, cycle(1) searching for initial values .................... (setting technique to bhhh) Iteration 0: log likelihood = 84.273579 Iteration 1: log likelihood = 87.852115 Iteration 2: log likelihood = 88.253422 Iteration 3: log likelihood = 89.191311 Iteration 4: log likelihood = 94.675898 (switching technique to nr) Iteration 5: log likelihood = 98.394691 (not concave) Iteration 6: log likelihood = 98.983093 Iteration 7: log likelihood = 99.983635 Iteration 8: log likelihood = 104.8309 Iteration 9: log likelihood = 114.27142 Iteration 10: log likelihood = 116.4741 Iteration 11: log likelihood = 118.45816 Iteration 12: log likelihood = 118.88056 Iteration 13: log likelihood = 118.88421 Iteration 14: log likelihood = 118.88421 Refining estimates: Iteration 0: log likelihood = 118.88421 Iteration 1: log likelihood = 118.88421 Unobserved-components model Components: random walk, order 1 cycle Sample: 1948m1 - 2011m1 Log likelihood = Number of obs Wald chi2(2) Prob > chi2 118.88421 OIM Std. Err. unrate Coef. frequency damping .0933466 .9820003 .0103609 .0061121 var(level) var(cycle1) .0143786 .0270339 .0051392 .0054343 z = = = 757 26650.81 0.0000 P>|z| [95% Conf. Interval] 9.01 160.66 0.000 0.000 .0730397 .9700207 .1136535 .9939798 2.80 4.97 0.003 0.000 .004306 .0163829 .0244511 .0376848 Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The estimated central frequency for the cyclical component is small, implying that the cyclical component is centered on low-frequency components. The high-damping factor indicates that all the components from this cyclical component are close to the estimated central frequency. The estimated variance of the stochastic-cycle process is small but significant. We use estat period to convert the estimate of the central frequency to an estimated central period. . estat period cycle1 Coef. period frequency damping 67.31029 .0933466 .9820003 Std. Err. [95% Conf. Interval] 7.471004 .0103609 .0061121 52.6674 .0730397 .9700207 Note: Cycle time unit is monthly. 81.95319 .1136535 .9939798 ucm — Unobserved-components model 703 Because we have monthly data, the estimated central period of 67.31 implies that the cyclical component is composed of random components that occur around a central periodicity of about 5.61 years. This estimate falls within the conventional Burns and Mitchell (1946) definition of business-cycle shocks occurring between 1.5 and 8 years. We can convert the estimated parameters of the cyclical component to an estimated spectral density of the cyclical component, as described by Harvey (1989). The spectral density of the cyclical component describes the relative importance of the random components at different frequencies; see Frequency-domain concepts used in the stochastic-cycle model for details. We use psdensity (see [TS] psdensity) to obtain the spectral density of the cyclical component implied by the estimated parameters, and we use twoway line (see [G-2] graph twoway line) to plot the estimated spectral density. 0 UCM cycle 1 spectral density 2 4 6 8 . psdensity sdensity omega . line sdensity omega 0 1 2 3 Frequency The estimated spectral density shows that the cyclical component is composed of random components that are tightly distributed at the low-frequency peak. Frequency-domain concepts used in the stochastic-cycle model The parameters of the stochastic-cycle model are easiest to interpret in the frequency domain. We now provide a review of the useful concepts from the frequency domain. Crucial to understanding the stochastic-cycle model is the frequency-domain concept that a stationary process can be decomposed into random components that occur at the frequencies in the interval [0, π]. We need some concepts from the frequency-domain approach to interpret the parameters in the stochastic-cycle model of the cyclical component. Here we provide a simple, intuitive explanation. More technical presentations can be found in Priestley (1981), Harvey (1989, 1993), Hamilton (1994), Fuller (1996), and Wei (2006). As with much time-series analysis, the basic results are for covariance-stationary processes with additional results handling some nonstationary cases. We present some useful results for covariancestationary processes. These results provide what we need to interpret the stochastic-cycle model for the stationary cyclical component. 704 ucm — Unobserved-components model The autocovariances γj , j ∈ {0, 1, . . . , ∞}, of a covariance-stationary process yt specify its variance and dependence structure. In the frequency-domain approach to time-series analysis, the spectral density describes the importance of the random components that occur at frequency ω relative to the components that occur at other frequencies. The frequency-domain approach focuses on the relative contributions of random components that occur at the frequencies [0, π]. The spectral density can be written as a weighted average of the autocorrelations of yt . Like autocorrelations, the spectral density is normalized by γ0 , the variance of yt . Multiplying the spectral density by γ0 yields the power-spectrum of yt . In an independent and identically distributed (i.i.d.) process, the components at all frequencies are equally important, so the spectral density is a flat line. In common parlance, we speak of high-frequency noise making a series look more jagged and of low-frequency components causing smoother plots. More formally, we say that a process composed primarily of high-frequency components will have fewer runs above or below the mean than an i.i.d. process and that a process composed primarily of low-frequency components will have more runs above or below the mean than an i.i.d. process. To further formalize these ideas, consider the first-order autoregressive (AR(1)) process given by yt = φyt−1 + t where t is a zero-mean, covariance-stationary process with finite variance σ 2 , and |φ| < 1 so that yt is covariance stationary. The first-order autocorrelation of this AR(1) process is φ. y φ=0.8 −3−2−1 0 1 2 φ=−0.8 −2−1 0 1 2 3 φ=0 −2 −1 0 1 2 Below are plots of simulated data when φ is set to 0, −0.8, and 0.8. When φ = 0, the data are i.i.d. When φ = −0.8, the value today is strongly negatively correlated with the value yesterday, so this case should be a prototypical high-frequency noise example. When φ = 0.8, the value today is strongly positively correlated with the value yesterday, so this case should be a prototypical low-frequency shock example. Time The plots above confirm our conjectures. The plot when φ = −0.8 contains fewer runs above or below the mean, and it is more jagged than the i.i.d. plot. The plot when φ = 0.8 contains more runs above or below the mean, and it is smoother than the i.i.d. plot. ucm — Unobserved-components model 705 0 5 Spectral density 10 15 20 25 Below we plot the spectral densities for the AR(1) model with φ = 0, φ = −0.8, and φ = 0.8. 0 1 2 3 Frequency φ=0 φ=0.8 φ=−0.8 The high-frequency components are much more important to the AR(1) process with φ = −0.8 than to the i.i.d. process with φ = 0. The low-frequency components are much more important to the AR(1) process with φ = 0.8 than to the i.i.d. process. Technical note Autoregressive moving-average (ARMA) models parameterize the autocorrelation in a time series by allowing today’s value to be a weighted average of past values and a weighted average of past i.i.d. shocks; see Hamilton (1994), Wei (2006), and [TS] arima for introductions and a Stata implementation. The intuitive ARMA parameterization has many nice features, including that one can easily rewrite the ARMA model as a weighted average of past i.i.d. shocks to trace how a shock feeds through the system. Although it is easy to obtain the spectral density of an ARMA process, the parameters themselves provide limited information about the underlying spectral density. In contrast, the parameters of the stochastic-cycle parameterization of autocorrelation in a time series directly provide information about the underlying spectral density. The parameter ω0 is the central frequency at which the random components are clustered. If ω0 is small, then the model is centered on low-frequency components. If ω0 is close to π , then the model is centered on high-frequency components. The parameter ρ is the damping factor that indicates how tightly clustered the random components are at the central frequency ω0 . If ρ is close to 0, there is no clustering of the random components. If ρ is close to 1, the random components are tightly distributed at the central frequency ω0 . In the graph below, we draw the spectral densities implied by stochastic-cycle models with four sets of parameters: ω0 = π/4, ρ = 0.8; ω0 = π/4, ρ = 0.9; ω0 = 4π/5, ρ = 0.8; and ω0 = 4π/5, ρ = 0.9. The graph below illustrates that ω0 is the central frequency at which the other important random components are distributed. It also illustrates that the damping parameter ρ controls the dispersion of the important components at the central frequency. ucm — Unobserved-components model 50 706 4π ω0 = ⁄5 ρ = 0.9 30 40 π ω0 = ⁄4 ρ = 0.9 4π ω0 = ⁄5 ρ = 0.8 0 10 20 π ω0 = ⁄4 ρ = 0.8 π/4 π/2 3π/4 π Another random-walk model example Example 3 Now let’s reconsider example 2. Although we might be happy with how our model has identified a stationary cyclical component that we could interpret in business-cycle terms, we suspect that there should also be a high-frequency cyclical component. It is difficult to estimate the parameters of a UCM with two or more stochastic-cycle models. Providing starting values for the central frequencies can be a crucial help to the optimization procedure. Below we estimate a UCM with two cyclical components. We use the frequency() suboption to provide starting values for the central frequencies; we specified the values below because we suspect one model will pick up the low-frequency components and the other will pick up the high-frequency components. We specified the low-frequency model to be order 2 to make it less peaked for any given damping factor. (Trimbur [2006] provides a nice introduction and some formal results for higher-order stochastic-cycle models.) . ucm unrate, cycle(1, frequency(2.9)) cycle(2, frequency(.09)) searching for initial values .................... (setting technique to bhhh) Iteration 0: log likelihood = 115.98563 Iteration 1: log likelihood = 125.04043 Iteration 2: log likelihood = 127.69387 Iteration 3: log likelihood = 134.50864 Iteration 4: log likelihood = 136.91353 (switching technique to nr) Iteration 5: log likelihood = 138.5091 Iteration 6: log likelihood = 146.09273 Iteration 7: log likelihood = 146.28132 Iteration 8: log likelihood = 146.28326 Iteration 9: log likelihood = 146.28326 Refining estimates: Iteration 0: log likelihood = 146.28326 Iteration 1: log likelihood = 146.28326 ucm — Unobserved-components model Unobserved-components model Components: random walk, 2 cycles of order 1 2 Sample: 1948m1 - 2011m1 Log likelihood = 146.28326 OIM Std. Err. unrate Coef. cycle1 frequency damping 2.882382 .7004295 .0668017 .1251571 cycle2 frequency damping .0667929 .9074708 .0207704 .0027886 .002714 var(level) var(cycle1) var(cycle2) z Number of obs Wald chi2(4) Prob > chi2 = = = 707 757 7681.33 0.0000 P>|z| [95% Conf. Interval] 43.15 5.60 0.000 0.000 2.751453 .4551261 3.013311 .9457329 .0206849 .0142273 3.23 63.78 0.001 0.000 .0262513 .8795858 .1073346 .9353559 .0039669 .0014363 .0010281 5.24 1.94 2.64 0.000 0.026 0.004 .0129953 0 .0006991 .0285454 .0056037 .004729 Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The output provides some support for the existence of a second, high-frequency cycle. The highfrequency components are centered at 2.88, whereas the low-frequency components are centered at 0.067. That the estimated damping factor is 0.70 for the high-frequency cycle whereas the estimated damping factor for the low-frequency cycle is 0.91 indicates that the high-frequency components are more diffusely distributed at 2.88 than the low-frequency components are at 0.067. We obtain and plot the estimated spectral densities to get another look at these results. . psdensity sdensity2a omega2a . psdensity sdensity2b omega2b, cycle(2) 0 1 2 3 4 . line sdensity2a sdensity2b omega2a, legend(col(1)) 0 1 2 3 Frequency UCM cycle 1 spectral density UCM cycle 2 spectral density The estimated spectral densities indicate that we have found two distinct cyclical components. 708 ucm — Unobserved-components model It does not matter whether we specify omega2a or omega2b to be the x-axis variable, because they are equal to each other. Technical note That the estimated spectral densities in the previous example do not overlap is important for parameter identification. Although the parameters are identified in large-sample theory, we have found it difficult to estimate the parameters of two cyclical components when the spectral densities overlap. When the spectral densities of two cyclical components overlap, the parameters may not be well identified and the optimization procedure may not converge. Comparing UCM and ARIMA Example 4 This example provides some insight for readers familiar with autoregressive integrated movingaverage (ARIMA) models but not with UCMs. If you are not familiar with ARIMA models, you may wish to skip this example. See [TS] arima for an introduction to ARIMA models in Stata. UCMs provide an alternative to ARIMA models implemented in [TS] arima. Neither set of models is nested within the other, but there are some cases in which instructive comparisons can be made. The random-walk model corresponds to an ARIMA model that is first-order integrated and has an i.i.d. error term. In other words, the random-walk UCM and the ARIMA(0,1,0) are asymptotically equivalent. Thus ucm unrate and arima unrate, arima(0,1,0) noconstant produce asymptotically equivalent results. The stochastic-cycle model for the stationary cyclical component is an alternative functional form for stationary processes to stationary autoregressive moving-average (ARMA) models. Which model is preferred depends on the application and which parameters a researchers wants to interpret. Both the functional forms and the parameter interpretations differ between the stochastic-cycle model and the ARMA model. See Trimbur (2006, eq. 25) for some formal comparisons of the two models. That both models can be used to estimate the stationary cyclical components for the random-walk model implies that we can compare the results in this case by comparing their estimated spectral densities. Below we estimate the parameters of an ARIMA(2,1,1) model and plot the estimated spectral density of the stationary component. ucm — Unobserved-components model . arima unrate, noconstant arima(2,1,1) (setting optimization to BHHH) Iteration 0: log likelihood = 129.8801 Iteration 1: log likelihood = 134.61953 Iteration 2: log likelihood = 137.04909 Iteration 3: log likelihood = 137.71386 Iteration 4: log likelihood = 138.25255 (switching optimization to BFGS) Iteration 5: log likelihood = 138.51924 Iteration 6: log likelihood = 138.81638 Iteration 7: log likelihood = 138.83615 Iteration 8: log likelihood = 138.8364 Iteration 9: log likelihood = 138.83642 Iteration 10: log likelihood = 138.83642 ARIMA regression Sample: 1948m2 - 2011m1 Log likelihood = 138.8364 D.unrate Coef. Number of obs Wald chi2(3) Prob > chi2 = = = 709 756 683.34 0.0000 OPG Std. Err. z P>|z| [95% Conf. Interval] ARMA ar L1. L2. .5398016 .2468148 .0586304 .0359396 9.21 6.87 0.000 0.000 .4248882 .1763744 .6547151 .3172551 ma L1. -.5146506 .0632838 -8.13 0.000 -.6386845 -.3906167 /sigma .2013332 .0032644 61.68 0.000 .1949351 .2077313 0 .2 ARMA spectral density .4 .6 .8 Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. . psdensity sdensity_arma omega_arma . line sdensity_arma omega_arma 0 1 2 3 Frequency The estimated spectral density from the ARIMA(2,1,1) has a similar shape to the plot obtained by combining the two spectral densities estimated from the stochastic-cycle model in example 3. For this particular application, the estimated central frequencies of the two cyclical components from the 710 ucm — Unobserved-components model stochastic-cycle model provide information about the business-cycle component and the high-frequency component that is not easily obtained from the ARIMA(2,1,1) model. On the other hand, it is easier to work out the impulse–response function for the ARMA model than for the stochastic-cycle model, implying that the ARMA model is easier to use when tracing the effect of a shock feeding through the system. A local-level model example We now consider the weekly series of initial claims for unemployment insurance in the United States, which is plotted below. Example 5 200 Change in initial claims 300 400 500 600 700 . use http://www.stata-press.com/data/r14/icsa1, clear . tsline icsa 01jan1970 01jan1980 01jan1990 Date 01jan2000 01jan2010 This series looks like it was generated by a random walk with extra noise, so we want to use a random-walk model that includes an additional random term. This structure causes the model to be occasionally known as the random-walk-plus-noise model, but it is more commonly known as the local-level model in the UCM literature. The local-level model models the trend as a random walk and models the idiosyncratic components as independent and identically distributed components. Formally, the local-level model specifies the observed time-series yt , for t = 1, . . . , T , as yt = µt + t µt = µt−1 + ηt where t ∼ i.i.d. N (0, σ2 ) and ηt ∼ i.i.d. N (0, ση2 ) and are mutually independent. ucm — Unobserved-components model 711 We fit the local-level model in the output below: . ucm icsa, model(llevel) searching for initial values .......... (setting technique to bhhh) Iteration 0: log likelihood = -9954.8223 Iteration 1: log likelihood = -9917.406 Iteration 2: log likelihood = -9905.6679 Iteration 3: log likelihood = -9897.7588 Iteration 4: log likelihood = -9894.2015 (switching technique to nr) Iteration 5: log likelihood = -9893.4337 Iteration 6: log likelihood = -9893.2469 Iteration 7: log likelihood = -9893.2469 Refining estimates: Iteration 0: log likelihood = -9893.2469 Iteration 1: log likelihood = -9893.2469 Unobserved-components model Components: local level Sample: 07jan1967 - 19feb2011 Log likelihood = -9893.2469 icsa var(level) var(icsa) Coef. 116.558 124.2715 OIM Std. Err. 8.806587 7.615506 Number of obs z 13.24 16.32 = 2,303 P>|z| [95% Conf. Interval] 0.000 0.000 99.29745 109.3454 133.8186 139.1976 Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. Note: Time units are in 7 days. The output indicates that both components are statistically significant. Technical note The estimation procedure will not always converge when estimating the parameters of the local-level model. If the series does not vary enough in the random level, modeled by the random walk, and in the stationary shocks around the random level, the estimation procedure will not converge because it will be unable to set the variance of one of the two components to 0. Take another look at the graphs of unrate and icsa. The extra noise around the random level that can be seen in the graph of icsa allows us to estimate both variances. A closely related point is that it is difficult to estimate the parameters of a local-level model with a stochastic-cycle component because the series must have enough variation to identify the variance of the random-walk component, the variance of the idiosyncratic term, and the parameters of the stochastic-cycle component. In some cases, series that look like candidates for the local-level model are best modeled as random-walk models with stochastic-cycle components. In fact, convergence can be a problem for most of the models in ucm. Convergence problems occur most often when there is insufficient variation to estimate the variances of the components in the model. When there is insufficient variation to estimate the variances of the components in the model, the optimization routine will fail to converge as it attempts to set the variance equal to 0. This usually shows up in the iteration log when the log likelihood gets stuck at a particular value and the message (not concave) or (backed up) is displayed repeatedly. When this happens, use the 712 ucm — Unobserved-components model iterate() option to limit the number of iterations, look to see which of the variances is being driven to 0, and drop that component from the model. (This technique is a method to obtain convergence to interpretable estimates, not a model-selection method.) Example 6 We might suspect that there is some serial correlation in the idiosyncratic shock. Alternatively, we could include a cyclical component to model the stationary time-dependence in the series. In the example below, we add a stochastic-cycle model for the stationary cyclical process, but we drop the idiosyncratic term and use a random-walk model instead of the local-level model. We change the model because it is difficult to estimate the variance of the idiosyncratic term along with the parameters of a stationary cyclical component. . ucm icsa, model(rwalk) cycle(1) searching for initial values .................... (setting technique to bhhh) Iteration 0: log likelihood = -10055.453 Iteration 1: log likelihood = -10047.163 Iteration 2: log likelihood = -10047.146 (backed up) Iteration 3: log likelihood = -10047.146 (backed up) Iteration 4: log likelihood = -10047.145 (backed up) (switching technique to nr) Iteration 5: log likelihood = -10047.142 (not concave) Iteration 6: log likelihood = -9889.8038 Iteration 7: log likelihood = -9883.967 Iteration 8: log likelihood = -9883.3818 (not concave) Iteration 9: log likelihood = -9883.3817 (not concave) Iteration 10: log likelihood = -9883.3815 (not concave) Iteration 11: log likelihood = -9883.3789 (not concave) Iteration 12: log likelihood = -9883.376 (not concave) Iteration 13: log likelihood = -9883.3684 (not concave) Iteration 14: log likelihood = -9882.0687 (not concave) Iteration 15: log likelihood = -9881.6615 Iteration 16: log likelihood = -9881.4451 Iteration 17: log likelihood = -9881.4441 Iteration 18: log likelihood = -9881.4441 Refining estimates: Iteration 0: log likelihood = -9881.4441 Iteration 1: log likelihood = -9881.4441 Unobserved-components model Components: random walk, order 1 cycle Sample: 07jan1967 - 19feb2011 Number of obs Wald chi2(2) Log likelihood = -9881.4441 Prob > chi2 = = = 2,303 23.04 0.0000 OIM Std. Err. z P>|z| [95% Conf. Interval] 1.469633 .1644576 .3855657 .0349537 3.81 4.71 0.000 0.000 .7139385 .0959495 2.225328 .2329656 97.90982 149.7323 8.320047 9.980798 11.77 15.00 0.000 0.000 81.60282 130.1703 114.2168 169.2943 icsa Coef. frequency damping var(level) var(cycle1) Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. Note: Time units are in 7 days. ucm — Unobserved-components model 713 Although the output indicates that the model fits well, the small estimate of the damping parameter indicates that the random components will be widely distributed at the central frequency. To get a better idea of the dispersion of the components, we look at the estimated spectral density of the stationary cyclical component. . psdensity sdensity3 omega3 .145 UCM cycle 1 spectral density .15 .155 .16 .165 .17 . line sdensity3 omega3 0 1 2 3 Frequency The graph shows that the random components that make up the cyclical component are diffusely distributed at a central frequency. Comparing UCM and ARIMA, revisited Example 7 Including lags of the dependent variable is an alternative method for modeling serially correlated errors. The estimated coefficients on the lags of the dependent variable estimate the coefficients in an autoregressive model for the stationary cyclical component; see Harvey (1989, 47–48) for a discussion. Including lags of the dependent variable should be viewed as an alternative to the stochastic-cycle model for the stationary cyclical component. In this example, we use the large-sample equivalence of the random-walk model with pth order autoregressive errors and an ARIMA(p, 1, 0) to illustrate this point. 714 ucm — Unobserved-components model In the output below, we include 2 lags of the dependent variable in the random-walk UCM. . ucm icsa L(1/2).icsa, model(rwalk) searching for initial values .......... (setting technique to bhhh) Iteration 0: log likelihood = -10026.649 Iteration 1: log likelihood = -9947.9671 Iteration 2: log likelihood = -9896.4778 Iteration 3: log likelihood = -9890.8199 Iteration 4: log likelihood = -9890.3202 (switching technique to nr) Iteration 5: log likelihood = -9890.1546 Iteration 6: log likelihood = -9889.561 Iteration 7: log likelihood = -9889.5608 Refining estimates: Iteration 0: log likelihood = -9889.5608 Iteration 1: log likelihood = -9889.5608 Unobserved-components model Components: random walk Sample: 21jan1967 - 19feb2011 Number of obs Wald chi2(2) Prob > chi2 Log likelihood = -9889.5608 OIM Std. Err. = = = 2,301 271.88 0.0000 icsa Coef. icsa L1. L2. -.3250633 -.1794686 .0205148 .0205246 -15.85 -8.74 0.000 0.000 -.3652715 -.2196961 -.2848551 -.1392411 317.6474 9.36691 33.91 0.000 299.2886 336.0062 var(level) z P>|z| [95% Conf. Interval] Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. Note: Time units are in 7 days. Now we use arima to estimate the parameters of an asymptotically equivalent ARIMA(2,1,0) model. (We specify the technique(nr) option so that arima will compute the observed information matrix standard errors that ucm computes.) We use nlcom to compute a point estimate and a standard error for the variance, which is directly comparable to the one produced by ucm. ucm — Unobserved-components model 715 . arima icsa, noconstant arima(2,1,0) technique(nr) Iteration 0: Iteration 1: log likelihood = -9896.4584 log likelihood = -9896.458 ARIMA regression Sample: 14jan1967 - 19feb2011 Number of obs Wald chi2(2) Prob > chi2 Log likelihood = -9896.458 OIM Std. Err. z P>|z| = = = 2302 271.95 0.0000 D.icsa Coef. [95% Conf. Interval] ar L1. L2. -.3249383 -.1793353 .0205036 .0205088 -15.85 -8.74 0.000 0.000 -.3651246 -.2195317 -.284752 -.1391388 /sigma 17.81606 .2625695 67.85 0.000 17.30143 18.33068 ARMA Note: The test of the variance against zero is one sided, and the two-sided confidence interval is truncated at zero. . nlcom _b[sigma:_cons]^2 _nl_1: _b[sigma:_cons]^2 D.icsa Coef. _nl_1 317.4119 Std. Err. 9.355904 z 33.93 P>|z| [95% Conf. Interval] 0.000 299.0746 335.7491 It is no accident that the parameter estimates and the standard errors from the two estimators are so close. As the sample size grows the differences in the parameter estimates and the estimated standard errors will go to 0, because the two estimators are equivalent in large samples. Models for the trend and idiosyncratic components A general model that allows for fixed or stochastic trends in τt is given by τt = τt−1 + βt−1 + ηt βt = βt−1 + ξt (2) (3) Following Harvey (1989), we define 11 flexible models for yt that specify both τt and t in (1). These models place restrictions on the general model specified in (2) and (3) and on t in (1). In other words, these models jointly specify τt and t . To any of these models, a cyclical component, a seasonal component, or exogenous variables may be added. 716 ucm — Unobserved-components model Table 1. Models for the trend and idiosyncratic components Model name Syntax option No trend or idiosyncratic component model(none) Model No trend model(ntrend) yt =t Deterministic constant model(dconstant) yt =µ + t µ=µ Local level model(llevel) yt =µt + t µt =µt−1 + ηt Random walk model(rwalk) yt =µt µt =µt−1 + ηt Deterministic trend model(dtrend) yt =µt + t µt =µt−1 + β β=β Local level with deterministic trend model(lldtrend) yt =µt + t µt =µt−1 + β + ηt β=β Random walk with drift model(rwdrift) yt =µt µt =µt−1 + β + ηt β=β Local linear trend model(lltrend) yt =µt + t µt =µt−1 + βt−1 + ηt βt =βt−1 + ξt Smooth trend model(strend) yt =µt + t µt =µt−1 + βt−1 βt =βt−1 + ξt Random trend model(rtrend) yt =µt µt =µt−1 + βt−1 βt =βt−1 + ξt The majority of the models available in ucm are designed for nonstationary time series. The deterministic-trend model incorporates a first-order deterministic time-trend in the model. The locallevel, random-walk, local-level-with-deterministic-trend, and random-walk-with-drift models are for modeling series with first-order stochastic trends. A series with a dth-order stochastic trend must be differenced d times to be stationary. The local-linear-trend, smooth-trend, and random-trend models are for modeling series with second-order stochastic trends. The no-trend-or-idiosyncratic-component model is useful for using ucm to model stationary series with cyclical components or seasonal components and perhaps exogenous variables. The no-trend and the deterministic-constant models are useful for using ucm to model stationary series with seasonal components or exogenous variables. ucm — Unobserved-components model 717 Seasonal component A seasonal component models cyclical behavior in a time series that occurs at known seasonal periodicities. A seasonal component is modeled in the time domain; the period of the cycle is specified as the number of time periods required for the cycle to complete. Example 8 Let’s begin by considering a series that displays a seasonal effect. Below we plot a monthly series containing the number of new cases of mumps in New York City between January 1928 and December 1972. (See Hipel and McLeod [1994] for the source and further discussion of this dataset.) 0 number of mumps cases reported in NYC 500 1000 1500 2000 . use http://www.stata-press.com/data/r14/mumps, clear . tsline mumps 1930m1 1940m1 1950m1 Month 1960m1 1970m1 The graph reveals recurring spikes at regular intervals, which we suspect to be seasonal effects. The series may or may not be stationary; the graph evidence is not definitive. Deterministic seasonal effects are a standard method of incorporating seasonality into a model. In a model with a constant term, the s deterministic seasonal effects are modeled as s parameters subject to the constraint that they sum to zero; formally, γt + γt−1 + · · · + γt−(s−1) = 0. A stochastic-seasonal model is a more flexible alternative that allows the seasonal effects at time t to sum to ζt , a zero-mean, finite-variance, i.i.d. random variable; formally, γt + γt−1 + · · · + γt−(s−1) = ζt . In the output below, we model the seasonal effects by a stochastic-seasonal model, we allow for the series to follow a random walk, and we include a stationary cyclical component. 718 ucm — Unobserved-components model . ucm mumps, seasonal(12) cycle(1) searching for initial values ................... (setting technique to bhhh) Iteration 0: log likelihood = -3270.1579 Iteration 1: log likelihood = -3257.7346 Iteration 2: log likelihood = -3257.1819 Iteration 3: log likelihood = -3249.857 Iteration 4: log likelihood = -3249.5035 (switching technique to nr) Iteration 5: log likelihood = -3248.9152 Iteration 6: log likelihood = -3248.724 Iteration 7: log likelihood = -3248.7138 Iteration 8: log likelihood = -3248.7138 Refining estimates: Iteration 0: log likelihood = -3248.7138 Iteration 1: log likelihood = -3248.7138 Unobserved-components model Components: random walk, seasonal(12), order 1 cycle Sample: 1928m1 - 1972m6 Number of obs Wald chi2(2) Log likelihood = -3248.7138 Prob > chi2 OIM Std. Err. mumps Coef. frequency damping .3863607 .8405622 .0282037 .0197933 221.2131 4.151639 12228.17 140.5179 4.383442 813.8394 var(level) var(seasonal) var(cycle1) z = = = 534 2141.69 0.0000 P>|z| [95% Conf. Interval] 13.70 42.47 0.000 0.000 .3310824 .8017681 .4416389 .8793563 1.57 0.95 15.03 0.058 0.172 0.000 0 0 10633.08 496.6231 12.74303 13823.27 Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The output indicates that the trend and seasonal variances may not be necessary. When the variance of the seasonal component is zero, the seasonal component becomes deterministic. Below we estimate the parameters of a model that includes deterministic seasonal effects and a stationary cyclical component. . ucm mumps ibn.month, model(none) cycle(1) searching for initial values ....... (setting technique to bhhh) Iteration 0: log likelihood = -3934.0178 Iteration 1: log likelihood = -3615.0098 Iteration 2: log likelihood = -3502.5223 Iteration 3: log likelihood = -3407.9644 Iteration 4: log likelihood = -3368.2264 (switching technique to nr) Iteration 5: log likelihood = -3352.1077 Iteration 6: log likelihood = -3284.5218 Iteration 7: log likelihood = -3283.0588 Iteration 8: log likelihood = -3283.0284 Iteration 9: log likelihood = -3283.0284 Refining estimates: Iteration 0: log likelihood = -3283.0284 Iteration 1: log likelihood = -3283.0284 ucm — Unobserved-components model Unobserved-components model Components: order 1 cycle Sample: 1928m1 - 1972m6 Number of obs Wald chi2(14) Prob > chi2 Log likelihood = -3283.0284 OIM Std. Err. mumps Coef. cycle1 frequency damping .3272753 .844874 .0262922 .0184994 480.5095 561.9174 832.8666 894.0747 869.6568 770.1562 433.839 218.2394 140.686 148.5876 215.0958 330.2232 13031.53 z = = = 719 534 3404.29 0.0000 P>|z| [95% Conf. Interval] 12.45 45.67 0.000 0.000 .2757436 .8086157 .3788071 .8811322 32.67128 32.66999 32.67696 32.64568 32.56282 32.48587 32.50165 32.56712 32.64138 32.69067 32.70311 32.68906 14.71 17.20 25.49 27.39 26.71 23.71 13.35 6.70 4.31 4.55 6.58 10.10 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 416.475 497.8854 768.8209 830.0904 805.8348 706.4851 370.1369 154.409 76.7101 84.51508 150.9989 266.1538 544.544 625.9494 896.9122 958.0591 933.4787 833.8274 497.541 282.0698 204.662 212.6602 279.1927 394.2926 798.2719 16.32 0.000 11466.95 14596.11 mumps month 1 2 3 4 5 6 7 8 9 10 11 12 var(cycle1) Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. The output indicates that each of these components is statistically significant. Technical note In a stochastic model for the seasonal component, the seasonal effects sum to the random variable ζt ∼ i.i.d. N (0, σζ2 ): s−1 X γt = − γt−j + ζt j=1 Stored results Because ucm is estimated using sspace, most of the sspace stored results appear after ucm. Not all of these results are relevant for ucm; programmers wishing to treat ucm results as sspace results should see Stored results of [TS] sspace. See Methods and formulas for the state-space representation of UCMs, and see [TS] sspace for more documentation that relates to all the stored results. 720 ucm — Unobserved-components model ucm stores the following in e(): Scalars e(N) e(k) e(k aux) e(k eq) e(k dv) e(k cycles) e(df m) e(ll) e(chi2) e(p) e(tmin) e(tmax) e(stationary) e(rank) e(ic) e(rc) e(converged) Macros e(cmd) e(cmdline) e(depvar) e(covariates) e(indeps) e(tvar) e(eqnames) e(model) e(title) e(tmins) e(tmaxs) e(chi2type) e(vce) e(vcetype) e(opt) e(initial values) e(technique) e(tech steps) e(properties) e(estat cmd) e(predict) e(marginsok) e(marginsnotok) e(asbalanced) e(asobserved) Matrices e(b) e(Cns) e(ilog) e(gradient) e(V) e(V modelbased) Functions e(sample) number of observations number of parameters number of auxiliary parameters number of equations in e(b) number of dependent variables number of stochastic cycles model degrees of freedom log likelihood χ2 significance minimum time in sample maximum time in sample 1 if the estimated parameters indicate a stationary model, 0 otherwise rank of VCE number of iterations return code 1 if converged, 0 otherwise ucm command as typed unoperated names of dependent variables in observation equations list of covariates independent variables variable denoting time within groups names of equations type of model title in estimation output formatted minimum time formatted maximum time Wald; type of model χ2 test vcetype specified in vce() title used to label Std. Err. type of optimization type of initial values maximization technique iterations taken in maximization technique b V program used to implement estat program used to implement predict predictions allowed by margins predictions disallowed by margins factor variables fvset as asbalanced factor variables fvset as asobserved parameter vector constraints matrix iteration log (up to 20 iterations) gradient vector variance–covariance matrix of the estimators model-based variance marks estimation sample ucm — Unobserved-components model 721 Methods and formulas Methods and formulas are presented under the following headings: Introduction State-space formulation Cyclical component extensions Introduction The general form of UCMs can be expressed as yt = τt + γt + ψt + xt β + t where τt is the trend, γt is the seasonal component, ψt is the cycle, β is the regression coefficients for regressors xt , and t is the idiosyncratic error with variance σ2 . We can decompose the trend as τt = µt µt = µt−1 + αt−1 + ηt αt = αt−1 + ξt where µt is the local level, αt is the local slope, and ηt and ξt are i.i.d. normal errors with mean 0 and variance ση2 and σξ2 , respectively. Next consider the seasonal component, γt , with a period of s time units. Ignoring a seasonal Ps−1 disturbance term, the seasonal effects will sum to zero, j=0 γt−j = 0. Adding a normal error term, ωt , with mean 0 and variance σω2 , we express the seasonal component as γt = − s−1 X γt−j + ωt j=1 Finally, the cyclical component, ψt , is a function of the frequency λ, in radians, and a unit-less scaling variable ρ, termed the damping effect, 0 < ρ < 1. We require two equations to express the cycle: ψt = ψt−1 ρ cos λ + ψet−1 ρ sin λ + κt ψet = −ψt−1 ρ sin λ + ψet−1 ρ cos λ + κ et where the κt and κ et disturbances are normally distributed with mean 0 and variance σκ2 . The disturbance terms t , ηt , ξt , ωt , κt , and κ et are independent. State-space formulation ucm is an easy-to-use implementation of the state-space command sspace, with special modifications, where the local linear trend components, seasonal components, and cyclical components are states of the state-space model. The state-space model can be expressed in matrix form as yt = Dzt + Fxt + t zt = Azt−1 + Cζt 722 ucm — Unobserved-components model where yt , t = 1, . . . , T , are the observations and zt are the unobserved states. The number of states, m, depends on the model specified. The k × 1 vector xt contains the exogenous variables specified as indepvars, and the 1 × k vector F contains the regression coefficients to be estimated. t is the observation equation disturbance, and the m0 × 1 vector ζt contains the state equation disturbances, where m0 ≤ m. Finally, C is a m × m0 matrix of zeros and ones. These recursive equations are evaluated using the diffuse Kalman filter of De Jong (1991). Below we give the state-space matrix structures for a local linear trend with a stochastic seasonal component, with a period of 4 time units, and an order-2 cycle. The state vector, zt , and its transition matrix, A, have the structure 1 0  0  0  A = 0  0  0  0 0   1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   0 −1 −1 −1 0 0 0 0   0 1 0 0 0 0 0 0   0 0 1 0 0 0 0 0   0 0 0 0 ρ cos λ ρ sin λ 1 0   0 0 0 0 −ρ sin λ ρ cos λ 0 1   0 0 0 0 0 0 ρ cos λ ρ sin λ 0 0 0 0 0 0 −ρ sin λ ρ cos λ 1 0  0  0  C = 0  0  0  0 0  0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0  0 0  0  0  0  0  0  0 1  µt  αt     γt     γt−1     γ zt =   t−2   ψt,1     ψet,1     ψt,2  ψet,2   ηt  ξt    ζt =  ωt    κt κ et  D = (1 0 1 0 0 1 0 0 0) Cyclical component extensions Recall that the stochastic cyclical model is given by ∗ ψt = ρ(ψt−1 cos λc + ψt−1 sin λc ) + κt,1 ∗ ψt∗ = ρ(−ψt−1 sin λc + ψt−1 cos λc ) + κt,2 where κt,j ∼ i.i.d. N (0, σκ2 ) and 0 < ρ < 1 is a damping effect. The cycle is variance-stationary when ρ < 1 because Var(ψt ) = σκ2 /(1 − ρ). We will express a UCM with a cyclical component added to a trend as yt = µt + ψt + t where µt can be any of the trend parameterizations discussed earlier. ucm — Unobserved-components model 723 Higher-order cycles, k = 2 or k = 3, are defined as ∗ ψt,j = ρ(ψt−1,j cos λc + ψt−1,j sin λc ) + ψt−1,j+1 ∗ ∗ ∗ ψt,j = ρ(−ψt−1,j sin λc + ψt−1,j cos λc ) + ψt−1,j+1 for j < k , and ∗ ψt,k = ρ(ψt−1,k cos λc + ψt−1,k sin λc ) + κt,1 ∗ ∗ ψt,k = ρ(−ψt−1,k sin λc + ψt−1,k cos λc ) + κt,2 Harvey and Trimbur (2003) discuss the properties of this model and its state-space formulation.   Andrew Charles Harvey (1947– ) is a British econometrician. After receiving degrees in economics and statistics from the University of York and the London School of Economics and working for a period in Kenya, he has worked as a teacher and researcher at the University of Kent, the London School of Economics, and now the University of Cambridge. Harvey’s interests are centered on time series, especially state-space models, signal extraction, volatility, and changes in quantiles.  References Burns, A. F., and W. C. Mitchell. 1946. Measuring Business Cycles. New York: National Bureau of Economic Research. De Jong, P. 1991. The diffuse Kalman filter. Annals of Statistics 19: 1073–1083. Durbin, J., and S. J. Koopman. 2012. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press. Fuller, W. A. 1996. Introduction to Statistical Time Series. 2nd ed. New York: Wiley. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. . 1993. Time Series Models. 2nd ed. Cambridge, MA: MIT Press. Harvey, A. C., and T. M. Trimbur. 2003. General model-based filters for extracting cycles and trends in economic time series. The Review of Economics and Statistics 85: 244–255. Hipel, K. W., and A. I. McLeod. 1994. Time Series Modelling of Water Resources and Environmental Systems. Amsterdam: Elsevier. Priestley, M. B. 1981. Spectral Analysis and Time Series. London: Academic Press. Trimbur, T. M. 2006. Properties of higher order stochastic cycles. Journal of Time Series Analysis 27: 1–17. Wei, W. W. S. 2006. Time Series Analysis: Univariate and Multivariate Methods. 2nd ed. Boston: Pearson. 724 ucm — Unobserved-components model Also see [TS] ucm postestimation — Postestimation tools for ucm [TS] arima — ARIMA, ARMAX, and other dynamic regression models [TS] sspace — State-space models [TS] tsfilter — Filter a time-series, keeping only selected periodicities [TS] tsset — Declare data to be time-series data [TS] tssmooth — Smooth and forecast univariate time-series data [TS] var — Vector autoregressive models [U] 20 Estimation and postestimation commands Title ucm postestimation — Postestimation tools for ucm Postestimation commands Methods and formulas predict Also see estat Remarks and examples Postestimation commands The following postestimation commands are of special interest after ucm: Command Description estat period psdensity display cycle periods in time units estimate the spectral density The following standard postestimation commands are also available: Command Description estat ic estat summarize estat vce estimates forecast lincom Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing and inference for linear combinations of coefficients likelihood-ratio test point estimates, standard errors, testing and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest nlcom predict predictnl test testnl 725 726 ucm postestimation — Postestimation tools for ucm predict Description for predict predict creates a new variable containing predictions such as linear predictions, trend components, seasonal components, cyclical components, and standardized and unstandardized residuals. The root mean squared error is available for all predictions. All predictions are also available as static one-stepahead predictions or as dynamic multistep predictions, and you can control when dynamic predictions begin. Menu for predict Statistics > Postestimation Syntax for predict    predict type stub* | newvarlist statistic  if   in   , statistic options  Description Main xb trend seasonal cycle residuals rstandard linear prediction using exogenous variables trend component seasonal component cyclical component residuals standardized residuals These statistics are available both in and out of sample; type predict the estimation sample. options . . . if e(sample) . . . if wanted only for Description Options rmse(stub* | newvarlist) put estimated root mean squared errors of predicted statistics in the new variable dynamic(time constant) begin dynamic forecast at specified time Advanced smethod(method) method for predicting unobserved components method Description onestep smooth filter predict using past information predict using all sample information predict using past and contemporaneous information ucm postestimation — Postestimation tools for ucm 727 Options for predict  Main  xb, trend, seasonal, cycle, residuals, and rstandard specify the statistic to be predicted. xb, the default, calculates the linear predictions using the exogenous variables. xb may not be used with the smethod(filter) option. trend estimates the unobserved trend component. seasonal estimates the unobserved seasonal component. cycle estimates the unobserved cyclical component. residuals calculates the residuals in the equation for the dependent variable. residuals may not be specified with dynamic(). rstandard calculates the standardized residuals, which are the residuals normalized to have unit variances. rstandard may not be specified with the smethod(filter), smethod(smooth), or dynamic() option.   Options rmse(stub* | newvarlist) puts the root mean squared errors of the predicted statistic into the specified new variable. Multiple variables are only required for predicting cycles of a model that has more than one cycle. The root mean squared errors measure the variances due to the disturbances but do not account for estimation error. The stub* syntax is for models with multiple cycles, where you provide the prefix and predict will add a numeric suffix for each predicted cycle. dynamic(time constant) specifies when predict should start producing dynamic forecasts. The specified time constant must be in the scale of the time variable specified in tsset, and the time constant must be inside a sample for which observations on the dependent variable are available. For example, dynamic(tq(2008q4)) causes dynamic predictions to begin in the fourth quarter of 2008, assuming that your time variable is quarterly; see [D] datetime. If the model contains exogenous variables, they must be present for the whole predicted sample. dynamic() may not be specified with the rstandard, residuals, or smethod(smooth) option.   Advanced smethod(method) specifies the method for predicting the unobserved components. smethod() causes different amounts of information on the dependent variable to be used in predicting the components at each time period. smethod(onestep), the default, causes predict to estimate the components at each time period using previous information on the dependent variable. The Kalman filter is performed on previous periods, but only the one-step predictions are made for the current period. smethod(smooth) causes predict to estimate the components at each time period using all the sample data by the Kalman smoother. smethod(smooth) may not be specified with the rstandard option. smethod(filter) causes predict to estimate the components at each time period using previous and contemporaneous data by the Kalman filter. The Kalman filter is performed on previous periods and the current period. smethod(filter) may not be specified with the xb option. 728 ucm postestimation — Postestimation tools for ucm estat Description for estat estat period transforms an estimated central frequency to an estimated period after ucm. Menu for estat Statistics > Postestimation Syntax for estat estat period  , options options  Description Main level(#) cformat(% fmt) set confidence level; default is level(95) numeric format Options for estat period  Options  level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals. cformat(% fmt) sets the display format for the table numeric values. The default is cformat(%9.0g). Remarks and examples We assume that you have already read [TS] ucm. In this entry, we illustrate some features of predict after using ucm to estimate the parameters of an unobserved-components model. All predictions after ucm depend on the unobserved components, which are estimated recursively using a Kalman filter. Changing the sample can alter the state estimates, which can change all other predictions. ucm postestimation — Postestimation tools for ucm 729 Example 1 We begin by modeling monthly data on the median duration of employment spells in the United States. We include a stochastic-seasonal component because the data have not been seasonally adjusted. . use http://www.stata-press.com/data/r14/uduration2 (BLS data, not seasonally adjusted) . ucm duration, seasonal(12) cycle(1) difficult searching for initial values .................... (setting technique to bhhh) Iteration 0: log likelihood = -409.79557 Iteration 1: log likelihood = -403.3831 Iteration 2: log likelihood = -403.37373 (backed up) Iteration 3: log likelihood = -403.36901 (backed up) Iteration 4: log likelihood = -403.36663 (backed up) (switching technique to nr) Iteration 5: log likelihood = -403.36656 (backed up) Iteration 6: log likelihood = -397.87756 (not concave) Iteration 7: log likelihood = -396.47272 (not concave) Iteration 8: log likelihood = -394.50895 Iteration 9: log likelihood = -392.9516 (not concave) Iteration 10: log likelihood = -389.38063 (not concave) Iteration 11: log likelihood = -388.59491 Iteration 12: log likelihood = -388.30078 Iteration 13: log likelihood = -388.25766 Iteration 14: log likelihood = -388.25675 Iteration 15: log likelihood = -388.25675 Refining estimates: Iteration 0: log likelihood = -388.25675 Iteration 1: log likelihood = -388.25675 Unobserved-components model Components: random walk, seasonal(12), order 1 cycle Sample: 1967m7 - 2008m12 Number of obs Wald chi2(2) Prob > chi2 Log likelihood = -388.25675 498 7.17 0.0277 OIM Std. Err. z P>|z| [95% Conf. Interval] 1.641531 .2671232 .7250323 .1050168 2.26 2.54 0.024 0.011 .2204938 .0612939 3.062568 .4729524 .1262922 .0017289 .0641496 .0221428 .0009647 .0211839 5.70 1.79 3.03 0.000 0.037 0.001 .0828932 0 .0226299 .1696912 .0036196 .1056693 duration Coef. frequency damping var(level) var(seasonal) var(cycle1) = = = Note: Model is not stationary. Note: Tests of variances against zero are one sided, and the two-sided confidence intervals are truncated at zero. 730 ucm postestimation — Postestimation tools for ucm Below we predict the trend and the seasonal components to get a look at the model fit. predict strend, trend predict season, seasonal tsline duration strend, name(trend) nodraw legend(rows(1)) tsline season, name(season) yline(0,lwidth(vthin)) nodraw graph combine trend season, rows(2) 4 6 8 10 12 14 . . . . . 1970m1 1980m1 1990m1 Month 2010m1 trend, onestep −2 seasonal, onestep −1 0 1 2 median duration of unemployment 2000m1 1970m1 1980m1 1990m1 Month 2000m1 2010m1 The trend tracks the data well. That the seasonal component appears to change over time indicates that the stochastic-seasonal component might fit better than a deterministic-seasonal component. ucm postestimation — Postestimation tools for ucm 731 Example 2 In this example, we use the model to forecast the median unemployment duration. We use the root mean squared error of the prediction to compute a confidence interval of our dynamic predictions. Recall that the root mean squared error accounts for variances due to the disturbances but not due to the estimation error. . tsappend, add(12) . predict duration_f, dynamic(tm(2009m1)) rmse(rmse) . scalar z = invnormal(0.95) . generate lbound = duration_f - z*rmse if tm>=tm(2008m12) (497 missing values generated) 6 8 10 12 14 . generate ubound = duration_f + z*rmse if tm>=tm(2008m12) (497 missing values generated) . label variable lbound "90% forecast interval" . twoway (tsline duration duration_f if tm>=tm(2006m1)) > (tsrline lbound ubound if tm>=tm(2008m12)), > ysize(2) xtitle("") legend(cols(1)) 2006m1 2007m1 2008m1 2009m1 2010m1 median duration of unemployment xb prediction, duration, dynamic(tm(2009m1)) 90% forecast interval/ubound The model forecasts a large temporary increase in the median duration of unemployment. Methods and formulas For details on the ucm postestimation methods, see [TS] sspace postestimation. See [TS] psdensity for the methods used to estimate the spectral density. Also see [TS] ucm — Unobserved-components model [TS] psdensity — Parametric spectral density estimation after arima, arfima, and ucm [TS] sspace postestimation — Postestimation tools for sspace [U] 20 Estimation and postestimation commands Title var intro — Introduction to vector autoregressive models Description Remarks and examples References Also see Description Stata has a suite of commands for fitting, forecasting, interpreting, and performing inference on vector autoregressive (VAR) models and structural vector autoregressive (SVAR) models. The suite includes several commands for estimating and interpreting impulse–response functions (IRFs), dynamicmultiplier functions, and forecast-error variance decompositions (FEVDs). The table below describes the available commands. Fitting a VAR or SVAR var [TS] var svar [TS] var svar varbasic [TS] varbasic Model diagnostics and inference varstable [TS] varstable varsoc [TS] varsoc varwle [TS] varwle vargranger [TS] vargranger varlmar [TS] varlmar varnorm [TS] varnorm Fit vector autoregressive models Fit structural vector autoregressive models Fit a simple VAR and graph IRFs or FEVDs Check the stability condition of VAR or SVAR estimates Obtain lag-order selection statistics for VARs and VECMs Obtain Wald lag-exclusion statistics after var or svar Perform pairwise Granger causality tests after var or svar Perform LM test for residual autocorrelation after var or svar Test for normally distributed disturbances after var or svar Forecasting after fitting a VAR or SVAR fcast compute [TS] fcast compute Compute dynamic forecasts after var, svar, or vec fcast graph [TS] fcast graph Graph forecasts after fcast compute Working with IRFs, dynamic-multiplier functions, and FEVDs irf [TS] irf Create and analyze IRFs, dynamic-multiplier functions, and FEVDs This entry provides an overview of vector autoregressions and structural vector autoregressions. More rigorous treatments can be found in Hamilton (1994), Lütkepohl (2005), and Amisano and Giannini (1997). Stock and Watson (2001) provide an excellent nonmathematical treatment of vector autoregressions and their role in macroeconomics. Becketti (2013) provides an excellent introduction to VAR analysis with an emphasis on how it is done in practice. 732 var intro — Introduction to vector autoregressive models 733 Remarks and examples Remarks are presented under the following headings: Introduction to VARs Introduction to SVARs Short-run SVAR models Long-run restrictions IRFs and FEVDs Introduction to VARs A VAR is a model in which K variables are specified as linear functions of p of their own lags, p lags of the other K − 1 variables, and possibly additional exogenous variables. Algebraically, a p-order VAR model, written VAR(p), with exogenous variables xt is given by yt = v + A1 yt−1 + · · · + Ap yt−p + B0 xt + B1 xt−1 + · · · + Bs xt−s + ut t ∈ {−∞, ∞} (1) where yt = (y1t , . . . , yKt )0 is a K × 1 random vector, A1 through Ap are K × K matrices of parameters, xt is an M × 1 vector of exogenous variables, B0 through Bs are K × M matrices of coefficients, v is a K × 1 vector of parameters, and ut is assumed to be white noise; that is, E(ut ) = 0, E(ut u0t ) = Σ, and E(ut u0s ) = 0 for t 6= s There are K 2 × p + K × (M (s + 1) + 1) parameters in the equation for yt , and there are {K × (K + 1)}/2 parameters in the covariance matrix Σ. One way to reduce the number of parameters is to specify an incomplete VAR, in which some of the A or B matrices are set to zero. Another way is to specify linear constraints on some of the coefficients in the VAR. A VAR can be viewed as the reduced form of a system of dynamic simultaneous equations. Consider the system f 1 xt + W f 2 xt−2 + · · · + W f s xt−s + et W0 yt = a + W1 yt−1 + · · · + Wp yt−p + W (2) where a is a K × 1 vector of parameters, each Wi , i = 0, . . . , p, is a K × K matrix of parameters, and et is a K × 1 disturbance vector. In the traditional dynamic simultaneous equations approach, sufficient restrictions are placed on the Wi to obtain identification. Assuming that W0 is nonsingular, (2) can be rewritten as yt =W0−1 a + W0−1 W1 yt−1 + · · · + W0−1 Wp yt−p f 1 xt + W−1 W f 2 xt−2 + · · · + W−1 W f s xt−s + W−1 et + W0−1 W 0 0 0 which is a VAR with v = W0−1 a Ai = W0−1 Wi fi Bi = W0−1 W ut = W0−1 et (3) 734 var intro — Introduction to vector autoregressive models The cross-equation error variance–covariance matrix Σ contains all the information about contemporaneous correlations in a VAR and may be the VAR’s greatest strength and its greatest weakness. Because no questionable a priori assumptions are imposed, fitting a VAR allows the dataset to speak for itself. However, without imposing some restrictions on the structure of Σ, we cannot make a causal interpretation of the results. If we make additional technical assumptions, we can derive another representation of the VAR in (1). If the VAR is stable (see [TS] varstable), we can rewrite yt as yt = µ + ∞ X i=0 Di xt−i + ∞ X Φi ut−i (4) i=0 where µ is the K × 1 time-invariant mean of the process and Di and Φi are K × M and K × K matrices of parameters, respectively. Equation (4) states that the process by which the variables in yt fluctuate about their time-invariant means, µ, is completely determined by the parameters in Di and Φi and the (infinite) past history of the exogenous variables xt and the independent and identically distributed (i.i.d.) shocks or innovations, ut−1 , ut−2 , . . . . Equation (4) is known as the vector moving-average representation of the VAR. The Di are the dynamic-multiplier functions, or transfer functions. The moving-average coefficients Φi are also known as the simple IRFs at horizon i. The precise relationships between the VAR parameters and the Di and Φi are derived in Methods and formulas of [TS] irf create. The joint distribution of yt is determined by the distributions of xt and ut and the parameters v, Bi , and Ai . Estimating the parameters in a VAR requires that the variables in yt and xt be covariance stationary, meaning that their first two moments exist and are time invariant. If the yt are not covariance stationary, but their first differences are, a vector error-correction model (VECM) can be used. See [TS] vec intro and [TS] vec for more information about those models. If the ut form a zero mean, i.i.d. vector process, and yt and xt are covariance stationary and are not correlated with the ut , consistent and efficient estimates of the Bi , the Ai , and v are obtained via seemingly unrelated regression, yielding estimators that are asymptotically normally distributed. When the equations for the variables yt have the same set of regressors, equation-by-equation OLS estimates are the conditional maximum likelihood estimates. Much of the interest in VAR models is focused on the forecasts, IRFs, dynamic-multiplier functions, and the FEVDs, all of which are functions of the estimated parameters. Estimating these functions is straightforward, but their asymptotic standard errors are usually obtained by assuming that ut forms a zero mean, i.i.d. Gaussian (normal) vector process. Also, some of the specification tests for VARs have been derived using the likelihood-ratio principle and the stronger Gaussian assumption. In the absence of contemporaneous exogenous variables, the disturbance variance–covariance matrix contains all the information about contemporaneous correlations among the variables. VARs are sometimes classified into three types by how they account for this contemporaneous correlation. (See Stock and Watson [2001] for one derivation of this taxonomy.) A reduced-form VAR, aside from estimating the variance–covariance matrix of the disturbance, does not try to account for contemporaneous correlations. In a recursive VAR, the K variables are assumed to form a recursive dynamic structural equation model in which the first variable is a function of lagged variables, the second is a function of contemporaneous values of the first variable and lagged values, and so on. In a structural VAR, the theory you are working with places restrictions on the contemporaneous correlations that are not necessarily recursive. Stata has two commands for fitting reduced-form VARs: var and varbasic. var allows for constraints to be imposed on the coefficients. varbasic allows you to fit a simple VAR quickly without constraints and graph the IRFs. var intro — Introduction to vector autoregressive models 735 Because fitting a VAR of the correct order can be important, varsoc offers several methods for choosing the lag order p of the VAR to fit. After fitting a VAR, and before proceeding with inference, interpretation, or forecasting, checking that the VAR fits the data is important. varlmar can be used to check for autocorrelation in the disturbances. varwle performs Wald tests to determine whether certain lags can be excluded. varnorm tests the null hypothesis that the disturbances are normally distributed. varstable checks the eigenvalue condition for stability, which is needed to interpret the IRFs and IRFs. Introduction to SVARs As discussed in [TS] irf create, a problem with VAR analysis is that, because Σ is not restricted to be a diagonal matrix, an increase in an innovation to one variable provides information about the innovations to other variables. This implies that no causal interpretation of the simple IRFs is possible: there is no way to determine whether the shock to the first variable caused the shock in the second variable or vice versa. However, suppose that we had a matrix P such that Σ = PP0 . We can then show that the variables in P−1 ut have zero mean and that E{P−1 ut (P−1 ut )0 } = IK . We could rewrite (4) as yt = µ + ∞ X Φs PP−1 ut−s s=0 =µ+ ∞ X Θs P−1 ut−s s=0 =µ+ ∞ X Θs wt−s (5) s=0 where Θs = Φs P and wt = P−1 ut . If we had such a P, the wk would be mutually orthogonal, and the Θs would allow the causal interpretation that we seek. SVAR models provide a framework for estimation of and inference about a broad class of P matrices. As described in [TS] irf create, the estimated P matrices can then be used to estimate structural IRFs and structural FEVDs. There are two types of SVAR models. Short-run SVAR models identify a P matrix by placing restrictions on the contemporaneous correlations between the variables. Long-run SVAR models, on the other hand, do so by placing restrictions on the long-term accumulated effects of the innovations. Short-run SVAR models A short-run SVAR model without exogenous variables can be written as A(IK − A1 L − A2 L2 − · · · − Ap Lp )yt = At = Bet (6) where L is the lag operator; A, B, and A1 , . . . , Ap are K × K matrices of parameters; t is a K × 1 vector of innovations with t ∼ N (0, Σ) and E[t 0s ] = 0K for all s 6= t; and et is a K × 1 vector of orthogonalized disturbances; that is, et ∼ N (0, IK ) and E[et e0s ] = 0K for all s 6= t. These transformations of the innovations allow us to analyze the dynamics of the system in terms of a change to an element of et . In a short-run SVAR model, we obtain identification by placing restrictions on A and B, which are assumed to be nonsingular. 736 var intro — Introduction to vector autoregressive models Equation (6) implies that Psr = A−1 B, where Psr is the P matrix identified by a particular short-run SVAR model. The latter equality in (6) implies that At 0t A0 = Bet e0t B0 Taking the expectation of both sides yields Σ = Psr P0sr Assuming that the underlying VAR is stable (see [TS] varstable for a discussion of stability), we can invert the autoregressive representation of the model in (6) to an infinite-order, moving-average representation of the form ∞ X yt = µ + Θsr (7) s et−s s=0 whereby yt is expressed in terms of the mutually orthogonal, unit-variance structural innovations et . The Θsr s contain the structural IRFs at horizon s. In a short-run SVAR model, the A and B matrices model all the information about contemporaneous correlations. The B matrix also scales the innovations ut to have unit variance. This allows the structural IRFs constructed from (7) to be interpreted as the effect on variable i of a one-time unit increase in the structural innovation to variable j after s periods. Psr identifies the structural IRFs by defining a transformation of Σ, and Psr is identified by the restrictions placed on the parameters in A and B. Because there are only K(K + 1)/2 free parameters in Σ, only K(K + 1)/2 parameters may be estimated in an identified Psr . Because there are 2K 2 total parameters in A and B, the order condition for identification requires that at least 2K 2 − K(K + 1)/2 restrictions be placed on those parameters. Just as in the simultaneous-equations framework, this order condition is necessary but not sufficient. Amisano and Giannini (1997) derive a method to check that an SVAR model is locally identified near some specified values for A and B. Before moving on to models with long-run constraints, consider these limitations. We cannot place constraints on the elements of A in terms of the elements of B, or vice versa. This limitation is imposed by the form of the check for identification derived by Amisano and Giannini (1997). As noted in Methods and formulas of [TS] var svar, this test requires separate constraint matrices for the parameters in A and B. Also, we cannot mix short-run and long-run constraints. Long-run restrictions A general short-run SVAR has the form A(IK − A1 L − A2 L2 − · · · − Ap Lp )yt = Bet To simplify the notation, let Ā = (IK − A1 L − A2 L2 − · · · − Ap Lp ). The model is assumed to be stable (see [TS] varstable), so Ā−1 , the matrix of estimated long-run effects of the reduced-form VAR shocks, is well defined. Constraining A to be an identity matrix allows us to rewrite this equation as yt = Ā−1 Bet which implies that Σ = BB0 . Thus C = Ā−1 B is the matrix of long-run responses to the orthogonalized shocks, and yt = Cet var intro — Introduction to vector autoregressive models 737 In long-run models, the constraints are placed on the elements of C, and the free parameters are estimated. These constraints are often exclusion restrictions. For instance, constraining C[1, 2] to be zero can be interpreted as setting the long-run response of variable 1 to the structural shocks driving variable 2 to be zero. Stata’s svar command estimates the parameters of structural VARs. See [TS] var svar for more information and examples. IRFs and FEVDs IRFs describe how the K endogenous variables react over time to a one-time shock to one of the K disturbances. Because the disturbances may be contemporaneously correlated, these functions do not explain how variable i reacts to a one-time increase in the innovation to variable j after s periods, holding everything else constant. To explain this, we must start with orthogonalized innovations so that the assumption to hold everything else constant is reasonable. Recursive VARs use a Cholesky decomposition to orthogonalize the disturbances and thereby obtain structurally interpretable IRFs. Structural VARs use theory to impose sufficient restrictions, which need not be recursive, to decompose the contemporaneous correlations into orthogonal components. FEVDs are another tool for interpreting how the orthogonalized innovations affect the K variables over time. The FEVD from j to i gives the fraction of the s-step forecast-error variance of variable i that can be attributed to the j th orthogonalized innovation. Dynamic–multiplier functions describe how the endogenous variables react over time to a unit change in an exogenous variable. This is a different experiment from that in IRFs and FEVDs because dynamic-multiplier functions consider a change in an exogenous variable instead of a shock to an endogenous variable. irf create estimates IRFs, Cholesky orthogonalized IRFs, dynamic-multiplier functions, and structural IRFs and their standard errors. It also estimates Cholesky and structural FEVDs. The irf graph, irf cgraph, irf ograph, irf table, and irf ctable commands graph and tabulate these estimates. Stata also has several other commands to manage IRF and FEVD results. See [TS] irf for a description of these commands. fcast compute computes dynamic forecasts and their standard errors from VARs. fcast graph graphs the forecasts that are generated using fcast compute. VARs allow researchers to investigate whether one variable is useful in predicting another variable. A variable x is said to Granger-cause a variable y if, given the past values of y , past values of x are useful for predicting y . The Stata command vargranger performs Wald tests to investigate Granger causality between the variables in a VAR. References Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer. Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101–115. Watson, M. W. 1994. Vector autoregressions and cointegration. In Vol. 4 of Handbook of Econometrics, ed. R. F. Engle and D. L. McFadden. Amsterdam: Elsevier. 738 var intro — Introduction to vector autoregressive models Also see [TS] var — Vector autoregressive models [TS] var svar — Structural vector autoregressive models [TS] vec intro — Introduction to vector error-correction models [TS] vec — Vector error-correction models [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs Title var — Vector autoregressive models Description Options Acknowledgment Quick start Remarks and examples References Menu Stored results Also see Syntax Methods and formulas Description var fits a multivariate time-series regression of each dependent variable on lags of itself and on lags of all the other dependent variables. var also fits a variant of vector autoregressive (VAR) models known as the VARX model, which also includes exogenous variables. See [TS] var intro for a list of commands that are used in conjunction with var. Quick start Vector autoregressive model for dependent variables y1, y2, and y3 and their first and second lags using tsset data var y1 y2 y3 As above, but include second and third lags instead of first and second var y1 y2 y3, lags(2 3) Add exogenous variables x1 and x2 var y1 y2 y3, lags(2 3) exog(x1 x2) As above, but make a small-sample degrees-of-freedom adjustment var y1 y2 y3, lags(2 3) exog(x1 x2) dfk Menu Statistics > Multivariate time series > Vector autoregression (VAR) 739 740 var — Vector autoregressive models Syntax var depvarlist  if   in   , options  Description options Model noconstant lags(numlist) exog(varlist) suppress constant term use lags numlist in the VAR use exogenous variables varlist Model 2 constraints(numlist) nolog iterate(#) tolerance(#) noisure dfk small nobigf apply specified linear constraints suppress SURE iteration log set maximum number of iterations for SURE; default is iterate(1600) set convergence tolerance of SURE use one-step SURE make small-sample degrees-of-freedom adjustment report small-sample t and F statistics do not compute parameter vector for coefficients implicitly set to zero Reporting level(#) lutstats nocnsreport display options set confidence level; default is level(95) report Lütkepohl lag-order selection statistics do not display constraints control columns and column formats, row spacing, and line width coeflegend display legend instead of statistics You must tsset your data before using var; see [TS] tsset. depvarlist and varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands. coeflegend does not appear in the dialog box. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. Options   Model noconstant; see [R] estimation options. lags(numlist) specifies the lags to be included in the model. The default is lags(1 2). This option takes a numlist and not simply an integer for the maximum lag. For example, lags(2) would include only the second lag in the model, whereas lags(1/2) would include both the first and second lags in the model. See [U] 11.1.8 numlist and [U] 11.4.4 Time-series varlists for more discussion of numlists and lags. exog(varlist) specifies a list of exogenous variables to be included in the VAR. var — Vector autoregressive models  741  Model 2 constraints(numlist); see [R] estimation options. nolog suppresses the log from the iterated seemingly unrelated regression algorithm. By default, the iteration log is displayed when the coefficients are estimated through iterated seemingly unrelated regression. When the constraints() option is not specified, the estimates are obtained via OLS, and nolog has no effect. For this reason, nolog can be specified only when constraints() is specified. Similarly, nolog cannot be combined with noisure. iterate(#) specifies an integer that sets the maximum number of iterations when the estimates are obtained through iterated seemingly unrelated regression. By default, the limit is 1,600. When constraints() is not specified, the estimates are obtained using OLS, and iterate() has no effect. For this reason, iterate() can be specified only when constraints() is specified. Similarly, iterate() cannot be combined with noisure. tolerance(#) specifies a number greater than zero and less than 1 for the convergence tolerance of the iterated seemingly unrelated regression algorithm. By default, the tolerance is 1e-6. When the constraints() option is not specified, the estimates are obtained using OLS, and tolerance() has no effect. For this reason, tolerance() can be specified only when constraints() is specified. Similarly, tolerance() cannot be combined with noisure. noisure specifies that the estimates in the presence of constraints be obtained through one-step seemingly unrelated regression. By default, var obtains estimates in the presence of constraints through iterated seemingly unrelated regression. When constraints() is not specified, the estimates are obtained using OLS, and noisure has no effect. For this reason, noisure can be specified only when constraints() is specified. dfk specifies that a small-sample degrees-of-freedom adjustment be used when estimating Σ, the error variance–covariance matrix. Specifically, 1/(T − m) is used instead of the large-sample divisor 1/T , where m is the average number of parameters in the functional form for yt over the K equations. small causes var to report small-sample t and F statistics instead of the large-sample normal and chi-squared statistics. nobigf requests that var not save the estimated parameter vector that incorporates coefficients that have been implicitly constrained to be zero, such as when some lags have been omitted from a model. e(bf) is used for computing asymptotic standard errors in the postestimation commands irf create and fcast compute; see [TS] irf create and [TS] fcast compute. Therefore, specifying nobigf implies that the asymptotic standard errors will not be available from irf create and fcast compute. See Fitting models with some lags excluded.   Reporting level(#); see [R] estimation options. lutstats specifies that the Lütkepohl (2005) versions of the lag-order selection statistics be reported. See Methods and formulas in [TS] varsoc for a discussion of these statistics. nocnsreport; see [R] estimation options. display options: noci, nopvalues, vsquish, cformat(% fmt), pformat(% fmt), sformat(% fmt), and nolstretch; see [R] estimation options. The following option is available with var but is not shown in the dialog box: coeflegend; see [R] estimation options. 742 var — Vector autoregressive models Remarks and examples Remarks are presented under the following headings: Introduction Fitting models with some lags excluded Fitting models with exogenous variables Fitting models with constraints on the coefficients Introduction A VAR is a model in which K variables are specified as linear functions of p of their own lags, p lags of the other K − 1 variables, and possibly exogenous variables. A VAR with p lags is usually denoted a VAR(p). For more information, see [TS] var intro. Example 1: VAR model To illustrate the basic usage of var, we replicate the example in Lütkepohl (2005, 77–78). The data consists of three variables: the first difference of the natural log of investment, dln inv; the first difference of the natural log of income, dln inc; and the first difference of the natural log of consumption, dln consump. The dataset contains data through the fourth quarter of 1982, though Lütkepohl uses only the observations through the fourth quarter of 1978. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . tsset time variable: qtr, 1960q1 to 1982q4 delta: 1 quarter var — Vector autoregressive models . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lutstats dfk Vector autoregression Sample: 1960q4 - 1978q4 Number of obs = Log likelihood = 606.307 (lutstats) AIC = FPE = 2.18e-11 HQIC = Det(Sigma_ml) = 1.23e-11 SBIC = Equation Parms RMSE R-sq chi2 P>chi2 dln_inv dln_inc dln_consump 7 7 7 Coef. .046148 .011719 .009445 Std. Err. 0.1286 0.1142 0.2513 z 9.736909 8.508289 22.15096 P>|z| 73 -24.63163 -24.40656 -24.06686 0.1362 0.2032 0.0011 [95% Conf. Interval] dln_inv dln_inv L1. L2. -.3196318 -.1605508 .1254564 .1249066 -2.55 -1.29 0.011 0.199 -.5655218 -.4053633 -.0737419 .0842616 dln_inc L1. L2. .1459851 .1146009 .5456664 .5345709 0.27 0.21 0.789 0.830 -.9235013 -.9331388 1.215472 1.162341 dln_consump L1. L2. .9612288 .9344001 .6643086 .6650949 1.45 1.40 0.148 0.160 -.3407922 -.369162 2.26325 2.237962 _cons -.0167221 .0172264 -0.97 0.332 -.0504852 .0170409 dln_inc dln_inv L1. L2. .0439309 .0500302 .0318592 .0317196 1.38 1.58 0.168 0.115 -.018512 -.0121391 .1063739 .1121995 dln_inc L1. L2. -.1527311 .0191634 .1385702 .1357525 -1.10 0.14 0.270 0.888 -.4243237 -.2469067 .1188615 .2852334 dln_consump L1. L2. .2884992 -.0102 .168699 .1688987 1.71 -0.06 0.087 0.952 -.0421448 -.3412354 .6191431 .3208353 _cons .0157672 .0043746 3.60 0.000 .0071932 .0243412 dln_consump dln_inv L1. L2. -.002423 .0338806 .0256763 .0255638 -0.09 1.33 0.925 0.185 -.0527476 -.0162235 .0479016 .0839847 dln_inc L1. L2. .2248134 .3549135 .1116778 .1094069 2.01 3.24 0.044 0.001 .005929 .1404798 .4436978 .5693471 dln_consump L1. L2. -.2639695 -.0222264 .1359595 .1361204 -1.94 -0.16 0.052 0.870 -.5304451 -.2890175 .0025062 .2445646 _cons .0129258 .0035256 3.67 0.000 .0060157 .0198358 743 744 var — Vector autoregressive models The output has two parts: a header and the standard Stata output table for the coefficients, standard errors, and confidence intervals. The header contains summary statistics for each equation in the VAR and statistics used in selecting the lag order of the VAR. Although there are standard formulas for all the lag-order statistics, Lütkepohl (2005) gives different versions of the three information criteria that drop the constant term from the likelihood. To obtain the Lütkepohl (2005) versions, we specified the lutstats option. The formulas for the standard and Lütkepohl versions of these statistics are given in Methods and formulas of [TS] varsoc. The dfk option specifies that the small-sample divisor 1/(T − m) be used in estimating Σ instead of the maximum likelihood (ML) divisor 1/T , where m is the average number of parameters included in each of the K equations. All the lag-order statistics are computed using the ML estimator of Σ. Thus, specifying dfk will not change the computed lag-order statistics, but it will change the estimated variance–covariance matrix. Also, when dfk is specified, a dfk-adjusted log likelihood is computed and stored in e(ll dfk). The lag() option takes a numlist of lags. To specify a model that includes the first and second lags, type . var y1 y2 y3, lags(1/2) not . var y1 y2 y3, lags(2) because the latter specification would fit a model that included only the second lag. Fitting models with some lags excluded To fit a model that has only a fourth lag, that is, yt = v + A4 yt−4 + ut you would specify the lags(4) option. Doing so is equivalent to fitting the more general model yt = v + A1 yt−1 + A2 yt−2 + A3 yt−3 + A4 yt−4 + ut with A1 , A2 , and A3 constrained to be 0. When you fit a model with some lags excluded, var estimates the coefficients included in the specification (A4 here) and stores these estimates in e(b). To obtain the asymptotic standard errors for impulse–response functions and other postestimation statistics, Stata needs the complete set of parameter estimates, including those that are constrained to be zero; var stores them in e(bf). Because you can specify models for which the full set of parameter estimates exceeds Stata’s limit on the size of matrices, the nobigf option specifies that var not compute and store e(bf). This means that the asymptotic standard errors of the postestimation functions cannot be obtained, although bootstrap standard errors are still available. Building e(bf) can be time consuming, so if you do not need this full matrix, and speed is an issue, use nobigf. var — Vector autoregressive models 745 Fitting models with exogenous variables Example 2: VAR model with exogenous variables We use the exog() option to include exogenous variables in a VAR. . var dln_inc dln_consump if qtr<=tq(1978q4), dfk exog(dln_inv) Vector autoregression Sample: 1960q4 - 1978q4 Number of obs Log likelihood = 478.5663 AIC FPE = 9.64e-09 HQIC Det(Sigma_ml) = 6.93e-09 SBIC Equation Parms RMSE R-sq chi2 P>chi2 dln_inc dln_consump 6 6 Coef. .011917 .009197 Std. Err. 0.0702 0.2794 z 5.059587 25.97262 P>|z| = = = = 73 -12.78264 -12.63259 -12.40612 0.4087 0.0001 [95% Conf. Interval] dln_inc dln_inc L1. L2. -.1343345 .0120331 .1391074 .1380346 -0.97 0.09 0.334 0.931 -.4069801 -.2585097 .1383111 .2825759 dln_consump L1. L2. .3235342 .0754177 .1652769 .1648624 1.96 0.46 0.050 0.647 -.0004027 -.2477066 .647471 .398542 dln_inv _cons .0151546 .0145136 .0302319 .0043815 0.50 3.31 0.616 0.001 -.0440987 .0059259 .074408 .0231012 dln_consump dln_inc L1. L2. .2425719 .3487949 .1073561 .1065281 2.26 3.27 0.024 0.001 .0321578 .1400036 .452986 .5575862 dln_consump L1. L2. -.3119629 -.0128502 .1275524 .1272325 -2.45 -0.10 0.014 0.920 -.5619611 -.2622213 -.0619648 .2365209 dln_inv _cons .0503616 .0131013 .0233314 .0033814 2.16 3.87 0.031 0.000 .0046329 .0064738 .0960904 .0197288 All the postestimation commands for analyzing VARs work when exogenous variables are included in a model, but the asymptotic standard errors for the h-step-ahead forecasts are not available. Fitting models with constraints on the coefficients var permits model specifications that include constraints on the coefficient, though var does not allow for constraints on Σ. See [TS] var intro and [TS] var svar for ways to constrain Σ. 746 var — Vector autoregressive models Example 3: VAR model with constraints In the first example, we fit a full VAR(2) to a three-equation model. The coefficients in the equation for dln inv were jointly insignificant, as were the coefficients in the equation for dln inc; and many individual coefficients were not significantly different from zero. In this example, we constrain the coefficient on L2.dln inc in the equation for dln inv and the coefficient on L2.dln consump in the equation for dln inc to be zero. . constraint 1 [dln_inv]L2.dln_inc = 0 . constraint 2 [dln_inc]L2.dln_consump = 0 . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lutstats dfk > constraints(1 2) Estimating VAR coefficients Iteration 1: tolerance = .00737681 Iteration 2: tolerance = 3.998e-06 Iteration 3: tolerance = 2.730e-09 Vector autoregression Sample: 1960q4 - 1978q4 Log likelihood = 606.2804 FPE = 1.77e-14 Det(Sigma_ml) = 1.05e-14 Equation Parms dln_inv dln_inc dln_consump ( 1) ( 2) 6 6 7 RMSE .043895 .011143 .008981 [dln_inv]L2.dln_inc = 0 [dln_inc]L2.dln_consump = 0 Number of obs (lutstats) AIC HQIC SBIC R-sq chi2 P>chi2 0.1280 0.1141 0.2512 9.842338 8.584446 22.86958 0.0798 0.1268 0.0008 = = = = 73 -31.69254 -31.46747 -31.12777 var — Vector autoregressive models Coef. Std. Err. z P>|z| 747 [95% Conf. Interval] dln_inv dln_inv L1. L2. -.320713 -.1607084 .1247512 .124261 -2.57 -1.29 0.010 0.196 -.5652208 -.4042555 -.0762051 .0828386 dln_inc L1. L2. .1195448 5.66e-19 .5295669 9.33e-18 0.23 0.06 0.821 0.952 -.9183873 -1.77e-17 1.157477 1.89e-17 dln_consump L1. L2. 1.009281 1.008079 .623501 .5713486 1.62 1.76 0.106 0.078 -.2127586 -.1117438 2.231321 2.127902 _cons -.0162102 .016893 -0.96 0.337 -.0493199 .0168995 dln_inc dln_inv L1. L2. .0435712 .0496788 .0309078 .0306455 1.41 1.62 0.159 0.105 -.017007 -.0103852 .1041495 .1097428 dln_inc L1. L2. -.1555119 .0122353 .1315854 .1165811 -1.18 0.10 0.237 0.916 -.4134146 -.2162595 .1023908 .2407301 dln_consump L1. L2. .29286 -1.53e-18 .1568345 1.89e-17 1.87 -0.08 0.062 0.935 -.01453 -3.85e-17 .6002501 3.55e-17 _cons .015689 .003819 4.11 0.000 .0082039 .0231741 dln_consump dln_inv L1. L2. -.0026229 .0337245 .0253538 .0252113 -0.10 1.34 0.918 0.181 -.0523154 -.0156888 .0470696 .0831378 dln_inc L1. L2. .2224798 .3469758 .1094349 .1006026 2.03 3.45 0.042 0.001 .0079912 .1497984 .4369683 .5441532 dln_consump L1. L2. -.2600227 -.0146825 .1321622 .1117618 -1.97 -0.13 0.049 0.895 -.519056 -.2337315 -.0009895 .2043666 _cons .0129149 .003376 3.83 0.000 .0062981 .0195317 None of the free parameter estimates changed by much. Whereas the coefficients in the equation dln inv are now significant at the 10% level, the coefficients in the equation for dln inc remain jointly insignificant. 748 var — Vector autoregressive models Stored results var stores the following in e(): Scalars e(N) e(N gaps) e(k) e(k eq) e(k dv) e(df eq) e(df m) e(df r) e(ll) e(ll dfk) e(obs #) e(k #) e(df m#) e(df r#) e(r2 #) e(ll #) e(chi2 #) e(F #) e(rmse #) e(aic) e(hqic) e(sbic) e(fpe) e(mlag) e(tmin) e(tmax) e(detsig) e(detsig ml) e(rank) number of observations number of gaps in sample number of parameters number of equations in e(b) number of dependent variables average number of parameters in an equation model degrees of freedom residual degrees of freedom (small only) log likelihood dfk adjusted log likelihood (dfk only) number of observations on equation # number of parameters in equation # model degrees of freedom for equation # residual degrees of freedom for equation # (small only) R-squared for equation # log likelihood for equation # x2 for equation # F statistic for equation # (small only) root mean squared error for equation # Akaike information criterion Hannan–Quinn information criterion Schwarz–Bayesian information criterion final prediction error highest lag in VAR first time period in sample maximum time determinant of e(Sigma) bml determinant of Σ rank of e(V) var — Vector autoregressive models Macros e(cmd) e(cmdline) e(depvar) e(endog) e(exog) e(exogvars) e(eqnames) e(lags) e(exlags) e(title) e(nocons) e(constraints) e(cnslist var) e(small) e(lutstats) e(timevar) e(tsfmt) e(dfk) e(properties) e(predict) e(marginsok) e(marginsnotok) e(marginsdefault) Matrices e(b) e(Cns) e(Sigma) e(V) e(bf) e(exlagsm) e(G) Functions e(sample) 749 var command as typed names of dependent variables names of endogenous variables, if specified names of exogenous variables, and their lags, if specified names of exogenous variables, if specified names of equations lags in model lags of exogenous variables in model, if specified title in estimation output nocons, if noconstant is specified constraints, if specified list of specified constraints small, if specified lutstats, if specified time variable specified in tsset format for the current time variable dfk, if specified b V program used to implement predict predictions allowed by margins predictions disallowed by margins default predict() specification for margins coefficient vector constraints matrix b matrix Σ variance–covariance matrix of the estimators constrained coefficient vector matrix mapping lags to exogenous variables Gamma matrix; see Methods and formulas marks estimation sample Methods and formulas When there are no constraints placed on the coefficients, the VAR(p) is a seemingly unrelated regression model with the same explanatory variables in each equation. As discussed in Lütkepohl (2005) and Greene (2008, 696), performing linear regression on each equation produces the maximum likelihood estimates of the coefficients. The estimated coefficients can then be used to calculate the residuals, which in turn are used to estimate the cross-equation error variance–covariance matrix Σ. Per Lütkepohl (2005), we write the VAR(p) with exogenous variables as yt = AYt−1 + B0 xt + ut where yt is the K × 1 vector of endogenous variables, A is a K × Kp matrix of coefficients, B0 is a K × M matrix of coefficients, (5) 750 var — Vector autoregressive models xt is the M × 1 vector of exogenous variables, ut is the K × 1 vector of white noise innovations, and   yt ..  Yt is the Kp × 1 matrix given by Yt =  . yt−p+1 Although (5) is easier to read, the formulas are much easier to manipulate if it is instead written as Y = BZ + U where Y= (y1 , . . . , yT ) Y is K × T B= (A, B0 ) B is K × (Kp + M )   Y0 . . . , YT −1 Z= Z is (Kp + M ) × T x1 . . . , xT U= (u1 , . . . , uT ) U is K × T Intercept terms in the model are included in xt . If there are no exogenous variables and no intercept terms in the model, xt is empty. The coefficients are estimated by iterated seemingly unrelated regression. Because the estimation is actually performed by reg3, the methods are documented in [R] reg3. See [P] makecns for more on estimation with constraints. b be the matrix of residuals that are obtained via Y − BZ b , where B b is the matrix of estimated Let U coefficients. Then the estimator of Σ is b 0U b b = 1U Σ Te By default, the maximum likelihood divisor of Te = T is used. When dfk is specified, a small-sample degrees-of-freedom adjustment is used; then, Te = T −m where m is the average number of parameters per equation in the functional form for yt over the K equations. small specifies that Wald tests after var be assumed to have F or t distributions instead of chi-squared or standard normal distributions. The standard errors from each equation are computed using the degrees of freedom for the equation. The “gamma” matrix stored in e(G) referred to in Stored results is the (Kp + 1) × (Kp + 1) matrix given by T 1X (1, Yt0 )(1, Yt0 )0 T t=1 The formulas for the lag-order selection criteria and the log likelihood are discussed in [TS] varsoc. var — Vector autoregressive models 751 Acknowledgment We thank Christopher F. Baum of the Department of Economics at Boston College and author of the Stata Press books An Introduction to Modern Econometrics Using Stata and An Introduction to Stata Programming for his helpful comments. References Box-Steffensmeier, J. M., J. R. Freeman, M. P. Hitt, and J. C. W. Pevehouse. 2014. Time Series Analysis for the Social Science. New York: Cambridge University Press. Greene, W. H. 2008. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101–115. Watson, M. W. 1994. Vector autoregressions and cointegration. In Vol. 4 of Handbook of Econometrics, ed. R. F. Engle and D. L. McFadden. Amsterdam: Elsevier. Also see [TS] var postestimation — Postestimation tools for var [TS] tsset — Declare data to be time-series data [TS] dfactor — Dynamic-factor models [TS] forecast — Econometric model forecasting [TS] mgarch — Multivariate GARCH models [TS] sspace — State-space models [TS] var svar — Structural vector autoregressive models [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [TS] vec — Vector error-correction models [U] 20 Estimation and postestimation commands [TS] var intro — Introduction to vector autoregressive models Title var postestimation — Postestimation tools for var Postestimation commands Methods and formulas predict Also see margins Remarks and examples Postestimation commands The following postestimation commands are of special interest after var: Command Description fcast compute fcast graph irf vargranger varlmar varnorm varsoc varstable varwle obtain dynamic forecasts graph dynamic forecasts obtained from fcast compute create and analyze IRFs and FEVDs Granger causality tests LM test for autocorrelation in residuals test for normally distributed residuals lag-order selection criteria check stability condition of estimates Wald lag-exclusion statistics The following standard postestimation commands are also available: Command Description estat ic estat summarize estat vce estimates forecast lincom Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest margins marginsplot nlcom predict predictnl test testnl 752 var postestimation — Postestimation tools for var 753 predict Description for predict predict creates a new variable containing predictions such as linear predictions and residuals. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic newvar  if   in   , statistic equation(eqno | eqname)  Description Main xb stdp residuals linear prediction; the default standard error of the linear prediction residuals These statistics are available both in and out of sample; type predict the estimation sample. . . . if e(sample) . . . if wanted only for Options for predict  Main  xb, the default, calculates the linear prediction for the specified equation. stdp calculates the standard error of the linear prediction for the specified equation. residuals calculates the residuals. equation(eqno | eqname) specifies the equation to which you are referring. equation() is filled in with one eqno or eqname for options xb, stdp, and residuals. For example, equation(#1) would mean that the calculation is to be made for the first equation, equation(#2) would mean the second, and so on. You could also refer to the equation by its name; thus, equation(income) would refer to the equation named income and equation(hours), to the equation named hours. If you do not specify equation(), the results are the same as if you specified equation(#1). For more information on using predict after multiple-equation estimation commands, see [R] predict. 754 var postestimation — Postestimation tools for var margins Description for margins margins estimates margins of response for linear predictions. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . )  predict(statistic . . . ) . . . statistic Description default xb stdp residuals linear predictions for each equation linear prediction for a specified equation not allowed with margins not allowed with margins   options xb defaults to the first equation. Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins. Remarks and examples Remarks are presented under the following headings: Model selection and inference Forecasting Model selection and inference See the following sections for information on model selection and inference after var. [TS] [TS] [TS] [TS] [TS] [TS] [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs vargranger — Perform pairwise Granger causality tests after var or svar varlmar — Perform LM test for residual autocorrelation after var or svar varnorm — Test for normally distributed disturbances after var or svar varsoc — Obtain lag-order selection statistics for VARs and VECMs varstable — Check the stability condition of VAR or SVAR estimates varwle — Obtain Wald lag-exclusion statistics after var or svar  var postestimation — Postestimation tools for var 755 Forecasting Two types of forecasts are available after you fit a VAR(p): a one-step-ahead forecast and a dynamic h-step-ahead forecast. The one-step-ahead forecast produces a prediction of the value of an endogenous variable in the current period by using the estimated coefficients, the past values of the endogenous variables, and any exogenous variables. If you include contemporaneous values of exogenous variables in your model, you must have observations on the exogenous variables that are contemporaneous with the period in which the prediction is being made to compute the prediction. In Stata terms, these one-stepahead predictions are just the standard linear predictions available after any estimation command. Thus predict, xb eq(eqno | eqname) produces one-step-ahead forecasts for the specified equation. predict, stdp eq(eqno | eqname) produces the standard error of the linear prediction for the specified equation. The standard error of the forecast includes an estimate of the variability due to innovations, whereas the standard error of the linear prediction does not. The dynamic h-step-ahead forecast begins by using the estimated coefficients, the lagged values of the endogenous variables, and any exogenous variables to predict one step ahead for each endogenous variable. Then the one-step-ahead forecast produces two-step-ahead forecasts for each endogenous variable. The process continues for h periods. Because each step uses the predictions of the previous steps, these forecasts are known as dynamic forecasts. See the following sections for information on obtaining forecasts after svar: [TS] fcast compute — Compute dynamic forecasts after var, svar, or vec [TS] fcast graph — Graph forecasts after fcast compute Methods and formulas Formulas for predict predict with the xb option provides the one-step-ahead forecast. If exogenous variables are specified, the forecast is conditional on the exogenous xt variables. Specifying the residuals option causes predict to calculate the errors of the one-step-ahead forecasts. Specifying the stdp option causes predict to calculate the standard errors of the one-step-ahead forecasts. Also see [TS] var — Vector autoregressive models [U] 20 Estimation and postestimation commands Title var svar — Structural vector autoregressive models Description Options Acknowledgment Quick start Remarks and examples References Menu Stored results Also see Syntax Methods and formulas Description svar fits a vector autoregressive model subject to short- or long-run constraints you place on the resulting impulse–response functions (IRFs). Economic theory typically motivates the constraints, allowing a causal interpretation of the IRFs to be made. See [TS] var intro for a list of commands that are used in conjunction with svar. Quick start Structural VAR for y1, y2, and y3 using tsset data with short-run constraints on impulse responses given by predefined matrices A and B svar y1 y2 y3, aeq(A) beq(B) Structural VAR for y1, y2, and y3 with long-run constraint on impulse responses given by the predefined matrix C svar y1 y2 y3, lreq(C) Add exogenous variables x1 and x2 svar y1 y2 y3, lreq(C) exog(x1 x2) As above, but include third and fourth lags of the dependent variables instead of first and second svar y1 y2 y3, lreq(C) exog(x1 x2) lags(3 4) Menu Statistics > Multivariate time series > Structural vector autoregression (SVAR) Syntax Short-run constraints      svar depvarlist if in , aconstraints(constraintsa ) aeq(matrixaeq ) acns(matrixacns ) bconstraints(constraintsb ) beq(matrixbeq ) bcns(matrixbcns )   short run options Long-run constraints      svar depvarlist if in , lrconstraints(constraintslr ) lreq(matrixlreq )   lrcns(matrixlrcns ) long run options 756 var svar — Structural vector autoregressive models short run options 757 Description Model noconstant aconstraints(constraintsa ) ∗ aeq(matrixaeq ) ∗ acns(matrixacns ) ∗ ∗ bconstraints(constraintsb ) beq(matrixbeq ) ∗ bcns(matrixbcns ) lags(numlist) ∗ suppress constant term apply previously defined constraintsa to A define and apply to A equality constraint matrix matrixaeq define and apply to A cross-parameter constraint matrix matrixacns apply previously defined constraintsb to B define and apply to B equality constraint matrix matrixbeq define and apply to B cross-parameter constraint matrixbcns use lags numlist in the underlying VAR Model 2 exog(varlistexog ) varconstraints(constraintsv ) noislog isiterate(#) istolerance(#) noisure dfk small noidencheck nobigf use exogenous variables varlist apply constraintsv to underlying VAR suppress SURE iteration log set maximum number of iterations for SURE; default is isiterate(1600) set convergence tolerance of SURE use one-step SURE make small-sample degrees-of-freedom adjustment report small-sample t and F statistics do not check for local identification do not compute parameter vector for coefficients implicitly set to zero Reporting level(#) full var lutstats nocnsreport display options set confidence level; default is level(95) show constrained parameters in table display underlying var output report Lütkepohl lag-order selection statistics do not display constraints control columns and column formats Maximization maximize options control the maximization process; seldom used coeflegend display legend instead of statistics ∗ aconstraints(constraintsa ), aeq(matrixaeq ), acns(matrixacns ), bconstraints(constraintsb ), beq(matrixbeq ), bcns(matrixbcns ): at least one of these options must be specified. coeflegend does not appear in the dialog box. 758 var svar — Structural vector autoregressive models long run options Description Model noconstant lrconstraints(constraintslr ) ∗ lreq(matrixlreq ) ∗ lrcns(matrixlrcns ) ∗ lags(numlist) suppress constant term apply previously defined constraintslr to C define and apply to C equality constraint matrix matrixlreq define and apply to C cross-parameter constraint matrix matrixlrcns use lags numlist in the underlying VAR Model 2 exog(varlistexog ) varconstraints(constraintsv ) noislog isiterate(#) istolerance(#) noisure dfk small noidencheck nobigf use exogenous variables varlist apply constraintsv to underlying VAR suppress SURE iteration log set maximum number of iterations for SURE; default is isiterate(1600) set convergence tolerance of SURE use one-step SURE make small-sample degrees-of-freedom adjustment report small-sample t and F statistics do not check for local identification do not compute parameter vector for coefficients implicitly set to zero Reporting level(#) full var lutstats nocnsreport display options set confidence level; default is level(95) show constrained parameters in table display underlying var output report Lütkepohl lag-order selection statistics do not display constraints control columns and column formats Maximization maximize options control the maximization process; seldom used coeflegend display legend instead of statistics ∗ lrconstraints(constraintslr ), lreq(matrixlreq ), lrcns(matrixlrcns ): at least one of these options must be specified. coeflegend does not appear in the dialog box. You must tsset your data before using svar; see [TS] tsset. depvarlist and varlistexog may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, fp, rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. var svar — Structural vector autoregressive models 759 Options   Model noconstant; see [R] estimation options. aconstraints(constraintsa ), aeq(matrixaeq ), acns(matrixacns ) bconstraints(constraintsb ), beq(matrixbeq ), bcns(matrixbcns ) These options specify the short-run constraints in an SVAR. To specify a short-run SVAR model, you must specify at least one of these options. The first list of options specifies constraints on the parameters of the A matrix; the second list specifies constraints on the parameters of the B matrix (see Short-run SVAR models). If at least one option is selected from the first list and none are selected from the second list, svar sets B to the identity matrix. Similarly, if at least one option is selected from the second list and none are selected from the first list, svar sets A to the identity matrix. None of these options may be specified with any of the options that define long-run constraints. aconstraints(constraintsa ) specifies a numlist of previously defined Stata constraints to be applied to A during estimation. aeq(matrixaeq ) specifies a matrix that defines a set of equality constraints. This matrix must be square with dimension equal to the number of equations in the underlying VAR. The elements of this matrix must be missing or real numbers. A missing value in the (i, j ) element of this matrix specifies that the (i, j ) element of A is a free parameter. A real number in the (i, j ) element of this matrix constrains the (i, j ) element of A to this real number. For example,  A= 1 0 . 1.5  specifies that A[1, 1] = 1, A[1, 2] = 0, A[2, 2] = 1.5, and A[2, 1] is a free parameter. acns(matrixacns ) specifies a matrix that defines a set of exclusion or cross-parameter equality constraints on A. This matrix must be square with dimension equal to the number of equations in the underlying VAR. Each element of this matrix must be missing, 0, or a positive integer. A missing value in the (i, j ) element of this matrix specifies that no constraint be placed on this element of A. A zero in the (i, j ) element of this matrix constrains the (i, j ) element of A to be zero. Any strictly positive integers must be in two or more elements of this matrix. A strictly positive integer in the (i, j ) element of this matrix constrains the (i, j ) element of A to be equal to all the other elements of A that correspond to elements in this matrix that contain the same integer. For example, consider the matrix  . 1 A= 1 0  Specifying acns(A) in a two-equation SVAR constrains A[2, 1] = A[1, 2] and A[2, 2] = 0 while leaving A[1, 1] free. bconstraints(constraintsb ) specifies a numlist of previously defined Stata constraints to be applied to B during estimation. beq(matrixbeq ) specifies a matrix that defines a set of equality constraints. This matrix must be square with dimension equal to the number of equations in the underlying VAR. The elements of this matrix must be either missing or real numbers. The syntax of implied constraints is analogous to the one described in aeq(), except that it applies to B rather than to A. 760 var svar — Structural vector autoregressive models bcns(matrixbcns ) specifies a matrix that defines a set of exclusion or cross-parameter equality constraints on B. This matrix must be square with dimension equal to the number of equations in the underlying VAR. Each element of this matrix must be missing, 0, or a positive integer. The format of the implied constraints is the same as the one described in the acns() option above. lrconstraints(constraintslr ), lreq(matrixlreq ), lrcns(matrixlrcns ) These options specify the long-run constraints in an SVAR. To specify a long-run SVAR model, you must specify at least one of these options. The list of options specifies constraints on the parameters of the long-run C matrix (see Long-run SVAR models for the definition of C). None of these options may be specified with any of the options that define short-run constraints. lrconstraints(constraintslr ) specifies a numlist of previously defined Stata constraints to be applied to C during estimation. lreq(matrixlreq ) specifies a matrix that defines a set of equality constraints on the elements of C. This matrix must be square with dimension equal to the number of equations in the underlying VAR. The elements of this matrix must be either missing or real numbers. The syntax of implied constraints is analogous to the one described in option aeq(), except that it applies to C. lrcns(matrixlrcns ) specifies a matrix that defines a set of exclusion or cross-parameter equality constraints on C. This matrix must be square with dimension equal to the number of equations in the underlying VAR. Each element of this matrix must be missing, 0, or a positive integer. The syntax of the implied constraints is the same as the one described for the acns() option above. lags(numlist) specifies the lags to be included in the underlying VAR model. The default is lags(1 2). This option takes a numlist and not simply an integer for the maximum lag. For instance, lags(2) would include only the second lag in the model, whereas lags(1/2) would include both the first and second lags in the model. See [U] 11.1.8 numlist and [U] 11.4.4 Time-series varlists for further discussion of numlists and lags.   Model 2 exog(varlistexog ) specifies a list of exogenous variables to be included in the underlying VAR. varconstraints(constraintsv ) specifies a list of constraints to be applied to the coefficients in the underlying VAR. Because svar estimates multiple equations, the constraints must specify the equation name for all but the first equation. noislog prevents svar from displaying the iteration log from the iterated seemingly unrelated regression algorithm. When the varconstraints() option is not specified, the VAR coefficients are estimated via OLS, a noniterative procedure. As a result, noislog may be specified only with varconstraints(). Similarly, noislog may not be combined with noisure. isiterate(#) sets the maximum number of iterations for the iterated seemingly unrelated regression algorithm. The default limit is 1,600. When the varconstraints() option is not specified, the VAR coefficients are estimated via OLS, a noniterative procedure. As a result, isiterate() may be specified only with varconstraints(). Similarly, isiterate() may not be combined with noisure. istolerance(#) specifies the convergence tolerance of the iterated seemingly unrelated regression algorithm. The default tolerance is 1e-6. When the varconstraints() option is not specified, the VAR coefficients are estimated via OLS, a noniterative procedure. As a result, istolerance() may be specified only with varconstraints(). Similarly, istolerance() may not be combined with noisure. var svar — Structural vector autoregressive models 761 noisure specifies that the VAR coefficients be estimated via one-step seemingly unrelated regression when varconstraints() is specified. By default, svar estimates the coefficients in the VAR via iterated seemingly unrelated regression when varconstraints() is specified. When the varconstraints() option is not specified, the VAR coefficient estimates are obtained via OLS, a noniterative procedure. As a result, noisure may be specified only with varconstraints(). dfk specifies that a small-sample degrees-of-freedom adjustment be used when estimating Σ, the covariance matrix of the VAR disturbances. Specifically, 1/(T − m) is used instead of the largesample divisor 1/T , where m is the average number of parameters in the functional form for yt over the K equations. small causes svar to calculate and report small-sample t and F statistics instead of the large-sample normal and chi-squared statistics. noidencheck requests that the Amisano and Giannini (1997) check for local identification not be performed. This check is local to the starting values used. Because of this dependence on the starting values, you may wish to suppress this check by specifying the noidencheck option. However, be careful in specifying this option. Models that are not structurally identified can still converge, thereby producing meaningless results that only appear to have meaning. nobigf requests that svar not compute the estimated parameter vector that incorporates coefficients that have been implicitly constrained to be zero, such as when some lags have been omitted from a model. e(bf) is used for computing asymptotic standard errors in the postestimation commands irf create and fcast compute. Therefore, specifying nobigf implies that the asymptotic standard errors will not be available from irf create and fcast compute. See Fitting models with some lags excluded in [TS] var.   Reporting level(#); see [R] estimation options. full shows constrained parameters in table. var specifies that the output from var also be displayed. By default, the underlying VAR is fit quietly. lutstats specifies that the Lütkepohl versions of the lag-order selection statistics be computed. See Methods and formulas in [TS] varsoc for a discussion of these statistics. nocnsreport; see [R] estimation options. display options: noci, nopvalues, cformat(% fmt), pformat(% fmt), and sformat(% fmt); see [R] estimation options.   Maximization   maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and from(init specs); see [R] maximize. These options are seldom used. The following option is available with svar but is not shown in the dialog box: coeflegend; see [R] estimation options. 762 var svar — Structural vector autoregressive models Remarks and examples Remarks are presented under the following headings: Introduction Short-run SVAR models Long-run SVAR models Introduction This entry assumes that you have already read [TS] var intro and [TS] var; if not, please do. Here we illustrate how to fit SVARs in Stata subject to short-run and long-run restrictions. For more detailed information on SVARs, see Amisano and Giannini (1997) and Hamilton (1994). For good introductions to VARs, see Lütkepohl (2005), Hamilton (1994), Stock and Watson (2001), and Becketti (2013). Short-run SVAR models A short-run SVAR model without exogenous variables can be written as A(IK − A1 L − A2 L2 − · · · − Ap Lp )yt = At = Bet where L is the lag operator, A, B, and A1 , . . . , Ap are K × K matrices of parameters, t is a K × 1 vector of innovations with t ∼ N (0, Σ) and E[t 0s ] = 0K for all s 6= t, and et is a K × 1 vector of orthogonalized disturbances; that is, et ∼ N (0, IK ) and E[et e0s ] = 0K for all s 6= t. These transformations of the innovations allow us to analyze the dynamics of the system in terms of a change to an element of et . In a short-run SVAR model, we obtain identification by placing restrictions on A and B, which are assumed to be nonsingular. Example 1: Short-run just-identified SVAR model Following Sims (1980), the Cholesky decomposition is one method of identifying the impulse– response functions in a VAR; thus, this method corresponds to an SVAR. There are several sets of constraints on A and B that are easily manipulated back to the Cholesky decomposition, and the following example illustrates this point. One way to impose the Cholesky restrictions is to assume an SVAR model of the form e K − A1 − A2 L2 − · · · Ap Lp )yt = Be e t A(I e is a lower triangular matrix with ones on the diagonal and B e is a diagonal matrix. Because where A −1 e e b the P matrix for this model is Psr = A B, its estimate, Psr , obtained by plugging in estimates e and B e , should equal the Cholesky decomposition of Σ. b of A To illustrate, we use the German macroeconomic data discussed in Lütkepohl (2005) and used in [TS] var. In this example, yt = (dln inv, dln inc, dln consump), where dln inv is the first difference of the log of investment, dln inc is the first difference of the log of income, and dln consump is the first difference of the log of consumption. Because the first difference of the natural log of a variable can be treated as an approximation of the percentage change in that variable, we will refer to these variables as percentage changes in inv, inc, and consump, respectively. We will impose the Cholesky restrictions on this system by applying constraint matrices    1 0 0 . 0 A =  . 1 0 and B = 0 . . . 1 0 0 equality constraints with the  0 0 . var svar — Structural vector autoregressive models 763 With these structural restrictions, we assume that the percentage change in inv is not contemporaneously affected by the percentage changes in either inc or consump. We also assume that the percentage change of inc is affected by contemporaneous changes in inv but not consump. Finally, we assume that percentage changes in consump are affected by contemporaneous changes in both inv and inc. The following commands fit an SVAR model with these constraints. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . matrix A = (1,0,0\.,1,0\.,.,1) . matrix B = (.,0,0\0,.,0\0,0,.) . svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), aeq(A) beq(B) Estimating short-run parameters (output omitted ) Structural vector autoregression ( 1) [a_1_1]_cons = 1 ( 2) [a_1_2]_cons = 0 ( 3) [a_1_3]_cons = 0 ( 4) [a_2_2]_cons = 1 ( 5) [a_2_3]_cons = 0 ( 6) [a_3_3]_cons = 1 ( 7) [b_1_2]_cons = 0 ( 8) [b_1_3]_cons = 0 ( 9) [b_2_1]_cons = 0 (10) [b_2_3]_cons = 0 (11) [b_3_1]_cons = 0 (12) [b_3_2]_cons = 0 Sample: 1960q4 - 1978q4 Number of obs = 73 Exactly identified model Log likelihood = 606.307 Coef. Std. Err. z /a_1_1 /a_2_1 /a_3_1 /a_1_2 /a_2_2 /a_3_2 /a_1_3 /a_2_3 /a_3_3 1 -.0336288 -.0435846 0 1 -.424774 0 0 1 (constrained) .0294605 -1.14 .0194408 -2.24 (constrained) (constrained) .0765548 -5.55 (constrained) (constrained) (constrained) /b_1_1 /b_2_1 /b_3_1 /b_1_2 /b_2_2 /b_3_2 /b_1_3 /b_2_3 /b_3_3 .0438796 0 0 0 .0110449 0 0 0 .0072243 .0036315 12.08 (constrained) (constrained) (constrained) .0009141 12.08 (constrained) (constrained) (constrained) .0005979 12.08 P>|z| [95% Conf. Interval] 0.254 0.025 -.0913702 -.0816879 .0241126 -.0054812 0.000 -.5748187 -.2747293 0.000 .036762 .0509972 0.000 .0092534 .0128365 0.000 .0060525 .0083962 The SVAR output has four parts: an iteration log, a display of the constraints imposed, a header with sample and SVAR log-likelihood information, and a table displaying the estimates of the parameters from the A and B matrices. From the output above, we can see that the equality constraint matrices supplied to svar imposed the intended constraints and that the SVAR header informs us that the model we fit is just identified. The estimates of a 2 1, a 3 1, and a 3 2 are all negative. Because the 764 var svar — Structural vector autoregressive models off-diagonal elements of the A matrix contain the negative of the actual contemporaneous effects, the estimated effects are positive, as expected. b and B b are stored in e(A) and e(B), respectively, allowing us to compute the The estimates A estimated Cholesky decomposition. . . . . matrix matrix matrix matrix Aest = e(A) Best = e(B) chol_est = inv(Aest)*Best list chol_est chol_est[3,3] dln_inv dln_inc dln_consump dln_inv .04387957 .00147562 .00253928 dln_inc 0 .01104494 .0046916 dln_consump 0 0 .00722432 svar stores the estimated Σ from the underlying var in e(Sigma). The output below illustrates the computation of the Cholesky decomposition of e(Sigma). It is the same as the output computed from the SVAR estimates. . matrix sig_var = e(Sigma) . matrix chol_var = cholesky(sig_var) . matrix list chol_var chol_var[3,3] dln_inv dln_inc dln_consump dln_inv .04387957 .00147562 .00253928 dln_inc 0 .01104494 .0046916 dln_consump 0 0 .00722432 We might now wonder why we bother obtaining parameter estimates via nonlinear estimation if we can obtain them simply by a transform of the estimates produced by var. When the model is just identified, as in the previous example, the SVAR parameter estimates can be computed via a transform of the VAR estimates. However, when the model is overidentified, such is not the case. Example 2: Short-run overidentified SVAR model The Cholesky decomposition example above fit a just-identified model. This example considers an overidentified model. In example 1, the a 2 1 parameter was not significant, which is consistent with a theory in which changes in our measure of investment affect only changes in income with a lag. We can impose the restriction that a 2 1 is zero and then test this overidentifying restriction. Our A and B matrices are now     1 0 0 . 0 0 A = 0 1 0 and B = 0 . 0 . . 1 0 0 . The output below contains the commands and results we obtained by fitting this model on the Lütkepohl data. . matrix B = (.,0,0\0,.,0\0,0,.) . matrix A = (1,0,0\0,1,0\.,.,1) var svar — Structural vector autoregressive models 765 . svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), aeq(A) beq(B) Estimating short-run parameters (output omitted ) Structural vector autoregression ( 1) [a_1_1]_cons = 1 ( 2) [a_1_2]_cons = 0 ( 3) [a_1_3]_cons = 0 ( 4) [a_2_1]_cons = 0 ( 5) [a_2_2]_cons = 1 ( 6) [a_2_3]_cons = 0 ( 7) [a_3_3]_cons = 1 ( 8) [b_1_2]_cons = 0 ( 9) [b_1_3]_cons = 0 (10) [b_2_1]_cons = 0 (11) [b_2_3]_cons = 0 (12) [b_3_1]_cons = 0 (13) [b_3_2]_cons = 0 Sample: 1960q4 - 1978q4 Overidentified model Coef. Number of obs Log likelihood Std. Err. z P>|z| /a_1_1 /a_2_1 /a_3_1 /a_1_2 /a_2_2 /a_3_2 /a_1_3 /a_2_3 /a_3_3 1 0 -.0435911 0 1 -.4247741 0 0 1 (constrained) (constrained) .0192696 -2.26 (constrained) (constrained) .0758806 -5.60 (constrained) (constrained) (constrained) /b_1_1 /b_2_1 /b_3_1 /b_1_2 /b_2_2 /b_3_2 /b_1_3 /b_2_3 /b_3_3 .0438796 0 0 0 .0111431 0 0 0 .0072243 .0036315 12.08 (constrained) (constrained) (constrained) .0009222 12.08 (constrained) (constrained) (constrained) .0005979 12.08 LR test of identifying restrictions: chi2( = = 73 605.6613 [95% Conf. Interval] 0.024 -.0813589 -.0058233 0.000 -.5734973 -.2760508 0.000 .036762 .0509972 0.000 .0093356 .0129506 0.000 .0060525 .0083962 1)= 1.292 Prob > chi2 = 0.256 The footer in this example reports a test of the overidentifying restriction. The null hypothesis of this test is that any overidentifying restrictions are valid. In the case at hand, we cannot reject this null hypothesis at any of the conventional levels. Example 3: Short-run SVAR model with constraints svar also allows us to place constraints on the parameters of the underlying VAR. We begin by looking at the underlying VAR for the SVARs that we have used in the previous examples. 766 var svar — Structural vector autoregressive models . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4) Vector autoregression Sample: 1960q4 - 1978q4 Number of obs Log likelihood = 606.307 AIC FPE = 2.18e-11 HQIC Det(Sigma_ml) = 1.23e-11 SBIC Equation Parms RMSE R-sq chi2 P>chi2 dln_inv dln_inc dln_consump 7 7 7 Coef. .046148 .011719 .009445 Std. Err. 0.1286 0.1142 0.2513 z 10.76961 9.410683 24.50031 P>|z| = = = = 73 -16.03581 -15.77323 -15.37691 0.0958 0.1518 0.0004 [95% Conf. Interval] dln_inv dln_inv L1. L2. -.3196318 -.1605508 .1192898 .118767 -2.68 -1.35 0.007 0.176 -.5534355 -.39333 -.0858282 .0722283 dln_inc L1. L2. .1459851 .1146009 .5188451 .508295 0.28 0.23 0.778 0.822 -.8709326 -.881639 1.162903 1.110841 dln_consump L1. L2. .9612288 .9344001 .6316557 .6324034 1.52 1.48 0.128 0.140 -.2767936 -.3050877 2.199251 2.173888 _cons -.0167221 .0163796 -1.02 0.307 -.0488257 .0153814 dln_inc dln_inv L1. L2. .0439309 .0500302 .0302933 .0301605 1.45 1.66 0.147 0.097 -.0154427 -.0090833 .1033046 .1091437 dln_inc L1. L2. -.1527311 .0191634 .131759 .1290799 -1.16 0.15 0.246 0.882 -.4109741 -.2338285 .1055118 .2721552 dln_consump L1. L2. .2884992 -.0102 .1604069 .1605968 1.80 -0.06 0.072 0.949 -.0258926 -.3249639 .6028909 .3045639 _cons .0157672 .0041596 3.79 0.000 .0076146 .0239198 dln_consump dln_inv L1. L2. -.002423 .0338806 .0244142 .0243072 -0.10 1.39 0.921 0.163 -.050274 -.0137607 .045428 .0815219 dln_inc L1. L2. .2248134 .3549135 .1061884 .1040292 2.12 3.41 0.034 0.001 .0166879 .1510199 .4329389 .558807 dln_consump L1. L2. -.2639695 -.0222264 .1292766 .1294296 -2.04 -0.17 0.041 0.864 -.517347 -.2759039 -.010592 .231451 _cons .0129258 .0033523 3.86 0.000 .0063554 .0194962 var svar — Structural vector autoregressive models 767 The equation-level model tests reported in the header indicate that we cannot reject the null hypotheses that all the coefficients in the first equation are zero, nor can we reject the null that all the coefficients in the second equation are zero at the 5% significance level. We use a combination of theory and the p-values from the output above to place some exclusion restrictions on the underlying VAR(2). Specifically, in the equation for the percentage change of inv, we constrain the coefficients on L2.dln inv, L.dln inc, L2.dln inc, and L2.dln consump to be zero. In the equation for dln inc, we constrain the coefficients on L2.dln inv, L2.dln inc, and L2.dln consump to be zero. Finally, in the equation for dln consump, we constrain L.dln inv and L2.dln consump to be zero. We then refit the SVAR from the previous example. . . . . constraint constraint constraint constraint 1 2 3 4 [dln_inv]L2.dln_inv = 0 [dln_inv ]L.dln_inc = 0 [dln_inv]L2.dln_inc = 0 [dln_inv]L2.dln_consump = 0 . constraint 5 [dln_inc]L2.dln_inv = 0 . constraint 6 [dln_inc]L2.dln_inc = 0 . constraint 7 [dln_inc]L2.dln_consump = 0 . constraint 8 [dln_consump]L.dln_inv = 0 . constraint 9 [dln_consump]L2.dln_consump = 0 . svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), aeq(A) beq(B) > varconst(1/9) noislog Estimating short-run parameters (output omitted ) Structural vector autoregression ( 1) ( 2) ( 3) ( 4) ( 5) ( 6) ( 7) ( 8) ( 9) (10) (11) (12) (13) [a_1_1]_cons [a_1_2]_cons [a_1_3]_cons [a_2_1]_cons [a_2_2]_cons [a_2_3]_cons [a_3_3]_cons [b_1_2]_cons [b_1_3]_cons [b_2_1]_cons [b_2_3]_cons [b_3_1]_cons [b_3_2]_cons = = = = = = = = = = = = = 1 0 0 0 1 0 1 0 0 0 0 0 0 768 var svar — Structural vector autoregressive models Sample: 1960q4 - 1978q4 Overidentified model Coef. Number of obs Log likelihood Std. Err. z P>|z| /a_1_1 /a_2_1 /a_3_1 /a_1_2 /a_2_2 /a_3_2 /a_1_3 /a_2_3 /a_3_3 1 0 -.0418708 0 1 -.4255808 0 0 1 (constrained) (constrained) .0187579 -2.23 (constrained) (constrained) .0745298 -5.71 (constrained) (constrained) (constrained) /b_1_1 /b_2_1 /b_3_1 /b_1_2 /b_2_2 /b_3_2 /b_1_3 /b_2_3 /b_3_3 .0451851 0 0 0 .0113723 0 0 0 .0072417 .0037395 12.08 (constrained) (constrained) (constrained) .0009412 12.08 (constrained) (constrained) (constrained) .0005993 12.08 LR test of identifying restrictions: chi2( 1)= = = 73 601.8591 [95% Conf. Interval] 0.026 -.0786356 -.0051061 0.000 -.5716565 -.2795051 0.000 .0378557 .0525145 0.000 .0095276 .013217 0.000 .006067 .0084164 .8448 Prob > chi2 = 0.358 If we displayed the underlying VAR(2) results by using the var option, we would see that most of the unconstrained coefficients are now significant at the 10% level and that none of the equation-level model statistics fail to reject the null hypothesis at the 10% level. The svar output reveals that the p-value of the overidentification test rose and that the coefficient on a 3 1 is still insignificant at the 1% level but not at the 5% level. Before moving on to models with long-run constraints, consider these limitations. We cannot place constraints on the elements of A in terms of the elements of B, or vice versa. This limitation is imposed by the form of the check for identification derived by Amisano and Giannini (1997). As noted in Methods and formulas, this test requires separate constraint matrices for the parameters in A and B. Another limitation is that we cannot mix short-run and long-run constraints. Long-run SVAR models As discussed in [TS] var intro, a long-run SVAR has the form yt = Cet In long-run models, the constraints are placed on the elements of C, and the free parameters are estimated. These constraints are often exclusion restrictions. For instance, constraining C[1, 2] to be zero can be interpreted as setting the long-run response of variable 1 to the structural shocks driving variable 2 to be zero. Similar to the short-run model, the Plr matrix such that Plr P0lr = Σ identifies the structural impulse–response functions. Plr = C is identified by the restrictions placed on the parameters in C. There are K 2 parameters in C, and the order condition for identification requires that there be at least K 2 − K(K + 1)/2 restrictions placed on those parameters. As in the short-run model, this order condition is necessary but not sufficient, so the Amisano and Giannini (1997) check for local identification is performed by default. var svar — Structural vector autoregressive models 769 Example 4: Long-run SVAR model Suppose that we have a theory in which unexpected changes to the money supply have no long-run effects on changes in output and, similarly, that unexpected changes in output have no long-run effects on changes in the money supply. The C matrix implied by this theory is  . 0 C= 0 .  . use http://www.stata-press.com/data/r14/m1gdp . matrix lr = (.,0\0,.) . svar d.ln_m1 d.ln_gdp, lreq(lr) Estimating long-run parameters (output omitted ) Structural vector autoregression ( 1) [c_1_2]_cons = 0 ( 2) [c_2_1]_cons = 0 Sample: 1959q4 - 2002q2 Number of obs Overidentified model Log likelihood Coef. /c_1_1 /c_2_1 /c_1_2 /c_2_2 .0301007 0 0 .0129691 Std. Err. z .0016277 18.49 (constrained) (constrained) .0007013 18.49 LR test of identifying restrictions: chi2( 1)= = = 171 1151.614 P>|z| [95% Conf. Interval] 0.000 .0269106 .0332909 0.000 .0115946 .0143436 .1368 Prob > chi2 = 0.712 We have assumed that the underlying VAR has 2 lags; four of the five selection-order criteria computed by varsoc (see [TS] varsoc) recommended this choice. The test of the overidentifying restrictions provides no indication that it is not valid. 770 var svar — Structural vector autoregressive models Stored results svar stores the following in e(): Scalars e(N) e(N cns) e(k eq) e(k dv) e(k aux) e(ll) e(ll #) e(N gaps var) e(k var) e(k eq var) e(k dv var) e(df eq var) e(df m var) e(df r var) e(obs # var) e(k # var) e(df m# var) e(df r# var) e(r2 # var) e(ll # var) e(chi2 # var) e(F # var) e(rmse # var) e(mlag var) e(tparms var) e(aic var) e(hqic var) e(sbic var) e(fpe var) e(ll var) e(detsig var) e(detsig ml var) e(tmin) e(tmax) e(chi2 oid) e(oid df) e(rank) e(ic ml) e(rc ml) number of observations number of constraints number of equations in e(b) number of dependent variables number of auxiliary parameters log likelihood from svar log likelihood for equation # number of gaps in the sample number of coefficients in VAR number of equations in underlying VAR number of dependent variables in underlying VAR average number of parameters in an equation model degrees of freedom if small, residual degrees of freedom number of observations on equation # number of coefficients in equation # model degrees of freedom for equation # residual degrees of freedom for equation # (small only) R-squared for equation # log likelihood for equation # VAR χ2 statistic for equation # F statistic for equation # (small only) root mean squared error for equation # highest lag in VAR number of parameters in all equations Akaike information criterion Hannan–Quinn information criterion Schwarz–Bayesian information criterion final prediction error log likelihood from var determinant of e(Sigma) bml determinant of Σ first time period in the sample maximum time overidentification test number of overidentifying restrictions rank of e(V) number of iterations return code from ml var svar — Structural vector autoregressive models Macros e(cmd) e(cmdline) e(lrmodel) e(lags var) e(depvar var) e(endog var) e(exog var) e(nocons var) e(cns lr) e(cns a) e(cns b) e(dfk var) e(eqnames var) e(lutstats var) e(constraints var) e(small) e(tsfmt) e(timevar) e(title) e(properties) e(predict) Matrices e(b) e(Cns) e(Sigma) e(V) e(b var) e(V var) e(bf var) e(G var) e(aeq) e(acns) e(beq) e(bcns) e(lreq) e(lrcns) e(Cns var) e(A) e(B) e(C) e(A1) Functions e(sample) 771 svar command as typed long-run model, if specified lags in model names of dependent variables names of endogenous variables names of exogenous variables, if specified noconstant, if noconstant specified long-run constraints cross-parameter equality constraints on A cross-parameter equality constraints on B alternate divisor (dfk), if specified names of equations lutstats, if specified constraints var, if there are constraints on VAR small, if specified format of timevar name of timevar title in estimation output b V program used to implement predict coefficient vector constraints matrix b matrix Σ variance–covariance matrix of the estimators coefficient vector of underlying VAR model VCE of underlying VAR model full coefficient vector with zeros in dropped lags G matrix stored by var; see [TS] var Methods and formulas aeq(matrix), if specified acns(matrix), if specified beq(matrix), if specified bcns(matrix), if specified lreq(matrix), if specified lrcns(matrix), if specified constraint matrix from var, if varconstraints() is specified estimated A matrix, if a short-run model estimated B matrix estimated C matrix, if a long-run model estimated Ā matrix, if a long-run model marks estimation sample Methods and formulas The log-likelihood function for models with short-run constraints is L(A, B) = − NK N N b) ln(2π) + ln(|W|2 ) − tr(W0 WΣ 2 2 2 where W = B−1 A. When there are long-run constraints, because C = Ā−1 B and A = IK , W = B−1 = C−1 Ā−1 = (ĀC)−1 . Substituting the last term for W in the short-run log likelihood produces the long-run log likelihood 772 var svar — Structural vector autoregressive models L(C) = − NK N f 2 ) − N tr(W f 0W fΣ b) ln(2π) + ln(|W| 2 2 2 f = (ĀC)−1 . where W For both the short-run and the long-run models, the maximization is performed by the scoring method. See Harvey (1990) for a discussion of this method. Based on results from Amisano and Giannini (1997), the score vector for the short-run model is h i ∂L(A, B) b ⊗ IK ) × = N {vec(W0−1 )}0 − {vec(W)}0 (Σ ∂[vec(A), vec(B)]   (IK ⊗ B−1 ), −(A0 B0−1 ⊗ B−1 ) and the expected information matrix is  I [vec(A), vec(B)] = N    (W−1 ⊗ B0−1 ) (IK 2 + ⊕) (W0−1 ⊗ B−1 ), −(IK ⊗ B−1 ) 0−1 −(IK ⊗ B ) where ⊕ is the commutation matrix defined in Magnus and Neudecker (1999, 46–48). Using results from Amisano and Giannini (1997), we can derive the score vector and the expected information matrix for the case with long-run restrictions. The score vector is h i  ∂L(C) b ⊗ IK ) −(Ā0−1 C0−1 ⊗ C−1 ) = N {vec(W0−1 )}0 − {vec(W)}0 (Σ ∂vec(C) and the expected information matrix is I [vec(C)] = N (IK ⊗ C0−1 )(IK 2 + ⊕)(IK ⊗ C0−1 ) Checking for identification This section describes the methods used to check for identification of models with short-run or long-run constraints. Both methods depend on the starting values. By default, svar uses starting values constructed by taking a vector of appropriate dimension and applying the constraints. If there are m parameters in the model, the j th element of the 1 × m vector is 1 + m/100. svar also allows the user to provide starting values. For the short-run case, the model is identified if the matrix  NK NK NK NK   ∗ Vsr =  Ra (W0 ⊗ B) 0K 2 0K 2 Ra (IK ⊗ B)  has full column rank of 2K 2 , where NK = (1/2)(IK 2 + ⊕), Ra is the constraint matrix for the parameters in A (that is, Ra vec(A) = ra ), and Rb is the constraint matrix for the parameters in B (that is, Rb vec(B) = rb ). var svar — Structural vector autoregressive models 773 For the long-run case, based on results from the C model in Amisano and Giannini (1997), the model is identified if the matrix   (I ⊗ C0−1 )(2NK )(I ⊗ C−1 ) ∗ Vlr = Rc has full column rank of K 2 , where Rc is the constraint matrix for the parameters in C; that is, Rc vec(C) = rc . The test of the overidentifying restrictions is computed as LR = 2(LLvar − LLsvar ) where LR is the value of the test statistic against the null hypothesis that the overidentifying restrictions are valid, LLvar is the log likelihood from the underlying VAR(p) model, and LLsvar is the log likelihood from the SVAR model. The test statistic is asymptotically distributed as χ2 (q), where q is the number of overidentifying restrictions. Amisano and Giannini (1997, 38–39) emphasize that, because this test of the validity of the overidentifying restrictions is an omnibus test, it can be interpreted as a test of the null hypothesis that all the restrictions are valid. Because constraints might not be independent either by construction or because of the data, the number of restrictions is not necessarily equal to the number of constraints. The rank of e(V) gives the number of parameters that were independently estimated after applying the constraints. The maximum number of parameters that can be estimated in an identified short-run or long-run SVAR is K(K + 1)/2. This implies that the number of overidentifying restrictions, q , is equal to K(K + 1)/2 minus the rank of e(V). The number of overidentifying restrictions is also linked to the order condition for each model. In a short-run SVAR model, there are 2K 2 parameters. Because no more than K(K + 1)/2 parameters may be estimated, the order condition for a short-run SVAR model is that at least 2K 2 − K(K + 1)/2 restrictions be placed on the model. Similarly, there are K 2 parameters in long-run SVAR model. Because no more than K(K + 1)/2 parameters may be estimated, the order condition for a long-run SVAR model is that at least K 2 − K(K + 1)/2 restrictions be placed on the model. Acknowledgment We thank Gianni Amisano of the Dipartimento di Scienze Economiche at the Università degli Studi di Brescia for his helpful comments. References Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer. Becketti, S. 2013. Introduction to Time Series Using Stata. College Station, TX: Stata Press. Christiano, L. J., M. Eichenbaum, and C. L. Evans. 1999. Monetary policy shocks: What have we learned and to what end? In Handbook of Macroeconomics: Volume 1A, ed. J. B. Taylor and M. Woodford. New York: Elsevier. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Harvey, A. C. 1990. The Econometric Analysis of Time Series. 2nd ed. Cambridge, MA: MIT Press. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Magnus, J. R., and H. Neudecker. 1999. Matrix Differential Calculus with Applications in Statistics and Econometrics. Rev. ed. New York: Wiley. 774 var svar — Structural vector autoregressive models Rothenberg, T. J. 1971. Identification in parametric models. Econometrica 39: 577–591. Sims, C. A. 1980. Macroeconomics and reality. Econometrica 48: 1–48. Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives 15: 101–115. Watson, M. W. 1994. Vector autoregressions and cointegration. In Vol. 4 of Handbook of Econometrics, ed. R. F. Engle and D. L. McFadden. Amsterdam: Elsevier. Also see [TS] var svar postestimation — Postestimation tools for svar [TS] tsset — Declare data to be time-series data [TS] var — Vector autoregressive models [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [TS] vec — Vector error-correction models [U] 20 Estimation and postestimation commands [TS] var intro — Introduction to vector autoregressive models Title var svar postestimation — Postestimation tools for svar Postestimation commands predict Remarks and examples Also see Postestimation commands The following postestimation commands are of special interest after svar: Command Description fcast compute fcast graph irf vargranger varlmar varnorm varsoc varstable varwle obtain dynamic forecasts graph dynamic forecasts obtained from fcast compute create and analyze IRFs and FEVDs Granger causality tests LM test for autocorrelation in residuals test for normally distributed residuals lag-order selection criteria check stability condition of estimates Wald lag-exclusion statistics The following standard postestimation commands are also available: Command Description estat ic estat summarize estat vce estimates forecast lincom Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest nlcom predict predictnl test testnl 775 776 var svar postestimation — Postestimation tools for svar predict Description for predict predict creates a new variable containing predictions such as linear predictions and residuals. Menu for predict Statistics > Postestimation Syntax for predict   predict type newvar  if   in   , statistic equation(eqno | eqname)  Description statistic Main linear prediction; the default standard error of the linear prediction residuals xb stdp residuals These statistics are available both in and out of sample; type predict the estimation sample. . . . if e(sample) . . . if wanted only for Options for predict  Main  xb, the default, calculates the linear prediction for the specified equation. stdp calculates the standard error of the linear prediction for the specified equation. residuals calculates the residuals. equation(eqno | eqname) specifies the equation to which you are referring. equation() is filled in with one eqno or eqname for options xb, stdp, and residuals. For example, equation(#1) would mean that the calculation is to be made for the first equation, equation(#2) would mean the second, and so on. You could also refer to the equation by its name; thus, equation(income) would refer to the equation named income and equation(hours), to the equation named hours. If you do not specify equation(), the results are the same as if you specified equation(#1). For more information on using predict after multiple-equation commands, see [R] predict. Remarks and examples Remarks are presented under the following headings: Model selection and inference Forecasting var svar postestimation — Postestimation tools for svar Model selection and inference See the following sections for information on model selection and inference after var. [TS] [TS] [TS] [TS] [TS] [TS] [TS] irf — Create and analyze IRFs, dynamic-multiplier functions, and FEVDs vargranger — Perform pairwise Granger causality tests after var or svar varlmar — Perform LM test for residual autocorrelation after var or svar varnorm — Test for normally distributed disturbances after var or svar varsoc — Obtain lag-order selection statistics for VARs and VECMs varstable — Check the stability condition of VAR or SVAR estimates varwle — Obtain Wald lag-exclusion statistics after var or svar Forecasting See the following sections for information on obtaining forecasts after svar: [TS] fcast compute — Compute dynamic forecasts after var, svar, or vec [TS] fcast graph — Graph forecasts after fcast compute Also see [TS] var svar — Structural vector autoregressive models [U] 20 Estimation and postestimation commands 777 Title varbasic — Fit a simple VAR and graph IRFs or FEVDs Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description varbasic fits a basic vector autoregressive (VAR) model and graphs the impulse–response functions (IRFs), the orthogonalized impulse–response functions (OIRFs), or the forecast-error variance decompositions (FEVDs). Quick start Fit vector autoregressive model for dependent variables y1, y2, and y3 and their first and second lags and graph orthogonalized impulse responses using tsset data varbasic y1 y2 y3 As above, but include second and third lags instead of first and second lags var y1 y2 y3, lags(2 3) As above, but produce forecast-error variance decompositions instead of impulse responses var y1 y2 y3, lags(2 3) fevd As above, but set the forecast horizon for the forecast-error variance decompositions to be 12 periods var y1 y2 y3, lags(2 3) fevd step(12) Menu Statistics > Multivariate time series > Basic VAR Syntax varbasic depvarlist options  if   in   , options  Description Main lags(numlist) use lags numlist in the model; default is lags(1 2) irf produce matrix graph of IRFs produce matrix graph of FEVDs fevd nograph do not produce a graph step(#) set forecast horizon # for estimating the OIRFs, IRFs, and FEVDs; default is step(8) You must tsset your data before using varbasic; see [TS] tsset. depvarlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. rolling, statsby, and xi are allowed; see [U] 11.1.10 Prefix commands. See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands. 778 varbasic — Fit a simple VAR and graph IRFs or FEVDs 779 Options   Main lags(numlist) specifies the lags to be included in the model. The default is lags(1 2). This option takes a numlist and not simply an integer for the maximum lag. For instance, lags(2) would include only the second lag in the model, whereas lags(1/2) would include both the first and second lags in the model. See [U] 11.1.8 numlist and [U] 11.4.4 Time-series varlists for more discussion of numlists and lags. irf causes varbasic to produce a matrix graph of the IRFs instead of a matrix graph of the OIRFs, which is produced by default. fevd causes varbasic to produce a matrix graph of the FEVDs instead of a matrix graph of the OIRFs, which is produced by default. nograph specifies that no graph be produced. The IRFs, OIRFs, and FEVDs are still estimated and saved in the IRF file varbasic.irf. step(#) specifies the forecast horizon for estimating the IRFs, OIRFs, and FEVDs. The default is eight periods. Remarks and examples varbasic simplifies fitting simple VARs and graphing the IRFs, the OIRFs, or the FEVDs. See [TS] var and [TS] var svar for fitting more advanced VAR models and structural vector autoregressive (SVAR) models. All the postestimation commands discussed in [TS] var postestimation work after varbasic. This entry does not discuss the methods for fitting a VAR or the methods surrounding the IRFs, OIRFs, and FEVDs. See [TS] var and [TS] irf create for more on these methods. This entry illustrates how to use varbasic to easily obtain results. It also illustrates how varbasic serves as an entry point to further analysis. Example 1 We fit a three-variable VAR with two lags to the German macro data used by Lütkepohl (2005). The three variables are the first difference of natural log of investment, dln inv; the first difference of the natural log of income, dln inc; and the first difference of the natural log of consumption, dln consump. In addition to fitting the VAR, we want to see the OIRFs. Below we use varbasic to fit a VAR(2) model on the data from the second quarter of 1961 through the fourth quarter of 1978. By default, varbasic produces graphs of the OIRFs. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) 780 varbasic — Fit a simple VAR and graph IRFs or FEVDs . varbasic dln_inv dln_inc dln_consump if qtr<=tq(1978q4) Vector autoregression Sample: 1960q4 - 1978q4 Log likelihood = 606.307 FPE = 2.18e-11 Det(Sigma_ml) = 1.23e-11 Equation dln_inv dln_inc dln_consump Parms 7 7 7 Coef. Number of obs AIC HQIC SBIC RMSE R-sq chi2 P>chi2 .046148 .011719 .009445 0.1286 0.1142 0.2513 10.76961 9.410683 24.50031 0.0958 0.1518 0.0004 Std. Err. z P>|z| = = = = 73 -16.03581 -15.77323 -15.37691 [95% Conf. Interval] dln_inv dln_inv L1. L2. -.3196318 -.1605508 .1192898 .118767 -2.68 -1.35 0.007 0.176 -.5534355 -.39333 -.0858282 .0722283 dln_inc L1. L2. .1459851 .1146009 .5188451 .508295 0.28 0.23 0.778 0.822 -.8709326 -.881639 1.162903 1.110841 dln_consump L1. L2. .9612288 .9344001 .6316557 .6324034 1.52 1.48 0.128 0.140 -.2767936 -.3050877 2.199251 2.173888 _cons -.0167221 .0163796 -1.02 0.307 -.0488257 .0153814 dln_inc dln_inv L1. L2. .0439309 .0500302 .0302933 .0301605 1.45 1.66 0.147 0.097 -.0154427 -.0090833 .1033046 .1091437 dln_inc L1. L2. -.1527311 .0191634 .131759 .1290799 -1.16 0.15 0.246 0.882 -.4109741 -.2338285 .1055118 .2721552 dln_consump L1. L2. .2884992 -.0102 .1604069 .1605968 1.80 -0.06 0.072 0.949 -.0258926 -.3249639 .6028909 .3045639 _cons .0157672 .0041596 3.79 0.000 .0076146 .0239198 dln_consump dln_inv L1. L2. -.002423 .0338806 .0244142 .0243072 -0.10 1.39 0.921 0.163 -.050274 -.0137607 .045428 .0815219 dln_inc L1. L2. .2248134 .3549135 .1061884 .1040292 2.12 3.41 0.034 0.001 .0166879 .1510199 .4329389 .558807 dln_consump L1. L2. -.2639695 -.0222264 .1292766 .1294296 -2.04 -0.17 0.041 0.864 -.517347 -.2759039 -.010592 .231451 _cons .0129258 .0033523 3.86 0.000 .0063554 .0194962 varbasic — Fit a simple VAR and graph IRFs or FEVDs varbasic, dln_consump, dln_consump varbasic, dln_consump, dln_inc varbasic, dln_consump, dln_inv varbasic, dln_inc, dln_consump varbasic, dln_inc, dln_inc varbasic, dln_inc, dln_inv varbasic, dln_inv, dln_consump varbasic, dln_inv, dln_inc varbasic, dln_inv, dln_inv 781 .06 .04 .02 0 −.02 .06 .04 .02 0 −.02 .06 .04 .02 0 −.02 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 step 95% CI orthogonalized irf Graphs by irfname, impulse variable, and response variable Because we are also interested in looking at the FEVDs, we can use irf graph to obtain the graphs. Although the details are available in [TS] irf and [TS] irf graph, the command below produces what we want after the call to varbasic. . irf graph fevd, lstep(1) varbasic, dln_consump, dln_consump varbasic, dln_consump, dln_inc varbasic, dln_consump, dln_inv varbasic, dln_inc, dln_consump varbasic, dln_inc, dln_inc varbasic, dln_inc, dln_inv varbasic, dln_inv, dln_consump varbasic, dln_inv, dln_inc varbasic, dln_inv, dln_inv 1 .5 0 1 .5 0 1 .5 0 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 step 95% CI fraction of mse due to impulse Graphs by irfname, impulse variable, and response variable Technical note Stata stores the estimated IRFs, OIRFs, and FEVDs in a IRF file called varbasic.irf in the current working directory. varbasic replaces any varbasic.irf that already exists. Finally, varbasic makes varbasic.irf the active IRF file. This means that the graph and table commands irf graph, 782 varbasic — Fit a simple VAR and graph IRFs or FEVDs irf cgraph, irf ograph, irf table, and irf ctable will all display results that correspond to the VAR fit by varbasic. Stored results See Stored results in [TS] var. Methods and formulas varbasic uses var and irf graph to obtain its results. See [TS] var and [TS] irf graph for a discussion of how those commands obtain their results. References Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Also see [TS] varbasic postestimation — Postestimation tools for varbasic [TS] tsset — Declare data to be time-series data [TS] var — Vector autoregressive models [TS] var svar — Structural vector autoregressive models [U] 20 Estimation and postestimation commands [TS] var intro — Introduction to vector autoregressive models Title varbasic postestimation — Postestimation tools for varbasic Postestimation commands Also see predict margins Remarks and examples Postestimation commands The following postestimation commands are of special interest after varbasic: Command Description fcast compute fcast graph irf vargranger varlmar varnorm varsoc varstable varwle obtain dynamic forecasts graph dynamic forecasts obtained from fcast compute create and analyze IRFs and FEVDs Granger causality tests LM test for autocorrelation in residuals test for normally distributed residuals lag-order selection criteria check stability condition of estimates Wald lag-exclusion statistics The following standard postestimation commands are also available: Command Description estat ic estat summarize estat vce estimates forecast lincom Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC) summary statistics for the estimation sample variance–covariance matrix of the estimators (VCE) cataloging estimation results dynamic forecasts and simulations point estimates, standard errors, testing, and inference for linear combinations of coefficients likelihood-ratio test marginal means, predictive margins, marginal effects, and average marginal effects graph the results from margins (profile plots, interaction plots, etc.) point estimates, standard errors, testing, and inference for nonlinear combinations of coefficients predictions, residuals, influence statistics, and other diagnostic measures point estimates, standard errors, testing, and inference for generalized predictions Wald tests of simple and composite linear hypotheses Wald tests of nonlinear hypotheses lrtest margins marginsplot nlcom predict predictnl test testnl 783 784 varbasic postestimation — Postestimation tools for varbasic predict Description for predict predict creates a new variable containing predictions such as linear predictions and residuals. Menu for predict Statistics > Postestimation Syntax for predict   predict type statistic newvar  if   in   , statistic equation(eqno | eqname)  Description Main xb stdp residuals linear prediction; the default standard error of the linear prediction residuals These statistics are available both in and out of sample; type predict the estimation sample. . . . if e(sample) . . . if wanted only for Options for predict  Main  xb, the default, calculates the linear prediction for the specified equation. stdp calculates the standard error of the linear prediction for the specified equation. residuals calculates the residuals. equation(eqno | eqname) specifies the equation to which you are referring. equation() is filled in with one eqno or eqname for the xb, stdp, and residuals options. For example, equation(#1) would mean that the calculation is to be made for the first equation, equation(#2) would mean the second, and so on. You could also refer to the equation by its name; thus, equation(income) would refer to the equation named income and equation(hours), to the equation named hours. If you do not specify equation(), the results are the same as if you specified equation(#1). For more information on using predict after multiple-equation estimation commands, see [R] predict. varbasic postestimation — Postestimation tools for varbasic margins Description for margins margins estimates margins of response for linear predictions. Menu for margins Statistics > Postestimation Syntax for margins  margins margins  marginlist   marginlist  , options  , predict(statistic . . . )  options statistic Description default xb stdp residuals linear predictions for each equation linear prediction for a specified equation not allowed with margins not allowed with margins  xb defaults to the first equation. Statistics not allowed with margins are functions of stochastic quantities other than e(b). For the full syntax, see [R] margins. 785 786 varbasic postestimation — Postestimation tools for varbasic Remarks and examples Example 1 All the postestimation commands discussed in [TS] var postestimation work after varbasic. Suppose that we are interested in testing the hypothesis that there is no autocorrelation in the VAR disturbances. Continuing example 1 from [TS] varbasic, we now use varlmar to test this hypothesis. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . varbasic dln_inv dln_inc dln_consump if qtr<=tq(1978q4) (output omitted ) . varlmar Lagrange-multiplier test lag chi2 df Prob > chi2 1 2 5.5871 6.3189 9 9 0.78043 0.70763 H0: no autocorrelation at lag order Because we cannot reject the null hypothesis of no autocorrelation in the residuals, this test does not indicate any model misspecification. Also see [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [U] 20 Estimation and postestimation commands Title vargranger — Perform pairwise Granger causality tests after var or svar Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description vargranger performs a set of Granger causality tests for each equation in a VAR, providing a convenient alternative to test; see [R] test. Quick start Perform a Granger causality test after var or svar vargranger Perform a Granger causality test on vector autoregression estimation results stored as myest vargranger, estimates(myest) Menu Statistics > Multivariate time series > VAR diagnostics and tests 787 > Granger causality tests 788 vargranger — Perform pairwise Granger causality tests after var or svar Syntax vargranger  , estimates(estname) separator(#)  vargranger can be used only after var or svar; see [TS] var and [TS] var svar. Options estimates(estname) requests that vargranger use the previously obtained set of var or svar estimates stored as estname. By default, vargranger uses the active results. See [R] estimates for information on manipulating estimation results. separator(#) specifies how often separator lines should be drawn between rows. By default, separator lines appear every K lines, where K is the number of equations in the VAR under analysis. For example, separator(1) would draw a line between each row, separator(2) between every other row, and so on. separator(0) specifies that lines not appear in the table. Remarks and examples After fitting a VAR, we may want to know whether one variable “Granger-causes” another (Granger 1969). A variable x is said to Granger-cause a variable y if, given the past values of y , past values of x are useful for predicting y . A common method for testing Granger causality is to regress y on its own lagged values and on lagged values of x and test the null hypothesis that the estimated coefficients on the lagged values of x are jointly zero. Failure to reject the null hypothesis is equivalent to failing to reject the hypothesis that x does not Granger-cause y . For each equation and each endogenous variable that is not the dependent variable in that equation, vargranger computes and reports Wald tests that the coefficients on all the lags of an endogenous variable are jointly zero. For each equation in a VAR, vargranger tests the hypotheses that each of the other endogenous variables does not Granger-cause the dependent variable in that equation. Because it may be interesting to investigate these types of hypotheses by using the VAR that underlies an SVAR, vargranger can also produce these tests by using the e() results from an svar. When vargranger uses svar e() results, the hypotheses concern the underlying var estimates. See [TS] var and [TS] var svar for information about fitting VARs and SVARs in Stata. See Lütkepohl (2005), Hamilton (1994), and Amisano and Giannini (1997) for information about Granger causality and on VARs and SVARs in general. vargranger — Perform pairwise Granger causality tests after var or svar 789 Example 1: After var Here we refit the model with German data described in [TS] var and then perform Granger causality tests with vargranger. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk small (output omitted ) . vargranger Granger causality Wald tests Equation Excluded F df df_r dln_inv dln_inv dln_inv Prob > F dln_inc dln_consump ALL .04847 1.5004 1.5917 2 2 4 66 66 66 0.9527 0.2306 0.1869 dln_inc dln_inc dln_inc dln_inv dln_consump ALL 1.7683 1.7184 1.9466 2 2 4 66 66 66 0.1786 0.1873 0.1130 dln_consump dln_consump dln_consump dln_inv dln_inc ALL .97147 6.1465 3.7746 2 2 4 66 66 66 0.3839 0.0036 0.0080 Because the estimates() option was not specified, vargranger used the active e() results. Consider the results of the three tests for the first equation. The first is a Wald test that the coefficients on the two lags of dln inc that appear in the equation for dln inv are jointly zero. The null hypothesis that dln inc does not Granger-cause dln inv cannot be rejected. Similarly, we cannot reject the null hypothesis that the coefficients on the two lags of dln consump in the equation for dln inv are jointly zero, so we cannot reject the hypothesis that dln consump does not Grangercause dln inv. The third test is with respect to the null hypothesis that the coefficients on the two lags of all the other endogenous variables are jointly zero. Because this cannot be rejected, we cannot reject the null hypothesis that dln inc and dln consump, jointly, do not Granger-cause dln inv. Because we failed to reject most of these null hypotheses, we might be interested in imposing some constraints on the coefficients. See [TS] var for more on fitting VAR models with constraints on the coefficients. 790 vargranger — Perform pairwise Granger causality tests after var or svar Example 2: Using test instead of vargranger We could have used test to compute these Wald tests, but vargranger saves a great deal of typing. Still, seeing how to use test to obtain the results reported by vargranger is useful. . test [dln_inv]L.dln_inc [dln_inv]L2.dln_inc ( 1) [dln_inv]L.dln_inc = 0 ( 2) [dln_inv]L2.dln_inc = 0 F( 2, 66) = 0.05 Prob > F = 0.9527 . test [dln_inv]L.dln_consump [dln_inv]L2.dln_consump, accumulate ( ( ( ( 1) 2) 3) 4) [dln_inv]L.dln_inc = 0 [dln_inv]L2.dln_inc = 0 [dln_inv]L.dln_consump = 0 [dln_inv]L2.dln_consump = 0 F( 4, 66) = 1.59 Prob > F = 0.1869 . test [dln_inv]L.dln_inv [dln_inv]L2.dln_inv, accumulate ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) [dln_inv]L.dln_inc = 0 [dln_inv]L2.dln_inc = 0 [dln_inv]L.dln_consump = 0 [dln_inv]L2.dln_consump = 0 [dln_inv]L.dln_inv = 0 [dln_inv]L2.dln_inv = 0 F( 6, 66) = Prob > F = 1.62 0.1547 The first two calls to test show how vargranger obtains its results. The first test reproduces the first test reported for the dln inv equation. The second test reproduces the ALL entry for the first equation. The third test reproduces the standard F statistic for the dln inv equation, reported in the header of the var output in the previous example. The standard F statistic also includes the lags of the dependent variable, as well as any exogenous variables in the equation. This illustrates that the test performed by vargranger of the null hypothesis that the coefficients on all the lags of all the other endogenous variables are jointly zero for a particular equation; that is, the All test is not the same as the standard F statistic for that equation. vargranger — Perform pairwise Granger causality tests after var or svar 791 Example 3: After svar When vargranger is run on svar estimates, the null hypotheses are with respect to the underlying var estimates. We run vargranger after using svar to fit an SVAR that has the same underlying VAR as our model in example 1. . matrix A = (., 0,0 \ ., ., 0\ .,.,.) . matrix B = I(3) . svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk small aeq(A) beq(B) (output omitted ) . vargranger Granger causality Wald tests Equation Excluded F df df_r dln_inv dln_inv dln_inv Prob > F dln_inc dln_consump ALL .04847 1.5004 1.5917 2 2 4 66 66 66 0.9527 0.2306 0.1869 dln_inc dln_inc dln_inc dln_inv dln_consump ALL 1.7683 1.7184 1.9466 2 2 4 66 66 66 0.1786 0.1873 0.1130 dln_consump dln_consump dln_consump dln_inv dln_inc ALL .97147 6.1465 3.7746 2 2 4 66 66 66 0.3839 0.0036 0.0080 As we expected, the vargranger results are identical to those in the first example. Stored results vargranger stores the following in r(): Matrices r(gstats) r(gstats) χ2 , df, and p-values (if e(small)=="") F , df, df r, and p-values (if e(small)!="") Methods and formulas vargranger uses test to obtain Wald statistics of the hypothesis that all coefficients on the lags of variable x are jointly zero in the equation for variable y . vargranger uses the e() results stored by var or svar to determine whether to calculate and report small-sample F statistics or large-sample χ2 statistics.   Clive William John Granger (1934–2009) was born in Swansea, Wales, and earned degrees at the University of Nottingham in mathematics and statistics. Joining the staff there, he also worked at Princeton on the spectral analysis of economic time series, before moving in 1973 to the University of California, San Diego. He was awarded the 2003 Nobel Prize in Economics for methods of analyzing economic time series with common trends (cointegration). He was knighted in 2005, thus becoming Sir Clive Granger.  792 vargranger — Perform pairwise Granger causality tests after var or svar References Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer. Granger, C. W. J. 1969. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37: 424–438. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Phillips, P. C. B. 1997. The ET Interview: Professor Clive Granger. Econometric Theory 13: 253–303. Also see [TS] var — Vector autoregressive models [TS] var svar — Structural vector autoregressive models [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [TS] var intro — Introduction to vector autoregressive models Title varlmar — Perform LM test for residual autocorrelation after var or svar Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description varlmar implements a Lagrange multiplier (LM) test for autocorrelation in the residuals of VAR models, which was presented in Johansen (1995). Quick start Test the null hypothesis of no autocorrelation for the first two lags of the residuals after var or svar varlmar As above, but test the first 5 lags varlmar, mlag(5) Perform test for the first two lags of the residuals of a vector autoregression using estimation results stored in myest varlmar, estimates(myest) Menu Statistics > Multivariate time series > VAR diagnostics and tests 793 > LM test for residual autocorrelation 794 varlmar — Perform LM test for residual autocorrelation after var or svar Syntax varlmar  , options  options Description mlag(#) estimates(estname) separator(#) use # for the maximum order of autocorrelation; default is mlag(2) use previously stored results estname; default is to use active results draw separator line after every # rows varlmar can be used only after var or svar; see [TS] var and [TS] var svar. You must tsset your data before using varlmar; see [TS] tsset. Options mlag(#) specifies the maximum order of autocorrelation to be tested. The integer specified in mlag() must be greater than 0; the default is 2. estimates(estname) requests that varlmar use the previously obtained set of var or svar estimates stored as estname. By default, varlmar uses the active results. See [R] estimates for information on manipulating estimation results. separator(#) specifies how often separator lines should be drawn between rows. By default, separator lines do not appear. For example, separator(1) would draw a line between each row, separator(2) between every other row, and so on. Remarks and examples Most postestimation analyses of VAR models and SVAR models assume that the disturbances are not autocorrelated. varlmar implements the LM test for autocorrelation in the residuals of a VAR model discussed in Johansen (1995, 21–22). The test is performed at lags j = 1, . . . , mlag(). For each j , the null hypothesis of the test is that there is no autocorrelation at lag j . varlmar uses the estimation results stored by var or svar. By default, varlmar uses the active estimation results. However, varlmar can use any previously stored var or svar estimation results specified in the estimates() option. varlmar — Perform LM test for residual autocorrelation after var or svar 795 Example 1: After var Here we refit the model with German data described in [TS] var and then call varlmar. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk (output omitted ) . varlmar, mlag(5) Lagrange-multiplier test lag chi2 df Prob > chi2 1 2 3 4 5 5.5871 6.3189 8.4022 11.8742 5.2914 9 9 9 9 9 0.78043 0.70763 0.49418 0.22049 0.80821 H0: no autocorrelation at lag order Because we cannot reject the null hypothesis that there is no autocorrelation in the residuals for any of the five orders tested, this test gives no hint of model misspecification. Although we fit the VAR with the dfk option to be consistent with the example in [TS] var, varlmar always uses the ML estimator of Σ. The results obtained from varlmar are the same whether or not dfk is specified. Example 2: After svar When varlmar is applied to estimation results produced by svar, the sequence of LM tests is applied to the underlying VAR. See [TS] var svar for a description of how an SVAR model builds on a VAR. In this example, we fit an SVAR that has an underlying VAR with two lags that is identical to the one fit in the previous example. . matrix A = (.,.,0\0,.,0\.,.,.) . matrix B = I(3) . svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk aeq(A) beq(B) (output omitted ) . varlmar, mlag(5) Lagrange-multiplier test lag chi2 df Prob > chi2 1 2 3 4 5 5.5871 6.3189 8.4022 11.8742 5.2914 9 9 9 9 9 0.78043 0.70763 0.49418 0.22049 0.80821 H0: no autocorrelation at lag order Because the underlying VAR(2) is the same as the previous example (we assure you that this is true), the output from varlmar is also the same. 796 varlmar — Perform LM test for residual autocorrelation after var or svar Stored results varlmar stores the following in r(): Matrices r(lm) χ2 , df, and p-values Methods and formulas The formula for the LM test statistic at lag j is LMs = (T − d − .5) ln b| |Σ e |Σs | ! b is the maximum likelihood where T is the number of observations in the VAR; d is explained below; Σ e s is the estimate of Σ, the variance–covariance matrix of the disturbances from the VAR; and Σ maximum likelihood estimate of Σ from the following augmented VAR. If there are K equations in the VAR, we can define et to be a K × 1 vector of residuals. After we create the K new variables e1, e2, . . . , eK containing the residuals from the K equations, we can augment the original VAR with lags of these K new variables. For each lag s, we form an augmented regression in which the new residual variables are lagged s times. Per the method of Davidson and e s is the MacKinnon (1993, 358), the missing values from these s lags are replaced with zeros. Σ maximum likelihood estimate of Σ from this augmented VAR, and d is the number of coefficients estimated in the augmented VAR. See [TS] var for a discussion of the maximum likelihood estimate of Σ in a VAR. The asymptotic distribution of LMs is χ2 with K 2 degrees of freedom. References Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press. Johansen, S. 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press. Also see [TS] var — Vector autoregressive models [TS] var svar — Structural vector autoregressive models [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [TS] var intro — Introduction to vector autoregressive models Title varnorm — Test for normally distributed disturbances after var or svar Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description varnorm computes and reports a series of statistics against the null hypothesis that the disturbances in a VAR are normally distributed. For each equation, and for all equations jointly, up to three statistics may be computed: a skewness statistic, a kurtosis statistic, and the Jarque–Bera statistic. By default, all three statistics are reported. Quick start Compute Jarque–Bera, skewness, and kurtosis statistics after var or svar to test the null hypothesis that the residuals are normally distributed varnorm As above, but only report the Jarque–Bera statistic varnorm, jbera After svar, use the Cholesky decomposition of the estimated variance–covariance matrix to compute the tests varnorm, cholesky Menu Statistics > Multivariate time series > VAR diagnostics and tests 797 > Test for normally distributed disturbances 798 varnorm — Test for normally distributed disturbances after var or svar Syntax varnorm  , options  options Description jbera skewness kurtosis estimates(estname) cholesky separator(#) report Jarque–Bera statistic; default is to report all three statistics report skewness statistic; default is to report all three statistics report kurtosis statistic; default is to report all three statistics use previously stored results estname; default is to use active results use Cholesky decomposition draw separator line after every # rows varnorm can be used only after var or svar; see [TS] var and [TS] var svar. You must tsset your data before using varnorm; see [TS] tsset. Options jbera requests that the Jarque–Bera statistic and any other explicitly requested statistic be reported. By default, the Jarque–Bera, skewness, and kurtosis statistics are reported. skewness requests that the skewness statistic and any other explicitly requested statistic be reported. By default, the Jarque–Bera, skewness, and kurtosis statistics are reported. kurtosis requests that the kurtosis statistic and any other explicitly requested statistic be reported. By default, the Jarque–Bera, skewness, and kurtosis statistics are reported. estimates(estname) specifies that varnorm use the previously obtained set of var or svar estimates stored as estname. By default, varnorm uses the active results. See [R] estimates for information on manipulating estimation results. cholesky specifies that varnorm use the Cholesky decomposition of the estimated variance–covariance b to orthogonalize the residuals when varnorm is applied to svar matrix of the disturbances, Σ, results. By default, when varnorm is applied to svar results, it uses the estimated structural b −1 B b on C b to orthogonalize the residuals. When applied to var e() results, decomposition A b For this reason, the cholesky option varnorm always uses the Cholesky decomposition of Σ. may not be specified when using var results. separator(#) specifies how often separator lines should be drawn between rows. By default, separator lines do not appear. For example, separator(1) would draw a line between each row, separator(2) between every other row, and so on. Remarks and examples Some of the postestimation statistics for VAR and SVAR assume that the K disturbances have a K -dimensional multivariate normal distribution. varnorm uses the estimation results produced by var or svar to produce a series of statistics against the null hypothesis that the K disturbances in the VAR are normally distributed. b1 , the kurtosis statistic λ b2 , and Per the notation in Lütkepohl (2005), call the skewness statistic λ b the Jarque–Bera statistic λ3 . The Jarque–Bera statistic is a combination of the other two statistics. The single-equation results are from tests against the null hypothesis that the disturbance for that varnorm — Test for normally distributed disturbances after var or svar 799 particular equation is normally distributed. The results for all the equations are from tests against the null hypothesis that the K disturbances follow a K -dimensional multivariate normal distribution. Failure to reject the null hypothesis indicates a lack of model misspecification. Example 1: After var We refit the model with German data described in [TS] var and then call varnorm. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk (output omitted ) . varnorm Jarque-Bera test Equation chi2 df Prob > chi2 2.821 3.450 1.566 7.838 2 2 2 6 0.24397 0.17817 0.45702 0.25025 Skewness chi2 df Prob > chi2 .11935 -.38316 -.31275 0.173 1.786 1.190 3.150 1 1 1 3 0.67718 0.18139 0.27532 0.36913 Kurtosis chi2 df Prob > chi2 3.9331 3.7396 2.6484 2.648 1.664 0.376 4.688 1 1 1 3 0.10367 0.19710 0.53973 0.19613 dln_inv dln_inc dln_consump ALL Skewness test Equation dln_inv dln_inc dln_consump ALL Kurtosis test Equation dln_inv dln_inc dln_consump ALL dfk estimator used in computations In this example, neither the single-equation Jarque–Bera statistics nor the joint Jarque–Bera statistic come close to rejecting the null hypothesis. The skewness and kurtosis results have similar structures. The Jarque–Bera results use the sum of the skewness and kurtosis statistics. The skewness and kurtosis results are based on the skewness and kurtosis coefficients, respectively. See Methods and formulas. Example 2: After svar The test statistics are computed on the orthogonalized VAR residuals; see Methods and formulas. When varnorm is applied to var results, varnorm uses a Cholesky decomposition of the estimated b to orthogonalize the residuals. variance–covariance matrix of the disturbances, Σ, 800 varnorm — Test for normally distributed disturbances after var or svar By default, when varnorm is applied to svar estimation results, it uses the estimated structural b −1 B b on C b to orthogonalize the residuals of the underlying VAR. Alternatively, when decomposition A varnorm is applied to svar results and the cholesky option is specified, varnorm uses the Cholesky b to orthogonalize the residuals of the underlying VAR. decomposition of Σ We fit an SVAR that is based on an underlying VAR with two lags that is the same as the one fit in the previous example. We impose a structural decomposition that is the same as the Cholesky decomposition, as illustrated in [TS] var svar. . matrix a = (.,0,0\.,.,0\.,.,.) . matrix b = I(3) . svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk aeq(a) beq(b) (output omitted ) . varnorm Jarque-Bera test Equation chi2 df Prob > chi2 2.821 3.450 1.566 7.838 2 2 2 6 0.24397 0.17817 0.45702 0.25025 Skewness chi2 df Prob > chi2 .11935 -.38316 -.31275 0.173 1.786 1.190 3.150 1 1 1 3 0.67718 0.18139 0.27532 0.36913 Kurtosis chi2 df Prob > chi2 3.9331 3.7396 2.6484 2.648 1.664 0.376 4.688 1 1 1 3 0.10367 0.19710 0.53973 0.19613 dln_inv dln_inc dln_consump ALL Skewness test Equation dln_inv dln_inc dln_consump ALL Kurtosis test Equation dln_inv dln_inc dln_consump ALL dfk estimator used in computations Because the estimated structural decomposition is the same as the Cholesky decomposition, the varnorm results are the same as those from the previous example. Technical note b the estimated variance–covariance matrix of The statistics computed by varnorm depend on Σ, the disturbances. var uses the maximum likelihood estimator of this matrix by default, but the dfk option produces an estimator that uses a small-sample correction. Thus specifying dfk in the call to var or svar will affect the test results produced by varnorm. varnorm — Test for normally distributed disturbances after var or svar 801 Stored results varnorm stores the following in r(): Macros r(dfk) dfk, if specified Matrices r(kurtosis) r(skewness) r(jb) kurtosis test, df, and p-values skewness test, df, and p-values Jarque–Bera test, df, and p-values Methods and formulas b t be the K × 1 varnorm is based on the derivations found in Lütkepohl (2005, 174–181). Let u vector of residuals from the K equations in a previously fitted VAR or the residuals from the K b be the estimated covariance equations of the VAR underlying a previously fitted SVAR. Similarly, let Σ b matrix of the disturbances. (Note that Σ depends on whether the dfk option was specified.) The skewness, kurtosis, and Jarque–Bera statistics must be computed using the orthogonalized residuals. Because bP b0 b=P Σ implies that b −1 Σ b −10 = IK bP P b is one way of performing the orthogonalization. When varnorm is applied b t by P premultiplying u b b When varnorm is applied to to var results, P is defined to be the Cholesky decomposition of Σ. b is set, by default, to the estimated structural decomposition; that is, P b =A b −1 B b, svar results, P b and B b are the svar estimates of the A and B matrices, or C b , where C b is the long-run where A SVAR estimation of C. (See [TS] var svar for more on the origin and estimation of the A and B b is set matrices.) When varnorm is applied to svar results and the cholesky option is specified, P b to the Cholesky decomposition of Σ. b t to be the orthogonalized VAR residuals given by Define w b −1 u b t = (w bt w b1t , . . . , w bKt )0 = P The K × 1 vectors of skewness and kurtosis coefficients are then computed using the orthogonalized residuals by T X 3 b 1 = (bb11 , . . . , bbK1 )0 ; bbk1 = 1 w bkt b T i=1 T X bbk2 = 1 w b4 T i=1 kt b 2 = (bb12 , . . . , bbK2 )0 ; b Under the null hypothesis of multivariate Gaussian disturbances, b0 b b1 = T b1 b1 λ 6 d → χ2 (K) 802 varnorm — Test for normally distributed disturbances after var or svar 0 b b b2 = T (b2 − 3) (b2 − 3) λ 24 and b3 = λ b1 + λ b2 λ d → χ2 (K) d → χ2 (2K) b1 is the skewness statistic, λ b2 is the kurtosis statistic, and λ b3 is the Jarque–Bera statistic. λ b1 , λ b2 , and λ b3 are for tests of the null hypothesis that the K × 1 vector of disturbances follows λ a multivariate normal distribution. The corresponding statistics against the null hypothesis that the disturbances from the k th equation come from a univariate normal distribution are b2 b1k = T b k1 λ 6 d → χ2 (1) 2 b2 b2k = T ( b k2 − 3) λ 24 and b3k = λ b1 + λ b2 λ d → χ2 (1) d → χ2 (2) References Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Jarque, C. M., and A. K. Bera. 1987. A test for normality of observations and regression residuals. International Statistical Review 2: 163–172. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Also see [TS] var — Vector autoregressive models [TS] var svar — Structural vector autoregressive models [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [TS] var intro — Introduction to vector autoregressive models Title varsoc — Obtain lag-order selection statistics for VARs and VECMs Description Preestimation options Methods and formulas Quick start Postestimation option References Menu Remarks and examples Also see Syntax Stored results Description varsoc reports the final prediction error (FPE), Akaike’s information criterion (AIC), Schwarz’s Bayesian information criterion (SBIC), and the Hannan and Quinn information criterion (HQIC) lagorder selection statistics for a series of vector autoregressions of order 1 through a requested maximum lag. A sequence of likelihood-ratio test statistics for all the full VARs of order less than or equal to the highest lag order is also reported. varsoc can be used as a preestimation or a postestimation command. The preestimation version can be used to select the lag order for a VAR or vector error-correction model (VECM). The postestimation version obtains the information needed to compute the statistics from the previous model or specified stored estimates. Quick start Compute AIC, SBIC, and HQIC, and final prediction error to aid in the lag-order selection before VAR or VECM estimation of y1 and y2 using tsset data varsoc y1 y2 As above, but set the maximum lag order to be tested to 7 varsoc y1 y2, maxlag(7) As above, but use Lütkepohl’s version of the information criteria varsoc y1 y2, maxlag(7) lutstats Menu Preestimation for VARs Statistics > Multivariate time series > VAR diagnostics and tests > Lag-order selection statistics (preestimation) > VAR diagnostics and tests > Lag-order selection statistics (postestimation) > VEC diagnostics and tests > Lag-order selection statistics (preestimation) > VEC diagnostics and tests > Lag-order selection statistics (postestimation) Postestimation for VARs Statistics > Multivariate time series Preestimation for VECMs Statistics > Multivariate time series Postestimation for VECMs Statistics > Multivariate time series 803 804 varsoc — Obtain lag-order selection statistics for VARs and VECMs Syntax Preestimation syntax varsoc depvarlist  if   in   , preestimation options  Postestimation syntax   varsoc , estimates(estname) preestimation options Description Main maxlag(#) exog(varlist) constraints(constraints) noconstant lutstats level(#) separator(#) set maximum lag order to #; default is maxlag(4) use varlist as exogenous variables apply constraints to exogenous variables suppress constant term use Lütkepohl’s version of information criteria set confidence level; default is level(95) draw separator line after every # rows You must tsset your data before using varsoc; see [TS] tsset. by is allowed with the preestimation version of varsoc; see [U] 11.1.10 Prefix commands. Preestimation options   Main maxlag(#) specifies the maximum lag order for which the statistics are to be obtained. exog(varlist) specifies exogenous variables to include in the VARs fit by varsoc. constraints(constraints) specifies a list of constraints on the exogenous variables to be applied. Do not specify constraints on the lags of the endogenous variables because specifying one would mean that at least one of the VAR models considered by varsoc will not contain the lag specified in the constraint. Use var directly to obtain selection-order criteria with constraints on lags of the endogenous variables. noconstant suppresses the constant terms from the model. By default, constant terms are included. lutstats specifies that the Lütkepohl (2005) versions of the information criteria be reported. See Methods and formulas for a discussion of these statistics. level(#) specifies the confidence level, as a percentage, that is used to identify the first likelihoodratio test that rejects the null hypothesis that the additional parameters from adding a lag are jointly zero. The default is level(95) or as set by set level; see [U] 20.7 Specifying the width of confidence intervals. separator(#) specifies how often separator lines should be drawn between rows. By default, separator lines do not appear. For example, separator(1) would draw a line between each row, separator(2) between every other row, and so on. varsoc — Obtain lag-order selection statistics for VARs and VECMs 805 Postestimation option estimates(estname) specifies the name of a previously stored set of var or svar estimates. When no depvarlist is specified, varsoc uses the postestimation syntax and uses the currently active estimation results or the results specified in estimates(estname). See [R] estimates for information on manipulating estimation results. Remarks and examples Many selection-order statistics have been developed to assist researchers in fitting a VAR of the correct order. Several of these selection-order statistics appear in the [TS] var output. The varsoc command computes these statistics over a range of lags p while maintaining a common sample and option specification. varsoc can be used as a preestimation or a postestimation command. When it is used as a preestimation command, a depvarlist is required, and the default maximum lag is 4. When it is used as a postestimation command, varsoc uses the model specification stored in estname or the previously fitted model. varsoc computes four information criteria as well as a sequence of likelihood ratio (LR) tests. The information criteria include the FPE, AIC, the HQIC, and SBIC. For a given lag p, the LR test compares a VAR with p lags with one with p − 1 lags. The null hypothesis is that all the coefficients on the pth lags of the endogenous variables are zero. To use this sequence of LR tests to select a lag order, we start by looking at the results of the test for the model with the most lags, which is at the bottom of the table. Proceeding up the table, the first test that rejects the null hypothesis is the lag order selected by this process. See Lütkepohl (2005, 143–144) for more information on this procedure. An ‘*’ appears next to the LR statistic indicating the optimal lag. For the remaining statistics, the lag with the smallest value is the order selected by that criterion. An ‘*’ indicates the optimal lag. Strictly speaking, the FPE is not an information criterion, though we include it in this discussion because, as with an information criterion, we select the lag length corresponding to the lowest value; and, naturally, we want to minimize the prediction error. The AIC measures the discrepancy between the given model and the true model, which, of course, we want to minimize. Amemiya (1985) provides an intuitive discussion of the arguments in Akaike (1973). The SBIC and the HQIC can be interpreted similarly to the AIC, though the SBIC and the HQIC have a theoretical advantage over the AIC and the FPE. As Lütkepohl (2005, 148–152) demonstrates, choosing p to minimize the SBIC or the HQIC provides consistent estimates of the true lag order, p. In contrast, minimizing the AIC or the FPE will overestimate the true lag order with positive probability, even with an infinite sample size. Although VAR models assume that the modulus is strictly less than 1 (see [TS] varstable), VECMs do not need to satisfy this condition, and they work even if all the variables included in the model are integrated of order 1, I(1). Regardless of these differences, varsoc works for both estimation commands. As shown by Nielsen (2001), the lag-order selection statistics discussed above can be used in the presence of I(1) variables. 806 varsoc — Obtain lag-order selection statistics for VARs and VECMs Example 1: Preestimation Here we use varsoc as a preestimation command. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . varsoc dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lutstats Selection-order criteria (lutstats) Sample: 1961q2 - 1978q4 Number of obs = 71 lag 0 1 2 3 4 LL LR 564.784 576.409 588.859 591.237 598.457 Endogenous: Exogenous: 23.249 24.901* 4.7566 14.438 df p 9 9 9 9 0.006 0.003 0.855 0.108 FPE AIC HQIC 2.7e-11 -24.423 -24.423* 2.5e-11 -24.497 -24.3829 2.3e-11* -24.5942* -24.3661 2.7e-11 -24.4076 -24.0655 2.9e-11 -24.3575 -23.9012 SBIC -24.423* -24.2102 -24.0205 -23.5472 -23.2102 dln_inv dln_inc dln_consump _cons The sample used begins in 1961q2 because all the VARs are fit to the sample defined by any if or in conditions and the available data for the maximum lag specified. The default maximum number of lags is four. Because we specified the lutstats option, the table contains the Lütkepohl (2005) versions of the information criteria, which differ from the standard definitions in that they drop the constant term from the log likelihood. In this example, the likelihood-ratio tests selected a model with two lags. AIC and FPE have also both chosen a model with two lags, whereas SBIC and HQIC have both selected a model with zero lags. Example 2: Postestimation varsoc works as a postestimation command when no dependent variables are specified. . var dln_inc dln_consump if qtr<=tq(1978q4), lutstats exog(l.dln_inv) (output omitted ) . varsoc Selection-order criteria (lutstats) Sample: 1960q4 - 1978q4 Number of obs = lag 0 1 2 LL LR 460.646 467.606 477.087 Endogenous: Exogenous: 13.919 18.962* df p 4 4 0.008 0.001 FPE AIC HQIC 73 SBIC 1.3e-08 -18.2962 -18.2962 -18.2962* 1.2e-08 -18.3773 -18.3273 -18.2518 1.0e-08* -18.5275* -18.4274* -18.2764 dln_inc dln_consump L.dln_inv _cons Because we included one lag of dln inv in our original model, varsoc did likewise with each model it fit. Based on the work of Tsay (1984), Paulsen (1984), and Nielsen (2001), these lag-order selection criteria can be used to determine the lag length of the VAR underlying a VECM. See [TS] vec intro for an example in which we use varsoc to choose the lag order for a VECM. varsoc — Obtain lag-order selection statistics for VARs and VECMs 807 Stored results varsoc stores the following in r(): Scalars r(N) r(tmax) r(tmin) Macros r(endog) r(lutstats) r(cns#) Matrices r(stats) number of observations last time period in sample first time period in sample r(mlag) r(N gaps) maximum lag order the number of gaps in the sample names of endogenous variables lutstats, if specified the #th constraint r(exog) r(rmlutstats) names of exogenous variables rmlutstats, if specified LL, LR, FPE, AIC, HQIC, SBIC, and p-values Methods and formulas As shown by Hamilton (1994, 295–296), the log likelihood for a VAR(p) is     T b −1 | − K ln(2π) − K ln |Σ LL = 2 b is the maximum where T is the number of observations, K is the number of equations, and Σ likelihood estimate of E[ut u0t ], where ut is the K × 1 vector of disturbances. Because   b −1 | = −ln |Σ b| ln |Σ the log likelihood can be rewritten as  n o  T b | + K ln(2π) + K ln |Σ LL = − 2 Letting LL(j ) be the value of the log likelihood with j lags yields the LR statistic for lag order j as  LR(j) = 2 LL(j) − LL(j − 1) Model-order statistics The formula for the FPE given in Lütkepohl (2005, 147) is  K T + Kp + 1 FPE = |Σu | T − Kp − 1 This formula, however, assumes that there is a constant in the model and that none of the variables are dropped because of collinearity. To deal with these problems, the FPE is implemented as  K T +m FPE = |Σu | T −m where m is the average number of parameters over the K equations. This implementation accounts for variables dropped because of collinearity. 808 varsoc — Obtain lag-order selection statistics for VARs and VECMs By default, the AIC, SBIC, and HQIC are computed according to their standard definitions, which include the constant term from the log likelihood. That is,   LL 2tp AIC = − 2 + T T   LL ln(T ) SBIC = − 2 + tp T T    2ln ln(T ) LL tp + HQIC = − 2 T T where tp is the total number of parameters in the model and LL is the log likelihood. Lutstats Lütkepohl (2005) advocates dropping the constant term from the log likelihood because it does not affect inference. The Lütkepohl versions of the information criteria are  2pK 2 = ln |Σu | + T  ln(T ) pK 2 SBIC = ln |Σu | + T  2ln ln(T ) HQIC = ln |Σu | + pK 2 T AIC References Akaike, H. 1973. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, ed. B. N. Petrov and F. Csaki, 267–281. Budapest: Akailseoniai–Kiudo. Amemiya, T. 1985. Advanced Econometrics. Cambridge, MA: Harvard University Press. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. Nielsen, B. 2001. Order determination in general vector autoregressions. Working paper, Department of Economics, University of Oxford and Nuffield College. http://ideas.repec.org/p/nuf/econwp/0110.html. Paulsen, J. 1984. Order determination of multivariate autoregressive time series with unit roots. Journal of Time Series Analysis 5: 115–127. Tsay, R. S. 1984. Order selection in nonstationary autoregressive models. Annals of Statistics 12: 1425–1433. Also see [TS] var — Vector autoregressive models [TS] var svar — Structural vector autoregressive models [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [TS] vec — Vector error-correction models [TS] var intro — Introduction to vector autoregressive models [TS] vec intro — Introduction to vector error-correction models Title varstable — Check the stability condition of VAR or SVAR estimates Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description varstable checks the eigenvalue stability condition after estimating the parameters of a vector autoregression using var or svar. Quick start Check eigenvalue stability condition after var or svar varstable As above, and graph the eigenvalues of the companion matrix varstable, graph As above, and label each eigenvalue with its distance from the unit circle varstable, graph dlabel As above, but label the eigenvalues with their moduli varstable, graph modlabel Menu Statistics > Multivariate time series > VAR diagnostics and tests 809 > Check stability condition of VAR estimates 810 varstable — Check the stability condition of VAR or SVAR estimates Syntax varstable  , options  Description options Main estimates(estname) amat(matrix name) graph dlabel modlabel marker options rlopts(cline options) nogrid  pgrid( . . . ) use previously stored results estname; default is to use active results save the companion matrix as matrix name graph eigenvalues of the companion matrix label eigenvalues with the distance from the unit circle label eigenvalues with the modulus change look of markers (color, size, etc.) affect rendition of reference unit circle suppress polar grid circles specify radii and appearance of polar grid circles; see Options for details Add plots addplot(plot) add other plots to the generated graph Y axis, X axis, Titles, Legend, Overall twoway options any options other than by() documented in [G-3] twoway options varstable can be used only after var or svar; see [TS] var and [TS] var svar. Options   Main estimates(estname) requests that varstable use the previously obtained set of var estimates stored as estname. By default, varstable uses the active estimation results. See [R] estimates for information on manipulating estimation results. amat(matrix name) specifies a valid Stata matrix name by which the companion matrix A can be saved (see Methods and formulas for the definition of the matrix A). The default is not to save the A matrix. graph causes varstable to draw a graph of the eigenvalues of the companion matrix. dlabel labels each eigenvalue with its distance from the unit circle. dlabel cannot be specified with modlabel. modlabel labels the eigenvalues with their moduli. modlabel cannot be specified with dlabel. marker options specify the look of markers. This look includes the marker symbol, the marker size, and its color and outline; see [G-3] marker options. rlopts(cline options) affect the rendition of the reference unit circle; see [G-3] cline options. nogrid suppresses the polar grid circles.    pgrid( numlist , line options ) determines the radii and appearance of the polar grid circles. By default, the graph includes nine polar grid circles with radii 0.1, 0.2, . . . , 0.9 that have the grid line style. The numlist specifies the radii for the polar grid circles. The line options determine the appearance of the polar grid circles; see [G-3] line options. Because the pgrid() option can be repeated, circles with different radii can have distinct appearances. varstable — Check the stability condition of VAR or SVAR estimates  811  Add plots addplot(plot) adds specified plots to the generated graph. See [G-3] addplot option.   Y axis, X axis, Titles, Legend, Overall twoway options are any of the options documented in [G-3] twoway options, except by(). These include options for titling the graph (see [G-3] title options) and for saving the graph to disk (see [G-3] saving option). Remarks and examples Inference after var and svar requires that variables be covariance stationary. The variables in yt are covariance stationary if their first two moments exist and are independent of time. More explicitly, a variable yt is covariance stationary if 1. E[yt ] is finite and independent of t. 2. Var[yt ] is finite and independent of t 3. Cov[yt , ys ] is a finite function of |t − s| but not of t or s alone. Interpretation of VAR models, however, requires that an even stricter stability condition be met. If a VAR is stable, it is invertible and has an infinite-order vector moving-average representation. If the VAR is stable, impulse–response functions and forecast-error variance decompositions have known interpretations. Lütkepohl (2005) and Hamilton (1994) both show that if the modulus of each eigenvalue of the matrix A is strictly less than one, the estimated VAR is stable (see Methods and formulas for the definition of the matrix A). Example 1 After fitting a VAR with var, we can use varstable to check the stability condition. Using the same VAR model that was used in [TS] var, we demonstrate the use of varstable. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr>=tq(1961q2) & qtr<=tq(1978q4) (output omitted ) . varstable, graph Eigenvalue stability condition Eigenvalue .5456253 -.3785754 -.3785754 -.0643276 -.0643276 -.3698058 + + - .3853982i .3853982i .4595944i .4595944i Modulus .545625 .540232 .540232 .464074 .464074 .369806 All the eigenvalues lie inside the unit circle. VAR satisfies stability condition. Because the modulus of each eigenvalue is strictly less than 1, the estimates satisfy the eigenvalue stability condition. 812 varstable — Check the stability condition of VAR or SVAR estimates Specifying the graph option produced a graph of the eigenvalues with the real components on the x axis and the complex components on the y axis. The graph below indicates visually that these eigenvalues are well inside the unit circle. −1 −.5 Imaginary 0 .5 1 Roots of the companion matrix −1 −.5 0 Real .5 1 Example 2 This example illustrates two other features of the varstable command. First, varstable can check the stability of the estimates of the VAR underlying an SVAR fit by var svar. Second, varstable can check the stability of any previously stored var or var svar estimates. We begin by refitting the previous VAR and storing the results as var1. Because this is the same VAR that was fit in the previous example, the stability results should be identical. . var dln_inv dln_inc dln_consump if qtr>=tq(1961q2) & qtr<=tq(1978q4) (output omitted ) . estimates store var1 Now we use svar to fit an SVAR with a different underlying VAR and check the estimates of that underlying VAR for stability. . matrix A = (.,0\.,.) . matrix B = I(2) . svar d.ln_inc d.ln_consump, aeq(A) beq(B) (output omitted ) . varstable Eigenvalue stability condition Eigenvalue .548711 -.2979493 + -.2979493 -.3570825 .4328013i .4328013i Modulus .548711 .525443 .525443 .357082 All the eigenvalues lie inside the unit circle. VAR satisfies stability condition. varstable — Check the stability condition of VAR or SVAR estimates 813 The estimates() option allows us to check the stability of the var results stored as var1. . varstable, est(var1) Eigenvalue stability condition Eigenvalue .5456253 -.3785754 -.3785754 -.0643276 -.0643276 -.3698058 + + - Modulus .545625 .540232 .540232 .464074 .464074 .369806 .3853982i .3853982i .4595944i .4595944i All the eigenvalues lie inside the unit circle. VAR satisfies stability condition. The results are identical to those obtained in the previous example, confirming that we were checking the results in var1. Stored results varstable stores the following in r(): Matrices r(Re) r(Im) r(Modulus) real part of the eigenvalues of A imaginary part of the eigenvalues of A modulus of the eigenvalues of A Methods and formulas varstable forms the companion matrix A1  I  0 A=  .  ..  0 A2 0 I .. . 0 . . . Ap−1 ... 0 ... 0 .. .. . . ... I  Ap 0   0  ..  .  0 and obtains √ its eigenvalues by using matrix eigenvalues. The modulus of the complex eigenvalue r + ci is r2 + c2 . As shown by Lütkepohl (2005) and Hamilton (1994), the VAR is stable if the modulus of each eigenvalue of A is strictly less than 1. References Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. . 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. 814 varstable — Check the stability condition of VAR or SVAR estimates Also see [TS] var — Vector autoregressive models [TS] var svar — Structural vector autoregressive models [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [TS] var intro — Introduction to vector autoregressive models Title varwle — Obtain Wald lag-exclusion statistics after var or svar Description Options References Quick start Remarks and examples Also see Menu Stored results Syntax Methods and formulas Description varwle reports Wald tests the hypothesis that the endogenous variables at a given lag are jointly zero for each equation and for all equations jointly. Quick start Wald lag-exclusion statistics after var or svar varwle As above, but use VAR or SVAR estimation results stored in myest varwle, estimates(myest) Menu Statistics > Multivariate time series > VAR diagnostics and tests 815 > Wald lag-exclusion statistics 816 varwle — Obtain Wald lag-exclusion statistics after var or svar Syntax varwle  , estimates(estname) separator(#)  varwle can be used only after var or svar; see [TS] var and [TS] var svar. Options estimates(estname) requests that varwle use the previously obtained set of var or svar estimates stored as estname. By default, varwle uses the active estimation results. See [R] estimates for information on manipulating estimation results. separator(#) specifies how often separator lines should be drawn between rows. By default, separator lines do not appear. For example, separator(1) would draw a line between each row, separator(2) between every other row, and so on. Remarks and examples After fitting a VAR, one hypothesis of interest is that all the endogenous variables at a given lag are jointly zero. varwle reports Wald tests of this hypothesis for each equation and for all equations jointly. varwle uses the estimation results from a previously fitted var or svar. By default, varwle uses the active estimation results, but you may also use a stored set of estimates by specifying the estimates() option. If the VAR was fit with the small option, varwle also presents small-sample F statistics; otherwise, varwle presents large-sample chi-squared statistics. varwle — Obtain Wald lag-exclusion statistics after var or svar 817 Example 1: After var We analyze the model with the German data described in [TS] var using varwle. . use http://www.stata-press.com/data/r14/lutkepohl2 (Quarterly SA West German macro data, Bil DM, from Lutkepohl 1993 Table E.1) . var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), dfk small (output omitted ) . varwle Equation: dln_inv lag 1 2 F df df_r 2.64902 1.25799 3 3 66 66 F df df_r 2.19276 .907499 3 3 66 66 Prob > F 0.0560 0.2960 Equation: dln_inc lag 1 2 Prob > F 0.0971 0.4423 Equation: dln_consump lag 1 2 F df df_r 1.80804 5.57645 3 3 66 66 F df df_r 3.78884 2.96811 9 9 66 66 Prob > F 0.1543 0.0018 Equation: All lag 1 2 Prob > F 0.0007 0.0050 Because the VAR was fit with the dfk and small options, varwle used the small-sample estimator b in constructing the VCE, producing an F statistic. The first two equations appear to have a of Σ different lag structure from that of the third. In the first two equations, we cannot reject the null hypothesis that all three endogenous variables have zero coefficients at the second lag. The hypothesis that all three endogenous variables have zero coefficients at the first lag can be rejected at the 10% level for both of the first two equations. In contrast, in the third equation, the coefficients on the second lag of the endogenous variables are jointly significant, but not those on the first lag. However, we strongly reject the hypothesis that the coefficients on the first lag of the endogenous variables are zero in all three equations jointly. Similarly, we can also strongly reject the hypothesis that the coefficients on the second lag of the endogenous variables are zero in all three equations jointly. If we believe these results strongly enough, we might want to refit the original VAR, placing some constraints on the coefficients. See [TS] var for details on how to fit VAR models with constraints. 818 varwle — Obtain Wald lag-exclusion statistics after var or svar Example 2: After svar Here we fit a simple SVAR and then run varwle: . matrix a = (.,0\.,.) . matrix b = I(2) . svar dln_inc dln_consump, aeq(a) beq(b) Estimating short-run parameters Iteration 0: log likelihood = -159.21683 Iteration 1: log likelihood = 490.92264 Iteration 2: log likelihood = 528.66126 Iteration 3: log likelihood = 573.96363 Iteration 4: log likelihood = 578.05136 Iteration 5: log likelihood = 578.27633 Iteration 6: log likelihood = 578.27699 Iteration 7: log likelihood = 578.27699 Structural vector autoregression ( 1) [a_1_2]_cons = 0 ( 2) [b_1_1]_cons = 1 ( 3) [b_1_2]_cons = 0 ( 4) [b_2_1]_cons = 0 ( 5) [b_2_2]_cons = 1 Sample: 1960q4 - 1982q4 Exactly identified model Coef. /a_1_1 /a_2_1 /a_1_2 /a_2_2 89.72411 -64.73622 0 126.2964 /b_1_1 /b_2_1 /b_1_2 /b_2_2 1 0 0 1 Std. Err. Number of obs Log likelihood z 6.725107 13.34 10.67698 -6.06 (constrained) 9.466318 13.34 P>|z| = = 89 578.277 [95% Conf. Interval] 0.000 0.000 76.54315 -85.66271 102.9051 -43.80973 0.000 107.7428 144.8501 (constrained) (constrained) (constrained) (constrained) The output table from var svar gives information about the estimates of the parameters in the A and B matrices in the structural VAR. But, as discussed in [TS] var svar, an SVAR model builds on an underlying VAR. When varwle uses the estimation results produced by svar, it performs Wald lag-exclusion tests on the underlying VAR model. Next we run varwle on these svar results. varwle — Obtain Wald lag-exclusion statistics after var or svar 819 . varwle Equation: dln_inc lag 1 2 chi2 df Prob > chi2 6.88775 1.873546 2 2 0.032 0.392 Equation: dln_consump lag 1 2 chi2 df Prob > chi2 9.938547 13.89996 2 2 0.007 0.001 chi2 df Prob > chi2 34.54276 19.44093 4 4 0.000 0.001 Equation: All lag 1 2 Now we fit the underlying VAR with two lags and apply varwle to these results. . var dln_inc dln_consump (output omitted ) . varwle Equation: dln_inc lag 1 2 chi2 df Prob > chi2 6.88775 1.873546 2 2 0.032 0.392 Equation: dln_consump lag 1 2 chi2 df Prob > chi2 9.938547 13.89996 2 2 0.007 0.001 chi2 df Prob > chi2 34.54276 19.44093 4 4 0.000 0.001 Equation: All lag 1 2 Because varwle produces the same results in these two cases, we can conclude that when varwle is applied to svar results, it performs Wald lag-exclusion tests on the underlying VAR. 820 varwle — Obtain Wald lag-exclusion statistics after var or svar Stored results varwle stores the following in r(): Matrices if e(small)=="" r(chi2) r(df) r(p) if e(small)!="" r(F) r(df r) r(df) r(p) χ2 test statistics degrees of freedom p-values F test statistics numerator degrees of freedom denominator degree of freedom p-values Methods and formulas varwle uses test to obtain Wald statistics of the hypotheses that all the endogenous variables at a given lag are jointly zero for each equation and for all equations jointly. Like the test command, varwle uses estimation results stored by var or var svar to determine whether to calculate and report small-sample F statistics or large-sample chi-squared statistics.   Abraham Wald (1902–1950) was born in Cluj, in what is now Romania. He studied mathematics at the University of Vienna, publishing at first on geometry, but then became interested in economics and econometrics. He moved to the United States in 1938 and later joined the faculty at Columbia. His major contributions to statistics include work in decision theory, optimal sequential sampling, large-sample distributions of likelihood-ratio tests, and nonparametric inference. Wald died in a plane crash in India.  References Amisano, G., and C. Giannini. 1997. Topics in Structural VAR Econometrics. 2nd ed. Heidelberg: Springer. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis. 2nd ed. New York: Springer. Mangel, M., and F. J. Samaniego. 1984. Abraham Wald’s work on aircraft survivability. Journal of the American Statistical Association 79: 259–267. Wolfowitz, J. 1952. Abraham Wald, 1902–1950. Annals of Mathematical Statistics 23: 1–13 (and other reports in same issue). Also see [TS] var — Vector autoregressive models [TS] var svar — Structural vector autoregressive models [TS] varbasic — Fit a simple VAR and graph IRFs or FEVDs [TS] var intro — Introduction to vector autoregressive models Title vec intro — Introduction to vector error-correction models Description Remarks and examples References Also see Description Stata has a suite of commands for fitting, forecasting, interpreting, and performing inference on vector error-correction models (VECMs) with cointegrating variables. After fitting a VECM, the irf commands can be used to obtain impulse–response functions (IRFs) and forecast-error variance decompositions (FEVDs). The table below describes the available commands. Fitting a VECM vec [TS] vec Model diagnostics and inference vecrank [TS] vecrank veclmar [TS] veclmar vecnorm vecstable varsoc [TS] vecnorm [TS] vecstable [TS] varsoc Fit vector error-correction models Estimate the cointegrating rank of a VECM Perform LM test for residual autocorrelation after vec Test for normally distributed disturbances after vec Check the stability condition of VECM estimates Obtain lag-order selection statistics for VARs and VECMs Forecasting from a VECM fcast compute [TS] fcast compute fcast graph [TS] fcast graph Compute dynamic forecasts after var, svar, or vec Graph forecasts after fcast compute Working with IRFs and FEVDs irf [TS] irf Create and analyze IRFs and FEVDs This manual entry provides an overview of the commands for VECMs; provides an introduction to integration, cointegration, estimation, inference, and interpretation of VECM models; and gives an example of how to use Stata’s vec commands. Remarks and examples vec estimates the parameters of cointegrating VECMs. You may specify any of the five trend specifications in Johansen (1995, sec. 5.7). By default, identification is obtained via the Johansen normalization, but vec allows you to obtain identification by placing your own constraints on the parameters of the cointegrating vectors. You may also put more restrictions on the adjustment coefficients. vecrank is the command for determining the number of cointegrating equations. vecrank implements Johansen’s multiple trace test procedure, the maximum eigenvalue test, and a method based on minimizing either of two different information criteria. 821 822 vec intro — Introduction to vector error-correction models Because Nielsen (2001) has shown that the methods implemented in varsoc can be used to choose the order of the autoregressive process, no separate vec command is needed; you can simply use varsoc. veclmar tests that the residuals have no serial correlation, and vecnorm tests that they are normally distributed. All the irf routines described in [TS] irf are available for estimating, interpreting, and managing estimated IRFs and FEVDs for VECMs. Remarks are presented under the following headings: Introduction to cointegrating VECMs What is cointegration? The multivariate VECM specification Trends in the Johansen VECM framework VECM estimation in Stata Selecting the number of lags Testing for cointegration Fitting a VECM Fitting VECMs with Johansen’s normalization Postestimation specification testing Impulse–response functions for VECMs Forecasting with VECMs Introduction to cointegrating VECMs This section provides a brief introduction to integration, cointegration, and cointegrated vector error-correction models. For more details about these topics, see Hamilton (1994), Johansen (1995), Lütkepohl (2005), Watson (1994), and Becketti (2013). What is cointegration? Standard regression techniques, such as ordinary least squares (OLS), require that the variables be covariance stationary. A variable is covariance stationary if its mean and all its autocovariances are finite and do not change over time. Cointegration analysis provides a framework for estimation, inference, and interpretation when the variables are not covariance stationary. Instead of being covariance stationary, many economic time series appear to be “first-difference stationary”. This means that the level of a time series is not stationary but its first difference is. Firstdifference stationary processes are also known as integrated processes of order 1, or I(1) processes. Covariance-stationary processes are I(0). In general, a process whose dth difference is stationary is an integrated process of order d, or I(d). The canonical example of a first-difference stationary process is the random walk. This is a variable xt that can be written as xt = xt−1 + t (1) where the t are independent and identically distributed with mean zero and a finite variance σ 2 . Although E[xt ] = 0 for all t, Var[xt ] = T σ 2 is not time invariant, so xt is not covariance stationary. Because ∆xt = xt − xt−1 = t and t is covariance stationary, xt is first-difference stationary. These concepts are important because, although conventional estimators are well behaved when applied to covariance-stationary data, they have nonstandard asymptotic distributions and different rates of convergence when applied to I(1) processes. To illustrate, consider several variants of the model yt = a + bxt + et (2) Throughout the discussion, we maintain the assumption that E[et ] = 0. vec intro — Introduction to vector error-correction models 823 If both yt and xt are covariance-stationary processes, et must also be covariance stationary. As long as E[xt et ] = 0, we can consistently estimate the parameters a and b by using OLS. Furthermore, the distribution of the OLS estimator converges to a normal distribution centered at the true value as the sample size grows. If yt and xt are independent random walks and b = 0, there is no relationship between yt and xt , and (2) is called a spurious regression. Granger and Newbold (1974) performed Monte Carlo experiments and showed that the usual t statistics from OLS regression provide spurious results: given a large enough dataset, we can almost always reject the null hypothesis of the test that b = 0 even though b is in fact zero. Here the OLS estimator does not converge to any well-defined population parameter. Phillips (1986) later provided the asymptotic theory that explained the Granger and Newbold (1974) results. He showed that the random walks yt and xt are first-difference stationary processes and that the OLS estimator does not have its usual asymptotic properties when the variables are first-difference stationary. Because ∆yt and ∆xt are covariance stationary, a simple regression of ∆yt on ∆xt appears to be a viable alternative. However, if yt and xt cointegrate, as defined below, the simple regression of ∆yt on ∆xt is misspecified. If yt and xt are I(1) and b 6= 0, et could be either I(0) or I(1). Phillips and Durlauf (1986) have derived the asymptotic theory for the OLS estimator when et is I(1), though it has not been widely used in applied work. More interesting is the case in which et = yt − a − bxt is I(0). yt and xt are then said to be cointegrated. Two variables are cointegrated if each is an I(1) process but a linear combination of them is an I(0) process. It is not possible for yt to be a random walk and xt and et to be covariance stationary. As Granger (1981) pointed out, because a random walk cannot be equal to a covariance-stationary process, the equation does not “balance”. An equation balances when the processes on each side of the equal sign are of the same order of integration. Before attacking any applied problem with integrated variables, make sure that the equation balances before proceeding. An example from Engle and Granger (1987) provides more intuition. Redefine yt and xt to be yt + βxt = t , yt + αxt = νt , t = t−1 + ξt νt = ρνt−1 + ζt , |ρ| < 1 (3) (4) where ξt and ζt are i.i.d. disturbances over time that are correlated with each other. Because t is I(1), (3) and (4) imply that both xt and yt are I(1). The condition that |ρ| < 1 implies that νt and yt + αxt are I(0). Thus yt and xt cointegrate, and (1, α) is the cointegrating vector. Using a bit of algebra, we can rewrite (3) and (4) as ∆yt =βδzt−1 + η1t ∆xt = − δzt−1 + η2t (5) (6) where δ = (1 −ρ)/(α−β), zt = yt +αxt , and η1t and η2t are distinct, stationary, linear combinations of ξt and ζt . This representation is known as the vector error-correction model (VECM). One can think of zt = 0 as being the point at which yt and xt are in equilibrium. The coefficients on zt−1 describe how yt and xt adjust to zt−1 being nonzero, or out of equilibrium. zt is the “error” in the system, and (5) and (6) describe how system adjusts or corrects back to the equilibrium. As ρ → 1, the system degenerates into a pair of correlated random walks. The VECM parameterization highlights this point, because δ → 0 as ρ → 1. 824 vec intro — Introduction to vector error-correction models If we knew α, we would know zt , and we could work with the stationary system of (5) and (6). Although knowing α seems silly, we can conduct much of the analysis as if we knew α because there is an estimator for the cointegrating parameter α that converges to its true value at a faster rate than the estimator for the adjustment parameters β and δ . The definition of a bivariate cointegrating relation requires simply that there exist a linear combination of the I(1) variables that is I(0). If yt and xt are I(1) and there are two finite real numbers a 6= 0 and b 6= 0, such that ayt + bxt is I(0), then yt and xt are cointegrated. Although there are two parameters, a and b, only one will be identifiable because if ayt + bxt is I(0), so is cayt + cbxt for any finite, nonzero, real number c. Obtaining identification in the bivariate case is relatively simple. The coefficient on yt in (4) is unity. This natural construction of the model placed the necessary identification restriction on the cointegrating vector. As we discuss below, identification in the multivariate case is more involved. If yt is a K × 1 vector of I(1) variables and there exists a vector β, such that βyt is a vector of I(0) variables, then yt is said to be cointegrating of order (1, 0) with cointegrating vector β. We say that the parameters in β are the parameters in the cointegrating equation. For a vector of length K , there may be at most K − 1 distinct cointegrating vectors. Engle and Granger (1987) provide a more general definition of cointegration, but this one is sufficient for our purposes. The multivariate VECM specification In practice, most empirical applications analyze multivariate systems, so the rest of our discussion focuses on that case. Consider a VAR with p lags yt = v + A1 yt−1 + A2 yt−2 + · · · + Ap yt−p + t (7) where yt is a K × 1 vector of variables, v is a K × 1 vector of parameters, A1 – Ap are K × K matrices of parameters, and t is a K × 1 vector of disturbances. t has mean 0, has covariance matrix Σ, and is i.i.d. normal over time. Any VAR(p) can be rewritten as a VECM. Using some algebra, we can rewrite (7) in VECM form as ∆yt = v + Πyt−1 + p−1 X Γi ∆yt−i + t (8) i=1 where Π = Pj=p j=1 Aj − Ik and Γi = − Pj=p j=i+1 Aj . The v and t in (7) and (8) are identical. Engle and Granger (1987) show that if the variables yt are I(1) the matrix Π in (8) has rank 0 ≤ r < K , where r is the number of linearly independent cointegrating vectors. If the variables cointegrate, 0 < r < K and (8) shows that a VAR in first differences is misspecified because it omits the lagged level term Πyt−1 . Assume that Π has reduced rank 0 < r < K so that it can be expressed as Π = αβ0 , where α and β are both r × K matrices of rank r. Without further restrictions, the cointegrating vectors are not identified: the parameters (α, β) are indistinguishable from the parameters (αQ, βQ−10 ) for any r × r nonsingular matrix Q. Because only the rank of Π is identified, the VECM is said to identify the rank of the cointegrating space, or equivalently, the number of cointegrating vectors. In practice, the estimation of the parameters of a VECM requires at least r2 identification restrictions. Stata’s vec command can apply the conventional Johansen restrictions discussed below or use constraints that the user supplies. The VECM in (8) also nests two important special cases. If the variables in yt are I(1) but not cointegrated, Π is a matrix of zeros and thus has rank 0. If all the variables are I(0), Π has full rank K. vec intro — Introduction to vector error-correction models 825 There are several different frameworks for estimation and inference in cointegrating systems. Although the methods in Stata are based on the maximum likelihood (ML) methods developed by Johansen (1988, 1991, 1995), other useful frameworks have been developed by Park and Phillips (1988, 1989); Sims, Stock, and Watson (1990); Stock (1987); and Stock and Watson (1988); among others. The ML framework developed by Johansen was independently developed by Ahn and Reinsel (1990). Maddala and Kim (1998) and Watson (1994) survey all of these methods. The cointegration methods in Stata are based on Johansen’s maximum likelihood framework because it has been found to be particularly useful in several comparative studies, including Gonzalo (1994) and Hubrich, Lütkepohl, and Saikkonen (2001). Trends in the Johansen VECM framework Deterministic trends in a cointegrating VECM can stem from two distinct sources; the mean of the cointegrating relationship and the mean of the differenced series. Allowing for a constant and a linear trend and assuming that there are r cointegrating relations, we can rewrite the VECM in (8) as 0 ∆yt = αβ yt−1 + p−1 X Γi ∆yt−i + v + δt + t (9) i=1 where δ is a K × 1 vector of parameters. Because (9) models the differences of the data, the constant implies a linear time trend in the levels, and the time trend δt implies a quadratic time trend in the levels of the data. Often we may want to include a constant or a linear time trend for the differences without allowing for the higher-order trend that is implied for the levels of the data. VECMs exploit the properties of the matrix α to achieve this flexibility. Because α is a K × r rank matrix, we can rewrite the deterministic components in (9) as v = αµ + γ δt = αρt + τt (10a) (10b) where µ and ρ are r × 1 vectors of parameters and γ and τ are K × 1 vectors of parameters. γ is orthogonal to αµ, and τ is orthogonal to αρ; that is, γ0 αµ = 0 and τ0 αρ = 0, allowing us to rewrite (9) as p−1 X ∆yt = α(β0 yt−1 + µ + ρt) + Γi ∆yt−i + γ + τ t + t (11) i=1 Placing restrictions on the trend terms in (11) yields five cases. CASE 1: Unrestricted trend If no restrictions are placed on the trend parameters, (11) implies that there are quadratic trends in the levels of the variables and that the cointegrating equations are stationary around time trends (trend stationary). CASE 2: Restricted trend, τ =0 By setting τ = 0, we assume that the trends in the levels of the data are linear but not quadratic. This specification allows the cointegrating equations to be trend stationary. CASE 3: Unrestricted constant, τ = 0 and ρ = 0 By setting τ = 0 and ρ = 0, we exclude the possibility that the levels of the data have quadratic trends, and we restrict the cointegrating equations to be stationary around constant means. Because γ is not restricted to zero, this specification still puts a linear time trend in the levels of the data. 826 vec intro — Introduction to vector error-correction models CASE 4: Restricted constant, τ = 0, ρ = 0, and γ = 0 By adding the restriction that γ = 0, we assume there are no linear time trends in the levels of the data. This specification allows the cointegrating equations to be stationary around a constant mean, but it allows no other trends or constant terms. CASE 5: No trend, τ = 0, ρ = 0, γ = 0, and µ = 0 This specification assumes that there are no nonzero means or trends. It also assumes that the cointegrating equations are stationary with means of zero and that the differences and the levels of the data have means of zero. This flexibility does come at a price. Below we discuss testing procedures for determining the number of cointegrating equations. The asymptotic distribution of the LR for hypotheses about r changes with the trend specification, so we must first specify a trend specification. A combination of theory and graphical analysis will aid in specifying the trend before proceeding with the analysis. VECM estimation in Stata 11.2 11.4 11.6 11.8 12 12.2 We provide an overview of the vec commands in Stata through an extended example. We have monthly data on the average selling prices of houses in four cities in Texas: Austin, Dallas, Houston, and San Antonio. In the dataset, these average housing prices are contained in the variables austin, dallas, houston, and sa. The series begin in January of 1990 and go through December 2003, for a total of 168 observations. The following graph depicts our data. 1990m1 1995m1 2000m1 2005m1 t ln of house prices in austin ln of house prices in houston ln of house prices in dallas ln of house prices in san antonio The plots on the graph indicate that all the series are trending and potential I(1) processes. In a competitive market, the current and past prices contain all the information available, so tomorrow’s price will be a random walk from today’s price. Some researchers may opt to use [TS] dfgls to investigate the presence of a unit root in each series, but the test for cointegration we use includes the case in which all the variables are stationary, so we defer formal testing until we test for cointegration. The time trends in the data appear to be approximately linear, so we will specify trend(constant) when modeling these series, which is the default with vec. The next graph shows just Dallas’s and Houston’s data, so we can more carefully examine their relationship. 827 11.2 11.4 11.6 11.8 12 12.2 vec intro — Introduction to vector error-correction models 1990m1 1991m11 1994m1 1996m1 1998m1 2000m1 2002m1 2004m1 t ln of house prices in dallas ln of house prices in houston Except for the crash at the end of 1991, housing prices in Dallas and Houston appear closely related. Although average prices in the two cities will differ because of resource variations and other factors, if the housing markets become too dissimilar, people and businesses will migrate, bringing the average housing prices back toward each other. We therefore expect the series of average housing prices in Houston to be cointegrated with the series of average housing prices in Dallas. Selecting the number of lags To test for cointegration or fit cointegrating VECMs, we must specify how many lags to include. Building on the work of Tsay (1984) and Paulsen (1984), Nielsen (2001) has shown that the methods implemented in varsoc can be used to determine the lag order for a VAR model with I(1) variables. As can be seen from (9), the order of the corresponding VECM is always one less than the VAR. vec makes this adjustment automatically, so we will always refer to the order of the underlying VAR. The output below uses varsoc to determine the lag order of the VAR of the average housing prices in Dallas and Houston. . use http://www.stata-press.com/data/r14/txhprice . varsoc dallas houston Selection-order criteria Sample: 1990m5 - 2003m12 Number of obs lag 0 1 2 3 4 LL LR 299.525 577.483 590.978 593.437 596.364 Endogenous: Exogenous: 555.92 26.991* 4.918 5.8532 df p 4 4 4 4 0.000 0.000 0.296 0.210 FPE AIC .000091 -3.62835 3.2e-06 -6.9693 2.9e-06* -7.0851* 2.9e-06 -7.06631 3.0e-06 -7.05322 = 164 HQIC SBIC -3.61301 -6.92326 -7.00837* -6.95888 -6.9151 -3.59055 -6.85589 -6.89608* -6.80168 -6.71299 dallas houston _cons We will use two lags for this bivariate model because the Hannan–Quinn information criterion (HQIC) method, Schwarz Bayesian information criterion (SBIC) method, and sequential likelihood-ratio (LR) test all chose two lags, as indicated by the “*” in the output. The reader can verify that when all four cities’ data are used, the LR test selects three lags, the HQIC method selects two lags, and the SBIC method selects one lag. We will use three lags in our four-variable model. 828 vec intro — Introduction to vector error-correction models Testing for cointegration The tests for cointegration implemented in vecrank are based on Johansen’s method. If the log likelihood of the unconstrained model that includes the cointegrating equations is significantly different from the log likelihood of the constrained model that does not include the cointegrating equations, we reject the null hypothesis of no cointegration. Here we use vecrank to determine the number of cointegrating equations: . vecrank dallas houston Johansen tests for cointegration Trend: constant Number of obs = Sample: 1990m3 - 2003m12 Lags = maximum rank 0 1 2 parms 6 9 10 LL 576.26444 599.58781 599.67706 eigenvalue . 0.24498 0.00107 166 2 5% trace critical statistic value 46.8252 15.41 0.1785* 3.76 Besides presenting information about the sample size and time span, the header indicates that test statistics are based on a model with two lags and a constant trend. The body of the table presents test statistics and their critical values of the null hypotheses of no cointegration (line 1) and one or fewer cointegrating equations (line 2). The eigenvalue shown on the last line is used to compute the trace statistic in the line above it. Johansen’s testing procedure starts with the test for zero cointegrating equations (a maximum rank of zero) and then accepts the first null hypothesis that is not rejected. In the output above, we strongly reject the null hypothesis of no cointegration and fail to reject the null hypothesis of at most one cointegrating equation. Thus we accept the null hypothesis that there is one cointegrating equation in the bivariate model. Using all four series and a model with three lags, we find that there are two cointegrating relationships. . vecrank austin dallas houston sa, lag(3) Johansen tests for cointegration Trend: constant Number of obs = Sample: 1990m4 - 2003m12 Lags = maximum rank 0 1 2 3 4 parms 36 43 48 51 52 LL 1107.7833 1137.7484 1153.6435 1158.4191 1158.5868 eigenvalue . 0.30456 0.17524 0.05624 0.00203 165 3 5% trace critical statistic value 101.6070 47.21 41.6768 29.68 9.8865* 15.41 0.3354 3.76 Fitting a VECM vec estimates the parameters of cointegrating VECMs. There are four types of parameters of interest: 1. The parameters in the cointegrating equations β 2. The adjustment coefficients α 3. The short-run coefficients 4. Some standard functions of β and α that have useful interpretations vec intro — Introduction to vector error-correction models 829 Although all four types are discussed in [TS] vec, here we discuss only types 1–3 and how they appear in the output of vec. Having determined that there is a cointegrating equation between the Dallas and Houston series, we now want to estimate the parameters of a bivariate cointegrating VECM for these two series by using vec. . vec dallas houston Vector error-correction model Sample: 1990m3 - 2003m12 Log likelihood = Det(Sigma_ml) = Equation D_dallas D_houston 599.5878 2.50e-06 Parms RMSE R-sq 4 4 .038546 .045348 0.1692 0.3737 Coef. Std. Err. z Number of obs AIC HQIC SBIC chi2 P>chi2 32.98959 96.66399 P>|z| = = = = 166 -7.115516 -7.04703 -6.946794 0.0000 0.0000 [95% Conf. Interval] D_dallas _ce1 L1. -.3038799 .0908504 -3.34 0.001 -.4819434 -.1258165 dallas LD. -.1647304 .0879356 -1.87 0.061 -.337081 .0076202 houston LD. -.0998368 .0650838 -1.53 0.125 -.2273988 .0277251 _cons .0056128 .0030341 1.85 0.064 -.0003339 .0115595 D_houston _ce1 L1. .5027143 .1068838 4.70 0.000 .2932258 .7122028 dallas LD. -.0619653 .1034547 -0.60 0.549 -.2647327 .1408022 houston LD. -.3328437 .07657 -4.35 0.000 -.4829181 -.1827693 _cons .0033928 .0035695 0.95 0.342 -.0036034 .010389 Cointegrating equations Equation Parms _ce1 1 Identification: chi2 P>chi2 1640.088 0.0000 beta is exactly identified Johansen normalization restriction imposed beta Coef. dallas houston _cons 1 -.8675936 -1.688897 Std. Err. z P>|z| [95% Conf. Interval] _ce1 . .0214231 . . -40.50 . . 0.000 . . -.9095821 . . -.825605 . 830 vec intro — Introduction to vector error-correction models The header contains information about the sample, the fit of each equation, and overall model fit statistics. The first estimation table contains the estimates of the short-run parameters, along with their standard errors, z statistics, and confidence intervals. The two coefficients on L. ce1 are the parameters in the adjustment matrix α for this model. The second estimation table contains the estimated parameters of the cointegrating vector for this model, along with their standard errors, z statistics, and confidence intervals. Using our previous notation, we have estimated α b = (−0.304, 0.503) and b = (1, −0.868) β  b= Γ −0.165 −0.0998 −0.062 −0.333 b = (0.0056, 0.0034) v  Overall, the output indicates that the model fits well. The coefficient on houston in the cointegrating equation is statistically significant, as are the adjustment parameters. The adjustment parameters in this bivariate example are easy to interpret, and we can see that the estimates have the correct signs and imply rapid adjustment toward equilibrium. When the predictions from the cointegrating equation are positive, dallas is above its equilibrium value because the coefficient on dallas in the cointegrating equation is positive. The estimate of the coefficient [D dallas]L. ce1 is −0.3. Thus when the average housing price in Dallas is too high, it quickly falls back toward the Houston level. The estimated coefficient [D houston]L. ce1 of 0.5 implies that when the average housing price in Dallas is too high, the average price in Houston quickly adjusts toward the Dallas level at the same time that the Dallas prices are adjusting. Fitting VECMs with Johansen’s normalization As discussed by Johansen (1995), if there are r cointegrating equations, then at least r2 restrictions are required to identify the free parameters in β. Johansen proposed a default identification scheme that has become the conventional method of identifying models in the absence of theoretically justified restrictions. Johansen’s identification scheme is e0) β0 = (Ir , β e is an (K − r) × r matrix of identified parameters. vec where Ir is the r × r identity matrix and β applies Johansen’s normalization by default. To illustrate, we fit a VECM with two cointegrating equations and three lags on all four series. We are interested only in the estimates of the parameters in the cointegrating equations, so we can specify the noetable option to suppress the estimation table for the adjustment and short-run parameters. vec intro — Introduction to vector error-correction models . vec austin dallas houston sa, lags(3) rank(2) noetable Vector error-correction model Sample: 1990m4 - 2003m12 Number of obs AIC Log likelihood = 1153.644 HQIC Det(Sigma_ml) = 9.93e-12 SBIC Cointegrating equations Equation Parms chi2 P>chi2 _ce1 _ce2 2 2 Identification: 586.3044 2169.826 = = = = 831 165 -13.40174 -13.03496 -12.49819 0.0000 0.0000 beta is exactly identified Johansen normalization restrictions imposed beta Coef. Std. Err. z P>|z| [95% Conf. Interval] austin dallas houston sa _cons 1 0 -.2623782 -1.241805 5.577099 .