Ox Metrics Intro
Ox Metrics Intro
Ox Metrics Intro
7 June 2010
J. James Reade
Outline
Introduction to OxMetrics. Pre-modelling: Data management. Modelling: Using the packages. Post-modelling: Misspecication and further analyses. Purpose of slides: Cannot cover all material in slides! Slides provide information to start using OxMetrics and PcGive.
j.j.reade@bham.ac.uk
OxMetrics: Context
Contrary to belief, OxMetrics is used widely and not just in Oxford. More than other packages, OxMetrics reects an econometric methodology. The Hendry, or LSE approach: General-to-specic. Advantage: Thorough and rigorous econometric methodology. Disadvantage: Not everyones cup of tea. Practically: Some tests unavailable (KPSS and other unit root/cointegration tests). Automated General-to-specic algorithm built in: Autometrics (previous incarnation PcGets). Little else dierent.
j.j.reade@bham.ac.uk
j.j.reade@bham.ac.uk
j.j.reade@bham.ac.uk
Getting Started
Open by locating OxMetrics on your system. Disambiguating: OxMetrics (in Economics Software?) is what you want, not OxEdit.2 OxMetrics is the software package, or front-end. PcGive, G@RCH etc are modules within. Getting OxMetrics on your computer: Licence on the way: All sta can have OxMetrics, as can students.
A OxEdit is a text editor developed specically for Ox, but many people write and compile L TEX and other languages using
it.
j.j.reade@bham.ac.uk 8
.gwg)
3. Right click on the white area, select Draw and Draw a Freehand Line. 4. Sign your name. 5. Double click on plot area, select Regression/Scale. In Regression bit set number of lines to 1. You have successfully regressed your signature.
j.j.reade@bham.ac.uk
I think!
10
j.j.reade@bham.ac.uk
Admittedly this is more useful when Ox programming than using OxMetrics more generally. But manuals are online. For non-Mac users, that is. . .
11
j.j.reade@bham.ac.uk
Loading Data
First task: Opening data. OxMetrics has own datale format: .in7 and .bn7. But can load .xls and .csv les directly.6 Can also copy and paste data. But: Must be careful about format of csv/xls le: Need date in rst column (with no column title) in form 1957-1 or 1957-12-24. Keep names simple, ensure no holes in data.7 Important for time series: PcGive treats sample as up to rst .NaN entry.
Can load most types of data le in OxMetrics, including Stata .dta les. 7 I.e. no variables without names. OxMetrics will generally change any blank data cells to .NaN but better to be on safe side and do this yourself (or enter N/A in blank cells).
j.j.reade@bham.ac.uk 12 6
Data Management
Assuming you have loaded your data le: What is there and what does it look like? OxMetrics has plenty of tools for analysing data pre-estimation. Summary Statistics: Right click and Data Description or View and Summary Statistics.8
Database: US_China_1m_IB.csv Sample: 2002-01-01 Tue - 2010-01-06 Wed (2092 observations) Variables: 5 Variable leading sample #obs #miss minimum mean Svar1 1 to 2092 2092 0 2.4523e+06 2.4537e+06 China 1 to 2079 2078 14 0 2.6621 US 1 to 2079 2078 14 0.2325 2.6609 ExchangeRate 1 to 2092 2092 0 6.8109 7.8442 1MFWDExchangeRate 29 to 2092 2063 29 6.7834 7.7727
j.j.reade@bham.ac.uk
Pre-Estimation: Hocus-pocus?
May come across those who claim looking at data pre-estimation wrong: Contaminates later ndings. OxMetrics pre-estimation tools artefact of econometric methodology: You need to know what data looks like before you model it. Helps determine correct model for data: Stationarity, structural breaks, lag length, data transformations (e.g. logs). . .
j.j.reade@bham.ac.uk
14
Plotting data
OxMetrics has great exibility for plotting data. Huge range of possible types of plot. Easy to access. Multiple series on a set of axes, multiple sets of axes. Manipulation of axes and series much more intuitive than Excel. Copy and paste works in wonderful ways. . .
A Can save in eps format, ideal for including in LTEX documents.9
PDF les are easy to create from eps: Type eps2pdf into Google and download the relevant program for your operating system.
j.j.reade@bham.ac.uk 15
2002
2003
2004
2005
2006
2007
2008
2009
2010
j.j.reade@bham.ac.uk
16
Plotting Data
.10
10
j.j.reade@bham.ac.uk
Double clicking on a plotted graph, selecting the Graph layout tab, and changing the Aspect ratio (Y scale) to Half height 50% will provide a graph half as high as it is wide, very useful for including in papers without taking up too much space.
j.j.reade@bham.ac.uk 18
11
j.j.reade@bham.ac.uk
19
j.j.reade@bham.ac.uk
21
Data transformations: Lags, logs, dierences, percentage changes,. . . Creating variables: Dummies, quadratic trends, interaction terms,. . . Other...: Extensive list of data transformations.
j.j.reade@bham.ac.uk
22
Task: Create your own algebra le. Go to File and New... or hit Ctrl+N.
j.j.reade@bham.ac.uk
24
Estimation carried out using Module: Select module via PcGive is the Module we will use, but PcGive has many possibilities. Select Models for time-series data and Single-equation Dynamic Modelling using PcGive.
j.j.reade@bham.ac.uk
25
Autometrics?
Clicking OK leads to the next menu: Autometrics options. Autometrics is PcGives automatic model selection algorithm. Based on Hendry (1995) Ch. 9, the General-to-specic modelling methodology. Start with most general model possible: All variables that might be relevant. Omit variables if t- and F-tests permit, and also if misspecication tests allow. Find most parsimonious/simple model possible satisfying misspecication tests. Takes specied model as the general unrestricted model (GUM). Massively useful modelling tool. But leave for now.
j.j.reade@bham.ac.uk
26
17 18
As will highlighting them and hitting the double arrow buttons. Double clicking in the Selection window de-selects a variable.
27
j.j.reade@bham.ac.uk
Bottom right: Can select which dataset to choose variables from. Handy when more than one dataset open. But cannot select variables from dierent datasets for a model. Bottom left: Recall a previous model: Lists all previous models estimated using PcGive. Can reselect previous model if need to.
j.j.reade@bham.ac.uk
28
Our Model
Interested in Covered Interest Parity between US and China: iU S,t = (ft st) + iCh,t. Hence dependent variable (Y) is US interest rate (recall log approximation). Explanatory variables: Log forward, spot exchange rates, Chinese interest rate. Estimation methodology: General-to-specic. So estimate rst unrestricted version and then test model restrictions. iU S,t = 0 + 1ft + 2st + 3iCh,t + t, t iidN 0, 2 . (2) (1)
Absence of Normality (autocorrelation, heteroskedasticity) implies important explanatory power left out of model.
29
j.j.reade@bham.ac.uk
20
j.j.reade@bham.ac.uk
RSS 5519.1801 F(3,2046) = 90.3 [0.000]** log-likelihood -3923.97 no. of parameters 4 se(US) 1.74649
Important but not yet appropriate to scrutinise in detail: Not checked if residuals are Normally distributed: Is model well specied?
j.j.reade@bham.ac.uk
31
2002
2003
r:US (scaled)
2004
2005
2006
2007
2008
2009
2010
2002
2003
2004
2005
2006
2007
2008
2009
2010
j.j.reade@bham.ac.uk
32
To carry out post-estimation testing hit Test button:21 Resulting menu gives post-estimation options we will explore. For now, select Test Summary:
Normality test: Hetero test: Hetero-X test: RESET23 test: Chi2(2) F(6,2043) F(9,2040) F(2,2044) = = = = 637.54 384.85 394.87 104.14 [0.0000]** [0.0000]** [0.0000]** [0.0000]**
Output: Type of test, test statistic distribution, test statistic, p-value, signicance. Standard:
is rejection at 5% level,
rejection at 1% level.
21
j.j.reade@bham.ac.uk
Heteroskedasticity: White test: Regress squared residuals on regressors and squared regressors. X test also includes cross-products. Signicance of any regressors of combinations of heteroskedasticity. RESET test: Test of functional form: Include squares and cubes of tted values in original regression model. Signicance of squares and cubes implies wrong functional form assumed.
j.j.reade@bham.ac.uk
34
However, more likely omitted variables or structural form problems cause heteroskedasticity rather than anything in model.
35
j.j.reade@bham.ac.uk
Actual and tted values: Plots both how well does model t data? Cross plot of actual and tted: Scatter plot high correlation = good model.23 Residuals (scaled): Plot of all residuals, scaled by standard deviation.24 Residual density and histogram: Distribution of residuals: Is it Normal/symmetric?
Since tted similar to actual. Hence if Normally distributed and model has constant, scaled versions are standard Normally distributed.
36
j.j.reade@bham.ac.uk
US Fitted
2.5 Density
r:US
3.0
N(0,1)
3.5
4.0
1 0 1
j.j.reade@bham.ac.uk
37
Post-Estimation: Predictions
Two reasons for modelling: 1. To understand an economic phenomenon better. How have China and the US interacted nancially? 2. To predict something. How will they interact nancially? PcGive allows forecasting or prediction. Forecasting a time-series concept, prediction more cross-section. But in C-S, need observations on explanatory variables to predict. Well return to Prediction in its more natural context: Time series modelling.
j.j.reade@bham.ac.uk
38
j.j.reade@bham.ac.uk
39
RSS 5519.1801 F(3,2046) = 90.3 [0.000]** log-likelihood -3923.97 no. of parameters 4 se(US) 1.74649
j.j.reade@bham.ac.uk
40
Test for linear restrictions (Rb=r): R matrix Constant ChinaLExchangeRateL1MFWDExchangeRate 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 r vector 0.0000 1.0000 -1.0000 1.0000 LinRes F(4,2046) = 32.670 [0.0000]**
j.j.reade@bham.ac.uk
41
General Restrictions
Linear restrictions requires you to recall econometrics. General restrictions requires to you write some code: Each variable denoted by ampersand (&) and number: Key beneath. RHS of code line must be zero. Each restriction is line of code; must be ended with semi-colon ;. Code gives exibility: Could write &2+&3=0.
Test for general restrictions: &0=0; &1-1=0; &2+1=0; &3-1=0; GenRes Chi2(4) = 130.68 [0.0000]**
j.j.reade@bham.ac.uk
42
Batch File
As before, want to document what you do: Especially useful if you take break! Estimate model then hit batch button. Produces new window with Batch code in. Better to hit Save Save As... than work in window. Tasks: 1. Estimate your model and create Batch le. 2. Change sample size and re-estimate.
j.j.reade@bham.ac.uk
43
Ox Batch Code
New to recent versions of OxMetrics: Can generate Ox code. Ox is programming language OxMetrics written in. Model and Ox Batch Code..., or Alt+O opens Ox le. File contains Ox code used by OxMetrics to generate output you found. Taking a look, and amending code highly recommended: Programming languages are the future.
j.j.reade@bham.ac.uk
44
Very useful feature: Can append dataset with residuals t and tted values rU S,t. Can manually run tests on residuals/tted values and create various plots.
25
Test for all variables is given in model output. You can check this. . .
45
j.j.reade@bham.ac.uk
j.j.reade@bham.ac.uk
46
In multi-variate modelling.
47
j.j.reade@bham.ac.uk
Modelling: Models for time-series data, Single-equation Dynamic Modelling using PcGive.
27
j.j.reade@bham.ac.uk
rCh,t
5.0 2.5
rUS,t
st
2004 2006 2008 2010
2.0 2002
ft
2004 2006 2008 2010
Cursory glance at data series above tells us they display persistence. We model such time persistence using lagged dependent variables: Recall AR(K) model for home interest rate:
2 t N 0, .
rU S,t = 0 + 1rU S,t1 + + K rU S,tK + t, 1 will likely be highly signicant for all series in our model.
(3)
j.j.reade@bham.ac.uk
49
j.j.reade@bham.ac.uk
50
Time Dependence
It is standard to report the time-series properties of data being modelled. Unit root tests and ideally plots of data series also.28 Also important: Aects distributions of test statistics. Tomorrow: Can investigate eects using PcNaive in OxMetrics. PcGive allows unit-root testing. Generic time series xt: xt = 0 + 1xt1 + K xtK + et, Unit root test is hypothesis that 1 = 1. Unit root test output tells us what 1 coecient is, if not unity. But test dependent on correct lag specication: Too short: Omitted variable bias. Too long less important.
28
2 et N 0, r .
(4)
Unit-root tests are usually criticised for a lack of power and hence other information is vital for characterising data series.
51
j.j.reade@bham.ac.uk
j.j.reade@bham.ac.uk
52
Unit-Root Testing
In resulting menu, choose variables of interest (can test many variables), click OK. In next menu, check the Unit-root tests box at the top. Then access the drop down menu for Unit-root test settings if desired.29 Rearrange AR(1) model to:
K1
xt = 0 + xt1 +
k=1
k xtk + et.
(5)
Either: Test = 0 using standard t-test (but not t-distribution). Test = 0 and 0 = 0 using standard F-test (but not F-distribution). Strategy: Start general and reduce if trend/constant insignicant.
29
Have a play around with the dierent options; most important is inclusion of constant and/or trend.
53
j.j.reade@bham.ac.uk
Table for each variable: Can get more lags, and more detail (non-summary table).
Unit-root tests The dataset is: /Users/jamesreade/Documents/Data/Mon Ind/US_China_1m_IB_jr.csv The sample is: 2002-02-13 - 2009-12-17 China: ADF tests (T=2047, Constant; 5%=-2.86 1%=-3.44) D-lag t-adf beta Y_1 sigma t-DY_lag t-prob 2 -7.591** 0.92512 0.4052 -2.613 0.0090 1 -8.167** 0.92058 0.4058 -10.35 0.0000 0 -10.50** 0.89793 0.4162 US: ADF tests (T=2047, Constant; 5%=-2.86 1%=-3.44) D-lag t-adf beta Y_1 sigma t-DY_lag t-prob 2 -0.4669 0.99982 0.03064 5.537 0.0000 1 -0.2998 0.99988 0.03086 27.16 0.0000 0 0.4596 1.0002 0.03599
j.j.reade@bham.ac.uk
54
j.j.reade@bham.ac.uk
j.j.reade@bham.ac.uk
32
j.j.reade@bham.ac.uk
RSS 5514.85552 F(3,2044) = 90.64 [0.000]** log-likelihood -3920.34 no. of parameters 4 se(US) 1.74715
j.j.reade@bham.ac.uk
58
Model terrible. Should add lagged dependent variable at minimum. Spurious signicance possible: Already established unit-root behaviour. Autoregressive Distributed Lag (ADL) model: Distributed lag of explanatory variables. Eect of variable spread over number of time periods.
j.j.reade@bham.ac.uk
59
j.j.reade@bham.ac.uk
60
Remember to check recursive estimation and save some observations for forecasting.
j.j.reade@bham.ac.uk
61
EQ(28) Modelling US by OLS The dataset is: /Users/jamesreade/Documents/Data/Mon Ind/US_China_1m_IB_jr.csv The estimation sample is: 2002-02-12 - 2009-12-03 Coefficient 1.00048 -0.117155 0.000177891 -0.00203798 0.815535 -0.882411 2.88309 -2.75721 0.035644 0.999583 0.999582 2038 2.68521 Std.Error 0.0004840 0.02279 0.001901 0.001899 0.5628 0.5607 0.7046 0.7064 t-value 2067. -5.14 0.0936 -1.07 1.45 -1.57 4.09 -3.90 t-prob Part.R2 0.0000 0.9995 0.0000 0.0128 0.9254 0.0000 0.2834 0.0006 0.1475 0.0010 0.1157 0.0012 0.0000 0.0082 0.0001 0.0074
US_1 Constant China China_1 LExchangeRate LExchangeRate_1 L1MFWDExchangeRate L1MFWDExchangeRate_1 sigma R2 Adj.R2 no. of observations mean(US)
RSS 2.57911023 F(7,2030) = 6.956e+05 [0.000]** log-likelihood 3907.26 no. of parameters 8 se(US) 1.74303
1-step (ex post) forecast analysis 2009-12-04 - 2009-12-17 Parameter constancy forecast tests: Forecast Chi2(10) = 0.34950 [1.0000] Chow F(10,2030)= 0.033968 [1.0000] CUSUM t(9) = 0.5808 [0.5756] (zero forecast innovation mean)
j.j.reade@bham.ac.uk
62
Better but still not good. Tasks: Use Test... in Test menu to investigate test failures. Use Post-estimation graphics to investigate: Autocorrelation. Normality. Heteroskedasticity. How might you reformulate model?
j.j.reade@bham.ac.uk
63
j.j.reade@bham.ac.uk
ECM = US - 251.079 - 4.08618*China - 145.122*LExchangeRate + 271.774*L1MFWDExchangeRate; WALD test: Chi2(3) = 1.01629 [0.7973]
See PcGive Vol. 1, Sec. 18.3, especially for graphing lag weights.
65
j.j.reade@bham.ac.uk
Cointegration Testing
Cointegration testing makes use of other useful features of OxMetrics. Cointegration theory: xt I(1), yt I(1) but linear combination yt xt I(0). Hence regress yt on xt and save residuals since t = yt xt in regression. Carry out unit root testing on residuals t. If residuals stationary, implies xy and yt cointegrated.35 Tasks: 1. Regress iU S,t on iCh,t, ft and st. 2. Save the residuals from the regression. 3. Carry out unit root testing on the residuals.
35
Subject to the caveat that Dickey-Fuller unit root tests are known to have low power hence conclude in favour of null (unit root) too often. Most cointegration is now done using VAR models: Come back tomorrow for that.
j.j.reade@bham.ac.uk 66
j.j.reade@bham.ac.uk
67
Roots of other variables for model reduction purposes: ADL for xt, yt: (1 1L)xt = 0(1 + 1/0)yt + t. If 1 = 1/0, divide thru by (1 1L) for: yt = + 0xt + ut. ut = t/(1 1L) ut = 1ut1 + t, hence autoregressive errors.
j.j.reade@bham.ac.uk 68
j.j.reade@bham.ac.uk
69
Post-Estimation: Graphics
As in Cross-Section, Graphics may help determine problems. Many additions in time series: Residual density options, ACFs and PACFs: May want to trick PcGive into thinking your cross section is time series. Graphic possibilities: Actual/tted values: Does model do well? Residuals: Scaled, unscaled, pickled, roasted,. . . 36 Residual density/actual plots: Distribution looks normal/iid? Skewed? Heteroskedastic? Use these graphics to shape model re-specications.
36
j.j.reade@bham.ac.uk
Post-Estimation: Graphics
6
US Fitted r:US (scaled)
10 4 0 2
10 2002 2004 Density 2.0 r:US N(0,1) 1.5 1.0 0.5 2006 2008 2010 2002 1.0 0.5 0.0 0.5 2004
ACFr:US
2006
PACFr:US
2008
2010
10
10
15
10
j.j.reade@bham.ac.uk
71
j.j.reade@bham.ac.uk
Post-Estimation: Forecasts
To forecast, must re-estimate and estimate over reduced sample size. At sample size selection (Estimate) menu, Change Less forecasts to 10. Shortens estimation sample by 10 most recent observations. Can then forecast over these 10 observations. Evaluating model via forecast performance very common. But may not be indicative of model quality, esp. if data non-stationary. Hendry (1995): Forecast performance of naive simple devices hard to beat. Post-estimation, go to the Test menu and select Forecast...
j.j.reade@bham.ac.uk
73
j.j.reade@bham.ac.uk
74
Forecasting: Tasks
Select a dierent length of Forecasts to hold back. Evaluate the forecast performance of the model by both types of forecast. What is the forecast performance as h is increased? Compare the forecast performance of your model against a random walk model. Recall random walk model is xt = xt1 + t, or xt = t. Remember you can copy and paste particular sets of plots in OxMetrics.
j.j.reade@bham.ac.uk
75
Recursive Estimation
Recursive estimation is a method of model evaluation: Are coecient estimates stable over the entire sample? Or are they averages over structural breaks? Generally expected in any regression analysis submitted to a journal. Estimate over observations 1, 2, . . . , TI , where TI < T is the Initialisation. Evaluate all model parameters: , , etc. Estimate over observations 1, 2, . . . , TI , TI + 1 and evaluate parameters. Keep going until reach full sample. Analysis: Plot parameters for dierent sample lengths. Calculate test statistics to detect structural change.
j.j.reade@bham.ac.uk 76
Need to also select Initialization: What is TI ? Smaller TI , analysis covers more of sample. Smaller TI , earlier estimates less stable as sample size tiny. Then estimate as usual: Will take fractionally longer to produce results.
j.j.reade@bham.ac.uk 77
38
Can always copy and paste some to a new graphics le though (Using Ctrl+N).
78
j.j.reade@bham.ac.uk
US lag
2004 2006 2008 2010 2002 0.01 0.00 0.01 2004 2006 2008 0.02 2010 2002 5.0 2.5 2004 2006 2008 2010
China
China lag
2004 2006 2008 2010
0.0 2002 10 0 2004 2010 2002 1Month Forward Exchange Rate 10 0 2006 2008 2004 2006 2008 2010
2002
2004
2006
2008
2010 2002
2004
2006
2008
2010
Need to change axes to make plots useful: Much structural instability.39 Be careful in changing axes: Comparability between plots impaired. t-stat signicant if both green lines do not include zero.
39
Make use of Apply button when changing the Y axis values here. To aid visibility, delete Z label also, and relabel.
79
j.j.reade@bham.ac.uk
= TI , TI + 1, . . . , T .
If model stable expect this to rise steadily: By 2 each period. Jumps indicative of something changing: Residual very large that period. 1-step Residuals +/- 2 SE: Final observation residuals: y x , = TI , TI + 1, . . . , T . Points outside standard error bounds ( ) associated with structural change. Standardized innovations: Residuals calculated using last period estimates: y 1x , = TI , TI + 1, . . . , T . Standardised by . Large observations suggest structural instability.
3 2 1 2002 2004 2006 2008
RSS
Res1Step
Innovs
2004
2004
2006
2008
2010
j.j.reade@bham.ac.uk
80
(7)
(8)
Forecast Chow test: Estimate 1, . . . , M 1, forecast {M }, {M, M + 1}, ..., {M, M + 1, . . . , T }. (RSSm RSSM 1)(M k 1) D 2 Test statistic: FForcChow = mM +1. (9) RSSM 1 (m M 1) Tests all very likely to fail if structural change hence useful.
j.j.reade@bham.ac.uk 81
50 2002 75 50 25 2003
Ndn CHOWs
2004
5%
2005
2006
2007
2008
2009
2010
2002 15 10 5 2002
2003
Nup CHOWs
2004
5%
2005
2006
2007
2008
2009
2010
2003
2004
2005
2006
2007
2008
2009
2010
j.j.reade@bham.ac.uk
82
Deterministic Terms
Already covered creating them in Calculator. What about using them? Chow tests suggest something happened in 2003 and 2007. Can work out date by hovering cursor over graph. Can also write results instead of graphing in Recursives. Can also print out largest residuals: See Further output... Tasks: Determine the dates of structural changes and create dummy variables. Re-run your model including these structural break terms. Do they have any eect? If not how might you alter them/the model?
j.j.reade@bham.ac.uk
83
j.j.reade@bham.ac.uk
84