Econometrics (Bigb3n)
Econometrics (Bigb3n)
Econometrics (Bigb3n)
Econometrics deals with the measurement of economic relationships. The term 'econometrics' is
formed from two words of Greek origin, oikonomia (economy), and metron (measure).
Econometrics is a combination of economic theory, mathematical economics and statistics, but it
is completely distinct from each one of these three branches of science.
It is the unification of all three that is powerful. And it is this unification that constitutes
econometrics.
Thus ECONOMETRICS may be considered as the integration of economics, mathematics and
statistics for the purpose of providing numerical values for the parameters of economic
relationships (for example , elasticities, marginal propensities) and verifying economic theories.
Starting from the relationships of economic theory,we express them in mathematical terms (i.e.
we build a model so that they can be measured. We then use specific methods, called
econometric methods, in order to obtain numerical estimates of the coefficients of the economic
relationships. Econometric methods are statistical methods specifically adapted to the
peculiarities of economic phenomena. The Most important characteristic of economic
relationships is that they contain a random element (stochastic error term or Disturbance
element) which, however, is ignored by economic theory and mathematical economics which
postulate exact (deterministic) relationships between the various economic variables.
An example will make the above clear. Economic theory postulates that the demand for a
commodity depends on its price, on the prices of other commodities, on consumers' income and
on tastes. This is an exact relationship, because it implies that demand is completely
determined by the above four factors. No other factor, except those explicitly mentioned,
influences the demand. In mathematical economics we express the above abstract economic
relationship of demand in mathematics. Thus we may write the following demand equation as
Yet it is common knowledge that in economic life many more factors may affect
demand. The invention of a new product, a war, professional changes, institutional changes,
changes in law, changes in income distribution, massive population movements (migration),
etc., are examples of such factors. Furthermore, human behavior is inherently erratic. We are
influenced by rumors, dreams, prejudices, traditions and other psychological and sociological
factors, which make us
behave differently even though the conditions in the market (prices) and our incomes remain the
same. In econometrics the influence of these 'other' factors
is,taken into account by the introduction into the economic relationships of a random variable, u,
known as the stochastic term
Hence, the initial exact model now becomes an econometric model:
Q = b0 + b1P + b2Po + b3Y + b4T + u
Where the random variable = u, covers all other unaccounted factors that affect Demand (Q)
GOALS OF ECONOMETRICS
3. Forecasting, i.e. using the numerical estimates of the coefficients in order to forecast
the future values of the economic magnitudes.
DIVISION OF ECONOMETRICS
1. Time Series Data: A time series is a set of observations on the values that a variable
takes at different times. Such data may be collected at regular time intervals, such as
daily (e.g stock prices, weather reports), weekly (e.g money supply figures), monthly
[e.g., the unemployment rate, the Consumer Price Index (CPI)], quarterly (e.g., GDP),
annually (e.g government budgets), quinquennially, that is, every 5 years (e.g the census
of manufactures), or decennially (e.g the census of population). e.g the data of a
Country’s GDP taken between 2000-2024 is a time series data. It is denoted by subscript
t
2. Cross-Section Data: Cross-section data are data on one or more variables collected at
the same point in time, for example, the data of on Inflation for five countries taken for
just one year. It is denoted by subscript i
The first step in any econometric research is the specification of the model with which one will
attempt the measurement of the phenomenon being analyzed. This stage is also known as the
formulation of the maintained hypothesis. In this stage, the VARIABLES OF THE MODEL & the
MATHEMATICAL FORM OF THE MODEL are determined
After the formulation of the model one should obtain estimates of its parameters. the second
stage includes the estimation of the model by means of the appropriate econometric method.
This stage is known as the testing of the maintained hypothesis. Here Data for Estimation of the
Model are Gathered, Examination Of Identification condition of the function, examination of the
degree correlation between variables & the choice of the appropriate econometric technique for
estimation e.t.c are all conducted in this stage
Once the model has been estimated, one should proceed with the evaluation of the estimates,
that is, to say decide on the basis of certain criteria whether the estimates are satisfactory and
reliable. Here three major criteria must be satisfied to underpin the validity, accuracy & reliability
of the estimates.
First, The economic a priori criteria, which are determined by economic theory. Second,
statistical criteria, determined by statistical theory. And Finally, Third, the econometric
criteria, determined by econometric theory.
The final stage of any econometric research is concerned with the evaluation of the forecasting
validity of the model. Estimates are useful because they help in decision making. A model,after
the estimation of its parameters, can be used in forecasting the values of economic variables.
The econometrician must ascertain how good the forecasts are expected to be, in other words
he must test the forecasting power of the model
IMPORTANT WORD: Stages A & C are the most important for any econometric research. They
require the skills of an economist with experience of the functioning of the economic system.
While Stages B and D are technical and require knowledge of theoretical econometrics.
Regression Analysis is concerned with describing and evaluating the relationship between a
given variable, usually called the dependent variable, and one or more other variables, usually
known as the independent variable(s).
Some Alternative names for Dependent variable or the Y variable are the Explained variable,
Predictand, Regressand, Response, Endogenous, Outcome, Controlled variable, effect variable
Other names for the Independent variables or X variables are the Explanatory variable,
Predictor, Regressor, Stimulus, Exogenous, Covariate, Control variable, cause variable
REGRESSION VS CAUSATION
Although regression analysis deals with the dependence of one variable on other variables, it
does not necessarily imply causation. e.g In a crop yield-rainfall scenario, crop yield (dependent
variable) depends on rainfall (independent variable), but rainfall does not depend on crop yield
i.e x causes y, but y does not cause x. Which means There exist only one-way causation and
not two-way causation here.
REGRESSION VS CORRELATION
The primary objective of correlation analysis is to measure the strength or degree of linear
association between two variables. For example, we may be interested in finding the correlation
(coefficient) between smoking and lung cancer, between scores on statistics and mathematics
examinations, between high school grades and college grades, and so on. But In regression
analysis, we try to estimate or predict the average value of one variable on the basis of the fixed
values of other variables. Thus, we may want to know whether we can predict the average
score on a statistics examination by knowing a student’s score on a mathematics examination.
If we are studying the dependence of a variable on only a single explanatory variable, such as
that of consumption expenditure on real income, such a study is known as simple, or
two-variable (bivariate) regression analysis. However, if we are studying the dependence of one
variable on more than one explanatory variable, as in the effect of rainfall, temperature,
sunshine, and fertilizer on crop-yield, it is known as multiple (multivariate) regression analysis. In
other words, in simple regression there is only one explanatory variable, whereas in multiple
regression there is more than one explanatory variable.
This is the commonest method used to estimate the parameters in a linear regression model by
minimizing the sum of the squared differences between the observed values (X variables) and
the values predicted (Y variable) by the model. It is used to find the line of best fit.
For a simple regression, the general equation is
Y=a+bX.
However, this equation is completely deterministic. Is this realistic? No. So what we do is to add
a random disturbance term, u into the equation.
Now we have the new statistical/econometric equation as
Y=a+bX+u
PRF VS SRF
The population regression function (PRF) is a description of the model that is thought to be
generating the actual data and the true relationship between the variables (i.e. the true values of
a & b).
While the sample regression function (SRF) is the estimated data used to infer/determine likely
values of the PRF parameters (i.e hat(a) & hat(b))
Since the PRF parameters are usually unknown, we superimpose the SRF on the PRF to have:
Y=hat(a)+hat(b)X + hat(u)
Same as Y = hat(Y) + hat(u)
Interpreted as Actual = Estimated + Error
LINEARITY
In order to use OLS, we need a model which is linear in the parameters (a & b). It does not
necessarily have to be linear in the variables (y & x).
Linearity in parameters means the degree (power) of the parameters of the model is 1
Linearity in variables means the degree (power) of the variables of the model is 1
PROPERTIES OF OLS
Time series analysis is a specific way of analyzing a sequence of data points collected over an
interval of time. In time series analysis, analysts record data points at consistent intervals over a
set period of time rather than just recording the data points intermittently or randomly
Time series analysis is used for non-stationary data—things that are constantly fluctuating over
time or are affected by time. Industries like finance, retail, and economics frequently use time
series analysis because currency and sales are always changing. Stock market analysis is an
excellent example of time series analysis in action, especially with automated trading
algorithms. Likewise, time series analysis is ideal for forecasting weather changes, helping
meteorologists predict everything from tomorrow’s weather report to future years of climate
change.
Examples of time series analysis in action include: Weather data, Rainfall measurements,
Temperature readings, Heart rate monitoring (EKG), Brain monitoring (EEG), Quarterly sales,
Stock prices, Automated stock trading, Industry forecasts, Interest rates
1. For Forecasting
2. For Prediction
3. To understand the underlying course or systematic pattern over time
4. To show the likely changes in data behavior
5. It determines the likelihood of future events
DATA CLASSIFICATION
Stock time series data means measuring attributes at a certain point in time, like a static
snapshot of the information as it was.
Flow time series data means measuring the activity of the attributes over a certain period,
which is generally part of the total whole and makes up a portion of the results.
DATA VARIATIONS
Functional analysis can pick out the patterns and relationships within the data to identify
notable events.
Trend analysis means determining consistent movement in a certain direction. There are two
types of trends: deterministic, where we can find the underlying cause, and stochastic, which is
random and unexplainable.
Seasonal variation describes events that occur at specific and regular intervals during the
course of a year. Serial dependence occurs when data points close together in time tend to be
related.
Unit root is a term that represents nonstationarity. It is a process where the first difference is
stationary.
a Unit root test tests whether a time series variable is non-stationary and possesses a unit root.
The null hypothesis is generally defined as the presence of a unit root and the alternative
hypothesis is either stationarity, trend stationarity or explosive root depending on the test used
i.e
Ho: There is unit root
H1: There is no unit root
In general, the approach to unit root testing implicitly assumes that the time series to be tested
can be written as,
Yt = Dt + Zt + Et
Dt is the deterministic component (trend, seasonal component, etc.)
Zt is the stochastic component.
Et is the stationary error process.
The task of the test is to determine whether the stochastic component contains a unit root or is
stationary.
1. Impact of Time Series Length on Test Results: This capability varies differently for
different tests. The time series length must not influence a test in providing unbiased
results. For any given length, the tests should ideally yield an unbiased outcome;
therefore, it is necessary to study the impact of length on the test results. This would
provide colossal information about the tests’ efficacy while dealing with time series of
different lengths. The effect of time series length on test results can be noted by
comparing the test results and critical values for various lengths, which help notice the
significance of a test declaring a time series as stationary or nonstationary.
2. Impact of Time Series Clustering on Test Results: Some stationarity tests function by
dividing the time series into various fragments. All these fragments are compared and
analyzed, and test results are obtained. Among all the considered tests, Levene’s, KW,
and two-way KS tests are built to examine the time series by dividing them into various
groups or clusters. These fragments, groups, or clusters are created by considering
some parts of the time series without any intermixing of data. The size of the group can
cause notable variations in test results. It is crucial to know the apt size of the groups for
respective tests to obtain accurate and unbiased results. Considering a very small or
very large group size may lead to significant discrepancies in test results.
3. Impact of Time Series Facets on Test Results: Trend and volatility effects in a
seasonal time series account for its nonstationarity and might bias the test results [32]. It
is also possible that the tests may overlook these effects and may fail to yield unbiased
results. It is vital to notice the changes in test results and critical values due to the trend
effect as the facet cause nonstationarity in time series. Such an analysis can help
understand how impactful a trend effect could be in making a test bias
PHILLIPS-PERRON TEST
the Phillips–Perron test (named after Peter C. B. Phillips and Pierre Perron) is a unit root test.
That is, it is used in time series analysis to test the null hypothesis that a time series is
integrated of order 1.
Like the augmented Dickey–Fuller test, the Phillips–Perron test addresses the issue that the
process generating data might have a higher order of autocorrelation than is admitted in the test
equation.
The test is robust with respect to unspecified autocorrelation and heteroscedasticity in the
disturbance process of the test equation.
COINTEGRATION
A cointegration test is used to establish if there is a correlation between several time series in
the long term. The concept was first introduced by Nobel laureates Robert Engle and Clive
Granger in 1987 after British economist Paul Newbold and Granger published the spurious
regression concept.
Cointegration tests identify scenarios where two or more non-stationary time series are
integrated together in a way that they cannot deviate from equilibrium in the long term. The tests
are used to identify the degree of sensitivity of two variables to the same average price over a
specified period of time.
2. Johansen Test: The Johansen test is used to test cointegrating relationships between
several non-stationary time series data. Compared to the Engle-Granger test, the
Johansen test allows for more than one cointegrating relationship. However, it is subject
to asymptotic properties (large sample size) since a small sample size would produce
unreliable results. Using the test to find cointegration of several time series avoids the
issues created when errors are carried forward to the next step. Johansen’s test comes
in two main forms, i.e., Trace tests and Maximum Eigenvalue test.
Trace Tests
Trace tests evaluate the number of linear combinations in a time series data, i.e., K to be equal
to the value Ko, and the hypothesis for the value K to be greater than Ko. It is illustrated as
follows:
Ho : K = Ko
Ho : K > Ko
When using the trace test to test for cointegration in a sample, we set K0 to zero to test whether
the null hypothesis will be rejected. If it is rejected, we can deduce that there exists a
cointegration relationship in the sample. Therefore, the null hypothesis should be rejected to
confirm the existence of a cointegration relationship in the sample.
When explanatory variables are not purely exogenous in the sense that their values are
determined in the system, certain consequences for econometric analysis arise:
- You cannot specify single equation as complete model of relationship between the
variables
- You cannot use OLS to obtain estimates that have BLUE properties
- Asymptotic un-biasedness cannot be achieved in the sense that even an increase in
sample size will not eliminate specification bias and OLS estimates are not consistent.
This kind of bias is also called simultaneous equation bias and can only be cured by
appropriately specifying a model of simultaneous equation
SIMULTANEOUS EQUATION MODEL is a model made up of more than one equation having
variables that are jointly dependent.
- Certain explanatory variables are also endogenous in the system
For example, if market price of a stock (M) is a function of financial leverage (L), Investment (I),
Investor psychology (P); and in turn, investor psychology is a function of current and one period
lag of market price. It will be inappropriate to use a single equation model as specification of the
relationship. The correct specification is:
M𝑡 = f(L𝑡 , I𝑡, P𝑡),
P𝑡 = f(M𝑡, M𝑡−1)
Which can be expressed econometrically as:
M𝑡 =a0 +a1L𝑡 +a2I𝑡 +a3P𝑡 +μ𝑡
P=bo+b1𝑀t+b2Mt-1+e𝑡
1. The dependent variables M and P are endogenous (determined within the system)
2. Pt , though an explanatory variable, is not purely independent
3. L𝑡, I𝑡, and M𝑡−1 are pre-determined variables. They are also explanatory variables. They
are exogenous and determined outside the system. They are as given.
4. μ𝑡 and e𝑡 are error/stochastic variables
5. The simultaneous equation system is structural model in the sense that you have
endogenous variables that are functions of other endogenous variables and pre-
determined variables. In a complete structural model, you have n endogenous variables
and n equations. The above equation is a complete structural model
6. a0 , a1, a2, a3, b0, b1, and b2 are structural parameters, and can be used to express
direct effects of the explanatory variables on the dependent variables
7. Note however that indirect effects of explanatory variables on the dependent variables
cannot be expressed by the structural parameters but only through a solution to the
simultaneous equation. For instance, the effect of M𝑡−1 on M𝑡 can be determined only
through its effect on Pt etc
REDUCED FORM : Structural model can be estimated by expressing each of its endogenous
variable as function of predetermined variables only, example:
The reduced form coefficients measure the total effects of pre- determined variables on the
respective dependent variables while:
OLS can be used to estimate reduced form parameters (total effects) but:
- observe that this may not yield the direct and indirect effects separately.
- Also effects of explanatory variables which are also endogenous may not be known
directly
RECURSIVE FORM: Also known as Triangular system, this is a structural model in which the
first equation has a dependent variable (y1 ) as a function of predetermined variables (x𝑖) only.
The second equation has another dependent variable (y2) as a function of the first dependent
variable (y1) and other pre- determined variables only and so on.
You can safely use OLS to estimate the structural parameters because the μ𝑖 are independent
of the explanatory variables for the particular equations
ILLUSTRATIONS
A Researcher is considering a topic that has to do with the effect of dividend policy decisions on
market value (P) of firms in Nigeria. A scan of literature shows that five key factors are important
in explaining P, namely Dividend Payout rate (d), Earnings Rate (e), Investment Opportunity (I),
Management Efficiency (m) and economic growth rate (g). While other factors are as given, d is
said to depend on e, I, and P, just as I depends on e and g. Required:
Solution
Structural model:
P = f(d,e,I,m,g)
d = f(e, I, P)
I = f(e, g)
ESTIMATION METHODS
● Ordinary Least Squares Method: OLS can only be used to estimate simultaneous
equation coefficients when transformed to reduced or recursive form such that all
explanatory variables are truly exogenous. Otherwise the estimates will be biased and
inconsistent
● Indirect Least Squares: ILS is a single equation method and can be used to estimate
structural parameters of simultaneous equation where the equations are exactly
identified. The steps are as follows:
1. Derive reduced form of the model
2. Obtain estimates of reduced form coefficients Π𝑖 of each equation using OLS method
3. Form a system of coefficient relationships by making Π𝑖 functions of the structural
parameters.
4. Solve for structural parameters accordingly
IDENTIFICATION PROBLEM
UNDER-IDENTIFIED MODEL: Statistical form of any of the functions of the model is not unique
• IDENTIFIED MODELS: Here, all functions of the model are unique, and may come in two
forms:
- EXACTLY IDENTIFIED
- OVER-IDENTIFIED
• ORDER CONDITION
An equation in which the number of excluded variables is greater than or equal to the number of
endogenous variables, less one, is said to satisfy the order condition for identification
(TV-EV) ≥ (END-1)
Where:
TV = Total number of variables in the system
EV= Number of variables in the equation
END= Number of endogenous variables