Econometrics - Chapter 1 - Introduction To Econometrics - Shalabh, IIT Kanpur
Econometrics - Chapter 1 - Introduction To Econometrics - Shalabh, IIT Kanpur
Econometrics - Chapter 1 - Introduction To Econometrics - Shalabh, IIT Kanpur
Introduction to Econometrics
It may be pointed out that the econometric methods can be used in other areas like engineering sciences,
biological sciences, medical sciences, geosciences, agricultural sciences etc. In simple words, whenever there
is a need of finding the stochastic relationship in mathematical format, the econometric methods and tools
help. The econometric tools are helpful in explaining the relationships among variables.
Econometric Models:
A model is a simplified representation of a real-world process. It should be representative in the sense that it
should contain the salient features of the phenomena under study. In general, one of the objectives in
modeling is to have a simple model to explain a complex phenomenon. Such an objective may sometimes
lead to oversimplified model and sometimes the assumptions made are unrealistic. In practice, generally, all
the variables which the experimenter thinks are relevant to explain the phenomenon are included in the
model. Rest of the variables are dumped in a basket called “disturbances” where the disturbances are random
variables. This is the main difference between economic modeling and econometric modeling. This is also
the main difference between mathematical modeling and statistical modeling. The mathematical modeling is
exact in nature, whereas the statistical modeling contains a stochastic term also.
An economic model is a set of assumptions that describes the behaviour of an economy, or more generally, a
phenomenon.
Aims of econometrics:
The three main aims econometrics are as follows:
3. Use of models:
The obtained models are used for forecasting and policy formulation, which is an essential part in any policy
decision. Such forecasts help the policymakers to judge the goodness of the fitted model and take necessary
measures in order to re-adjust the relevant economic variables.
Econometrics uses statistical methods after adapting them to the problems of economic life. These adopted
statistical methods are usually termed as econometric methods. Such methods are adjusted so that they
become appropriate for the measurement of stochastic relationships. These adjustments basically attempt to
specify attempts to the stochastic element which operate in real-world data and enters into the determination
of observed data. This enables the data to be called a random sample which is needed for the application of
statistical tools.
The theoretical econometrics includes the development of appropriate methods for the measurement of
economic relationships which are not meant for controlled experiments conducted inside the laboratories.
The econometric methods are generally developed for the analysis of non-experimental data.
The applied econometrics includes the application of econometric methods to specific branches of
econometric theory and problems like demand, supply, production, investment, consumption etc. The applied
econometrics involves the application of the tools of econometric theory for the analysis of the economic
phenomenon and forecasting economic behaviour.
Types of data
Various types of data is used in the estimation of the model.
1. Time series data
Time series data give information about the numerical values of variables from period to period and are
collected over time. For example, the data during the years 1990-2010 for monthly income constitutes a time
series of data.
2. Cross-section data
The cross-section data give information on the variables concerning individual agents (e.g., consumers or
produces) at a given point of time. For example, a cross-section of a sample of consumers is a sample of
family budgets showing expenditures on various commodities by each family, as well as information on
family income, family composition and other demographic, social or financial characteristics.
Aggregation problem:
The aggregation problems arise when aggregative variables are used in functions. Such aggregative variables
may involve.
1. Aggregation over individuals:
For example, the total income may comprise the sum of individual incomes.
4. Spatial aggregation:
Sometimes the aggregation is related to spatial issues. For example, the population of towns, countries, or the
production in a city or region etc..
Such sources of aggregation introduce “aggregation bias” in the estimates of the coefficients. It is important
to examine the possibility of such errors before estimating the model.
where f is some well-defined function and 1 , 2 ,..., k are the parameters which characterize the role and
contribution of X 1, X 2 ,..., X k , respectively. The term reflects the stochastic nature of the relationship
between y and X 1, X 2 ,..., X k and indicates that such a relationship is not exact in nature. When 0, then
the relationship is called the mathematical model otherwise the statistical model. The term “model” is
broadly used to represent any phenomenon in a mathematical framework.
A model or relationship is termed as linear if it is linear in parameters and non-linear, if it is not linear in
parameters. In other words, if all the partial derivatives of y with respect to each of the parameters
1 , 2 ,..., k are independent of the parameters, then the model is called as a linear model. If any of the
partial derivatives of y with respect to any of the 1 , 2 ,..., k is not independent of the parameters, the
model is called non-linear. Note that the linearity or non-linearity of the model is not described by the
linearity or non-linearity of explanatory variables in the model.
For example
y 1 X 12 2 X 2 3 log X 3
is a linear model because y / i , (i 1, 2,3) are independent of the parameters i , (i 1, 2,3). On the other
hand,
y 12 X 1 2 X 2 3 log X
of any of the 1 , 2 or 3 .
When the function f is linear in parameters, then y f ( X 1 , X 2 ,..., X k , 1 , 2 ,..., k ) is called a linear
model and when the function f is non-linear in parameters, then it is called a non-linear model. In general,
the function f is chosen as
f ( X 1 , X 2 ,..., X k , 1 , 2 ..., k ) 1 X 1 2 X 2 ... k X k
to describe a linear model. Since X 1 , X 2 ,..., X k are pre-determined variables and y is the outcome, so both
are known. Thus the knowledge of the model depends on the knowledge of the parameters 1 , 2 ,..., k .
The statistical linear modeling essentially consists of developing approaches and tools to determine
1 , 2 ,..., k in the linear model
y 1 X 1 2 X 2 ... k X k
Different statistical estimation procedures, e.g., method of maximum likelihood, the principle of least
squares, method of moments etc. can be employed to estimate the parameters of the model. The method of
maximum likelihood needs further knowledge of the distribution of y whereas the method of moments and
the principle of least squares do not need any knowledge about the distribution of y .
The regression analysis is a tool to determine the values of the parameters given the data on y and
X 1, X 2 ,..., X k . The literal meaning of regression is “to move in the backward direction”. Before discussing
and understanding the meaning of “backward direction”, let us find which of the following statements is
correct:
S1 : model generates data or
S 2 : data generates the model.
Obviously, S1 is correct. It can be broadly thought that the model exists in nature but is unknown to the
experimenter. When some values to the explanatory variables are provided, then the values for the output or
study variable are generated accordingly, depending on the form of the function f and the nature of the
phenomenon. So ideally, the pre-existing model gives rise to the data. Our objective is to determine the
Consider a simple example to understand the meaning of “regression”. Suppose the yield of the crop ( y )
depends linearly on two explanatory variables, viz., the quantity of fertilizer ( X 1 ) and level of irrigation
( X 2 ) as
y 1 X 1 2 X 2 .
There exist the true values of 1 and 2 in nature but are unknown to the experimenter. Some values on y
are recorded by providing different values to X 1 and X 2 . There exists some relationship between y and
X 1 , X 2 which gives rise to a systematically behaved data on y , X 1 and X 2 . Such a relationship is unknown
to the experimenter. To determine the model, we move in the backward direction in the sense that the
collected data is used to determine the unknown parameters 1 and 2 of the model. In this sense, such an
The theory and fundamentals of linear models lay the foundation for developing the tools for regression
analysis that are based on valid statistical theory and concepts.
Generally, the data is collected on n subjects, then y on data, then y denotes the response or study variable
and y1 , y2 ,..., yn are the n values. If there are k explanatory variables X 1 , X 2 ,.., X k then xij denotes the i th
value of the j th variable i 1, 2,..., n; j 1, 2,..., k . The observation can be presented in the following table:
Notation for the data used in regression analysis
Observation number Response Explanatory variables
y X1 X 2 X k
4. Specification of model:
The experimenter or the person working in the subject usually help in determining the form of the model.
Only the form of the tentative model can be ascertained, and it will depend on some unknown parameters.
For example, a general form will be like
y f ( X 1 , X 2 ,..., X k ; 1 , 2 ,..., k )
where is the random error reflecting mainly the difference in the observed value of y and the value of y
obtained through the model. The form of f ( X 1 , X 2 ,..., X k ; 1 , 2 ,..., k ) can be linear as well as non-linear
depending on the form of parameters 1 , 2 ,..., k . A model is said to be linear if it is linear in parameters.
For example,
y 1 X 1 2 X 12 3 X 2
y 1 2 ln X 2
are linear models whereas
y 1 X 1 22 X 2 3 X 2
y ln 1 X 1 2 X 2
are non-linear models. Many times, the non-linear models can be converted into linear models through some
transformations. So the class of linear models is wider than what it appears initially.
6. Fitting of model:
The estimation of unknown parameters using appropriate method provides the values of the parameter.
Substituting these values in the equation gives us a usable model. This is termed as model fitting. The
estimates of parameters 1 , 2 ,..., k in the model
y f ( X 1 , X 2 ,..., X k , 1 , 2 ,..., k )
are denoted by ˆ1 , ˆ2 ,..., ˆk which gives the fitted model as
When the value of y is obtained for the given values of X 1 , X 2 ,..., X k , it is denoted as ŷ and called as fitted
value.
The fitted equation is used for prediction. In this case, ŷ is termed as the predicted value. Note that the
fitted value is where the values used for explanatory variables correspond to one of the n observations in the
data, whereas predicted value is the one obtained for any set of values of explanatory variables. It is not
generally recommended to predict the y -values for the set of those values of explanatory variables which lie
outside the range of data. When the values of explanatory variables are the future values of explanatory
variables, the predicted values are called forecasted values.
The validation of the assumptions must be made before drawing any statistical conclusion. Any departure
from the validity of assumptions will be reflected in the statistical inferences. In fact, the regression analysis
is an iterative process where the outputs are used to diagnose, validate, criticize and modify the inputs. The
iterative process is illustrated in the following figure.
Inputs Outputs