Concepts Linear Vs Non-Linear Regression
Concepts Linear Vs Non-Linear Regression
Nonlinear relationships are more plausible than linear ones for many economic
processes. In this chapter we will first define what is meant by linear regression
analysis and then demonstrate some common methods for extending its use to
fit nonlinear relationships. The chapter concludes with a brief outline of the
kind of technique used to fit models that cannot be recast in linear form.
(4.1)
(4.2)
(4.3)
and it is now linear in variables as well as in parameters. This type of transformation is only cosmetic, and you will usually see the regression equation presented
with the variables written in their original nonlinear form. This avoids the need
for explanation and extra notation.
On the other hand, an equation such as
Y = 1X22
06_Dough_Ch04.indd 192
(4.4)
1/25/2011 6:28:34 PM
Introduction to Econometrics
193
Table 4.1 Average annual percentage rates of growth of employment, e, and real GDP,
g, 198897
e
Australia
z = 1/g
z = 1/g
7.73
0.1294
1.68
3.04
0.3289
Korea
2.57
Austria
0.65
2.55
0.3922
Luxembourg
3.02
5.64
0.1773
Belgium
0.34
2.16
0.4630
Netherlands
1.88
2.86
0.3497
Canada
1.17
2.03
0.4926
New Zealand
0.91
2.01
0.4975
Denmark
0.02
2.02
0.4950
Norway
0.36
2.98
0.3356
Finland
1.06
1.78
0.5618
Portugal
0.33
2.79
0.3584
France
0.28
2.08
0.4808
Spain
0.89
2.60
0.3846
Germany
0.08
2.71
0.3690
Sweden
0.94
1.17
0.8547
Greece
0.87
2.08
0.4808
Switzerland
0.79
1.15
0.8696
Iceland
0.13
1.54
0.6494
Turkey
2.02
4.18
0.2392
Ireland
2.16
6.40
0.1563
United Kingdom
0.66
1.97
0.5076
United States
1.53
2.46
0.4065
Italy
0.30
1.68
0.5952
Japan
1.06
2.81
0.3559
3
2
1
0
0
1
2
GDP growth rate
06_Dough_Ch04.indd 193
1/25/2011 6:28:38 PM
2
+ u.
g
(4.5)
This is nonlinear in g, but if we define z = 1/g, we can rewrite the model so that
it is linear in variables as well as parameters:
e = 1 + 2 z + u.
(4.6)
The data for z are given in Table 4.1. In any serious regression application, one
would construct z directly from g. The output for a regression of e on z is shown
in Table 4.2 and the regression is plotted in Figure 4.2. The regression is shown
in equation form as (4.7). The constant term in the regression is an estimate of
1 and the coefficient of z is an estimate of 2.
e = 2.60 4.05z.
(4.7)
Table 4.2
. gen z = 1/g
. reg e z
Source
SS
df
MS
Number of obs =
Model
Residual
|
|
13.1203665
11.5816089
1
23
13.1203665
.503548214
Total
24.7019754
24
1.02924898
Coef.
z
_ cons
|
|
-4.050817
2.604753
Std. Err.
.793579
.3748822
-5.10
6.95
0.4
0.6
F(1, 23)
Prob > F
R-squared
Adj R-squared
Root MSE
P > |t|
0.000
0.000
=
=
=
=
=
25
26.06
0.0000
0.5311
0.5108
.70961
-2.409174
3.380256
0
0
0.2
0.8
1
2
z = 1/g
Figure 4.2 Employment growth rate regressed on the reciprocal of GDP growth rate
06_Dough_Ch04.indd 194
1/25/2011 6:28:40 PM
Introduction to Econometrics
195
3
2
1
0
0
1
2
3
4
GDP growth rate
Figure 4.3 Nonlinear and linear regressions of employment growth rate on GDP growth
rate
4.05
.
g
(4.8)
Figure 4.3 shows the nonlinear relationship (4.8) plotted in the original
diagram. The linear regression reported in Exercise 1.4 is also shown, for
comparison.
In this case, it was easy to see that the relationship between e and g was nonlinear. In the case of multiple regression analysis, nonlinearity might be detected
using the graphical technique described in Section 3.2.
EXERCISE
4.1
. gen Z = 1/SIBLINGS
(11 missing values generated)
. reg S Z
Source
06_Dough_Ch04.indd 195
SS
df
MS
Number of obs =
Model
Residual
|
|
169.838682
2962.9288
1
527
169.838682
5.62225579
Total
3132.76749
528
5.93327175
Coef.
z
_ cons
|
|
2.071194
12.7753
Std. Err.
.3768407
.1928491
P > |t|
5.50
66.25
0.000
0.000
F(1, 527)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
529
30.21
0.0000
0.0542
0.0524
2.3711
2.811488
13.15415
1/25/2011 6:28:42 PM
(4.9)
When you see such a function, you can immediately say that the elasticity of Y with
respect to X is constant and equal to 2. This is easily demonstrated. Regardless of
the mathematical relationship connecting Y and X, or the definitions of Y and X,
the elasticity of Y with respect to X is defined to be the proportional change in Y
for a given proportional change in X:
elasticity =
dY Y
.
dX X
(4.10)
Thus, for example, if Y is demand and X is income, the expression defines the
income elasticity of demand for the commodity in question.
The expression may be rewritten
elasticity =
dY dX
.
Y X
(4.11)
In the case of the demand example, this may be interpreted as the marginal
propensity to consume the commodity divided by the average propensity to
consume it.
If the relationship between Y and X takes the form (4.9),
dY
Y
= 1 2 X 2 1 = 2 .
dX
X
(4.12)
Hence,
elasticity =
dY dX 2Y X
=
= 2 .
Y X
Y X
(4.13)
06_Dough_Ch04.indd 196
(4.14)
1/25/2011 6:28:44 PM
Introduction to Econometrics
197
this means that the income elasticity of demand is equal to 0.3. If you are trying
to explain this to someone who is not familiar with economic jargon, the easiest
way to explain it is to say that a 1 percent change in X (income) will cause a 0.3
percent change in Y (demand).
A function of this type can be converted into a linear equation by means of
a logarithmic transformation. You will certainly have encountered logarithms
in a basic mathematics course. In econometric work they are indispensable. If
you are unsure about their use, you should review your notes from that basic
math course. The main properties of logarithms are given in Box 4.1.
In the box it is shown that (4.9) may be linearized as
log Y = log 1 + 2 log X.
(4.15)
This is known as a logarithmic model or, alternatively, a loglinear model, referring to the fact that it is linear in logarithms. If we write Y = log Y, Z = log X, and
1 = log 1, the equation may be rewritten
Y = 1 + 2Z.
(4.16)
20000
40000
60000
80000
100000
120000
140000
EXP
Figure 4.4 Regression of expenditure on food eaten at home on total household expenditure
06_Dough_Ch04.indd 197
1/25/2011 6:28:46 PM
using rule 1
using rule 3.
Thus far, we have not specified whether we are taking logarithms to base e or to base
10. Throughout this text we shall be using e as the base, and so we shall be using what
are known as natural logarithms. This is standard in econometrics. Purists sometimes
write ln instead of log to emphasize that they are working with natural logarithms,
but this is now unnecessary. Nobody uses logarithms to base 10 any more. They were
tabulated in the dreaded log tables that were universally employed for multiplying
or dividing large numbers until the early 1970s. When the pocket calculator was invented, they became redundant. They are not missed.
With e as base, we can state another rule:
4. If Y = eX, log Y = X.
eX, also sometimes written exp(X), is familiarly known as the antilog of X. One
can say that log eX is the log of the antilog of X, and since log and antilog cancel out, it is not surprising that log eX turns out just to be X. Using rule 2 above,
log eX = X log e = X since log e to base e is 1.
representative households in the United States in 1995, the data being taken
from the Consumer Expenditure Survey.
When analyzing household expenditure data, it is usual to relate types of
expenditure to total household expenditure rather than income, the reason
being that the relationship with expenditure tends to be more stable than that
with income. The outputs from linear and logarithmic regressions are shown in
Tables 4.3 and 4.4.
The linear regression indicates that 5.3 cents out of the marginal dollar are
spent on food eaten at home. Interpretation of the intercept is problematic
because literally it implies that $1,916 would be spent on food eaten at home
even if total expenditure were zero.
The logarithmic regression, shown in Figure 4.5, indicates that the elasticity of
expenditure on food eaten at home with respect to total household expenditure
is 0.48. Is this figure plausible? Yes, because food eaten at home is a necessity
06_Dough_Ch04.indd 198
1/25/2011 6:28:47 PM
Introduction to Econometrics
199
Table 4.3
. reg FDHO EXP
Source
SS
df
MS
Number of obs =
Model
Residual
|
|
915843574
2.0815e+09
1
867
915843574
2400831.16
Total
2.9974e+09
868
3453184.55
FDHO
Coef.
Std. Err.
EXP
_ cons
.0528427
1916.143
.0027055
96.54591
F(1, 867)
Prob > F
R-squared
Adj R-squared
Root MSE
P > |t|
19.531
19.847
0.000
0.000
=
=
=
=
=
869
381.47
0.0000
0.3055
0.3047
1549.5
.0581529
2105.634
Table 4.4
. gen LGFDHO = ln(FDHO)
. gen LGEXP = ln(EXP)
. reg LGFDHO LGEXP
Source
SS
df
MS
Number of obs =
Model
Residual
|
|
84.4161692
184.579612
1
866
84.4161692
.213140429
Total
268.995781
867
.310260416
LGFDHO
Coef.
LGEXP
_ cons
.4800417
3.166271
Std. Err.
.0241212
.244297
t
19.901
12.961
F(1, 866)
Prob > F
R-squared
Adj R-squared
Root MSE
P > |t|
0.000
0.000
=
=
=
=
=
868
396.06
0.0000
0.3138
0.3130
.46167
.5273846
3.645754
LGFDHO
9
5
7
10
11
12
LGEXP
Figure 4.5 Logarithmic regression of expenditure on food eaten at home on total household expenditure
06_Dough_Ch04.indd 199
1/25/2011 6:28:49 PM
20000
40000
60000
80000
100000
120000
140000 EXP
Figure 4.6 Linear and logarithmic regressions of expenditure on food eaten at home on
total household expenditure
rather than a luxury, so one would expect the elasticity to be less than 1. The
intercept has no economic meaning. Figure 4.6 plots the logarithmic regression
line in the original diagram. While there is not much difference between the
regression lines over the middle part of the range of observations, it is clear that
the logarithmic regression gives a better fit for very low and very high levels of
household expenditure.
Semilogarithmic models
Another common functional form is given by equation (4.17):
Y = 1e 2 X .
(4.17)
(4.18)
dY dX
= 2 .
Y
(4.19)
Hence,
06_Dough_Ch04.indd 200
1/25/2011 6:28:52 PM
Introduction to Econometrics
201
(4.20)
Note that only the left side is logarithmic in variables, and for this reason (4.20)
is described as a semilogarithmic model.
The interpretation of 2 as the proportional change in Y per unit change in X
is valid only when 2 is small. When 2 is large, the interpretation may be a little
more complex. Suppose that Y is related to X by (4.17) and that X increases by
one unit to X. Then Y, the new value of Y is given by
Y' = 1e 2 X ' = 1e 2 (X +1)
= 1e 2 X e 2 = Ye 2
2
= Y 1 + 2 + 2 + ... .
2!
(4.21)
Thus, the proportional change per unit change in X is actually greater than 2.
However, if 2 is small (say, less than 0.1), 22 and further terms will be very small
and can be neglected. In that case, the right side of the equation simplifies to Y(1
+ 2) and the original marginal interpretation of 2 still applies.
Example: semilogarithmic earnings function
For fitting earnings functions, the semilogarithmic model is generally considered
to be superior to the linear model. We will start with the simplest possible
version:
EARNINGS = 1e 2 S ,
(4.22)
(4.23)
06_Dough_Ch04.indd 201
1/25/2011 6:28:53 PM
SS
df
MS
Number of obs =
|
|
38.5643833
148.14326
1
538
38.5643833
.275359219
Total
186.707643
539
.34639637
LGEARN
Coef.
Std. Err.
S
_ cons
|
|
.1096934
1.292241
.0092691
.1287252
Model
Residual
t
11.83
10.04
F(1, 538)
Prob > F
R-squared
Adj R-squared
Root MSE
P > |t|
0.000
0.000
=
=
=
=
=
540
140.05
0.0000
0.2065
0.2051
.52475
.1279014
1.545107
5
4
3
2
1
0
0
9 10 11 12 13 14 15 16 17 18 19 20
The scatter diagram for the semilogarithmic regression is shown in Figure 4.7.
For the purpose of comparison, it is plotted together with the linear regression
in a plot with the untransformed variables in Figure 4.8. The two regression
lines do not differ greatly in their overall fit, but the semilogarithmic specification has the advantages of not predicting negative earnings for individuals with
low levels of schooling and of allowing the increase in earnings per year of
schooling to increase with schooling.
06_Dough_Ch04.indd 202
1/25/2011 6:28:55 PM
Introduction to Econometrics
203
120
100
80
60
40
20
0
0
9 10 11 12 13 14 15 16 17 18 19 20
20
Years of schooling (highest grade completed)
(4.24)
when we take the random effect into account. Working backwards, this implies
that the original (untransformed) equation should be of the form
e = 1 +
2
+ u.
g
(4.25)
In this particular case, if it is true that in the original equation the disturbance
term is additive and satisfies the regression model conditions, it will also be true
in the transformed equation. No problem here.
What happens when we start off with a model such as
Y = 1 X22?
(4.26)
(4.27)
when the disturbance term is included. Working back to the original equation,
this implies that (4.26) should be rewritten
Y = 1X22 v,
(4.28)
06_Dough_Ch04.indd 203
1/25/2011 6:28:57 PM
(4.29)
The answer is that when you take logarithms, there is no mathematical way of
simplifying log 1 X22 + u . The transformation does not lead to a linearization.
You would have to use a nonlinear regression technique, for example, of the
type discussed in the next section.
Example
The central limit theorem suggests that the disturbance term should have a normal distribution. It can be demonstrated that if the disturbance term has a normal distribution, so also will the residuals, provided that the regression equation
is correctly specified. An examination of the distribution of the residuals thus
provides indirect evidence of the adequacy of the specification of a regression
model. Figure 4.9 shows the residuals from linear and semilogarithmic regressions of EARNINGS on S using EAEF Data Set 21, standardized so that they
200
Residuals (linear)
180
Residuals (semilogarithmic)
160
140
120
100
80
60
40
20
3
to
2.
5
2.
5
to
2
to
1.
5
1.
5
1
to
to
1
0.
5
0.
5
0
to
to
0
0
.5
0
.5
to
to
.5
1
1
.5
to
2
.5
2
to
to
.5
06_Dough_Ch04.indd 204
1/25/2011 6:28:58 PM
Introduction to Econometrics
205
have standard deviation equal to 1, for comparison. The distribution of the residuals from the linear specification is right skewed, while that for the residuals from
the semilogarithmic specification is much closer to a normal distribution. This
suggests that the semilogarithmic specification is preferable.
06_Dough_Ch04.indd 205
1/25/2011 6:29:00 PM
en
log Yi
= en
log(Y1 ... Yn )
= e log(Y1
... Yn )n
= (Y1 ... Yn )n .
(4.30)
Y*
i = Yi / geometric mean of Y,
(4.31)
Table 4.6
. gen EARNSTAR = EARNINGS/16.3135
. gen LGEARNST = ln(EARNSTAR)
. reg EARNSTAR S EXP
Source
06_Dough_Ch04.indd 206
SS
df
MS
Number of obs =
Model
Residual
|
|
84.5963381
336.288615
2
537
42.298169
.626235783
Total
420.884953
539
.780862622
EARNSTAR
Coef.
S
EXP
_ cons
|
|
|
.1641662
.0344765
1.623503
Std. Err.
.0143225
.0078777
.2619003
t
11.46
4.38
6.20
F(2, 537)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
540
67.54
0.0000
0.2010
0.1980
.79135
P > |t|
0.000
0.000
0.000
.1360312
.0190015
2.137977
.1923011
.0499515
1.109028
1/25/2011 6:29:00 PM
Introduction to Econometrics
207
Table 4.7
. reg LGEARNST S EXP
Source
SS
df
MS
Number of obs =
Model
Residual
|
|
50.9842589
135.72339
2
537
25.4921295
.252743742
Total
186.707649
539
.346396379
LGEARNST
S
EXP
_ cons
|
|
Coef.
.1235911
.0350826
2.282673
F(2, 537)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
540
100.86
0.0000
0.2731
0.1980
.50274
Std. Err.
P > |t|
.0090989
.0050046
.1663823
13.58
7.01
13.72
0.000
0.000
0.000
.1057173
.0252515
2.609513
.141465
.0449137
1.955833
EXERCISES
4.2
SS
df
MS
Number of obs =
Model
Residual
|
|
1.62650898
15.2402109
1
534
1.62650898
.028539721
Total
16.8667198
535
.031526579
LGS
Coef.
Std. Err.
LGSM
_ cons
|
|
.200682
2.11373
.0265831
.0648636
7.55
32.59
P > |t|
0.000
0.000
F(1, 534)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
536
56.99
0.0000
0.0964
0.0947
.16894
.2529022
2.241149
The output shows the result of regressing LGS, the logarithm of years of schooling, on LGSM, the logarithm of mothers years of schooling, using EAEF Data
Set 21. Provide an interpretation of the coefficients and evaluate the regression
results.
06_Dough_Ch04.indd 207
1/25/2011 6:29:02 PM
SS
df
MS
Number of obs =
Model
2.14395934
2.14395934
Residual
14.7640299
538
.027442435
Total
16.9079893
539
.031369182
F(1, 538)
Prob > F
R-squared
Adj R-squared
Root MSE
LGS
P > |t|
[95% Conf.
SM
_ cons
|
|
8.84
77.52
0.000
0.000
.0174162
2.281116
Coef.
.0223929
2.340421
Std. Err.
.0025335
.0301902
=
=
=
=
=
540
78.13
0.0000
0.1268
0.1252
.16566
Interval]
.0273696
2.399727
The output shows the result of regressing LGS, the logarithm of years of schooling, on SM, mothers years of schooling, using EAEF Data Set 21. Provide an
interpretation of the coefficients and evaluate the regression results.
4.4
Download the CES data set from the website (see Appendix B) and fit linear
and logarithmic regressions for your commodity on EXP, total household
expenditure, excluding observations with zero expenditure on your commodity.
Interpret the regression results and perform appropriate tests.
4.5
4.6
What is the relationship between weight and height? Using your EAEF data set,
regress the logarithm of WEIGHT85 on the logarithm of HEIGHT. Interpret
the regression results and perform appropriate tests.
4.7
4.8* Suppose that the logarithm of Y is regressed on the logarithm of X, the fitted
regression being
lo gl Y = b1 + b2 log X .
Suppose X* = X, where is a constant, and suppose that log Y* is regressed
on log X*. Determine how the regression coefficients are related to those of the
original regression. Determine also how the t statistic for b2 and R2 for the equation are related to those in the original regression.
4.9
Using your EAEF data set, regress the logarithm of earnings on S and EXP.
Interpret the regression results and perform appropriate tests.
4.10 Using your EAEF data set, evaluate whether the dependent variable of an earn-
06_Dough_Ch04.indd 208
1/25/2011 6:29:04 PM
Introduction to Econometrics
209
is preferable for the expenditure function for your commodity in the CES data
set. Note: Drop households reporting no expenditure on your commodity.
(4.32)
(4.33)
Of course, the quadratic model may be viewed as a special case of the interactive model with X3 = X2, but it is convenient to treat it separately. These models
can be fitted using OLS with no modification. However, the interpretation of
their coefficients has to be approached with care. The usual interpretation of
a parameter, that it represents the effect of a unit change in its associated variable, holding all other variables constant, cannot be applied. In the case of the
quadratic model, it is not possible for X2 to change without X22 also changing.
In the case of the interactive model, it is not possible for X2 to change without
X2X3 also changing, if X3 is kept constant.
Quadratic variables
By differentiating (4.32), one obtains the change in Y per unit change in X2:
dY
= 2 + 2 3X2 .
dX 2
(4.34)
Viewed this way, it can be seen that the impact of a unit change in X2 on Y,
(2 + 23X2 ), changes with X2. This means that 2 has an interpretation that is
different from that in the ordinary linear model
Y = 1 + 2 X2 + u,
(4.35)
06_Dough_Ch04.indd 209
(4.36)
1/25/2011 6:29:08 PM
120
100
80
60
40
quadratic
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
20
Years of schooling (highest grade completed)
Table 4.8
. gen SSQ = S*S
. reg EARNINGS S SSQ
Source
06_Dough_Ch04.indd 210
SS
df
MS
Number of obs =
Model
Residual
|
|
20372.4953
91637.7357
2
537
10186.2477
170.0647553
Total
112010.231
539
207.811189
EARNINGS
Coef.
S
SSQ
_ cons
|
|
|
2.772317
.1829731
22.25089
Std. Err.
2.119128
.0737308
14.92883
t
1.31
2.48
1.49
F(2, 537)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
540
59.69
0.0000
0.1819
0.1788
13.063
P > |t|
0.191
0.013
0.137
6.935114
.0381369
7.075176
1.390481
.3278092
51.57695
1/25/2011 6:29:11 PM
Introduction to Econometrics
211
of schooling). The coefficient of S implies that, for an individual with no schooling, the impact of a year of schooling is to decrease hourly earnings by $2.77.
The intercept also has no sensible interpretation. Literally, it implies that an
individual with no schooling would have hourly earnings of $22.25, which is
implausibly high.
The quadratic relationship is illustrated in Figure 4.10. Over the range of
the actual data, it fits the observations tolerably well. The fit is not dramatically
different from those of the linear and semilogarithmic specifications. However,
when one extrapolates beyond the data range, the quadratic function increases
as schooling decreases, giving rise to implausible estimates of both 1 and 2 for
S = 0. In this example, we would prefer the semilogarithmic specification, as do
all wage-equation studies. The slope coefficient of the semilogarithmic specification has a simple interpretation and the specification does not give rise to
predictions outside the data range that are obviously nonsensical.
The data on employment growth rate, e, and GDP growth rate, g, for 25
OECD countries in Exercise 1.4 provide a less problematic example of the use
of a quadratic function. gsq has been defined as the square of g. Table 4.9 shows
the output from the quadratic regression. In Figure 4.11, the quadratic regression is compared with that obtained in Section 4.1. The quadratic specification
appears to be an improvement on the hyperbolic function fitted in Section 4.1.
It is more satisfactory than the latter for low values of g, in that it does not yield
implausibly large negative predicted values of e. The only defect is that it predicts that the fitted value of e starts to fall when g exceeds 7.
Higher-order polynomials
Why stop at a quadratic? Why not consider a cubic, or quartic, or a polynomial
of even higher order? There are usually several good reasons for not doing
so. Diminishing marginal effects are standard in economic theory, justifying
Table 4.9
. gen gsq = g*g
. reg e g gsq
Source
06_Dough_Ch04.indd 211
SS
df
MS
Number of obs =
Model
Residual
|
|
15.9784642
8.7235112
2
22
7.98923212
.396523236
Total
24.7019754
24
1.02924898
Coef.
g
gsq
_ cons
|
|
|
1.200205
.0838408
1.678113
Std. Err.
.3862226
.0445693
.6556641
3.11
1.88
2.56
F (2, 22)
Prob > F
R-squared
Adj R-squared
Root MSE
25
= 20.15
= 0.0000
= 0.6468
= 0.6147
= .6297
P > |t|
0.005
0.073
0.018
.3992287
.1762719
3.037877
2.001182
.0085903
.3183494
1/25/2011 6:29:13 PM
quadratic
3
2
hyperbolic
1
0
0
1
2
3
4
GDP growth rate
Figure 4.11 Hyperbolic and quadratic regressions of employment growth rate on GDP
growth rate
quadratic specifications, at least as an approximation, but economic theory seldom suggests that a relationship might sensibly be represented by a cubic or
higher-order polynomial. The second reason follows from the first. There will
be an improvement in fit as higher-order terms are added, but because these
terms are not theoretically justified, the improvement will be sample-specific.
Third, unless the sample is very small, the fits of higher-order polynomials are
unlikely to be very different from those of a quadratic over the main part of the
data range.
These points are illustrated by Figure 4.12, which shows cubic and quartic
regressions with the original linear and quadratic regressions. Over the main
quartic
cubic
1
quartic
0
0
1
cubic
2
GDP growth rate
Figure 4.12 Cubic and quartic regressions of employment growth rate on GDP growth
rate
06_Dough_Ch04.indd 212
1/25/2011 6:29:14 PM
Introduction to Econometrics
213
data range, from g = 1.5 to g = 4, the fits of the cubic and quartic are very
similar to that of the quadratic. R2 for the linear specification is 0.590. For the
quadratic it improves to 0.647. For the cubic and quartic it is 0.651 and 0.658,
relatively small further improvements. Further, the cubic and quartic curves
both exhibit implausible characteristics. The cubic declines even more rapidly
than the quadratic for high values of g, and the quartic has strange twists at its
extremities.
(4.37)
(4.38)
This representation makes explicit the fact that (2 + 4 X3 ) , the marginal effect
of X2 on Y, depends on the value of X3. From this it can be seen that the interpretation of 2 has a special interpretation. It gives the marginal effect of X2 on
Y, when X3 = 0.
One may alternatively rewrite the model as
Y = 1 + 2 X2 + (3 + 4 X2 )X3 + u.
(4.39)
From this it may be seen that the marginal effect of X3 on Y, holding X2 constant, is (3 + 4 X2 ) and that 3 may be interpreted as the marginal effect of X3
on Y, when X2 = 0.
If X3 = 0 is a long way outside the range of X3 in the sample, the interpretation of the estimate of 2 as an estimate of the marginal effect of X2 when X3 = 0
should be treated with caution. Sometimes the estimate will be completely
implausible, in the same way as the estimate of the intercept in a regression
is often implausible if given a literal interpretation. We have just encountered
a similar problem with the interpretation of 2 in the quadratic specification.
Often it is of interest to compare the estimates of the effects of X2 and X3 on Y
in models excluding and including the interactive term, and the changes in the
meanings of 2 and 3 caused by the inclusion of the interactive term can make
such comparisons difficult.
06_Dough_Ch04.indd 213
1/25/2011 6:29:15 PM
(4.40)
X = X3 X3 .
(4.41)
*
3
)(X * + X )+ u
3
(4.42)
= 1 + 2 X + 3 X3 + 4 X 2 X3 + u
*
2
*
*
where 1 = 1 + 2 X2 + 3 X3 + 4 X2 X3, *2 = 2 + 4 X3, and 3 = 3 + 4 X2. The
point of doing this is that the coefficients of X*2 and X*3 now give the marginal
effects of the variables when the other variable is held at its sample mean, which
is to some extent a representative value. For example, rewriting the new equation as
(4.43)
it can be seen that *2 gives the marginal effect of X2*, and hence X2, when X3* = 0,
that is, when X3 is at its sample mean. *3 has a similar interpretation.
Example
Table 4.10 shows the results of regressing the logarithm of hourly earnings
on years of schooling and years of work experience for males using EAEF
Data Set 21. It implies that an extra year of schooling increases earnings by
13.0 percent and that an extra year of work experience increases them by 3.2
percent. In Table 4.11, the interactive variable SEXP is defined as the product
of S and EXP and added to the specification. The schooling coefficient now
jumps to 23.7 percent, an extraordinarily high figure. But of course it has now
changed its meaning. It now estimates the impact of an extra year of schooling
Table 4.10
. reg LGEARN S EXP
Source
06_Dough_Ch04.indd 214
SS
df
MS
Number of obs =
F(2, 267)
Prob > F
R-squared
Adj R-squared
Root MSE
Model
Residual
|
|
25.4256872
7.3402828
2
267
12.7128436
.252210797
Total
92.76597
269
.344854907
LGEARN
Coef.
Std. Err.
P > |t|
[95% Conf.
S
EXP
_ cons
|
|
|
.1303979
.0321614
.5969745
.0129905
.0083783
.2768371
10.04
3.84
2.16
0.000
0.000
0.032
.1048211
.0156655
.0519132
=
=
=
=
=
270
50.41
0.0000
0.2741
0.2686
.50221
Interval]
.1559747
.0486573
1.142036
1/25/2011 6:29:17 PM
Introduction to Econometrics
215
Table 4.11
. gen SEXP = S*EXP
. reg LGEARN S EXP SEXP
Source
SS
df
MS
Number of obs =
F(3, 266)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
270
35.58
0.0000
0.2864
0.2783
.49887
Model
Residual
|
|
26.5654376
66.2005325
3
266
8.85514586
.248874182
Total
92.76597
269
.344854907
LGEARN
Coef.
Std. Err.
P > |t|
S
EXP
SEXP
_ cons
|
|
|
|
.2371066
.1226418
.0065695
.9003565
.0515064
.0430918
.0030699
.7517877
4.60
2.85
2.14
1.20
0.000
0.005
0.033
0.232
.1356944
.0377974
.0126139
2.380568
.3385187
.2074863
.0005252
.579855
Table 4.12
.
.
.
.
.
SS
df
MS
Number of obs =
Model
25.4256872
12.7128436
Residual
67.3402828
267
.252210797
Total
92.76597
269
.344854907
LGEARN
Coef.
Std. Err.
S1
EXP1
_ cons
|
|
|
.1303979
.0321614
2.961112
.0129905
.0083783
.0305633
t
10.04
3.84
96.88
P > |t|
0.000
0.000
0.000
F(2, 267)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
270
50.41
0.0000
0.2741
0.2686
.50221
.1559747
.0486573
3.021288
for those individuals who have no work experience. The experience coefficient
has also risen sharply. Now it indicates that an extra year increases earnings
by a wholly implausible 12.3 percent. But this figure refers to individuals with
no schooling, and every individual in the sample had at least 7 years.
To deal with these problems, we define S1, EXP1, and SEXP1 as the corresponding schooling, experience, and interactive variables with the means
subtracted, and repeat the regressions. We will refer to the original regressions
excluding and including the interactive term, with the output shown in Tables
4.10 and 4.11, as Regressions (1) and (2), and the new ones, with the output
shown in Tables 4.12 and 4.13, as Regressions (3) and (4).
06_Dough_Ch04.indd 215
1/25/2011 6:29:23 PM
SS
df
MS
Number of obs =
Model
Residual
|
|
26.5654377
66.2005324
3
266
8.85514589
.248874182
Total
92.76597
269
.344854907
LGEARN
Coef.
Std. Err.
S1
EXP1
SEXP1
_ cons
|
|
|
|
.1196959
.0324933
.0065695
2.933994
.0138394
.0083241
.0030699
.0328989
8.65
3.90
2.14
89.18
F(3, 266)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
270
35.58
0.0000
0.2864
0.2783
.49887
P > |t|
0.000
0.000
0.033
0.000
.0924473
.0161038
.0126139
2.869218
.1469446
.0488829
.0005252
2.998769
Regression (3) is virtually identical to Regression (1). In particular, comparing Tables 4.10 and 4.12, the slope coefficients are the same, as are the standard
errors, t statistics, and R2. The only difference is in the intercept. This now refers
to the logarithm of hourly earnings of an individual with mean schooling and
mean experience. It implies an hourly rate of e2.96 = 19.32 dollars, and since it is
in the middle of the data range it is perhaps more informative than the intercept
in Regression (1), which suggested that the hourly earnings of an individual
with no schooling and no work experience would be e0.60 = 1.82 dollars.
Regressions (2) and (4) also have much in common. The analysis of variance
and goodness of fit statistics are the same, and the results relating to the interactive effect are the same. The only differences are in the output relating to the
schooling and work experience slope coefficients, which in Regression (4) now
relate to an individual with mean schooling and experience. A comparison of
Regressions (3) and (4) allows a more meaningful evaluation of the impact of
including the interactive term. We see that, for an individual at the mean, it has
little effect on the value of work experience, but it suggests that the value of a
year of schooling was overestimated by a small amount in Regression (3). The
interactive effect itself suggests that the value of education diminishes for all
individuals with increasing experience.
06_Dough_Ch04.indd 216
1/25/2011 6:29:25 PM
Introduction to Econometrics
217
Y = b1 + bj X j,
(4.44)
j =2
Y 2 is a linear combination of the squares of the X variables and their interactions. If Y 2 is added to the regression specification, it should pick up quadratic
and interactive nonlinearity, if present, without necessarily being highly correlated with any of the X variables and consuming only one degree of freedom. If
the t statistic for the coefficient of Y 2 is significant, this indicates that some kind
of nonlinearity may be present.
Of course the test does not indicate the actual form of the nonlinearity and it
may fail to detect other types of nonlinearity. However, it does have the virtue
of being very easy to implement.
However, the conIn principle, one could also include higher powers of Y.
sensus appears to be that this is not usually worthwhile.
EXERCISES
4.12
. gen SMSQ = SM*SM
. reg S SM SMSQ
Source
SS
df
MS
Number of obs =
Model
519.131914
259.565957
Residual
2685.85142
534
5.00158551
Total
3204.98333
539
5.94616574
Coef.
SM
SMSQ
_ cons
|
|
.2564658
.0271172
12.79121
Std. Err.
.1318583
.0060632
.7366358
t
1.95
4.47
17.36
F(2, 537)
Prob > F
R-squared
Adj R-squared
Root MSE
P > |t|
0.052
0.000
0.000
=
=
=
=
=
540
51.90
0.0000
0.1620
0.1589
2.2364
.0025556
.0390277
14.23825
The output shows the result of regression of S on SM and its square, SMSQ.
Evaluate the regression results. In particular, explain why the coefficient of SM
is negative.
4.13 Using your EAEF data set, perform a regression parallel to that in Exercise 4.12
and evaluate the results. Define a new variable SM12 as SM 12. SM12 may be
interpreted as the number of years of schooling of the mother after completing
high school, if positive, and the number of years of schooling lost before completing high school, if negative. Regress S on SM12 and its square, and compare
the results with those in your original regression.
06_Dough_Ch04.indd 217
1/25/2011 6:29:26 PM
SS
df
MS
Number of obs =
Model
Residual
|
|
1.62650898
15.2402109
1
534
1.62650898
.028539721
Total
16.8667198
535
.031526579
LGS
Coef.
Std. Err.
P > |t|
LGSM
LGSMSQ
_ cons
|
|
|
(omitted)
.100341
2.11373
.0132915
.0648636
7.55
32.59
0.000
0.000
F(1, 534)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
536
56.99
0.0000
0.0964
0.0947
.16894
.0742309
1.986311
.1264511
2.241149
The output shows the results of regressing LGS, the logarithm of S, on LGSM,
the logarithm of SM, and LGSMSQ, the logarithm of SMSQ. Explain the
regression results.
4.15 Perform a RESET test of functional misspecification. Using your EAEF data set,
regress WEIGHT02 on HEIGHT. Save the fitted values as YHAT and define
YHATSQ as its square. Add YHATSQ to the regression specification and test its
coefficient.
(4.45)
and you wish to obtain estimates of 1, 2, and 3 given data on Y and X. There
is no way of transforming (4.45) to obtain a linear relationship, and so it is not
possible to apply the usual regression procedure.
Nevertheless, one can still use the principle of minimizing the sum of the
squares of the residuals to obtain estimates of the parameters. We will describe
a simple nonlinear regression algorithm that uses the principle. It consists of a
series of repeated steps:
1. You start by guessing plausible values for the parameters.
2. You calculate the predicted values of Y from the data on X, using these values
3.
4.
5.
6.
06_Dough_Ch04.indd 218
of the parameters.
You calculate the residual for each observation in the sample, and hence RSS,
the sum of the squares of the residuals.
You then make small changes in one or more of your estimates of the
parameters.
You calculate the new predicted values of Y, residuals, and RSS.
If RSS is smaller than before, your new estimates of the parameters are better
than the old ones and you take them as your new starting point.
1/25/2011 6:29:31 PM
Introduction to Econometrics
219
7. You repeat steps 4, 5, and 6 again and again until you are unable to make any
2
+ u.
g
(4.46)
100
80
RSS
60
40
20
0
7
b2
06_Dough_Ch04.indd 219
1/25/2011 6:29:33 PM
RSS
30
20
10
0
1
1.5
2.5
b1
Table 4.14
. nl (e = {beta1} + {beta2}/g)
(obs = 25)
Iteration 0: residual SS = 11.58161
Iteration 1: residual SS = 11.58161
Source
SS
df
MS
Number of obs =
Model
Residual
|
|
13.1203672
11.5816083
1
23
13.1203672
.503548186
Total
24.7019754
24
1.02924898
Coef.
Std. Err.
P > |t|
/beta1
/beta2
|
|
2.604753
4.050817
.3748821
.793579
6.95
5.10
0.000
0.000
25
R-squared
= 0.5311
Adj R-squared = 0.5108
= .7096113
Root MSE
Res. dev.
= 51.71049
3.380256
2.409174
In practice, the algorithms used for minimizing the residual sum of squares
in a nonlinear model are mathematically far more sophisticated than the simple
trial-and-error method described above. Nevertheless, until fairly recently a
major problem with the fitting of nonlinear regressions was that it was very
slow compared with linear regression, especially when there were several
parameters to be estimated, and the high computing cost discouraged the use
of nonlinear regression. This has changed as the speed and power of computers
have increased. As a consequence, more interest is being taken in the technique
and regression applications often incorporate user-friendly nonlinear regression
features.
06_Dough_Ch04.indd 220
1/25/2011 6:29:33 PM
Introduction to Econometrics
221
(4.47)
3
2
(4.46)
1
0
0
1
2
3
4
GDP growth rate
Table 4.15
. nl (e = {beta1} + {beta2}/({beta3} + g))
(obs = 25)
Iteration 0: residual SS = 11.58161
Iteration 1: residual SS = 11.19238
Iteration 15: residual SS = 9.01051
Source
SS
df
MS
Number of obs =
Model
Residual
|
|
15.6914659
9.01050957
2
22
7.84573293
.409568617
Total
24.7019754
24
1.02924898
Coef.
Std. Err.
/beta1
/beta2
/beta3
|
|
|
5.467548
31.0764
4.148589
2.826401
41.78914
4.870437
Res. dev.
t
1.93
0.74
0.85
25
R-squared
=
0.6352
Adj R-squared =
0.6021
Root MSE
= .6399755
= 45.43482
P > |t|
0.066
0.465
0.404
.3940491
117.7418
5.95208
11.32914
55.58897
14.24926
Table 4.14 shows such output for the present hyperbolic regression of e on g.
It is, as usual, Stata output, but output from other regression applications will
look similar. The Stata command for a nonlinear regression is nl. This is followed by the hypothesized mathematical relationship within parentheses. The
parameters must be given names placed within braces. Here 1 is {beta1} and 2
is {beta2}. The output is effectively the same as the linear regression output in
Table 4.2.
06_Dough_Ch04.indd 221
1/25/2011 6:29:35 PM
2
+ u.
3 + g
(4.47)
Key terms
elasticity
linear in parameters
linear in variables
logarithmic model
logarithmic transformation
loglinear model
nonlinear regression algorithm
semilogarithmic model
EXERCISE
4.16*
. nl (S = {beta1} + {beta2}/({beta3} + SIBLINGS)) if SIBLINGS>0
(obs = 529)
Iteration 0: residual SS = 2962.929
Iteration 1: residual SS = 2951.616
Iteration 13: residual SS = 2926.201
Source
SS
df
MS
Number of obs =
529
R-squared
=
0.0659
Adj R-squared =
0.0624
Root MSE
= 2.358627
Model
Residual
|
|
206.566702
2926.20078
2
526
103.283351
5.56311936
Total
3132.76749
528
5.93327175
Coef.
Std. Err.
P>|t|
/beta1
/beta2
/beta3
|
|
|
11.09973
17.09479
3.794949
1.363292
18.78227
3.66492
8.14
0.91
1.04
0.000
0.363
0.301
8.421565
19.80268
3.404729
Res. dev.
= 2406.077
13.7779
53.99227
10.99463
06_Dough_Ch04.indd 222
1/25/2011 6:29:36 PM
Introduction to Econometrics
223
The output above uses EAEF Data Set 21 to fit the nonlinear model
S = 1 +
2
+ u,
3 + SIBLINGS
where S is the years of schooling of the respondent and SIBLINGS is the number
of brothers and sisters. The specification is an extension of that for Exercise 4.1,
with the addition of the parameter 3. Provide an interpretation of the regression
results and compare it with that for Exercise 4.1.
06_Dough_Ch04.indd 223
1/25/2011 6:29:38 PM