Specification Error OR Misspecification in Statistical Models

The document discusses specification error in statistical models, which occurs when the model is not an accurate representation of the relationships in the data. Some common causes of specification error include specifying a linear relationship when the true relationship is nonlinear, omitting important predictors or covariates, and including unimportant predictors in the model. Specification error can cause other model assumptions like normality to fail and results in biased coefficient estimates and invalid statistical tests.

Uploaded by

Bangura Abdulai F

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Specification Error OR Misspecification in Statistical Models

Uploaded by

Bangura Abdulai F

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

What Is Specification Error in Statistical

Models?
When we think about model assumptions, we tend to focus on assumptions
like independence, normality, and constant variance. The other big
assumption, which is harder to see or test, is that there is no specification
error. The assumption of linearity is part of this, but it’s actually a bigger
assumption.

What is this assumption of no specification error?

The basic idea is that when you choose a final model, you want to choose
There are a few common ways of specifying a linear model inaccurately.

Specifying a linear relationship between X & Y when

the relationship isn’t linear
It’s often the case that the relationship between a predictor X and Y isn’t a
straight line. Let’s use a common one as an example: a curvilinear
relationship.

Specifying a line when the relationship is really a curve.

One way to check for a curvilinear relationship is with bivariate
graphing before you get started modeling. Many times (though not always)
the fix is simple: a log transformation of X or an addition of a quadratic (X
squared) term.

Other ways to find it include residual graphs and, if they make theoretical
sense, adding transformations of X to the model and assessing model fit.

Another example is an interaction term.

If the effect of a variable X is moderated by another predictor, it means X

doesn’t have a simple linear relationship with Y. X’s relationship with Y
depends on the value of a third variable–the moderator.

Including that interaction in the model will accurately represent the real
relationship between X and Y. Failing to include it means mis-specification
of X’s real effect.

Leaving out important predictors

The basic idea here is that if you’ve left out some important predictor or
covariate, your model isn’t an accurate representation.
On the other hand, it’s impossible to realistically include every predictor that
predicts or explains the outcome, as much as you may want to.

(And there are certainly models whose job is not to represent all predictors
of an outcome. Rather it’s to test the relationship with specific predictors).

So you have to be comfortable with some level of specification error here

and focus on minimizing it.

One of the most problematic mistakes here is to leave out an

important confounding variable. Of course, you’re limited to the variables in
your data set. So this is something to think about long before you’ve
collected data.

Including unimportant predictors in the model

Just to make sure this doesn’t get too easy, another cause of model mis-
specification is including predictors that are unrelated to the outcome
variable.

So we can’t avoid missing an important predictor by throwing every

possible predictor we have into the model.

There are many ways to build a model. The goal of all of them is to find the
best model. The best model is one that includes all the important predictors
in the right form, but not any unimportant ones.

Consequences of specification error

Specification error often, but not always, causes other assumptions to fail.

For example, sometimes you can solve non-normality of the residuals by

adding a missed covariate or interaction term.

So the first step in solving problems with other assumptions is

usually not to jump to transformations or some other complicated
modelling, but to reassess the predictors you’ve put into the model.
A comparison of exclusion and inclusion of variables is as follows: