Model Validation-Tutorial
Model Validation-Tutorial
Bootstrap method
Predictive performance
Use bootstrap and other methods for model
validation
Demonstrate association: Evaluation the
relationship between an outcome and
covariates
e.g, Association Between Helicopter vs
Ground Emergency Medical Services and
Survival for Adults With Major Trauma
JAMA. 2012;307(15):1602-1610.
We are interested in the beta coefficient of the
regression model, e.g.,
In the multivariable regression model, for
patients transported to level I trauma
centers, helicopter transport was
associated with an improved odds of
survival compared with ground transport
(odds ratio [OR], 1.16; 95% CI, 1.14-1.17;
P < .001).
Prediction and forecasting:
e.g., Regression Tree Analysis.
Decompensated Heart Failure: Classification
and Risk Stratification for In-Hospital
Mortality in Acutely.
JAMA. 2005;293(5):572-580
Predictive score construction:
e.g., score (H) is generally based on the results
of regression model: H=(β1×covariate A )+
(β2×covariate B)+(β3×covariate C), and so on,
where β1, β2, and β3 denote the estimates of
beta coefficients for covariates A, B, and C and
were obtained by fitting the regression model
for the outcome of interest.
Model validation is applied to regression
models for prediction purpose.
MODEL VALIDATION in general has at least
two parts:
1. Model selection:
to choose the best model based on model
performance.
2. Model assessment:
to estimate performance for a final
chosen model.
Here we study various methods for model
assessment ( how well the model is to predict a future
outcome?)
E W. Steyerberg et. al
Journal of Clinical Epidemiology 54 (2001) 774–781
Randomly sampling, with replacement, from
an original dataset for use in obtaining
statistical inference.
Bootstrap theory says that the distance between
the population mean and sample mean is
similar to the distance between sample mean
and bootstrap ‘subsample’ mean.
95% CI for:
Correlation coefficient
CV = SD/mean
AUC of ROC
Median
External validation: use a training (derivation)
data to build the model and a test (validation)
data to validate the model.
example: old vs new patients, one vs another
dataset,
Internal validation: use the same dataset for
model building and validation.
1. we use regression analysis to construct the
predictive model to provide an estimate of patient
outcome.
2. The apparent performance of the model on this
training set will be better than the performance in
another data set, even if the latter test set consists of
patients from the same population. (this is called
optimism)
Population Sample Inference
Regular Population Sample Standard error
statistical mean mean
procedure