Multiple Imputation: Julia Kozlitina Steve Robertson April 26, 2006
Multiple Imputation: Julia Kozlitina Steve Robertson April 26, 2006
Multiple Imputation: Julia Kozlitina Steve Robertson April 26, 2006
Julia Kozlitina
Steve Robertson
Applications
Software
Multiple Imputation -
Idea: replace each missing item with 2 or more
acceptable values, representing a distribution of
possibilities (Rubin, 1987).
This results in m complete datasets (each one is
analyzed using standard methods, and estimated
parameters are averaged).
Can often be generated from simple modifications
of existing single-imputation methods such as
hot-deck or regression.
Dataset with m imputations
k variables m imputations
N units in the survey
j 1
T W (1 m1 ) B
Inference on combined estimates
Confidence intervals and significance tests can be computed
using a t reference distribution with
v (m 1)(1 rm1 )
rm (1 m 1 ) B / W
How many imputations needed?
Rubin (1987, p. 114) shows that the relative efficiency of a
finite-m estimator is
1
V ( )
1
V ( m ) m
where is the rate of missing information for the quantity
being estimated.
Values shown below. For small , m =2 or 3 is nearly fully
efficient.
m 0.1 0.3 0.5 0.7 0.9
1 95 88 82 77 73
2 98 93 89 86 83
3 98 95 93 90 88
5 99 97 95 94 92
100 100 100 100 100
Problems
Difficulties with MI variance estimator discussed
by Binder & Sun (1996), Fay (1996), and others
- Gives inconsistent variance estimates under some
simple conditions (improper imputation)
Free:
- MIX - Software for multiple imputation
http://www.stat.psu.edu/~jls/misoftwa.html
References
Binder, D.A., and Sun, W. (1996). Frequency valid multiple imputation for surveys
with a complex design. Proceedings of the Section on Survey Research Methods,
ASA, 281-286.
Fay, R.E. (1996). Alternative paradigms for the analysis of imputed survey data.
JASA, 91, 490-498.
Kalton, G., and Kish, L. (1984). Some efficient random imputation methods.
Communications in Statistics, A13, 1919-1939.
Kott, P.S. (1995). A paradox of multiple imputation. Proceedings, 384-389.
Kim, J., and Fuller, W.A. (2004). Fractional hot deck imputation. Biometrika, 91,
559-578.
Lavori, P.W., Dawson, R., and Shera, D. (1995). A multiple imputation strategy for
clinical trials with truncation of patient data. Statistics in Medicine, 14, 1913-1925.
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: John
Wiley & Sons, Inc.
SAS Manual Version 8.1, Chapter 11
Shao, J. (2002). Resampling methods for variance estimation in complex surveys
with a complex design. In Survey Nonresponse. Edited by Groves, R.M., et. al. New
York: John Wiley & Sons, Inc., 303-314