[R] Multiple imputation using mice with "mean"
e.rapsomaniki at mail.cryst.bbk.ac.uk
Mon Sep 25 15:13:07 CEST 2006
I am trying to impute missing values for my data.frame. As I intend to use the
complete data for prediction I am currently measuring the success of an
imputation method by its resulting classification error in my training data.
I have tried several approaches to replace missing values:
- mean/median substitution
- substitution by a value selected from the observed values of a variable
- MLE in the mix package
- all available methods for numerical data in the MICE package (ie. pmm, sample,
mean and norm)
I found that the least classification error results using mice with the "mean"
option for numerical data. However, I am not sure how the "mean" multiple
imputatation differs from the simple mean substitution. I tried to read some of
the documentation supporting the R package, but couldn't find much theory about
the "mean" imputation method.
Are there any good papers to explain the background behind each imputation
option in MICE?
I would really appreciate any comments on the above, as my understanding of
statistics is very limited.
Birkbeck College, UK
More information about the R-help