[R] Multiple imputation using mice with "mean"

Eleni Rapsomaniki e.rapsomaniki at mail.cryst.bbk.ac.uk
Mon Sep 25 15:13:07 CEST 2006


I am trying to impute missing values for my data.frame. As I intend to use the
complete data for prediction I am currently measuring the success of an
imputation method by its resulting classification error in my training data.

I have tried several approaches to replace missing values:
- mean/median substitution
- substitution by a value selected from the observed values of a variable
- MLE in the mix package
- all available methods for numerical data in the MICE package (ie. pmm, sample,
mean and norm)

I found that the least classification error results using mice with the "mean"
option for numerical data. However, I am not sure how the "mean" multiple
imputatation differs from the simple mean substitution. I tried to read some of
the documentation supporting the R package, but couldn't find much theory about
the "mean" imputation method. 

Are there any good papers to explain the background behind each imputation
option in MICE? 

I would really appreciate any comments on the above, as my understanding of
statistics is very limited. 

Many thanks
Eleni Rapsomaniki
Birkbeck College, UK

More information about the R-help mailing list