[R-sig-ME] Best way to handle missing data?

Fri Feb 27 03:30:41 CET 2015

Dear list;

I am using nlme to create a repeated measures (i.e. 2 level) model.  There
is missing data in several of the predictor variables.  What is the best
way to handle this situation?  The variable with (by far) the most missing
data is the best predictor in the model, so I would not want to remove it.
I am also trying to avoid omitting the observations with missing data,
because that would require omitting almost 40% of the observations and
would result in a substantial loss of power.

A member of my dissertation committee who uses SAS, recommended that I use
full information maximum likelihood estimation (FIML) (described here:
http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf),
which is the easiest way to handle missing data in SAS.  Is there an
equivalent procedure in R?

Alternatively, I have tried several approaches to multiple imputation.  For
example, I used the package, Amelia, which appears to handle the clustered
structure of the data appropriately, to generate five imputed versions of
the data set, and then used lapply to run my model on each.  But I am not
sure how to combine the resulting five models into one final result.  I
will need a final result that enables me to report, not just the fixed
effects of the model, but also the random effects variance components and,
ideally, the distributions across the population of the random intercept
and slopes, and correlations between them.

Many thanks for any suggestions on how to proceed.

Bonnie

	[[alternative HTML version deleted]]