[R-sig-ME] Best way to handle missing data?

Bonnie Dixon bmdixon at ucdavis.edu
Fri Feb 27 06:20:33 CET 2015

I actually did try mice also (method "2l.norm"), but it seemed that Amelia
was preferable for imputation.  Mice seems to only be able to impute one
variable, whereas Amelia can impute as many variables as have missing data
producing 100% complete data sets as output.

However, most of the missing data in the data set I am working with is in
just one variable, so I could consider using mice, and just imputing the
variable that has the most missing data, while omitting observations that
have missing data in any of the other variables.  But the pooled results
from mice only seem to include the fixed effects of the model, so this
still leaves me wondering how to report the random effects, which are very
important to my research question.

When using Amelia to impute, the packages Zelig and ZeligMultilevel can be
used to combine the results from each of the models.  But again, only the
fixed effects seem to be included in the output, so I am not sure how to
report on the random effects.


On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com>

> Mice might be the package you need
> On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:
>> Dear list;
>> I am using nlme to create a repeated measures (i.e. 2 level) model.  There
>> is missing data in several of the predictor variables.  What is the best
>> way to handle this situation?  The variable with (by far) the most missing
>> data is the best predictor in the model, so I would not want to remove it.
>> I am also trying to avoid omitting the observations with missing data,
>> because that would require omitting almost 40% of the observations and
>> would result in a substantial loss of power.
>> A member of my dissertation committee who uses SAS, recommended that I use
>> full information maximum likelihood estimation (FIML) (described here:
>> http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf
>> ),
>> which is the easiest way to handle missing data in SAS.  Is there an
>> equivalent procedure in R?
>> Alternatively, I have tried several approaches to multiple imputation.
>> For
>> example, I used the package, Amelia, which appears to handle the clustered
>> structure of the data appropriately, to generate five imputed versions of
>> the data set, and then used lapply to run my model on each.  But I am not
>> sure how to combine the resulting five models into one final result.  I
>> will need a final result that enables me to report, not just the fixed
>> effects of the model, but also the random effects variance components and,
>> ideally, the distributions across the population of the random intercept
>> and slopes, and correlations between them.
>> Many thanks for any suggestions on how to proceed.
>> Bonnie
>>         [[alternative HTML version deleted]]
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> --
> ____________________________
> Ersatzistician and Chutzpahthologist
> I can answer any question.  "I don't know" is an answer. "I don't know
> yet" is a better answer.
> "I can write better than anybody who can write faster, and I can write
> faster than anybody who can write better" AJ Liebling

	[[alternative HTML version deleted]]

More information about the R-sig-mixed-models mailing list