[R-sig-ME] Best way to handle missing data?

Tue Mar 3 11:16:54 CET 2015

With MI, you do indeed average parameter estimates across the imputed datasets. And the way the SE for such an average is computed takes into consideration not only the variance of the estimate conditional on a particular dataset but also the variability across datasets. That's in fact the entire point of doing the imputation multiple times.

See, for example: http://sites.stat.psu.edu/~jls/mifaq.html#howto

One can apply that principle to any parameter estimate, even if this computation is not automated for particular models via a package.

Best,
Wolfgang

> -----Original Message-----
> From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-
> project.org] On Behalf Of Joseph Bulbulia
> Sent: Monday, March 02, 2015 13:04
> To: David Duffy
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] Best way to handle missing data?
> 
> RELATED QUESTION
> I have a related and probably naive question, but raising it might be
> helpful to Bonnie and others (myself included) who are struggling with
> multiple-imputaton in a mixed-effects modeling setting.
> 
> FIRST, MY DISCOMFORT
> The question arises from (1) my discomfort with averaging across multiply
> imputed datasets, which seems to lose the uncertainty from the data-
> generating imputation process (2) my need to use a wider class of models
> than is made available by Zelig — such as MCMCglmm.
> 
> NOTE
> I realise that MCMCglmm can handle missing variables (MAR) as outcome
> variables,  but where many columns have missing values, the resulting
> multivariate outcome model will often becomes overly complex.
> 
> THE QUESTION
> To avoid averaging, if multiple data sets were generated (assume
> sensibly) through a multiple imputation algorithm (say using the Amelia
> package), would it make any sense to combine the datasets (e.g. using r-
> bind) with an indicator for each of the imputed datasets, and then to
> model each specific imputed dataset as a random effect in, say,
> MCMCglmm?
> 
> REASONING
> If the observations from the datasets were conceived as measurements on
> individuals (also included as an effect modelled as random).  Then
> conceptually it seems you would be adjusting your expectation for the
> variation of multiple observations within individuals from the multiply
> imputed datasets. Where there is no imputation, the observed values
> remain constant, and part of me thinks this constancy of observations
> within individuals shouldn’t effect the estimates... I think?
> 
> SNAG
> On the other hand, just combining datasets with an indicator for each
> dataset would artificially (and often dramatically) increase the number
> of observations, which might not be handled adequately by the G/ R
> structures.
> 
> 
> APOLOGY
> I apologise if this question makes little sense, or if the answer is just
> plain obvious.  I’d intended to ask a statistician at work, and to
> simulate some data with him,  but the topic came up here, and I figured
> others might benefit, in case others had the same (potentially naive)
> thought, and the experts have a quick answer, even if the answer is “you
> are muddled.”
> 
> Cheers,
> 
> Joseph