[R-sig-ME] Best way to handle missing data?

Malcolm Fairbrother M.Fairbrother at bristol.ac.uk
Tue Mar 3 10:53:00 CET 2015


Hi Bonnie,

I was getting an error with that code, and finding Zelig cumbersome. So how
about just:

mods <- lapply(africa.am[[1]], function(x) lmer(gdp_pc ~ infl + (infl |
country), data=x))
rowMeans(sapply(mods, fixef)) # fixed effects
rowMeans(sapply(mods, function(x) as.data.frame(VarCorr(x))$vcov)) # random
effects variances
Reduce("+", sapply(mods, ranef))/length(mods) # random effects (intercepts
and slopes)
sqrt(rowMeans(sapply(mods, function(x) diag(vcov(x)))) +
diag(var(t(sapply(mods, fixef))))*(1+1/length(mods))) # SEs

I think that gets you everything you want?

The last row of code is my interpretation of: "The variance of the point
estimate is the average of the estimated variances from within each
completed data set, plus the sample variance in the point estimates across
the data sets (multiplied by a factor that corrects for the bias because m
< ∞)." (from http://r.iq.harvard.edu/docs/amelia/amelia.pdf)

Does that correspond to what you were getting via Zelig? I'd be interested
to know that this worked, actually.

Cheers,
Malcolm




On 2 March 2015 at 20:37, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:

> Thanks for this suggestion, Malcolm.  Here is an example in which I use
> Amelia/Zelig with the "africa" data set that is available in Amelia.
> I extracted the average standard deviation of the random effects from the
> result produced by Zelig.  (In this example, I am using the version of the
> summary.MI function found here:
> http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l)
>  Perhaps this approach will work for my purposes.
>
> # Get packages
> require(Amelia)
> require(Zelig)
> require(ZeligMultilevel)
>
> # Look at the data
> data(africa)
> head(africa)
> summary(africa)
> help(africa)
>
> # Impute the missing data
> africa.am <-
>   amelia(x = africa,
>          m = 30,
>          cs = "country",
>          ts = "year",
>          logs = "gdp_pc")
> summary(africa.am)
> plot(africa.am)
> missmap(africa.am)
> names(africa.am)
>
> # Create a model:
> africa.z <-
>   zelig(formula = gdp_pc ~ infl + tag(infl | country),
>         data = africa.am$imputations,
>         model = "ls.mixed")
>
> # The combined fixed effects:
> summary(africa.z)
>
> # The average standard deviation of the random intercepts and slopes:
> ran.ints <-
>   sapply(africa.z,
>          function(x)
>            attributes(VarCorr(x$result)$country)$stddev["(Intercept)"])
> mean(ran.ints)
>
> ran.slopes <-
>   sapply(africa.z,
>          function(x)
>            attributes(VarCorr(x$result)$country)$stddev["infl"])
> mean(ran.slopes)
>
>
>
> On Fri, Feb 27, 2015 at 4:47 AM, Malcolm Fairbrother <
> M.Fairbrother at bristol.ac.uk> wrote:
>
>> Hi Bonnie,
>>
>> I have not seen a formal treatment of this issue, but from the Amelia
>> documentation, my understanding is that if you want an estimate of the
>> random effects variance, you can just take the average of the estimates
>> from the model fitted to each imputed dataset. This is true for any
>> parameter, from the sounds of what Honaker, King, and Blackwell have
>> written.
>>
>>  "you can combine directly and use as the multiple imputation estimate of
>> this parameter, q ̄, the average of them separate estimates"
>>
>> Even if Zelig doesn't report the RE variance estimates automatically,
>> they must be "in there" somewhere... I'm sure you can extract them. Or
>> maybe skip Zelig, and just use Amelia, and extract the estimated RE
>> variances from each fitted model (presumably using lme4)?
>>
>> Cheers,
>> Malcolm
>>
>>
>> Date: Thu, 26 Feb 2015 21:20:33 -0800
>>> From: Bonnie Dixon <bmdixon at ucdavis.edu>
>>> To: Mitchell Maltenfort <mmalten at gmail.com>
>>> Cc: "r-sig-mixed-models at r-project.org"
>>>         <r-sig-mixed-models at r-project.org>
>>> Subject: Re: [R-sig-ME] Best way to handle missing data?
>>>
>>>
>>> I actually did try mice also (method "2l.norm"), but it seemed that
>>> Amelia
>>> was preferable for imputation.  Mice seems to only be able to impute one
>>> variable, whereas Amelia can impute as many variables as have missing
>>> data
>>> producing 100% complete data sets as output.
>>>
>>> However, most of the missing data in the data set I am working with is in
>>> just one variable, so I could consider using mice, and just imputing the
>>> variable that has the most missing data, while omitting observations that
>>> have missing data in any of the other variables.  But the pooled results
>>> from mice only seem to include the fixed effects of the model, so this
>>> still leaves me wondering how to report the random effects, which are
>>> very
>>> important to my research question.
>>>
>>> When using Amelia to impute, the packages Zelig and ZeligMultilevel can
>>> be
>>> used to combine the results from each of the models.  But again, only the
>>> fixed effects seem to be included in the output, so I am not sure how to
>>> report on the random effects.
>>>
>>> Bonnie
>>>
>>> On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com>
>>> wrote:
>>>
>>> > Mice might be the package you need
>>> >
>>> >
>>> > On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>
>>> wrote:
>>> >
>>> >> Dear list;
>>> >>
>>> >> I am using nlme to create a repeated measures (i.e. 2 level) model.
>>> There
>>> >> is missing data in several of the predictor variables.  What is the
>>> best
>>> >> way to handle this situation?  The variable with (by far) the most
>>> missing
>>> >> data is the best predictor in the model, so I would not want to
>>> remove it.
>>> >> I am also trying to avoid omitting the observations with missing data,
>>> >> because that would require omitting almost 40% of the observations and
>>> >> would result in a substantial loss of power.
>>> >>
>>> >> A member of my dissertation committee who uses SAS, recommended that
>>> I use
>>> >> full information maximum likelihood estimation (FIML) (described here:
>>> >>
>>> http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf
>>> >> ),
>>> >> which is the easiest way to handle missing data in SAS.  Is there an
>>> >> equivalent procedure in R?
>>> >>
>>> >> Alternatively, I have tried several approaches to multiple imputation.
>>> >> For
>>> >> example, I used the package, Amelia, which appears to handle the
>>> clustered
>>> >> structure of the data appropriately, to generate five imputed
>>> versions of
>>> >> the data set, and then used lapply to run my model on each.  But I am
>>> not
>>> >> sure how to combine the resulting five models into one final result.
>>> I
>>> >> will need a final result that enables me to report, not just the fixed
>>> >> effects of the model, but also the random effects variance components
>>> and,
>>> >> ideally, the distributions across the population of the random
>>> intercept
>>> >> and slopes, and correlations between them.
>>> >>
>>> >> Many thanks for any suggestions on how to proceed.
>>> >>
>>> >> Bonnie
>>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list