[R-sig-ME] Best way to handle missing data?
Bonnie Dixon
bmdixon at ucdavis.edu
Mon Mar 2 20:37:49 CET 2015
Thanks for this suggestion, Malcolm. Here is an example in which I use
Amelia/Zelig with the "africa" data set that is available in Amelia.
I extracted the average standard deviation of the random effects from the
result produced by Zelig. (In this example, I am using the version of the
summary.MI function found here:
http://stackoverflow.com/questions/16571580/multi-level-regression-model-on-multiply-imputed-data-set-in-r-amelia-zelig-l)
Perhaps this approach will work for my purposes.
# Get packages
require(Amelia)
require(Zelig)
require(ZeligMultilevel)
# Look at the data
data(africa)
head(africa)
summary(africa)
help(africa)
# Impute the missing data
africa.am <-
amelia(x = africa,
m = 30,
cs = "country",
ts = "year",
logs = "gdp_pc")
summary(africa.am)
plot(africa.am)
missmap(africa.am)
names(africa.am)
# Create a model:
africa.z <-
zelig(formula = gdp_pc ~ infl + tag(infl | country),
data = africa.am$imputations,
model = "ls.mixed")
# The combined fixed effects:
summary(africa.z)
# The average standard deviation of the random intercepts and slopes:
ran.ints <-
sapply(africa.z,
function(x)
attributes(VarCorr(x$result)$country)$stddev["(Intercept)"])
mean(ran.ints)
ran.slopes <-
sapply(africa.z,
function(x)
attributes(VarCorr(x$result)$country)$stddev["infl"])
mean(ran.slopes)
On Fri, Feb 27, 2015 at 4:47 AM, Malcolm Fairbrother <
M.Fairbrother at bristol.ac.uk> wrote:
> Hi Bonnie,
>
> I have not seen a formal treatment of this issue, but from the Amelia
> documentation, my understanding is that if you want an estimate of the
> random effects variance, you can just take the average of the estimates
> from the model fitted to each imputed dataset. This is true for any
> parameter, from the sounds of what Honaker, King, and Blackwell have
> written.
>
> "you can combine directly and use as the multiple imputation estimate of
> this parameter, q ̄, the average of them separate estimates"
>
> Even if Zelig doesn't report the RE variance estimates automatically, they
> must be "in there" somewhere... I'm sure you can extract them. Or maybe
> skip Zelig, and just use Amelia, and extract the estimated RE variances
> from each fitted model (presumably using lme4)?
>
> Cheers,
> Malcolm
>
>
> Date: Thu, 26 Feb 2015 21:20:33 -0800
>> From: Bonnie Dixon <bmdixon at ucdavis.edu>
>> To: Mitchell Maltenfort <mmalten at gmail.com>
>> Cc: "r-sig-mixed-models at r-project.org"
>> <r-sig-mixed-models at r-project.org>
>> Subject: Re: [R-sig-ME] Best way to handle missing data?
>>
>>
>> I actually did try mice also (method "2l.norm"), but it seemed that Amelia
>> was preferable for imputation. Mice seems to only be able to impute one
>> variable, whereas Amelia can impute as many variables as have missing data
>> producing 100% complete data sets as output.
>>
>> However, most of the missing data in the data set I am working with is in
>> just one variable, so I could consider using mice, and just imputing the
>> variable that has the most missing data, while omitting observations that
>> have missing data in any of the other variables. But the pooled results
>> from mice only seem to include the fixed effects of the model, so this
>> still leaves me wondering how to report the random effects, which are very
>> important to my research question.
>>
>> When using Amelia to impute, the packages Zelig and ZeligMultilevel can be
>> used to combine the results from each of the models. But again, only the
>> fixed effects seem to be included in the output, so I am not sure how to
>> report on the random effects.
>>
>> Bonnie
>>
>> On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com>
>> wrote:
>>
>> > Mice might be the package you need
>> >
>> >
>> > On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>
>> wrote:
>> >
>> >> Dear list;
>> >>
>> >> I am using nlme to create a repeated measures (i.e. 2 level) model.
>> There
>> >> is missing data in several of the predictor variables. What is the
>> best
>> >> way to handle this situation? The variable with (by far) the most
>> missing
>> >> data is the best predictor in the model, so I would not want to remove
>> it.
>> >> I am also trying to avoid omitting the observations with missing data,
>> >> because that would require omitting almost 40% of the observations and
>> >> would result in a substantial loss of power.
>> >>
>> >> A member of my dissertation committee who uses SAS, recommended that I
>> use
>> >> full information maximum likelihood estimation (FIML) (described here:
>> >>
>> http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf
>> >> ),
>> >> which is the easiest way to handle missing data in SAS. Is there an
>> >> equivalent procedure in R?
>> >>
>> >> Alternatively, I have tried several approaches to multiple imputation.
>> >> For
>> >> example, I used the package, Amelia, which appears to handle the
>> clustered
>> >> structure of the data appropriately, to generate five imputed versions
>> of
>> >> the data set, and then used lapply to run my model on each. But I am
>> not
>> >> sure how to combine the resulting five models into one final result. I
>> >> will need a final result that enables me to report, not just the fixed
>> >> effects of the model, but also the random effects variance components
>> and,
>> >> ideally, the distributions across the population of the random
>> intercept
>> >> and slopes, and correlations between them.
>> >>
>> >> Many thanks for any suggestions on how to proceed.
>> >>
>> >> Bonnie
>>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list