[R-sig-ME] Best way to handle missing data?

landon hurley ljrhurley at gmail.com
Fri Feb 27 07:27:12 CET 2015

Hash: SHA512

On 02/27/2015 01:02 AM, Ken Beath wrote:
> mice will impute the complete dataset, it just needs to have an imputation
> method setup for each variable. See the example given in the help for
> mice.impute.2lonly.norm
> Full information maximum likelihood estimation (FIML) (Note for Landon,
> this is ML taking into account the missing data) is only feasible if you
> can reformulate everything as a structural equation model and use software
> that can cope with this. Otherwise working with the integrals is pretty
> much impossible. If there is something in the model that is nonlinear it
> probably isn't an option at all. One of the great things about multiple
> imputation is that you get it running with say 20 imputations and then run
> it overnight with 200 or more and it probably won't change but you will
> know that you have enough imputations. So FIML doesn't have an advantage in
> that respect.

I'm not sure that's needed as a distinction. This quote from the 	r-help
mailing list [0]  addresses it:

> I'm not sure you are correct on this. Other texts on multilevel models
> (e.g., Raudenbush and Bryk, Kreft and Deeuw, and Singer & Willett) all
> use FiML as a synonym for ML. In fact, Kreft and Deleeuw go as far to
> even state they are the same thing (see page 131).
> When you run a model in HLM selecting "Full Maximum Likelihood" and
> method="ML" in lme, the results, including all fixed effects, variance
> components, empirical bayes residuals, degrees of freedom are exactly
> the same.
> So, I think Doug [Bates] is correct in that ML == FiML. 
> Harold

So maybe a semantics difference. However, with respect to the handling
of the integral: if it's problematic, that should result in a
non-convergence problem, or different results reported when he reruns
the model, in terms of diagnostics.


> On 27 February 2015 at 16:20, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:
>> I actually did try mice also (method "2l.norm"), but it seemed that Amelia
>> was preferable for imputation.  Mice seems to only be able to impute one
>> variable, whereas Amelia can impute as many variables as have missing data
>> producing 100% complete data sets as output.
>> However, most of the missing data in the data set I am working with is in
>> just one variable, so I could consider using mice, and just imputing the
>> variable that has the most missing data, while omitting observations that
>> have missing data in any of the other variables.  But the pooled results
>> from mice only seem to include the fixed effects of the model, so this
>> still leaves me wondering how to report the random effects, which are very
>> important to my research question.
>> When using Amelia to impute, the packages Zelig and ZeligMultilevel can be
>> used to combine the results from each of the models.  But again, only the
>> fixed effects seem to be included in the output, so I am not sure how to
>> report on the random effects.
>> Bonnie
>> On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com>
>> wrote:
>>> Mice might be the package you need
>>> On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>
>> wrote:
>>>> Dear list;
>>>> I am using nlme to create a repeated measures (i.e. 2 level) model.
>> There
>>>> is missing data in several of the predictor variables.  What is the best
>>>> way to handle this situation?  The variable with (by far) the most
>> missing
>>>> data is the best predictor in the model, so I would not want to remove
>> it.
>>>> I am also trying to avoid omitting the observations with missing data,
>>>> because that would require omitting almost 40% of the observations and
>>>> would result in a substantial loss of power.
>>>> A member of my dissertation committee who uses SAS, recommended that I
>> use
>>>> full information maximum likelihood estimation (FIML) (described here:
>> http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf
>>>> ),
>>>> which is the easiest way to handle missing data in SAS.  Is there an
>>>> equivalent procedure in R?
>>>> Alternatively, I have tried several approaches to multiple imputation.
>>>> For
>>>> example, I used the package, Amelia, which appears to handle the
>> clustered
>>>> structure of the data appropriately, to generate five imputed versions
>> of
>>>> the data set, and then used lapply to run my model on each.  But I am
>> not
>>>> sure how to combine the resulting five models into one final result.  I
>>>> will need a final result that enables me to report, not just the fixed
>>>> effects of the model, but also the random effects variance components
>> and,
>>>> ideally, the distributions across the population of the random intercept
>>>> and slopes, and correlations between them.
>>>> Many thanks for any suggestions on how to proceed.
>>>> Bonnie
>>>>         [[alternative HTML version deleted]]
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> --
>>> ____________________________
>>> Ersatzistician and Chutzpahthologist
>>> I can answer any question.  "I don't know" is an answer. "I don't know
>>> yet" is a better answer.
>>> "I can write better than anybody who can write faster, and I can write
>>> faster than anybody who can write better" AJ Liebling
>>         [[alternative HTML version deleted]]
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

- -- 
Violence is the last refuge of the incompetent.
Version: GnuPG v1.4.11 (GNU/Linux)


More information about the R-sig-mixed-models mailing list