[R-sig-ME] Best way to handle missing data?
ljrhurley at gmail.com
Fri Feb 27 07:27:12 CET 2015
-----BEGIN PGP SIGNED MESSAGE-----
On 02/27/2015 01:02 AM, Ken Beath wrote:
> mice will impute the complete dataset, it just needs to have an imputation
> method setup for each variable. See the example given in the help for
> Full information maximum likelihood estimation (FIML) (Note for Landon,
> this is ML taking into account the missing data) is only feasible if you
> can reformulate everything as a structural equation model and use software
> that can cope with this. Otherwise working with the integrals is pretty
> much impossible. If there is something in the model that is nonlinear it
> probably isn't an option at all. One of the great things about multiple
> imputation is that you get it running with say 20 imputations and then run
> it overnight with 200 or more and it probably won't change but you will
> know that you have enough imputations. So FIML doesn't have an advantage in
> that respect.
I'm not sure that's needed as a distinction. This quote from the r-help
mailing list  addresses it:
> I'm not sure you are correct on this. Other texts on multilevel models
> (e.g., Raudenbush and Bryk, Kreft and Deeuw, and Singer & Willett) all
> use FiML as a synonym for ML. In fact, Kreft and Deleeuw go as far to
> even state they are the same thing (see page 131).
> When you run a model in HLM selecting "Full Maximum Likelihood" and
> method="ML" in lme, the results, including all fixed effects, variance
> components, empirical bayes residuals, degrees of freedom are exactly
> the same.
> So, I think Doug [Bates] is correct in that ML == FiML.
So maybe a semantics difference. However, with respect to the handling
of the integral: if it's problematic, that should result in a
non-convergence problem, or different results reported when he reruns
the model, in terms of diagnostics.
> On 27 February 2015 at 16:20, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:
>> I actually did try mice also (method "2l.norm"), but it seemed that Amelia
>> was preferable for imputation. Mice seems to only be able to impute one
>> variable, whereas Amelia can impute as many variables as have missing data
>> producing 100% complete data sets as output.
>> However, most of the missing data in the data set I am working with is in
>> just one variable, so I could consider using mice, and just imputing the
>> variable that has the most missing data, while omitting observations that
>> have missing data in any of the other variables. But the pooled results
>> from mice only seem to include the fixed effects of the model, so this
>> still leaves me wondering how to report the random effects, which are very
>> important to my research question.
>> When using Amelia to impute, the packages Zelig and ZeligMultilevel can be
>> used to combine the results from each of the models. But again, only the
>> fixed effects seem to be included in the output, so I am not sure how to
>> report on the random effects.
>> On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com>
>>> Mice might be the package you need
>>> On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>
>>>> Dear list;
>>>> I am using nlme to create a repeated measures (i.e. 2 level) model.
>>>> is missing data in several of the predictor variables. What is the best
>>>> way to handle this situation? The variable with (by far) the most
>>>> data is the best predictor in the model, so I would not want to remove
>>>> I am also trying to avoid omitting the observations with missing data,
>>>> because that would require omitting almost 40% of the observations and
>>>> would result in a substantial loss of power.
>>>> A member of my dissertation committee who uses SAS, recommended that I
>>>> full information maximum likelihood estimation (FIML) (described here:
>>>> which is the easiest way to handle missing data in SAS. Is there an
>>>> equivalent procedure in R?
>>>> Alternatively, I have tried several approaches to multiple imputation.
>>>> example, I used the package, Amelia, which appears to handle the
>>>> structure of the data appropriately, to generate five imputed versions
>>>> the data set, and then used lapply to run my model on each. But I am
>>>> sure how to combine the resulting five models into one final result. I
>>>> will need a final result that enables me to report, not just the fixed
>>>> effects of the model, but also the random effects variance components
>>>> ideally, the distributions across the population of the random intercept
>>>> and slopes, and correlations between them.
>>>> Many thanks for any suggestions on how to proceed.
>>>> [[alternative HTML version deleted]]
>>>> R-sig-mixed-models at r-project.org mailing list
>>> Ersatzistician and Chutzpahthologist
>>> I can answer any question. "I don't know" is an answer. "I don't know
>>> yet" is a better answer.
>>> "I can write better than anybody who can write faster, and I can write
>>> faster than anybody who can write better" AJ Liebling
>> [[alternative HTML version deleted]]
>> R-sig-mixed-models at r-project.org mailing list
Violence is the last refuge of the incompetent.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
-----END PGP SIGNATURE-----
More information about the R-sig-mixed-models