[R-sig-ME] Best way to handle missing data?
landon hurley
ljrhurley at gmail.com
Fri Feb 27 07:27:12 CET 2015
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
On 02/27/2015 01:02 AM, Ken Beath wrote:
> mice will impute the complete dataset, it just needs to have an imputation
> method setup for each variable. See the example given in the help for
> mice.impute.2lonly.norm
>
> Full information maximum likelihood estimation (FIML) (Note for Landon,
> this is ML taking into account the missing data) is only feasible if you
> can reformulate everything as a structural equation model and use software
> that can cope with this. Otherwise working with the integrals is pretty
> much impossible. If there is something in the model that is nonlinear it
> probably isn't an option at all. One of the great things about multiple
> imputation is that you get it running with say 20 imputations and then run
> it overnight with 200 or more and it probably won't change but you will
> know that you have enough imputations. So FIML doesn't have an advantage in
> that respect.
>
I'm not sure that's needed as a distinction. This quote from the r-help
mailing list [0] addresses it:
> I'm not sure you are correct on this. Other texts on multilevel models
> (e.g., Raudenbush and Bryk, Kreft and Deeuw, and Singer & Willett) all
> use FiML as a synonym for ML. In fact, Kreft and Deleeuw go as far to
> even state they are the same thing (see page 131).
>
> When you run a model in HLM selecting "Full Maximum Likelihood" and
> method="ML" in lme, the results, including all fixed effects, variance
> components, empirical bayes residuals, degrees of freedom are exactly
> the same.
>
> So, I think Doug [Bates] is correct in that ML == FiML.
>
> Harold
So maybe a semantics difference. However, with respect to the handling
of the integral: if it's problematic, that should result in a
non-convergence problem, or different results reported when he reruns
the model, in terms of diagnostics.
[0]https://stat.ethz.ch/pipermail/r-help/2004-August/056723.html
>
>
> On 27 February 2015 at 16:20, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:
>
>> I actually did try mice also (method "2l.norm"), but it seemed that Amelia
>> was preferable for imputation. Mice seems to only be able to impute one
>> variable, whereas Amelia can impute as many variables as have missing data
>> producing 100% complete data sets as output.
>>
>> However, most of the missing data in the data set I am working with is in
>> just one variable, so I could consider using mice, and just imputing the
>> variable that has the most missing data, while omitting observations that
>> have missing data in any of the other variables. But the pooled results
>> from mice only seem to include the fixed effects of the model, so this
>> still leaves me wondering how to report the random effects, which are very
>> important to my research question.
>>
>> When using Amelia to impute, the packages Zelig and ZeligMultilevel can be
>> used to combine the results from each of the models. But again, only the
>> fixed effects seem to be included in the output, so I am not sure how to
>> report on the random effects.
>>
>> Bonnie
>>
>> On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com>
>> wrote:
>>
>>> Mice might be the package you need
>>>
>>>
>>> On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>
>> wrote:
>>>
>>>> Dear list;
>>>>
>>>> I am using nlme to create a repeated measures (i.e. 2 level) model.
>> There
>>>> is missing data in several of the predictor variables. What is the best
>>>> way to handle this situation? The variable with (by far) the most
>> missing
>>>> data is the best predictor in the model, so I would not want to remove
>> it.
>>>> I am also trying to avoid omitting the observations with missing data,
>>>> because that would require omitting almost 40% of the observations and
>>>> would result in a substantial loss of power.
>>>>
>>>> A member of my dissertation committee who uses SAS, recommended that I
>> use
>>>> full information maximum likelihood estimation (FIML) (described here:
>>>>
>> http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf
>>>> ),
>>>> which is the easiest way to handle missing data in SAS. Is there an
>>>> equivalent procedure in R?
>>>>
>>>> Alternatively, I have tried several approaches to multiple imputation.
>>>> For
>>>> example, I used the package, Amelia, which appears to handle the
>> clustered
>>>> structure of the data appropriately, to generate five imputed versions
>> of
>>>> the data set, and then used lapply to run my model on each. But I am
>> not
>>>> sure how to combine the resulting five models into one final result. I
>>>> will need a final result that enables me to report, not just the fixed
>>>> effects of the model, but also the random effects variance components
>> and,
>>>> ideally, the distributions across the population of the random intercept
>>>> and slopes, and correlations between them.
>>>>
>>>> Many thanks for any suggestions on how to proceed.
>>>>
>>>> Bonnie
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>>
>>> --
>>> ____________________________
>>> Ersatzistician and Chutzpahthologist
>>>
>>> I can answer any question. "I don't know" is an answer. "I don't know
>>> yet" is a better answer.
>>>
>>> "I can write better than anybody who can write faster, and I can write
>>> faster than anybody who can write better" AJ Liebling
>>>
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
>
- --
Violence is the last refuge of the incompetent.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAEBCgAGBQJU8A5AAAoJEDeph/0fVJWsbNUP/invP0QBC1qS0sWfKrnRVM09
kV1fv4Y8rVflFnS+znsbAPDJOK+5YnvITmfoVLMdwTAWaUEyugKZVGDydY+fTDfg
GxokxDpNAdGlfDBg+asw49VOFoTFtBKai0PWKyw4zHrAHYS9rzTqeO2CVq1Qlb8G
F7je9naYr+iwcEkIWQZ2JloBH8OPw80UueWqNjQ0totVRN8ehYgsu2+iyyudTQnH
Sl7LWkg6QnDYYVKrlV9ygd6z9yOymU9f5w52px1cUIY0mBoT12fYturEfyi/aIxF
+3nBjRCE14C2c9y6mW2Lab9AYpR8bbzsmTK6y7PXid6/VxcqkZlE6Qsj4bD4zvK3
lkIdFj8BR2LdzJNI1EdM8LREA82VPrkS5LFf/4ige0pSo6X3aVoInC2ohLKGSdr5
r66Nh3tLu1a6kPtPBNw7YAxzkzRd2CKy9OTvOpz5wRqlXNvzOoq2Is7Hpoeva0yB
3hvAAgmJUtq8ZbTEXLQiDl2w/qeO+8o5KRfm/2uutN8z29S768me/6bfnvLELw9w
y2R4vwOGdpp+3XBAfs8sF5bMGVvTEzZj/ILph5D7OFRJi/pfCbntnf2mAFrllvlt
KUh+Okd0bO5dC2gfLuu42J3jQnCTMez/ghrEVlXkRX9XMnMz3JB7r4pdgmUqXHYu
w9eXfCoXza9efwhgHF1q
=LMV6
-----END PGP SIGNATURE-----
More information about the R-sig-mixed-models
mailing list