[R-sig-ME] Best way to handle missing data?
Ken Beath
ken.beath at mq.edu.au
Fri Feb 27 08:26:48 CET 2015
>From the same posting
>*From: Chris Lawrence <chris at lordsutch.com <https://stat.ethz.ch/mailman/listinfo/r-help>>*
<snip>
>*I have seen FIML used to refer to a type of ML estimation where a
*>*missing data treatment is included in the estimation procedure
*>*(parameter estimates are derived from incomplete cases for only the
*>*variables present in the case, rather than simply discarding the
*>*cases), at least in the latent-variable SEM context, specifically in
*>*AMOS. This may be what Francisco is getting at.
*>>*To my knowledge, no R packages implement this sort of "FIML", for any
*>*class of models, although there are other available missing data
*>*treatments (EM, MCMC estimation). *
*This is what is correctly referred to as FIML. Your original post claimed
that FIML was available through the ML option which is incorrect, and will
not fix missing values except in the dependent variable. The fact that some
software may claim that it does something that it doesn't will not change
this. What could be said is that FIML is simply ML done correctly in that
it builds the proper model for the data, rather than ignoring the
observations with missing data, so both are maximum likelihood. *
On 27 February 2015 at 17:27, landon hurley <ljrhurley at gmail.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 02/27/2015 01:02 AM, Ken Beath wrote:
> > mice will impute the complete dataset, it just needs to have an
> imputation
> > method setup for each variable. See the example given in the help for
> > mice.impute.2lonly.norm
> >
> > Full information maximum likelihood estimation (FIML) (Note for Landon,
> > this is ML taking into account the missing data) is only feasible if you
> > can reformulate everything as a structural equation model and use
> software
> > that can cope with this. Otherwise working with the integrals is pretty
> > much impossible. If there is something in the model that is nonlinear it
> > probably isn't an option at all. One of the great things about multiple
> > imputation is that you get it running with say 20 imputations and then
> run
> > it overnight with 200 or more and it probably won't change but you will
> > know that you have enough imputations. So FIML doesn't have an advantage
> in
> > that respect.
> >
>
> I'm not sure that's needed as a distinction. This quote from the
> r-help
> mailing list [0] addresses it:
>
> > I'm not sure you are correct on this. Other texts on multilevel models
> > (e.g., Raudenbush and Bryk, Kreft and Deeuw, and Singer & Willett) all
> > use FiML as a synonym for ML. In fact, Kreft and Deleeuw go as far to
> > even state they are the same thing (see page 131).
> >
> > When you run a model in HLM selecting "Full Maximum Likelihood" and
> > method="ML" in lme, the results, including all fixed effects, variance
> > components, empirical bayes residuals, degrees of freedom are exactly
> > the same.
> >
> > So, I think Doug [Bates] is correct in that ML == FiML.
> >
> > Harold
>
> So maybe a semantics difference. However, with respect to the handling
> of the integral: if it's problematic, that should result in a
> non-convergence problem, or different results reported when he reruns
> the model, in terms of diagnostics.
>
> [0]https://stat.ethz.ch/pipermail/r-help/2004-August/056723.html
>
> >
> >
> > On 27 February 2015 at 16:20, Bonnie Dixon <bmdixon at ucdavis.edu> wrote:
> >
> >> I actually did try mice also (method "2l.norm"), but it seemed that
> Amelia
> >> was preferable for imputation. Mice seems to only be able to impute one
> >> variable, whereas Amelia can impute as many variables as have missing
> data
> >> producing 100% complete data sets as output.
> >>
> >> However, most of the missing data in the data set I am working with is
> in
> >> just one variable, so I could consider using mice, and just imputing the
> >> variable that has the most missing data, while omitting observations
> that
> >> have missing data in any of the other variables. But the pooled results
> >> from mice only seem to include the fixed effects of the model, so this
> >> still leaves me wondering how to report the random effects, which are
> very
> >> important to my research question.
> >>
> >> When using Amelia to impute, the packages Zelig and ZeligMultilevel can
> be
> >> used to combine the results from each of the models. But again, only
> the
> >> fixed effects seem to be included in the output, so I am not sure how to
> >> report on the random effects.
> >>
> >> Bonnie
> >>
> >> On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <mmalten at gmail.com
> >
> >> wrote:
> >>
> >>> Mice might be the package you need
> >>>
> >>>
> >>> On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>
> >> wrote:
> >>>
> >>>> Dear list;
> >>>>
> >>>> I am using nlme to create a repeated measures (i.e. 2 level) model.
> >> There
> >>>> is missing data in several of the predictor variables. What is the
> best
> >>>> way to handle this situation? The variable with (by far) the most
> >> missing
> >>>> data is the best predictor in the model, so I would not want to remove
> >> it.
> >>>> I am also trying to avoid omitting the observations with missing data,
> >>>> because that would require omitting almost 40% of the observations and
> >>>> would result in a substantial loss of power.
> >>>>
> >>>> A member of my dissertation committee who uses SAS, recommended that I
> >> use
> >>>> full information maximum likelihood estimation (FIML) (described here:
> >>>>
> >>
> http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf
> >>>> ),
> >>>> which is the easiest way to handle missing data in SAS. Is there an
> >>>> equivalent procedure in R?
> >>>>
> >>>> Alternatively, I have tried several approaches to multiple imputation.
> >>>> For
> >>>> example, I used the package, Amelia, which appears to handle the
> >> clustered
> >>>> structure of the data appropriately, to generate five imputed versions
> >> of
> >>>> the data set, and then used lapply to run my model on each. But I am
> >> not
> >>>> sure how to combine the resulting five models into one final result.
> I
> >>>> will need a final result that enables me to report, not just the fixed
> >>>> effects of the model, but also the random effects variance components
> >> and,
> >>>> ideally, the distributions across the population of the random
> intercept
> >>>> and slopes, and correlations between them.
> >>>>
> >>>> Many thanks for any suggestions on how to proceed.
> >>>>
> >>>> Bonnie
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>> _______________________________________________
> >>>> R-sig-mixed-models at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>
> >>>
> >>>
> >>> --
> >>> ____________________________
> >>> Ersatzistician and Chutzpahthologist
> >>>
> >>> I can answer any question. "I don't know" is an answer. "I don't know
> >>> yet" is a better answer.
> >>>
> >>> "I can write better than anybody who can write faster, and I can write
> >>> faster than anybody who can write better" AJ Liebling
> >>>
> >>>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >
> >
> >
>
>
> - --
> Violence is the last refuge of the incompetent.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iQIcBAEBCgAGBQJU8A5AAAoJEDeph/0fVJWsbNUP/invP0QBC1qS0sWfKrnRVM09
> kV1fv4Y8rVflFnS+znsbAPDJOK+5YnvITmfoVLMdwTAWaUEyugKZVGDydY+fTDfg
> GxokxDpNAdGlfDBg+asw49VOFoTFtBKai0PWKyw4zHrAHYS9rzTqeO2CVq1Qlb8G
> F7je9naYr+iwcEkIWQZ2JloBH8OPw80UueWqNjQ0totVRN8ehYgsu2+iyyudTQnH
> Sl7LWkg6QnDYYVKrlV9ygd6z9yOymU9f5w52px1cUIY0mBoT12fYturEfyi/aIxF
> +3nBjRCE14C2c9y6mW2Lab9AYpR8bbzsmTK6y7PXid6/VxcqkZlE6Qsj4bD4zvK3
> lkIdFj8BR2LdzJNI1EdM8LREA82VPrkS5LFf/4ige0pSo6X3aVoInC2ohLKGSdr5
> r66Nh3tLu1a6kPtPBNw7YAxzkzRd2CKy9OTvOpz5wRqlXNvzOoq2Is7Hpoeva0yB
> 3hvAAgmJUtq8ZbTEXLQiDl2w/qeO+8o5KRfm/2uutN8z29S768me/6bfnvLELw9w
> y2R4vwOGdpp+3XBAfs8sF5bMGVvTEzZj/ILph5D7OFRJi/pfCbntnf2mAFrllvlt
> KUh+Okd0bO5dC2gfLuu42J3jQnCTMez/ghrEVlXkRX9XMnMz3JB7r4pdgmUqXHYu
> w9eXfCoXza9efwhgHF1q
> =LMV6
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
--
*Ken Beath*
Lecturer
Statistics Department
MACQUARIE UNIVERSITY NSW 2109, Australia
Phone: +61 (0)2 9850 8516
Building E4A, room 526
http://stat.mq.edu.au/our_staff/staff_-_alphabetical/staff/beath,_ken/
CRICOS Provider No 00002J
This message is intended for the addressee named and may...{{dropped:9}}
More information about the R-sig-mixed-models
mailing list