[R-sig-ME] Best way to handle missing data?

Bonnie Dixon bmdixon at ucdavis.edu
Fri Feb 27 21:00:55 CET 2015


Thank you very much to everyone who has replied for your helpful
suggestions.

For clarification about FIML (and in support of what Ken explained), my
professor who does multilevel modeling in SAS tells me that in SAS, "FIML"
refers to a form of maximum likelihood estimation that can accept an
incomplete data set, and does not omit the observations with missing data
as must be done in both "ML" and "REML" in nlme.  FIML in SAS handles
observations in which the data is missing for some variables by just using
those variables for which data is available and integrating over the
missing values.  This is the default method in SAS PROC MIXED for all mixed
effects models (not just for structural equation modeling).  But this
functionality does not appear to be available in R except for structural
equation modeling (i.e. package, lavaan).

Given that, I am now working on a multiple imputation solution for my
problem, using either mice or Amelia, and will post again to the list once
I have a working example.  (Apparently, I was wrong about mice only being
able to impute one variable.)  How many imputations are needed?  Many
sources online indicate that 3-10 is usually enough, and the default in
both mice and Amelia is 5.

Bonnie

On Thu, Feb 26, 2015 at 11:26 PM, Ken Beath <ken.beath at mq.edu.au> wrote:

> >From the same posting
>
> >*From: Chris Lawrence <chris at lordsutch.com <
> https://stat.ethz.ch/mailman/listinfo/r-help>>*
>
> <snip>
>
> >*I have seen FIML used to refer to a type of ML estimation where a
> *>*missing data treatment is included in the estimation procedure
> *>*(parameter estimates are derived from incomplete cases for only the
> *>*variables present in the case, rather than simply discarding the
> *>*cases), at least in the latent-variable SEM context, specifically in
> *>*AMOS.  This may be what Francisco is getting at.
> *>>*To my knowledge, no R packages implement this sort of "FIML", for any
> *>*class of models, although there are other available missing data
> *>*treatments (EM, MCMC estimation). *
>
> *This is what is correctly referred to as FIML. Your original post claimed
> that FIML was available through the ML option which is incorrect, and will
> not fix missing values except in the dependent variable. The fact that some
> software may claim that it does something that it doesn't will not change
> this. What could be said is that FIML is simply ML done correctly in that
> it builds the proper model for the data, rather than ignoring the
> observations with missing data, so both are maximum likelihood. *
>
> On 27 February 2015 at 17:27, landon hurley <ljrhurley at gmail.com> wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA512
> >
> > On 02/27/2015 01:02 AM, Ken Beath wrote:
> > > mice will impute the complete dataset, it just needs to have an
> > imputation
> > > method setup for each variable. See the example given in the help for
> > > mice.impute.2lonly.norm
> > >
> > > Full information maximum likelihood estimation (FIML) (Note for Landon,
> > > this is ML taking into account the missing data) is only feasible if
> you
> > > can reformulate everything as a structural equation model and use
> > software
> > > that can cope with this. Otherwise working with the integrals is pretty
> > > much impossible. If there is something in the model that is nonlinear
> it
> > > probably isn't an option at all. One of the great things about multiple
> > > imputation is that you get it running with say 20 imputations and then
> > run
> > > it overnight with 200 or more and it probably won't change but you will
> > > know that you have enough imputations. So FIML doesn't have an
> advantage
> > in
> > > that respect.
> > >
> >
> > I'm not sure that's needed as a distinction. This quote from the
> > r-help
> > mailing list [0]  addresses it:
> >
> > > I'm not sure you are correct on this. Other texts on multilevel models
> > > (e.g., Raudenbush and Bryk, Kreft and Deeuw, and Singer & Willett) all
> > > use FiML as a synonym for ML. In fact, Kreft and Deleeuw go as far to
> > > even state they are the same thing (see page 131).
> > >
> > > When you run a model in HLM selecting "Full Maximum Likelihood" and
> > > method="ML" in lme, the results, including all fixed effects, variance
> > > components, empirical bayes residuals, degrees of freedom are exactly
> > > the same.
> > >
> > > So, I think Doug [Bates] is correct in that ML == FiML.
> > >
> > > Harold
> >
> > So maybe a semantics difference. However, with respect to the handling
> > of the integral: if it's problematic, that should result in a
> > non-convergence problem, or different results reported when he reruns
> > the model, in terms of diagnostics.
> >
> > [0]https://stat.ethz.ch/pipermail/r-help/2004-August/056723.html
> >
> > >
> > >
> > > On 27 February 2015 at 16:20, Bonnie Dixon <bmdixon at ucdavis.edu>
> wrote:
> > >
> > >> I actually did try mice also (method "2l.norm"), but it seemed that
> > Amelia
> > >> was preferable for imputation.  Mice seems to only be able to impute
> one
> > >> variable, whereas Amelia can impute as many variables as have missing
> > data
> > >> producing 100% complete data sets as output.
> > >>
> > >> However, most of the missing data in the data set I am working with is
> > in
> > >> just one variable, so I could consider using mice, and just imputing
> the
> > >> variable that has the most missing data, while omitting observations
> > that
> > >> have missing data in any of the other variables.  But the pooled
> results
> > >> from mice only seem to include the fixed effects of the model, so this
> > >> still leaves me wondering how to report the random effects, which are
> > very
> > >> important to my research question.
> > >>
> > >> When using Amelia to impute, the packages Zelig and ZeligMultilevel
> can
> > be
> > >> used to combine the results from each of the models.  But again, only
> > the
> > >> fixed effects seem to be included in the output, so I am not sure how
> to
> > >> report on the random effects.
> > >>
> > >> Bonnie
> > >>
> > >> On Thu, Feb 26, 2015 at 8:33 PM, Mitchell Maltenfort <
> mmalten at gmail.com
> > >
> > >> wrote:
> > >>
> > >>> Mice might be the package you need
> > >>>
> > >>>
> > >>> On Thursday, February 26, 2015, Bonnie Dixon <bmdixon at ucdavis.edu>
> > >> wrote:
> > >>>
> > >>>> Dear list;
> > >>>>
> > >>>> I am using nlme to create a repeated measures (i.e. 2 level) model.
> > >> There
> > >>>> is missing data in several of the predictor variables.  What is the
> > best
> > >>>> way to handle this situation?  The variable with (by far) the most
> > >> missing
> > >>>> data is the best predictor in the model, so I would not want to
> remove
> > >> it.
> > >>>> I am also trying to avoid omitting the observations with missing
> data,
> > >>>> because that would require omitting almost 40% of the observations
> and
> > >>>> would result in a substantial loss of power.
> > >>>>
> > >>>> A member of my dissertation committee who uses SAS, recommended
> that I
> > >> use
> > >>>> full information maximum likelihood estimation (FIML) (described
> here:
> > >>>>
> > >>
> >
> http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf
> > >>>> ),
> > >>>> which is the easiest way to handle missing data in SAS.  Is there an
> > >>>> equivalent procedure in R?
> > >>>>
> > >>>> Alternatively, I have tried several approaches to multiple
> imputation.
> > >>>> For
> > >>>> example, I used the package, Amelia, which appears to handle the
> > >> clustered
> > >>>> structure of the data appropriately, to generate five imputed
> versions
> > >> of
> > >>>> the data set, and then used lapply to run my model on each.  But I
> am
> > >> not
> > >>>> sure how to combine the resulting five models into one final result.
> > I
> > >>>> will need a final result that enables me to report, not just the
> fixed
> > >>>> effects of the model, but also the random effects variance
> components
> > >> and,
> > >>>> ideally, the distributions across the population of the random
> > intercept
> > >>>> and slopes, and correlations between them.
> > >>>>
> > >>>> Many thanks for any suggestions on how to proceed.
> > >>>>
> > >>>> Bonnie
> > >>>>
> > >>>>         [[alternative HTML version deleted]]
> > >>>>
> > >>>> _______________________________________________
> > >>>> R-sig-mixed-models at r-project.org mailing list
> > >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> ____________________________
> > >>> Ersatzistician and Chutzpahthologist
> > >>>
> > >>> I can answer any question.  "I don't know" is an answer. "I don't
> know
> > >>> yet" is a better answer.
> > >>>
> > >>> "I can write better than anybody who can write faster, and I can
> write
> > >>> faster than anybody who can write better" AJ Liebling
> > >>>
> > >>>
> > >>
> > >>         [[alternative HTML version deleted]]
> > >>
> > >> _______________________________________________
> > >> R-sig-mixed-models at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > >>
> > >
> > >
> > >
> >
> >
> > - --
> > Violence is the last refuge of the incompetent.
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.11 (GNU/Linux)
> >
> > iQIcBAEBCgAGBQJU8A5AAAoJEDeph/0fVJWsbNUP/invP0QBC1qS0sWfKrnRVM09
> > kV1fv4Y8rVflFnS+znsbAPDJOK+5YnvITmfoVLMdwTAWaUEyugKZVGDydY+fTDfg
> > GxokxDpNAdGlfDBg+asw49VOFoTFtBKai0PWKyw4zHrAHYS9rzTqeO2CVq1Qlb8G
> > F7je9naYr+iwcEkIWQZ2JloBH8OPw80UueWqNjQ0totVRN8ehYgsu2+iyyudTQnH
> > Sl7LWkg6QnDYYVKrlV9ygd6z9yOymU9f5w52px1cUIY0mBoT12fYturEfyi/aIxF
> > +3nBjRCE14C2c9y6mW2Lab9AYpR8bbzsmTK6y7PXid6/VxcqkZlE6Qsj4bD4zvK3
> > lkIdFj8BR2LdzJNI1EdM8LREA82VPrkS5LFf/4ige0pSo6X3aVoInC2ohLKGSdr5
> > r66Nh3tLu1a6kPtPBNw7YAxzkzRd2CKy9OTvOpz5wRqlXNvzOoq2Is7Hpoeva0yB
> > 3hvAAgmJUtq8ZbTEXLQiDl2w/qeO+8o5KRfm/2uutN8z29S768me/6bfnvLELw9w
> > y2R4vwOGdpp+3XBAfs8sF5bMGVvTEzZj/ILph5D7OFRJi/pfCbntnf2mAFrllvlt
> > KUh+Okd0bO5dC2gfLuu42J3jQnCTMez/ghrEVlXkRX9XMnMz3JB7r4pdgmUqXHYu
> > w9eXfCoXza9efwhgHF1q
> > =LMV6
> > -----END PGP SIGNATURE-----
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
>
>
> --
>
> *Ken Beath*
> Lecturer
> Statistics Department
> MACQUARIE UNIVERSITY NSW 2109, Australia
>
> Phone: +61 (0)2 9850 8516
>
> Building E4A, room 526
> http://stat.mq.edu.au/our_staff/staff_-_alphabetical/staff/beath,_ken/
>
> CRICOS Provider No 00002J
> This message is intended for the addressee named and m...{{dropped:10}}



More information about the R-sig-mixed-models mailing list