[R-sig-ME] Best way to handle missing data?

Mon Mar 2 01:00:40 CET 2015

Thank you for this clarification.  I can see from studying the article
linked below more closely that it confirms what you have said.
http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf

The distinction seems to be between missing data in the dependent variable
(which SAS PROC MIXED handles automatically) versus missing data in a
predictor variable (which would require switching to a structural equation
modeling program, such as SAS PROC CALIS to handle automatically using
FIML).  Here is a quote from the conclusion of the article that explains
this:

"When estimating mixed models for repeated measurements, PROC MIXED and
PROC GLIMMIX automatically handle missing data by maximum likelihood, as
long as there are no missing data on predictor variables. When data are
missing on both predictor and dependent variables, PROC CALIS can do
maximum likelihood for a large class of linear models..."

This sounds approximately equivalent to the functionality available in R.

I don't think the model I am working on is a good candidate for structural
equation modeling because the data set is very unbalanced (ie. there are
very different numbers of observations for different people, taken at
different times), the main relationship of interest involves a time-varying
predictor, and one of the variables with missing data is not continuous (it
is a binary, categorical variable).  So, I will stick with the multiple
imputation approach for handling the missing data.

Bonnie

On Fri, Feb 27, 2015 at 4:22 PM, Viechtbauer Wolfgang (STAT) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

> > For clarification about FIML (and in support of what Ken explained), my
> > professor who does multilevel modeling in SAS tells me that in SAS,
> > "FIML"
> > refers to a form of maximum likelihood estimation that can accept an
> > incomplete data set, and does not omit the observations with missing data
> > as must be done in both "ML" and "REML" in nlme.  FIML in SAS handles
> > observations in which the data is missing for some variables by just
> > using
> > those variables for which data is available and integrating over the
> > missing values.  This is the default method in SAS PROC MIXED for all
> > mixed
> > effects models (not just for structural equation modeling).
>
> I hate to be so blunt here, but this is just flat out wrong. proc mixed is
> great and all, but it doesn't do such a thing. Just like lmer() and lme()
> (with na.action=na.omit), proc mixed will just delete rows with missing
> data and then use ML or REML estimation on what's left (which is perfectly
> fine under certain missing data mechanisms). Consequently, fitting the same
> model with proc mixed and lmer() or lme() to the same data with missing
> data yields essentially identical results. One can easily confirm this with
> a few examples.
>
> > But this
> > functionality does not appear to be available in R except for structural
> > equation modeling (i.e. package, lavaan).
>
> Indeed, one has to switch to some form of a latent variable model if one
> wants to use FIML. In R, one should look into 'lavaan' or 'sem' (or
> 'OpenMX' for the more adventurous). In SAS, one would need to use something
> like proc calis:
>
>
> http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/statug_calis_sect103.htm
>
> Again, proc mixed does not use FIML. I am really just repeating what Ken
> has already stated. Also relevant:
>
>
> http://stats.stackexchange.com/questions/51006/full-information-maximum-likelihood-for-missing-data-in-r
>
> Best,
> Wolfgang
>

	[[alternative HTML version deleted]]