[R-sig-ME] missing data in lme, lmer, PROC MIXED

Sun Jul 27 04:39:26 CEST 2008

On 26/07/2008, at 7:28 AM, M Henry H Stevens wrote:

> Hi folks,
> I have colleagues who comfortably state that "missing data" are ok in
> "mixed models" - because "the program (PROC MIXED) handles missing  
> data
> -- I have a hard time imagining what it does.
>
> To those of you who use both R and SAS, I was wondering if you might
> share insight into what these do.
>
> As far as I know, for lme:
> 'na.action="na.omit" ' or na.exclude, removes the rows with any  
> missing
> data.
>

This depends. If the missing data is the dependent and it is missing  
at random then as mixed models are fitted using maximum likelihood it  
will produce results that are optimal. Roughly (there are some really  
technical definitions for missing data and I haven't checked them) if  
we don't know the outcome and the reason it is missing isn't due to  
its value or the other data then we can simply leave it out of the  
likelihood equation it as it has no useful information. A problem is  
when data being missing provides this sort of information and is very  
difficult to model. An example is if observations above a certain  
value are more likely to be missing.

An alternative method of dealing with repeated data is to produce a  
summary for each subject or cluster, for example by averaging the last  
three visits. This doesn't correctly handle missing data although the  
loss in efficiency is usually small and it can work well, provided  
only a small proportion is missing.

What R and SAS don't deal with directly is missing data in the  
covariates. This takes a bit more work, for example using multiple  
imputation. Here the complete case method where an observation with  
any missing data is removed will result in a loss of efficiency  
compared to what can be achieved.

Ken