[R-sig-ME] missing data in lme, lmer, PROC MIXED
Ken Beath
kjbeath at kagi.com
Mon Jul 28 14:21:55 CEST 2008
On 28/07/2008, at 9:04 PM, M Henry H Stevens wrote:
> Thanks Ken. I have been assuming that they meant missing covariates (a
> subject provided most of the predictors, but not all). So I take it
> that
> SAS does no imputation on its own-that the user would need to do that
> (if they wanted?). lme does not do anything like that.
>
Yes, neither SAS or R or most programs handle missing covariates
automatically. The only program I know is MPlus which is a general
latent variable modelling program. I turned off the missing data
handling as for one model it resulted in an 11 dimensional integration.
Ken
> Hank
>
> On Sat, 2008-07-26 at 22:39 -0400, Ken Beath wrote:
>> On 26/07/2008, at 7:28 AM, M Henry H Stevens wrote:
>>
>>> Hi folks,
>>> I have colleagues who comfortably state that "missing data" are ok
>>> in
>>> "mixed models" - because "the program (PROC MIXED) handles missing
>>> data
>>> -- I have a hard time imagining what it does.
>>>
>>> To those of you who use both R and SAS, I was wondering if you might
>>> share insight into what these do.
>>>
>>> As far as I know, for lme:
>>> 'na.action="na.omit" ' or na.exclude, removes the rows with any
>>> missing
>>> data.
>>>
>>
>> This depends. If the missing data is the dependent and it is missing
>> at random then as mixed models are fitted using maximum likelihood it
>> will produce results that are optimal. Roughly (there are some really
>> technical definitions for missing data and I haven't checked them) if
>> we don't know the outcome and the reason it is missing isn't due to
>> its value or the other data then we can simply leave it out of the
>> likelihood equation it as it has no useful information. A problem is
>> when data being missing provides this sort of information and is very
>> difficult to model. An example is if observations above a certain
>> value are more likely to be missing.
>>
>> An alternative method of dealing with repeated data is to produce a
>> summary for each subject or cluster, for example by averaging the
>> last
>> three visits. This doesn't correctly handle missing data although the
>> loss in efficiency is usually small and it can work well, provided
>> only a small proportion is missing.
>>
>> What R and SAS don't deal with directly is missing data in the
>> covariates. This takes a bit more work, for example using multiple
>> imputation. Here the complete case method where an observation with
>> any missing data is removed will result in a loss of efficiency
>> compared to what can be achieved.
>>
>> Ken
> --
>
> Dr. Hank Stevens, Associate Professor
> 338 Pearson Hall
> Botany Department
> Miami University
> Oxford, OH 45056
>
> Office: (513) 529-4206
> Lab: (513) 529-4262
> FAX: (513) 529-4243
> http://www.cas.muohio.edu/~stevenmh/
> http://www.cas.muohio.edu/ecology
> http://www.muohio.edu/botany/
>
> "If the stars should appear one night in a thousand years, how would
> men
> believe and adore." -Ralph Waldo Emerson, writer and philosopher
> (1803-1882)
>
>
>
>
>
More information about the R-sig-mixed-models
mailing list