[R-sig-ME] missing data in lme, lmer, PROC MIXED

Tue Jul 29 13:40:58 CEST 2008

On 28/07/2008, at 11:05 PM, Doran, Harold wrote:

> Ken,
>
> Does M-Plus actually impute values for the missing cells in the model
> matrix for the fixed effects? Is this a default behavior of m-plus, or
> does one need to be cognizant of this and implement a particular
> imputation strategy?
>

MPlus has rather poor documentation in this area. Rather than impute I  
think it assumes multivariate normality of the covariates, or for  
categorical variables an underlying latent variables. So there is an  
assumed model for the covariates, this is something that is unavoidable.

It is switched on automatically in Mplus. I've tried with some  
simulated data and it does do something and seems to work properly.   
With a linear regression on 2 covariates I set half of one covariate  
to missing. With the missing data option the standard errors are  
reduced by about 20% compared to complete case which could be quite  
useful.

> In general, this kind of question comes up all the time on the
> multilevel listserv. There are constant suggestions that many of the
> multilevel software packages automagically "handle" missing data  
> because
> they use "maximum likelihood".
>

A simplification of what actually happens.

A useful introductory paper on missing data is http://maven.smith.edu/~nhorton/muchado.pdf 
  and accompanying talk http://maven.smith.edu/~nhorton/muchado-notes.pdf

Ken

>> -----Original Message-----
>> From: r-sig-mixed-models-bounces at r-project.org
>> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf
>> Of Ken Beath
>> Sent: Monday, July 28, 2008 8:22 AM
>> To: MHH Stevens
>> Cc: R Mixed Models; Stevens,Martin Henry H. Dr.
>> Subject: Re: [R-sig-ME] missing data in lme, lmer, PROC MIXED
>>
>> On 28/07/2008, at 9:04 PM, M Henry H Stevens wrote:
>>
>>> Thanks Ken. I have been assuming that they meant missing
>> covariates (a
>>> subject provided most of the predictors, but not all). So I take it
>>> that SAS does no imputation on its own-that the user would
>> need to do
>>> that (if they wanted?). lme does not do anything like that.
>>>
>>
>> Yes, neither SAS or R or most programs handle missing covariates
>> automatically. The only program I know is MPlus which is a general
>> latent variable modelling program. I turned off the missing data
>> handling as for one model it resulted in an 11 dimensional
>> integration.
>>
>> Ken
>>
>>> Hank
>>>
>>> On Sat, 2008-07-26 at 22:39 -0400, Ken Beath wrote:
>>>> On 26/07/2008, at 7:28 AM, M Henry H Stevens wrote:
>>>>
>>>>> Hi folks,
>>>>> I have colleagues who comfortably state that "missing
>> data" are ok
>>>>> in
>>>>> "mixed models" - because "the program (PROC MIXED) handles missing
>>>>> data
>>>>> -- I have a hard time imagining what it does.
>>>>>
>>>>> To those of you who use both R and SAS, I was wondering
>> if you might
>>>>> share insight into what these do.
>>>>>
>>>>> As far as I know, for lme:
>>>>> 'na.action="na.omit" ' or na.exclude, removes the rows with any
>>>>> missing
>>>>> data.
>>>>>
>>>>
>>>> This depends. If the missing data is the dependent and it
>> is missing
>>>> at random then as mixed models are fitted using maximum
>> likelihood it
>>>> will produce results that are optimal. Roughly (there are
>> some really
>>>> technical definitions for missing data and I haven't
>> checked them) if
>>>> we don't know the outcome and the reason it is missing isn't due to
>>>> its value or the other data then we can simply leave it out of the
>>>> likelihood equation it as it has no useful information. A
>> problem is
>>>> when data being missing provides this sort of information
>> and is very
>>>> difficult to model. An example is if observations above a certain
>>>> value are more likely to be missing.
>>>>
>>>> An alternative method of dealing with repeated data is to produce a
>>>> summary for each subject or cluster, for example by averaging the
>>>> last
>>>> three visits. This doesn't correctly handle missing data
>> although the
>>>> loss in efficiency is usually small and it can work well, provided
>>>> only a small proportion is missing.
>>>>
>>>> What R and SAS don't deal with directly is missing data in the
>>>> covariates. This takes a bit more work, for example using multiple
>>>> imputation. Here the complete case method where an observation with
>>>> any missing data is removed will result in a loss of efficiency
>>>> compared to what can be achieved.
>>>>
>>>> Ken
>>> -- 
>>>
>>> Dr. Hank Stevens, Associate Professor
>>> 338 Pearson Hall
>>> Botany Department
>>> Miami University
>>> Oxford, OH 45056
>>>
>>> Office: (513) 529-4206
>>> Lab: (513) 529-4262
>>> FAX: (513) 529-4243
>>> http://www.cas.muohio.edu/~stevenmh/
>>> http://www.cas.muohio.edu/ecology
>>> http://www.muohio.edu/botany/
>>>
>>> "If the stars should appear one night in a thousand years,
>> how would
>>> men
>>> believe and adore." -Ralph Waldo Emerson, writer and philosopher
>>> (1803-1882)
>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>