[R-sig-ME] Dealing with NAs in LMER with longitudinal data (Re Crime and Education data)

D. Rizopoulos d@r|zopou|o@ @end|ng |rom er@@mu@mc@n|
Tue Sep 17 09:26:52 CEST 2019

We should distinguish between missing data in the outcome and missing data in the covariates.

For missing data in the outcome, mixed effects models provide unbiased estimates and valid inferences under the missing completely at random and missing at random missing data mechanisms. No (multiple) imputation of the outcome is required in this case. Only that the model is adequately/flexibly specified with regard to both the fixed- and random-effects structures. For the fixed-effects part in particular you need to include any covariates that potentially relate to the reasons why you have missing data. Finally, if the missing data mechanism is missing not at random, then the mixed model alone is not enough and you will need to jointly model the outcome and the dropout process.

For missing data in the covariates you will need to use multiple imputation. It is important that the whole outcome is included in the imputation step. This is more challenging for example for longitudinal outcomes that are not measured at the same time points for all subjects. There are approaches and R packages to handle these situations.


From: David Duffy <David.Duffy using qimrberghofer.edu.au<mailto:David.Duffy using qimrberghofer.edu.au>>
Date: Tuesday, 17 Sep 2019, 06:03
To: Ades, James <jades using ucsd.edu<mailto:jades using ucsd.edu>>, r-sig-mixed-models using r-project.org <r-sig-mixed-models using r-project.org<mailto:r-sig-mixed-models using r-project.org>>
Subject: Re: [R-sig-ME] Dealing with NAs in LMER with longitudinal data (Re Crime and Education data)

> I�ve often heard that mixed-effect lmer/glmer models �handle� or �deal� with NA values well,
> and I�ve become more curious about what this actually means, if it is, indeed, true. What
> I�ve observed working with mixed-effect models is that na.omit will delete the entire
> row of observations, and depending on the number of NAs, the AIC might
> deceptively, dramatically decrease, given that the sample is smaller.

> I know that one can also use �na.pass�

> I�d assume that imputation is better practice for handling NAs.

> To summarize:
> 1. If lmer does handle NAs well, how exactly is it doing that? If �na.pass� fails,
> then is it handling NAs as any other program?

My limited understanding is that na.pass usually affects just the copy of the data in
the returned object. It won't get around the fact that if you are conditioning on fixed effects, only complete observations must be used. So if you want your AICs to be comparable, you need to have a single dataset that is complete for all the variables you are interested in.

> 2. Is imputation (done correctly) better than allowing mixed-effect functions to handle NAs?

If you have non-ignorable missing data, then these must be included as response variables, so the mixed model can combine the correct likelihoods for each pattern of missingness. I have more experience with a straightforward multivariate formulation for this, so I don't know how or if you can mimic this in the lmer framework. Quite aside from if you want to specify directional paths between such variables - imputation is the cheap and cheerful answer.

> 5. Should I be using long format here for variables like race (black, white, asian, latino)
> and education attainment (some high school, hs diploma, some college, bachelors,
> MA/grad school)

I'd of thought so, unless you already have a handle on the causes of any autocorrelation

Hopefully someone more in your area will respond, but in animal breeding genetics, there are mixed models of similar huge longitudinal datasets (people I know in human genetics were great fans of the Journal of Dairy Science ;), and of ASReml).

Cheers, David Duffy.
R-sig-mixed-models using r-project.org mailing list

	[[alternative HTML version deleted]]

More information about the R-sig-mixed-models mailing list