[R-sig-ME] Dealing with NAs in LMER with longitudinal data (Re Crime and Education data)

Tue Sep 17 06:02:41 CEST 2019

> I’ve often heard that mixed-effect lmer/glmer models “handle” or “deal” with NA values well, 
> and I’ve become more curious about what this actually means, if it is, indeed, true. What 
> I’ve observed working with mixed-effect models is that na.omit will delete the entire 
> row of observations, and depending on the number of NAs, the AIC might 
> deceptively, dramatically decrease, given that the sample is smaller.

> I know that one can also use “na.pass”

> I’d assume that imputation is better practice for handling NAs. 

> To summarize:
> 1. If lmer does handle NAs well, how exactly is it doing that? If “na.pass” fails, 
> then is it handling NAs as any other program?

My limited understanding is that na.pass usually affects just the copy of the data in
the returned object. It won't get around the fact that if you are conditioning on fixed effects, only complete observations must be used. So if you want your AICs to be comparable, you need to have a single dataset that is complete for all the variables you are interested in.

> 2. Is imputation (done correctly) better than allowing mixed-effect functions to handle NAs?

If you have non-ignorable missing data, then these must be included as response variables, so the mixed model can combine the correct likelihoods for each pattern of missingness. I have more experience with a straightforward multivariate formulation for this, so I don't know how or if you can mimic this in the lmer framework. Quite aside from if you want to specify directional paths between such variables - imputation is the cheap and cheerful answer.

> 5. Should I be using long format here for variables like race (black, white, asian, latino) 
> and education attainment (some high school, hs diploma, some college, bachelors, 
> MA/grad school)

I'd of thought so, unless you already have a handle on the causes of any autocorrelation

Hopefully someone more in your area will respond, but in animal breeding genetics, there are mixed models of similar huge longitudinal datasets (people I know in human genetics were great fans of the Journal of Dairy Science ;), and of ASReml).

Cheers, David Duffy.