[R-sig-ME] Residuals look "mirrored" when using lmer with imputed data

Sun Aug 6 15:00:47 CEST 2017

Uh ... part of my problem is the term "AUC". Without some very explicit
note to the contrary, AUC on a stats forum often refers to a cumulative
probability or the Receiver Operating Characteristic (ROC AUC).
Remember, we're doing this in our free time, so I am often quick and
don't read every line of code ....

You're effectively using a rectangular approximation for your integral
and that's something else to be careful with in terms of approximation.
But why impute separately here? You might be able to use a suitable
integral approximation to skip imputation:

>  dfAUC <- subset(df,!is.na(value)) %>% arrange(sampleNum) %>%
group_by(ID, treatment) %>%  mutate(AUC =
sum(diff(sampleNum)*rollmean(value,2)))
> fit2 <- lmer(AUC ~ sex * treatment + (1|ID), data = dfAUC)
> qqnorm(resid(fit2))

yields similarly flat tails on the qqplot. (Okay, maybe don't use the
rectangular approximation without proper imputation; this didn't work
as well as I'd hoped.)

In both cases, the residual plot looks not horrible to me, but the
qqnorm plot suggests some issues with bounding -- those flat tails look
you're running into some minimal/maximal value. At the lower end, the
obvious answer is "zero"; at the upper end, maybe you're hitting the
limits of your measurement device? We see which values actually occur,
both with the imputed data and the implicit-imputation-via-integral-
approximation data:

> xtabs(~AUC,dfImpAUC)
> xtabs(~AUC,dfAUC)

In both cases, we see that the data only take on a relatively small
number of values and indeed there are lots of repetitions at the
bounds.

Phillip

> On Sat, 2017-08-05 at 17:59 +0200, João C P Santiago wrote:
> > 
> > I'm sorry I may be misunderstanding, but why do you say my data
> > is  
> > bounded between 0 and 1? The data I shared is from multiple  
> > measurements of a blood hormone, which can be 0, but is certainly
> > not  
> > bounded at 1. Next I calculated the AUC i.e. the "total exposure"
> > of  
> > that hormone. X is time and y is the value of the hormone at each
> > time  
> > point. The AUC is also not bounded at 0 and 1.
> > 
> > Am I missing something?
> > 
> > Thanks for your reply!
> > Santiago
> > 
> > 
> > Quoting "Alday, Phillip" <Phillip.Alday at mpi.nl>:
> > 
> > > 
> > > 
> > > You're fitting a normal/Gaussian LMM to data bounded on [0, 1].
> > > The
> > > model assumptions about the residuals simply won't hold for
> > > bounded,
> > > binomial-like values. 
> > > 
> > > Why not fit a binomial model to the data and then use
> > > fitted(model)
> > > to
> > > compute AUC of the entire model? 
> > > 
> > > Phillip  
> > > 
> > > On Fri, 2017-08-04 at 08:52 +0200, João C P Santiago wrote:
> > > > 
> > > > 
> > > > I'm trying to assess if a treatment had any effect on the
> > > > levels
> > > > of
> > > > a  
> > > > hormone. To do this I need to calculate the area under the
> > > > curve
> > > > and  
> > > > then adjust it for sex (a known confounder) and smoking status
> > > > (not  
> > > > included in the demo data below to keep things simpler).
> > > > 
> > > > Here's a dput of the data: https://pastebin.com/VYcQGkwb
> > > > 
> > > > There's some missing values, so first step is to impute them
> > > > using
> > > > the  
> > > > mice package, then calculate AUC and finally fit the model:
> > > > 
> > > > library(dplyr)
> > > > library(lme4)
> > > > library(mice)
> > > > library(zoo)
> > > > 
> > > > ## Impute missing values
> > > > dfMids <- mice(df, m = 10, maxit = 15, seed = 2535)
> > > > dfImp  <- complete(dfMids)
> > > > 
> > > > ## Calculate AUC
> > > > dfImpAUC <- dfImp %>%
> > > >    arrange(sampleNum) %>%
> > > >    group_by(ID, treatment) %>%
> > > >    mutate(AUC = sum(diff(sampleNum)*rollmean(value,2)))
> > > > 
> > > > ## Fit model
> > > > fit <- lmer(AUC ~ sex * treatment + (1|ID), data = dfImpAUC)
> > > > 
> > > > ## Plot residuals
> > > > plot(fit)  # output: https://imgur.com/a/vfL1R
> > > > qqnorm(resid(fit))
> > > > 
> > > > 
> > > > 
> > > > I know it's possible to fit a model to each iteration of mids
> > > > model,  
> > > > but then I can't calculate the AUC, which is what I actually
> > > > need.
> > > > Any  
> > > > ideas why the residuals look like that?
> > > > 
> > > > Best
> > > > Santiago
> > > > 
> > > > 
> > > > 
> > > >