[R-sig-ME] Residuals look "mirrored" when using lmer with imputed data
Alday, Phillip
Phillip.Alday at mpi.nl
Sun Aug 6 15:00:47 CEST 2017
Uh ... part of my problem is the term "AUC". Without some very explicit
note to the contrary, AUC on a stats forum often refers to a cumulative
probability or the Receiver Operating Characteristic (ROC AUC).
Remember, we're doing this in our free time, so I am often quick and
don't read every line of code ....
You're effectively using a rectangular approximation for your integral
and that's something else to be careful with in terms of approximation.
But why impute separately here? You might be able to use a suitable
integral approximation to skip imputation:
> dfAUC <- subset(df,!is.na(value)) %>% arrange(sampleNum) %>%
group_by(ID, treatment) %>% mutate(AUC =
sum(diff(sampleNum)*rollmean(value,2)))
> fit2 <- lmer(AUC ~ sex * treatment + (1|ID), data = dfAUC)
> qqnorm(resid(fit2))
yields similarly flat tails on the qqplot. (Okay, maybe don't use the
rectangular approximation without proper imputation; this didn't work
as well as I'd hoped.)
In both cases, the residual plot looks not horrible to me, but the
qqnorm plot suggests some issues with bounding -- those flat tails look
you're running into some minimal/maximal value. At the lower end, the
obvious answer is "zero"; at the upper end, maybe you're hitting the
limits of your measurement device? We see which values actually occur,
both with the imputed data and the implicit-imputation-via-integral-
approximation data:
> xtabs(~AUC,dfImpAUC)
> xtabs(~AUC,dfAUC)
In both cases, we see that the data only take on a relatively small
number of values and indeed there are lots of repetitions at the
bounds.
Phillip
> On Sat, 2017-08-05 at 17:59 +0200, João C P Santiago wrote:
> >
> > I'm sorry I may be misunderstanding, but why do you say my data
> > is
> > bounded between 0 and 1? The data I shared is from multiple
> > measurements of a blood hormone, which can be 0, but is certainly
> > not
> > bounded at 1. Next I calculated the AUC i.e. the "total exposure"
> > of
> > that hormone. X is time and y is the value of the hormone at each
> > time
> > point. The AUC is also not bounded at 0 and 1.
> >
> > Am I missing something?
> >
> > Thanks for your reply!
> > Santiago
> >
> >
> > Quoting "Alday, Phillip" <Phillip.Alday at mpi.nl>:
> >
> > >
> > >
> > > You're fitting a normal/Gaussian LMM to data bounded on [0, 1].
> > > The
> > > model assumptions about the residuals simply won't hold for
> > > bounded,
> > > binomial-like values.
> > >
> > > Why not fit a binomial model to the data and then use
> > > fitted(model)
> > > to
> > > compute AUC of the entire model?
> > >
> > > Phillip
> > >
> > > On Fri, 2017-08-04 at 08:52 +0200, João C P Santiago wrote:
> > > >
> > > >
> > > > I'm trying to assess if a treatment had any effect on the
> > > > levels
> > > > of
> > > > a
> > > > hormone. To do this I need to calculate the area under the
> > > > curve
> > > > and
> > > > then adjust it for sex (a known confounder) and smoking status
> > > > (not
> > > > included in the demo data below to keep things simpler).
> > > >
> > > > Here's a dput of the data: https://pastebin.com/VYcQGkwb
> > > >
> > > > There's some missing values, so first step is to impute them
> > > > using
> > > > the
> > > > mice package, then calculate AUC and finally fit the model:
> > > >
> > > > library(dplyr)
> > > > library(lme4)
> > > > library(mice)
> > > > library(zoo)
> > > >
> > > > ## Impute missing values
> > > > dfMids <- mice(df, m = 10, maxit = 15, seed = 2535)
> > > > dfImp <- complete(dfMids)
> > > >
> > > > ## Calculate AUC
> > > > dfImpAUC <- dfImp %>%
> > > > arrange(sampleNum) %>%
> > > > group_by(ID, treatment) %>%
> > > > mutate(AUC = sum(diff(sampleNum)*rollmean(value,2)))
> > > >
> > > > ## Fit model
> > > > fit <- lmer(AUC ~ sex * treatment + (1|ID), data = dfImpAUC)
> > > >
> > > > ## Plot residuals
> > > > plot(fit) # output: https://imgur.com/a/vfL1R
> > > > qqnorm(resid(fit))
> > > >
> > > >
> > > >
> > > > I know it's possible to fit a model to each iteration of mids
> > > > model,
> > > > but then I can't calculate the AUC, which is what I actually
> > > > need.
> > > > Any
> > > > ideas why the residuals look like that?
> > > >
> > > > Best
> > > > Santiago
> > > >
> > > >
> > > >
> > > >
More information about the R-sig-mixed-models
mailing list