[R-sig-ME] LMM diagnostics: conditional residuals correlated highly with fitted values
Thierry Onkelinx
thierry.onkelinx at inbo.be
Wed Oct 7 17:15:21 CEST 2015
Can you elaborate on what Y is? Does it has a lower boundary? And if so, do
you have observations near that boundary? E.g. Y must be non-negative and
the dataset contains observations close to 0. A densityplot would be useful.
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2015-10-07 17:09 GMT+02:00 Yizhou Ma <maxxx848 op umn.edu>:
> Hi Thierry,
>
> Thank you for your reply and sorry for the HTML thing. Below is my
> summary(model) output.
>
> Y, Drink, and Age are continuous variables
> Gender is F & M.
> Family_ID is a factor.
>
> Linear mixed model fit by maximum likelihood ['lmerMod']
> Formula: Y ~ Drink * Gender + Age + (1 | Family_ID)
> Data: data
>
> AIC BIC logLik deviance df.resid
> 1046.4 1074.0 -516.2 1032.4 372
>
> Scaled residuals:
> Min 1Q Median 3Q Max
> -2.67228 -0.56085 -0.02968 0.66166 2.91452
>
> Random effects:
> Groups Name Variance Std.Dev.
> Family_ID (Intercept) 0.3550 0.5958
> Residual 0.6162 0.7850
> Number of obs: 379, groups: Family_ID, 189
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 1.10309 0.43921 2.511
> Drink 0.16425 0.08031 2.045
> Gender.M -0.19364 0.10874 -1.781
> Age -0.03377 0.01489 -2.268
> Drink:Gender.M -0.13647 0.10681 -1.278
>
> Correlation of Fixed Effects:
> (Intr) Drnk Gndr.M Age
> Drink -0.098
> Gender.M -0.040 -0.249
> Age -0.985 0.158 -0.054
> Drnk:G.M 0.042 -0.737 -0.021 -0.085
>
> Thank you very much,
> Cherry
>
> On Wed, Oct 7, 2015 at 5:14 AM, Thierry Onkelinx
> <thierry.onkelinx op inbo.be> wrote:
> > Dear Cherry,
> >
> > Please don't post in HTML. Have a look at the posting guide.
> >
> > You'll need to provide more information. What is the class of each
> variable
> > (continuous, count, presence/absence, factor, ...)? What is the output of
> > summary(model)?
> >
> > Best regards,
> >
> > ir. Thierry Onkelinx
> > Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and
> > Forest
> > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> > Kliniekstraat 25
> > 1070 Anderlecht
> > Belgium
> >
> > To call in the statistician after the experiment is done may be no more
> than
> > asking him to perform a post-mortem examination: he may be able to say
> what
> > the experiment died of. ~ Sir Ronald Aylmer Fisher
> > The plural of anecdote is not data. ~ Roger Brinner
> > The combination of some data and an aching desire for an answer does not
> > ensure that a reasonable answer can be extracted from a given body of
> data.
> > ~ John Tukey
> >
> > 2015-10-06 17:15 GMT+02:00 Yizhou Ma <maxxx848 op umn.edu>:
> >>
> >> Dear LMM experts:
> >>
> >> I am pretty new to using LMM and I have found the following situation
> >> bewildering as I was trying to do diagnostics with my fitted model: my
> >> conditional residuals correlated highly with the fitted values.
> >>
> >> I have a dataset with multiple families, each has 1-4 siblings. I am
> >> trying
> >> to regress Y onto EVs include Drink, Gender, & Age, while using random
> >> intercept for family. This is the model I used:
> >> model<-lmer(Y~Drink*Gender+Age
> >> +(1|Family_ID),data,REML=FALSE)
> >>
> >> After fitting the model, I used
> >> plot(model)
> >> to see the relationship between conditional residuals and fitted
> values. I
> >> expect them to be uncorrelated and I expect to see homoscedasticity.
> >>
> >> Yet to my surprise there is a high correlation (~0.5) between the
> >> residuals
> >> and the fitted values. (see here <http://imgur.com/pPsG4aR>). I know
> from
> >> GLM that this usually suggest nonlinear relationships between the EVs
> and
> >> the DV.
> >>
> >> I read some online posts (post1
> >>
> >> <
> http://stats.stackexchange.com/questions/43566/strange-pattern-in-residual-plot-from-mixed-effect-model
> >
> >> post2
> >>
> >> <
> http://stats.stackexchange.com/questions/168179/correlation-between-standardized-residuals-and-fitted-values-in-a-linear-mixed-e/168210#168210
> >)
> >> that suggest this can result from a poor model fit. So I tried a few
> >> different models, including: 1) log transform Drink, which is originally
> >> positively skewed; 2) add random slopes for Drink, Age, etc. None of
> these
> >> changes have led to a substantial difference for the residual & fitted
> >> value correlation.
> >>
> >> Some other info:
> >> 1) my overall model fit is not poor as indicated by the correlation
> >> between
> >> fitted values & Y. It is around 0.8;
> >> 2) most variables in my model has a normal, or at least symmetrical,
> >> distribution.
> >> 3) conditional residuals are normally distributed as shown in qqplots.
> >> 4) conditional residuals are not correlated with any fixed effects, such
> >> as
> >> Drink or Age.
> >>
> >> I have two guesses as to what is going on:
> >> 1) maybe the fact that each family is a different size actually violates
> >> assumptions of the model?
> >> 2) or maybe there is something wrong with estimation of the random
> effect
> >> (family intercept)?
> >>
> >> I'd really appreciate your insights as to what is going on here and if
> >> there is any problems with my model.
> >>
> >> Thank you very much,
> >> Cherry
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models op r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> >
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list