[R-sig-ME] LMM diagnostics: conditional residuals correlated highly with fitted values

Wed Oct 7 12:14:31 CEST 2015

Dear Cherry,

Please don't post in HTML. Have a look at the posting guide.

You'll need to provide more information. What is the class of each variable
(continuous, count, presence/absence, factor, ...)? What is the output of
summary(model)?

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-10-06 17:15 GMT+02:00 Yizhou Ma <maxxx848 op umn.edu>:

> Dear LMM experts:
>
> I am pretty new to using LMM and I have found the following situation
> bewildering as I was trying to do diagnostics with my fitted model: my
> conditional residuals correlated highly with the fitted values.
>
> I have a dataset with multiple families, each has 1-4 siblings. I am trying
> to regress Y onto EVs include Drink, Gender, & Age, while using random
> intercept for family. This is the model I used:
> model<-lmer(Y~Drink*Gender+Age
>                       +(1|Family_ID),data,REML=FALSE)
>
> After fitting the model, I used
> plot(model)
> to see the relationship between conditional residuals and fitted values. I
> expect them to be uncorrelated and I expect to see homoscedasticity.
>
> Yet to my surprise there is a high correlation (~0.5) between the residuals
> and the fitted values. (see here <http://imgur.com/pPsG4aR>). I know from
> GLM that this usually suggest nonlinear relationships between the EVs and
> the DV.
>
> I read some online posts (post1
> <
> http://stats.stackexchange.com/questions/43566/strange-pattern-in-residual-plot-from-mixed-effect-model
> >
> post2
> <
> http://stats.stackexchange.com/questions/168179/correlation-between-standardized-residuals-and-fitted-values-in-a-linear-mixed-e/168210#168210
> >)
> that suggest this can result from a poor model fit. So I tried a few
> different models, including: 1) log transform Drink, which is originally
> positively skewed; 2) add random slopes for Drink, Age, etc. None of these
> changes have led to a substantial difference for the residual & fitted
> value correlation.
>
> Some other info:
> 1) my overall model fit is not poor as indicated by the correlation between
> fitted values & Y. It is around 0.8;
> 2) most variables in my model has a normal, or at least symmetrical,
> distribution.
> 3) conditional residuals are normally distributed as shown in qqplots.
> 4) conditional residuals are not correlated with any fixed effects, such as
> Drink or Age.
>
> I have two guesses as to what is going on:
> 1) maybe the fact that each family is a different size actually violates
> assumptions of the model?
> 2) or maybe there is something wrong with estimation of the random effect
> (family intercept)?
>
> I'd really appreciate your insights as to what is going on here and if
> there is any problems with my model.
>
> Thank you very much,
> Cherry
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]