[R-sig-ME] LMM diagnostics: conditional residuals correlated highly with fitted values

Wed Oct 7 17:09:07 CEST 2015

Hi Thierry,

Thank you for your reply and sorry for the HTML thing. Below is my
summary(model) output.

Y, Drink, and Age are continuous variables
Gender is F & M.
Family_ID is a factor.

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: Y ~ Drink * Gender + Age + (1 | Family_ID)
   Data: data

     AIC      BIC   logLik deviance df.resid
  1046.4   1074.0   -516.2   1032.4      372

Scaled residuals:
     Min       1Q   Median       3Q      Max
-2.67228 -0.56085 -0.02968  0.66166  2.91452

Random effects:
 Groups    Name        Variance Std.Dev.
 Family_ID (Intercept) 0.3550   0.5958
 Residual                    0.6162   0.7850
Number of obs: 379, groups:  Family_ID, 189

Fixed effects:
                          Estimate Std. Error t value
(Intercept)          1.10309    0.43921   2.511
Drink                  0.16425    0.08031   2.045
Gender.M          -0.19364    0.10874  -1.781
Age                    -0.03377    0.01489  -2.268
Drink:Gender.M -0.13647    0.10681  -1.278

Correlation of Fixed Effects:
                (Intr)     Drnk   Gndr.M  Age
Drink        -0.098
Gender.M -0.040 -0.249
Age           -0.985  0.158 -0.054
Drnk:G.M  0.042 -0.737 -0.021 -0.085

Thank you very much,
Cherry

On Wed, Oct 7, 2015 at 5:14 AM, Thierry Onkelinx
<thierry.onkelinx at inbo.be> wrote:
> Dear Cherry,
>
> Please don't post in HTML. Have a look at the posting guide.
>
> You'll need to provide more information. What is the class of each variable
> (continuous, count, presence/absence, factor, ...)? What is the output of
> summary(model)?
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more than
> asking him to perform a post-mortem examination: he may be able to say what
> the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2015-10-06 17:15 GMT+02:00 Yizhou Ma <maxxx848 at umn.edu>:
>>
>> Dear LMM experts:
>>
>> I am pretty new to using LMM and I have found the following situation
>> bewildering as I was trying to do diagnostics with my fitted model: my
>> conditional residuals correlated highly with the fitted values.
>>
>> I have a dataset with multiple families, each has 1-4 siblings. I am
>> trying
>> to regress Y onto EVs include Drink, Gender, & Age, while using random
>> intercept for family. This is the model I used:
>> model<-lmer(Y~Drink*Gender+Age
>>                       +(1|Family_ID),data,REML=FALSE)
>>
>> After fitting the model, I used
>> plot(model)
>> to see the relationship between conditional residuals and fitted values. I
>> expect them to be uncorrelated and I expect to see homoscedasticity.
>>
>> Yet to my surprise there is a high correlation (~0.5) between the
>> residuals
>> and the fitted values. (see here <http://imgur.com/pPsG4aR>). I know from
>> GLM that this usually suggest nonlinear relationships between the EVs and
>> the DV.
>>
>> I read some online posts (post1
>>
>> <http://stats.stackexchange.com/questions/43566/strange-pattern-in-residual-plot-from-mixed-effect-model>
>> post2
>>
>> <http://stats.stackexchange.com/questions/168179/correlation-between-standardized-residuals-and-fitted-values-in-a-linear-mixed-e/168210#168210>)
>> that suggest this can result from a poor model fit. So I tried a few
>> different models, including: 1) log transform Drink, which is originally
>> positively skewed; 2) add random slopes for Drink, Age, etc. None of these
>> changes have led to a substantial difference for the residual & fitted
>> value correlation.
>>
>> Some other info:
>> 1) my overall model fit is not poor as indicated by the correlation
>> between
>> fitted values & Y. It is around 0.8;
>> 2) most variables in my model has a normal, or at least symmetrical,
>> distribution.
>> 3) conditional residuals are normally distributed as shown in qqplots.
>> 4) conditional residuals are not correlated with any fixed effects, such
>> as
>> Drink or Age.
>>
>> I have two guesses as to what is going on:
>> 1) maybe the fact that each family is a different size actually violates
>> assumptions of the model?
>> 2) or maybe there is something wrong with estimation of the random effect
>> (family intercept)?
>>
>> I'd really appreciate your insights as to what is going on here and if
>> there is any problems with my model.
>>
>> Thank you very much,
>> Cherry
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>