[R-sig-ME] LMM diagnostics: conditional residuals correlated highly with fitted values

Wed Oct 7 17:29:00 CEST 2015

Y is a brain measure that has been standardized. A histogram of Y is here:
http://imgur.com/Um8yyuu

I am confused about the "Y must be non-negative and the dataset
contains observations close to 0" part. Is that the requirements for
Y? Is so, then my model could be wrong.

On Wed, Oct 7, 2015 at 10:15 AM, Thierry Onkelinx
<thierry.onkelinx at inbo.be> wrote:
> Can you elaborate on what Y is? Does it has a lower boundary? And if so, do
> you have observations near that boundary? E.g. Y must be non-negative and
> the dataset contains observations close to 0. A densityplot would be useful.
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more than
> asking him to perform a post-mortem examination: he may be able to say what
> the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2015-10-07 17:09 GMT+02:00 Yizhou Ma <maxxx848 at umn.edu>:
>>
>> Hi Thierry,
>>
>> Thank you for your reply and sorry for the HTML thing. Below is my
>> summary(model) output.
>>
>> Y, Drink, and Age are continuous variables
>> Gender is F & M.
>> Family_ID is a factor.
>>
>> Linear mixed model fit by maximum likelihood  ['lmerMod']
>> Formula: Y ~ Drink * Gender + Age + (1 | Family_ID)
>>    Data: data
>>
>>      AIC      BIC   logLik deviance df.resid
>>   1046.4   1074.0   -516.2   1032.4      372
>>
>> Scaled residuals:
>>      Min       1Q   Median       3Q      Max
>> -2.67228 -0.56085 -0.02968  0.66166  2.91452
>>
>> Random effects:
>>  Groups    Name        Variance Std.Dev.
>>  Family_ID (Intercept) 0.3550   0.5958
>>  Residual                    0.6162   0.7850
>> Number of obs: 379, groups:  Family_ID, 189
>>
>> Fixed effects:
>>                           Estimate Std. Error t value
>> (Intercept)          1.10309    0.43921   2.511
>> Drink                  0.16425    0.08031   2.045
>> Gender.M          -0.19364    0.10874  -1.781
>> Age                    -0.03377    0.01489  -2.268
>> Drink:Gender.M -0.13647    0.10681  -1.278
>>
>> Correlation of Fixed Effects:
>>                 (Intr)     Drnk   Gndr.M  Age
>> Drink        -0.098
>> Gender.M -0.040 -0.249
>> Age           -0.985  0.158 -0.054
>> Drnk:G.M  0.042 -0.737 -0.021 -0.085
>>
>> Thank you very much,
>> Cherry
>>
>> On Wed, Oct 7, 2015 at 5:14 AM, Thierry Onkelinx
>> <thierry.onkelinx at inbo.be> wrote:
>> > Dear Cherry,
>> >
>> > Please don't post in HTML. Have a look at the posting guide.
>> >
>> > You'll need to provide more information. What is the class of each
>> > variable
>> > (continuous, count, presence/absence, factor, ...)? What is the output
>> > of
>> > summary(model)?
>> >
>> > Best regards,
>> >
>> > ir. Thierry Onkelinx
>> > Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>> > and
>> > Forest
>> > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>> > Kliniekstraat 25
>> > 1070 Anderlecht
>> > Belgium
>> >
>> > To call in the statistician after the experiment is done may be no more
>> > than
>> > asking him to perform a post-mortem examination: he may be able to say
>> > what
>> > the experiment died of. ~ Sir Ronald Aylmer Fisher
>> > The plural of anecdote is not data. ~ Roger Brinner
>> > The combination of some data and an aching desire for an answer does not
>> > ensure that a reasonable answer can be extracted from a given body of
>> > data.
>> > ~ John Tukey
>> >
>> > 2015-10-06 17:15 GMT+02:00 Yizhou Ma <maxxx848 at umn.edu>:
>> >>
>> >> Dear LMM experts:
>> >>
>> >> I am pretty new to using LMM and I have found the following situation
>> >> bewildering as I was trying to do diagnostics with my fitted model: my
>> >> conditional residuals correlated highly with the fitted values.
>> >>
>> >> I have a dataset with multiple families, each has 1-4 siblings. I am
>> >> trying
>> >> to regress Y onto EVs include Drink, Gender, & Age, while using random
>> >> intercept for family. This is the model I used:
>> >> model<-lmer(Y~Drink*Gender+Age
>> >>                       +(1|Family_ID),data,REML=FALSE)
>> >>
>> >> After fitting the model, I used
>> >> plot(model)
>> >> to see the relationship between conditional residuals and fitted
>> >> values. I
>> >> expect them to be uncorrelated and I expect to see homoscedasticity.
>> >>
>> >> Yet to my surprise there is a high correlation (~0.5) between the
>> >> residuals
>> >> and the fitted values. (see here <http://imgur.com/pPsG4aR>). I know
>> >> from
>> >> GLM that this usually suggest nonlinear relationships between the EVs
>> >> and
>> >> the DV.
>> >>
>> >> I read some online posts (post1
>> >>
>> >>
>> >> <http://stats.stackexchange.com/questions/43566/strange-pattern-in-residual-plot-from-mixed-effect-model>
>> >> post2
>> >>
>> >>
>> >> <http://stats.stackexchange.com/questions/168179/correlation-between-standardized-residuals-and-fitted-values-in-a-linear-mixed-e/168210#168210>)
>> >> that suggest this can result from a poor model fit. So I tried a few
>> >> different models, including: 1) log transform Drink, which is
>> >> originally
>> >> positively skewed; 2) add random slopes for Drink, Age, etc. None of
>> >> these
>> >> changes have led to a substantial difference for the residual & fitted
>> >> value correlation.
>> >>
>> >> Some other info:
>> >> 1) my overall model fit is not poor as indicated by the correlation
>> >> between
>> >> fitted values & Y. It is around 0.8;
>> >> 2) most variables in my model has a normal, or at least symmetrical,
>> >> distribution.
>> >> 3) conditional residuals are normally distributed as shown in qqplots.
>> >> 4) conditional residuals are not correlated with any fixed effects,
>> >> such
>> >> as
>> >> Drink or Age.
>> >>
>> >> I have two guesses as to what is going on:
>> >> 1) maybe the fact that each family is a different size actually
>> >> violates
>> >> assumptions of the model?
>> >> 2) or maybe there is something wrong with estimation of the random
>> >> effect
>> >> (family intercept)?
>> >>
>> >> I'd really appreciate your insights as to what is going on here and if
>> >> there is any problems with my model.
>> >>
>> >> Thank you very much,
>> >> Cherry
>> >>
>> >>         [[alternative HTML version deleted]]
>> >>
>> >> _______________________________________________
>> >> R-sig-mixed-models at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >
>> >
>
>