[R-sig-ME] LMM diagnostics: conditional residuals correlated highly with fitted values

Ulf Köther ukoether at uke.de
Wed Oct 7 18:38:01 CEST 2015


Dear Cherry,

maybe the correlation - which by the way seemed not that excessive to me
according to the first plot you posted but regardless of the r = 0.5
value (and I might be wrong with that totally!) - between your fitted
values and the residuals is coming from something like a non-linear
effect of age or drink on Y? To test this (in kind of half-formal way),
try this:

library(mgcv)
Res1 <- resid(model, scaled = TRUE)
L1 <- gam(Res1 ~ s(age), data = data)
plot(L1, xlab = "age")
points(x = data$age, y = Res1)
abline(h = 0)

...and then the same for drink. If there is no remaining non-linear age
effect in the residuals then this smoother should be around the
horizontal line at 0 for all age values, and the p-value of the smoother
should then indicate a non-significant age effect.

Good luck,

Ulf




Am 07.10.2015 um 18:05 schrieb Yizhou Ma:
> Hi Thierry,
> 
> Thank you for clarifying. I agree that high skewness can lead to
> nonlinear relationship which can not be properly modeled in linear
> models.
> 
> I have plotted the residuals against all my fixed factors and I cannot
> find any nonlinear relationship. It is possible that I am missing an
> important covariate though.
> 
> Thanks a lot,
> Cherry
> 
> 
> On Wed, Oct 7, 2015 at 10:54 AM, Thierry Onkelinx
> <thierry.onkelinx at inbo.be> wrote:
>> My example is not a requirement of a LMM but rather an example of a
>> distribution of a variable which can cause troubles with a LMM. Think of an
>> area. An area cannot be negative. This can cause artefacts into the
>> residuals when you have lots of values near zero. Have a look at this
>> example.
>>
>> n <- 200
>> dataset <- data.frame(
>>   X = runif(n)
>> )
>> dataset$eta <- -.1 + 3 * dataset$X
>> dataset$Y <- rpois(n, lambda = exp(dataset$eta))
>> model <- lm(Y~ X, data = dataset) #wrong analysis for this kind of data,
>> here just an illustration of the problem
>> plot(fitted(model), resid(model))
>>
>> But this doesn't seems to be the problem in your case.
>>
>> I would recommend that you see if there are patterns in the residuals when
>> you plot them against the covariates. Maybe you are missing an interaction
>> or even an important covariate.
>>
>> Best regards,
>>
>>
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
>> Forest
>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>> Kliniekstraat 25
>> 1070 Anderlecht
>> Belgium
>>
>> To call in the statistician after the experiment is done may be no more than
>> asking him to perform a post-mortem examination: he may be able to say what
>> the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>>
>> 2015-10-07 17:29 GMT+02:00 Yizhou Ma <maxxx848 at umn.edu>:
>>>
>>> Y is a brain measure that has been standardized. A histogram of Y is here:
>>> http://imgur.com/Um8yyuu
>>>
>>> I am confused about the "Y must be non-negative and the dataset
>>> contains observations close to 0" part. Is that the requirements for
>>> Y? Is so, then my model could be wrong.
>>>
>>> On Wed, Oct 7, 2015 at 10:15 AM, Thierry Onkelinx
>>> <thierry.onkelinx at inbo.be> wrote:
>>>> Can you elaborate on what Y is? Does it has a lower boundary? And if so,
>>>> do
>>>> you have observations near that boundary? E.g. Y must be non-negative
>>>> and
>>>> the dataset contains observations close to 0. A densityplot would be
>>>> useful.
>>>>
>>>> ir. Thierry Onkelinx
>>>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>>>> and
>>>> Forest
>>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>>>> Kliniekstraat 25
>>>> 1070 Anderlecht
>>>> Belgium
>>>>
>>>> To call in the statistician after the experiment is done may be no more
>>>> than
>>>> asking him to perform a post-mortem examination: he may be able to say
>>>> what
>>>> the experiment died of. ~ Sir Ronald Aylmer Fisher
>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>> The combination of some data and an aching desire for an answer does not
>>>> ensure that a reasonable answer can be extracted from a given body of
>>>> data.
>>>> ~ John Tukey
>>>>
>>>> 2015-10-07 17:09 GMT+02:00 Yizhou Ma <maxxx848 at umn.edu>:
>>>>>
>>>>> Hi Thierry,
>>>>>
>>>>> Thank you for your reply and sorry for the HTML thing. Below is my
>>>>> summary(model) output.
>>>>>
>>>>> Y, Drink, and Age are continuous variables
>>>>> Gender is F & M.
>>>>> Family_ID is a factor.
>>>>>
>>>>> Linear mixed model fit by maximum likelihood  ['lmerMod']
>>>>> Formula: Y ~ Drink * Gender + Age + (1 | Family_ID)
>>>>>    Data: data
>>>>>
>>>>>      AIC      BIC   logLik deviance df.resid
>>>>>   1046.4   1074.0   -516.2   1032.4      372
>>>>>
>>>>> Scaled residuals:
>>>>>      Min       1Q   Median       3Q      Max
>>>>> -2.67228 -0.56085 -0.02968  0.66166  2.91452
>>>>>
>>>>> Random effects:
>>>>>  Groups    Name        Variance Std.Dev.
>>>>>  Family_ID (Intercept) 0.3550   0.5958
>>>>>  Residual                    0.6162   0.7850
>>>>> Number of obs: 379, groups:  Family_ID, 189
>>>>>
>>>>> Fixed effects:
>>>>>                           Estimate Std. Error t value
>>>>> (Intercept)          1.10309    0.43921   2.511
>>>>> Drink                  0.16425    0.08031   2.045
>>>>> Gender.M          -0.19364    0.10874  -1.781
>>>>> Age                    -0.03377    0.01489  -2.268
>>>>> Drink:Gender.M -0.13647    0.10681  -1.278
>>>>>
>>>>> Correlation of Fixed Effects:
>>>>>                 (Intr)     Drnk   Gndr.M  Age
>>>>> Drink        -0.098
>>>>> Gender.M -0.040 -0.249
>>>>> Age           -0.985  0.158 -0.054
>>>>> Drnk:G.M  0.042 -0.737 -0.021 -0.085
>>>>>
>>>>> Thank you very much,
>>>>> Cherry
>>>>>
>>>>> On Wed, Oct 7, 2015 at 5:14 AM, Thierry Onkelinx
>>>>> <thierry.onkelinx at inbo.be> wrote:
>>>>>> Dear Cherry,
>>>>>>
>>>>>> Please don't post in HTML. Have a look at the posting guide.
>>>>>>
>>>>>> You'll need to provide more information. What is the class of each
>>>>>> variable
>>>>>> (continuous, count, presence/absence, factor, ...)? What is the
>>>>>> output
>>>>>> of
>>>>>> summary(model)?
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> ir. Thierry Onkelinx
>>>>>> Instituut voor natuur- en bosonderzoek / Research Institute for
>>>>>> Nature
>>>>>> and
>>>>>> Forest
>>>>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>>>>>> Kliniekstraat 25
>>>>>> 1070 Anderlecht
>>>>>> Belgium
>>>>>>
>>>>>> To call in the statistician after the experiment is done may be no
>>>>>> more
>>>>>> than
>>>>>> asking him to perform a post-mortem examination: he may be able to
>>>>>> say
>>>>>> what
>>>>>> the experiment died of. ~ Sir Ronald Aylmer Fisher
>>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>>> The combination of some data and an aching desire for an answer does
>>>>>> not
>>>>>> ensure that a reasonable answer can be extracted from a given body of
>>>>>> data.
>>>>>> ~ John Tukey
>>>>>>
>>>>>> 2015-10-06 17:15 GMT+02:00 Yizhou Ma <maxxx848 at umn.edu>:
>>>>>>>
>>>>>>> Dear LMM experts:
>>>>>>>
>>>>>>> I am pretty new to using LMM and I have found the following
>>>>>>> situation
>>>>>>> bewildering as I was trying to do diagnostics with my fitted model:
>>>>>>> my
>>>>>>> conditional residuals correlated highly with the fitted values.
>>>>>>>
>>>>>>> I have a dataset with multiple families, each has 1-4 siblings. I am
>>>>>>> trying
>>>>>>> to regress Y onto EVs include Drink, Gender, & Age, while using
>>>>>>> random
>>>>>>> intercept for family. This is the model I used:
>>>>>>> model<-lmer(Y~Drink*Gender+Age
>>>>>>>                       +(1|Family_ID),data,REML=FALSE)
>>>>>>>
>>>>>>> After fitting the model, I used
>>>>>>> plot(model)
>>>>>>> to see the relationship between conditional residuals and fitted
>>>>>>> values. I
>>>>>>> expect them to be uncorrelated and I expect to see homoscedasticity.
>>>>>>>
>>>>>>> Yet to my surprise there is a high correlation (~0.5) between the
>>>>>>> residuals
>>>>>>> and the fitted values. (see here <http://imgur.com/pPsG4aR>). I know
>>>>>>> from
>>>>>>> GLM that this usually suggest nonlinear relationships between the
>>>>>>> EVs
>>>>>>> and
>>>>>>> the DV.
>>>>>>>
>>>>>>> I read some online posts (post1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <http://stats.stackexchange.com/questions/43566/strange-pattern-in-residual-plot-from-mixed-effect-model>
>>>>>>> post2
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <http://stats.stackexchange.com/questions/168179/correlation-between-standardized-residuals-and-fitted-values-in-a-linear-mixed-e/168210#168210>)
>>>>>>> that suggest this can result from a poor model fit. So I tried a few
>>>>>>> different models, including: 1) log transform Drink, which is
>>>>>>> originally
>>>>>>> positively skewed; 2) add random slopes for Drink, Age, etc. None of
>>>>>>> these
>>>>>>> changes have led to a substantial difference for the residual &
>>>>>>> fitted
>>>>>>> value correlation.
>>>>>>>
>>>>>>> Some other info:
>>>>>>> 1) my overall model fit is not poor as indicated by the correlation
>>>>>>> between
>>>>>>> fitted values & Y. It is around 0.8;
>>>>>>> 2) most variables in my model has a normal, or at least symmetrical,
>>>>>>> distribution.
>>>>>>> 3) conditional residuals are normally distributed as shown in
>>>>>>> qqplots.
>>>>>>> 4) conditional residuals are not correlated with any fixed effects,
>>>>>>> such
>>>>>>> as
>>>>>>> Drink or Age.
>>>>>>>
>>>>>>> I have two guesses as to what is going on:
>>>>>>> 1) maybe the fact that each family is a different size actually
>>>>>>> violates
>>>>>>> assumptions of the model?
>>>>>>> 2) or maybe there is something wrong with estimation of the random
>>>>>>> effect
>>>>>>> (family intercept)?
>>>>>>>
>>>>>>> I'd really appreciate your insights as to what is going on here and
>>>>>>> if
>>>>>>> there is any problems with my model.
>>>>>>>
>>>>>>> Thank you very much,
>>>>>>> Cherry
>>>>>>>
>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> .
> 
--

_____________________________________________________________________

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Rainer Schoppik
_____________________________________________________________________

SAVE PAPER - THINK BEFORE PRINTING



More information about the R-sig-mixed-models mailing list