[R-sig-ME] Testing for non-linearity and heterogeneity

Mon Jan 19 19:55:43 CET 2009

It appears that you want to model nonlinearity with a polynomial
function, that is test whether  the regression of Q on M is better
described by a quadratic than a linear relation. In addition, you test
whether there are reliable differences between units in the linear and
possibly also the quadratic trends.

Such an analysis can be nicely illustrated with the  "sleepstudy" data.

> fm1 <- lmer(Reaction ~ poly(Days,2) + (1 | Subject), sleepstudy)
> fm2 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,1) | Subject), sleepstudy)
> fm3 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,2) | Subject), sleepstudy)

Use poly() to specify the degree of the polynomial. fm1 allows only
varying intercepts, fm2 allows varying intercepts and varying linear
slopes, and fm3 allows varying intercepts, and varying linear and
quadratic trends.

You can use anova() to check whether addition of varying linear trends
and varying quadratic trends leads to significant improvement in
goodness of fit:
> anova(fm1,fm2,fm3)
Data: sleepstudy
Models:
fm1: Reaction ~ poly(Days, 2) + (1 | Subject)
fm2: Reaction ~ poly(Days, 2) + (poly(Days, 1) | Subject)
fm3: Reaction ~ poly(Days, 2) + (poly(Days, 2) | Subject)
    Df     AIC     BIC  logLik  Chisq Chi Df Pr(>Chisq)
fm1  5 1802.96 1818.92 -896.48
fm2  7 1764.32 1786.67 -875.16 42.642       2    5.5e-10 ***
fm3 10 1757.68 1789.61 -868.84 12.639      3   0.005487 **

Obviously, it does: AIC and BIC decline, logLik grows significantly.
Then, you inspect the CMs (conditional means for LMM; conditional
modes for GLMM and NLMM) with

> dotplot(ranef(fm3, postVar=TRUE)

This will show you that the subject 332 is a bit of an outlier for the
quadratic CMs. Once you remove this subject, there is no significant
reliable between-subject variance for the quadratic trend.

> fm1.2 <- lmer(Reaction ~ poly(Days,2) + (1 | Subject), sleepstudy, subset=Subject != 332)
> fm2.2 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,1) | Subject), sleepstudy, subset=Subject != 332)
> fm3.2 <- lmer(Reaction ~ poly(Days,2) + (poly(Days,2) | Subject), sleepstudy, subset=Subject != 332)

> anova(fm1.2, fm2.2, fm3.2)
      Df    AIC    BIC logLik   Chisq Chi Df Pr(>Chisq)
fm1.2  5 1676.6 1692.3 -833.3
fm2.2  7 1618.4 1640.3 -802.2 62.2002      2  3.115e-14 ***
fm3.2 10 1622.2 1653.6 -801.1  2.1971      3     0.5325

Interestingly, the fixed-effect quadratic trend is not significant
for the original data set, but after removal of Subject 332, it is
consistently so. Thus, if there were an independent reason that
justifies exclusion of Subject 332, Reaction appears to follow a
quadratic trend over days but only the mean (intercept) and the linear
trend across days varies reliably between subjects. (This analysis is
only meant as an illustration. I did not check carefully whether this
conclusion holds up under close scrutiny.)

Please note that with the above model specification the number of
random effects grows very quickly. You should not hold lmer
responsible if it fails to converge for your data. Douglas Bates
provided some alternative specifications in an earlier post to keep
the number of random effects limited. Generally, I think that any
question that appears to require a follow-up analysis of CMs can be
rephrased in such a way that it is appropriately represented in the
model from the outset.

Reinhold Kliegl

On Mon, Jan 19, 2009 at 1:36 PM, Nick Isaac <njbisaac at googlemail.com> wrote:
> Dear list members,
>
> I would appreciate some advice on how to model nonlinearity in my dataset.
>
> The dataset contains >1000 observations on variables Q and M. The
> observations (actually species) are partitioned into around 50 groups. Each
> group spans only a small part of the total range of M. The principle
> research focus is on the relationship between Q and M, and whether this
> relationship displays heterogenity among groups. This is relatively easy to
> explore using:
>
> m1 <- lmer( Q ~ M +  (1|g) )
> m2 <- lmer( Q ~ M +  (M|g) )
>
> So far so good. However, I also want to test some recent theories that Q~M
> should vary with M, particularly that the relationship between slope and M
> might be negative curvilinear.
>
> Until recently, I thought that it would be appropriate to extract the
> conditional modes from the model and seek correlations separately. However,
> recent posts by Reinhold Kliegl on this list have made it clear that this
> would not be appropriate.
>
> So I've been wondered how to model this in lmer. I was thinking of fitting
> 2nd order polynomials to model the curvilinearity:
>
> m3 <- lmer( Q ~ M + I((M+20)^2) +  (M|g) ) #M is centered on it's mean, so I
> need to add 20 to make all values positive
> m4 <- lmer( Q ~ M + I((M+20)^2) +  (M + I((M+20)^2)|g) ) )
> m5 <- lmer( Q ~ M + I((M+20)^2) + (M|g) + (I((M+20)^2)|g) ) )
>
> I'm not sure how I would interpret differences in the fit of these models,
> and I'm sure that other (probably better) solutions exist.
>
> It has also been proposed that variance in the Q~M relationship should
> decline with increasing M. Is this a question about heteroscedasticity that
> I need to model explicitly (the residuals of m2 are quite well-behaved)?
>
> I would be extremely grateful for any advice on this topic.
>
> Best wishes, Nick
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>