[R] mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Wed May 23 10:59:04 CEST 2012

Martijn,

I don't think there is one right answer to this. If you look at things 
in the way that one would usually view a smooth model then m2 is both 
simpler (lower EDF) and fits better, so is simply a better model (if the 
simpler model fits better then why would you not use it?).

But of course `simpler' depends on whether you view the random effect as 
counting for one parameter, or for it's `effective degrees of freedom'. 
  If it's the former then you should probably fit models using 
method="ML" and compare via a GLRT test using the ML score, or simply 
drop the fixed effect if its p-value according to anova(m2) is too high.

I would not use anova(m1,m2) in this case, because of the difficulty in 
interpreting the random effects as being equivalent to un-penalized 
effects with rank equal to the random effect edfs.

best,
Simon

On 11/05/12 17:50, Martijn Wieling wrote:
> Dear Simon,
>
> Thanks for your concise reply, this is very helpful.
>
> With respect to my second question, however, I was not entirely clear
> - or perhaps I'm misunderstanding your answer. What I meant is:
> suppose I have a model with a random effect s(X, bs="re"). Now I want
> to test if a certain (fixed-effect) predictor A improves the model.
>
> I therefore compare:
> m1 = gam(Y ~ s(X,bs="re"), data=dat)
> m2 = gam(Y ~ A + s(X,bs="re"), data=dat)
>
> What I didn't make explicit before is that A in the model summary of
> m2 does not reach significance (e.g., p = 0.2). Comparing the models
> m1 and m2, shows that m1 is the more complex model (as adding A
> decreases the edf's invested in the ranef spline with more than 1),
> and m1 is not significantly better than m2. Now my question is, should
> I keep m2, even though A is not significant itself? Or should I ignore
> the result of anova(m1,m2) anyway, given that this comparison is not
> suitable when comparing models including random effects (as you argue
> regarding my first question)?
>
> If that is the case and the anova is not usable to compare m1 and m2
> due to the random effect parameter, note that the same can occur
> without random effects but when a non-linearity is included such as
> s(Longitude,Latitude). What then is appropriate: keep m1 (which is
> more complex), or use m2 (which has a less complex non-linearity, but
> includes an additional non-significant fixed-effect factor).
>
> With kind regards,
> Martijn
>
> --
> *******************************************
> Martijn Wieling
> http://www.martijnwieling.nl
> wieling at gmail.com
> +31(0)614108622
> *******************************************
> University of Groningen
> http://www.rug.nl/staff/m.b.wieling
> *******************************************

-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283