[R] mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?
Simon Wood
s.wood at bath.ac.uk
Wed May 23 11:28:20 CEST 2012
Having looked at this further, I've made some changes in mgcv_1.7-17 to
the p-value computations for terms that can be penalized to zero during
fitting (e.g. s(x,bs="re"), s(x,m=1) etc).
The Wald statistic based p-values from summary.gam and anova.gam (i.e.
what you get from e.g. anova(a) where a is a fitted gam object) are
quite well founded for smooth terms that are non-zero under full
penalization (e.g. a cubic spline is a straight line under full
penalization). For such smooths, an extension of Nychka's (1988) result
on CI's for splines gives a well founded distributional result on which
to base a Wald statistic. However, the Nychka result requires the
smoothing bias to be substantially less than the smoothing estimator
variance, and this will often not be the case if smoothing can actually
penalize a term to zero (to understand why, see argument in appendix of
Marra & Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).
Simulation testing shows that this theoretical concern has serious
practical consequences. So for terms that can be penalized to zero,
alternative approximations have to be used, and these are now
implemented in mgcv_1.7-17 (see ?summary.gam).
The approximate test performed by anova(a,b) (a and b are fitted "gam"
objects) is less well founded. It is a reasonable approximation when
each smooth term in the models could in principle be well approximated
by an unpenalized term of rank approximately equal to the edf of the
smooth term, but otherwise the p-values produced are likely to be much
too small. In particular simulation testing suggests that the test is
not to be trusted with s(...,bs="re") terms, and can be poor if the
models being compared involve any terms that can be penalized to zero
during fitting. (Although the mechanisms are a little different, this is
similar to the problem we would have if the models were viewed as
regular mixed models and we tried to use a GLRT to test variance
components for equality to zero).
These issues are now documented in ?anova.gam and ?summary.gam...
Simon
On 08/05/12 15:01, Martijn Wieling wrote:
> Dear useRs,
>
> I am using mgcv version 1.7-16. When I create a model with a few
> non-linear terms and a random intercept for (in my case) country using
> s(Country,bs="re"), the representative line in my model (i.e.
> approximate significance of smooth terms) for the random intercept
> reads:
> edf Ref.df F p-value
> s(Country) 36.127 58.551 0.644 0.982
>
> Can I interpret this as there being no support for a random intercept
> for country? However, when I compare the simpler model to the model
> including the random intercept, the latter appears to be a significant
> improvement.
>
>> anova(gam1,gam2,test="F")
> Model 1: ....
> Model 2: .... + s(BirthNation, bs="re")
> Resid. Df Resid. Dev Df Deviance F Pr(>F)
> 1 789.44 416.54
> 2 753.15 373.54 36.292 43.003 2.3891 1.225e-05 ***
>
> I hope somebody could help me in how I should proceed in these
> situations. Do I include the random intercept or not?
>
> I also have a related question. When I used to create a mixed-effects
> regression model using lmer and included e.g., an interaction in the
> fixed-effects structure, I would test if the inclusion of this
> interaction was warranted using anova(lmer1,lmer2). It then would show
> me that I invested 1 additional df and the resulting (possibly
> significant) improvement in fit of my model.
>
> This approach does not seem to work when using gam. In this case an
> apparent investment of 1 degree of freedom for the interaction, might
> result in an actual decrease of the degrees of freedom invested by the
> total model (caused by a decrease of the edf's of splines in the model
> with the interaction). In this case, how would I proceed in
> determining if the model including the interaction term is better?
>
> With kind regards,
> Martijn Wieling
>
> --
> *******************************************
> Martijn Wieling
> http://www.martijnwieling.nl
> wieling at gmail.com
> +31(0)614108622
> *******************************************
> University of Groningen
> http://www.rug.nl/staff/m.b.wieling
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603 http://people.bath.ac.uk/sw283
More information about the R-help
mailing list