[R] mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Mon Jun 11 15:56:32 CEST 2012

Dear Simon,

I ran an additional analysis using bam (mgcv 1.7-17) with three random
intercepts and no non-linearities, and compared these to the results
of lmer (lme4). Using bam results in a significant random intercept
(even though it has a very low edf-value), while the lmer results show
no variance associated to the random intercept of Placename. Should I
drop the random intercept of Placename and if so, how is this apparent
from the results of bam?

Summaries of both models are shown below.

With kind regards,
Martijn

#### l1 = lmer(RefPMIdistMeanLog.c ~ geogamfit + (1|Word) + (1|Key) +
(1|Placename), data=wrddst); print(l1,cor=F)

Linear mixed model fit by REML
Formula: RefPMIdistMeanLog.c ~ geogamfit + (1 | Word) + (1 | Key) + (1
|      Placename)
   Data: wrddst
    AIC    BIC logLik deviance REMLdev
 -44985 -44927  22498   -45009  -44997
Random effects:
 Groups    Name        Variance  Std.Dev.
 Word      (Intercept) 0.0800944 0.283009
 Key       (Intercept) 0.0013641 0.036933
 Placename (Intercept) 0.0000000 0.000000
 Residual              0.0381774 0.195390
Number of obs: 112608, groups: Word, 357; Key, 320; Placename, 40

Fixed effects:
            Estimate Std. Error t value
(Intercept) -0.00342    0.01513   -0.23
geogamfit    0.99249    0.02612   37.99

#### m1 = bam(RefPMIdistMeanLog.c ~ geogamfit + s(Word,bs="re") +
s(Key,bs="re") + s(Placename,bs="re"), data=wrddst,method="REML");
summary(m1,freq=F)

Family: gaussian
Link function: identity

Formula:
RefPMIdistMeanLog.c ~ geogamfit + s(Word, bs = "re") + s(Key,
    bs = "re") + s(Placename, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.00342    0.01513  -0.226    0.821
geogamfit    0.99249    0.02612  37.991   <2e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

Approximate significance of smooth terms:
                   edf Ref.df       F p-value
s(Word)      3.554e+02    347 634.716  <2e-16 ***
s(Key)       2.946e+02    316  23.054  <2e-16 ***
s(Placename) 1.489e-04     38   7.282  <2e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

R-sq.(adj) =  0.693   Deviance explained = 69.4%
REML score = -22498  Scale est. = 0.038177  n = 112608

On Wed, May 23, 2012 at 11:30 AM, Simon Wood-4 [via R]
<ml-node+s789695n4631060h52 at n4.nabble.com> wrote:
> Having looked at this further, I've made some changes in mgcv_1.7-17 to
> the p-value computations for terms that can be penalized to zero during
> fitting (e.g. s(x,bs="re"), s(x,m=1) etc).
>
> The Wald statistic based p-values from summary.gam and anova.gam (i.e.
> what you get from e.g. anova(a) where a is a fitted gam object) are
> quite well founded for smooth terms that are non-zero under full
> penalization (e.g. a cubic spline is a straight line under full
> penalization). For such smooths, an extension of Nychka's (1988) result
> on CI's for splines gives a well founded distributional result on which
> to base a Wald statistic. However, the Nychka result requires the
> smoothing bias to be substantially less than the smoothing estimator
> variance, and this will often not be the case if smoothing can actually
> penalize a term to zero (to understand why, see argument in appendix of
> Marra & Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).
>
> Simulation testing shows that this theoretical concern has serious
> practical consequences. So for terms that can be penalized to zero,
> alternative approximations have to be used, and these are now
> implemented in mgcv_1.7-17 (see ?summary.gam).
>
> The approximate test performed by anova(a,b) (a and b are fitted "gam"
> objects) is less well founded. It is a reasonable approximation when
> each smooth term in the models could in principle be well approximated
> by an unpenalized term of rank approximately equal to the edf of the
> smooth term, but otherwise the p-values produced are likely to be much
> too small. In particular simulation testing suggests that the test is
> not to be trusted with s(...,bs="re") terms, and can be poor if the
> models being compared involve any terms that can be penalized to zero
> during fitting. (Although the mechanisms are a little different, this is
> similar to the problem we would have if the models were viewed as
> regular mixed models and we tried to use a GLRT to test variance
> components for equality to zero).
>
> These issues are now documented in ?anova.gam and ?summary.gam...
>
> Simon
>
> On 08/05/12 15:01, Martijn Wieling wrote:
>
>> Dear useRs,
>>
>> I am using mgcv version 1.7-16. When I create a model with a few
>> non-linear terms and a random intercept for (in my case) country using
>> s(Country,bs="re"), the representative line in my model (i.e.
>> approximate significance of smooth terms) for the random intercept
>> reads:
>>                          edf       Ref.df     F          p-value
>> s(Country)       36.127 58.551   0.644    0.982
>>
>> Can I interpret this as there being no support for a random intercept
>> for country? However, when I compare the simpler model to the model
>> including the random intercept, the latter appears to be a significant
>> improvement.
>>
>>> anova(gam1,gam2,test="F")
>> Model 1: ....
>> Model 2: .... + s(BirthNation, bs="re")
>>    Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
>> 1    789.44     416.54
>> 2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***
>>
>> I hope somebody could help me in how I should proceed in these
>> situations. Do I include the random intercept or not?
>>
>> I also have a related question. When I used to create a mixed-effects
>> regression model using lmer and included e.g., an interaction in the
>> fixed-effects structure, I would test if the inclusion of this
>> interaction was warranted using anova(lmer1,lmer2). It then would show
>> me that I invested 1 additional df and the resulting (possibly
>> significant) improvement in fit of my model.
>>
>> This approach does not seem to work when using gam. In this case an
>> apparent investment of 1 degree of freedom for the interaction, might
>> result in an actual decrease of the degrees of freedom invested by the
>> total model (caused by a decrease of the edf's of splines in the model
>> with the interaction). In this case, how would I proceed in
>> determining if the model including the interaction term is better?
>>
>> With kind regards,
>> Martijn Wieling
>>
>> --
>> *******************************************
>> Martijn Wieling
>> http://www.martijnwieling.nl
>> [hidden email]
>> +31(0)614108622
>> *******************************************
>> University of Groningen
>> http://www.rug.nl/staff/m.b.wieling
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
> +44 (0)1225 386603               http://people.bath.ac.uk/sw283
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/mgcv-inclusion-of-random-intercept-in-model-based-on-p-value-of-smooth-or-anova-tp4617585p4631060.html
> To unsubscribe from mgcv: inclusion of random intercept in model - based on
> p-value of smooth or anova?, click here.
> NAML