[R-sig-ME] compare two GAMM4 models using AICs

Gavin Simpson ucfagls at gmail.com
Thu Nov 2 01:56:34 CET 2017


That isn't the recommended test for a parametric term vesus and smooth
alternative. A few weeks ago I emailed the authors, of the script you
link to, to point out the issue and provided a correction.

The alternative model should be

y ~ x + s(x, bs = 'tp', m = c(2,0))

As Alain mentioned, s(x) contains a basis function the corresponds to
a linear effect of x. What the version I show above does is as for a
second-order thin plate spline (so far so default) but without a null
space (the 0 in `m`). The null space of the spline is the part of the
basis that includes all the perfectly smooth basis functions; the
constant basis function and the linear basis function -- IIRC the
constant term will have been removed by the identifiability
constraints for the model intercept.

The above was cribbed from section 6.12.3 of the Second Edition of
Simon Wood's GAM book. Hopefully the tutorial script you linked to
will be updated soon.

If you want to double check some of the GAMM fits, you can fit your
model as a simple GAM with "random effect" splines:

gam(y ~ s(age) + s(g, bs='re') + s(s, bs = 're') + offset(expy), family=poisson)

which fits random intercepts in levels of g and levels of s. (I can't
recall what that lme4 notation actually produces in terms of random
effects - the gam() version I show will be uncorrelated.) You also
have a few more choices for conditional distribution of the response
via family = nb for Neg Bin, plus there are some zero-inflated Poisson
options if needed.

HTH

Gavin

On 24 October 2017 at 15:54, dani <orchidn at live.com> wrote:
> Hello everyone,
>
>
> Dr Zuur, thank you so much for your answer.
>
> First, I made a mistake and I forgot to specify that it was a Poisson model with an offset term.
>
>
> I asked my question because I saw this: http://qcbs.ca/wiki/r_workshop8:
>
>
> "How do we test whether the non-linear model offers a significant improvement over the linear model? We can use the gam() and anova() commands to formally test whether an assumption of linearity is justified. To do, we must simply set our smoothed model so that it is nested in our linear model; that is, we create model object that includes both x (linear) and s(x) (non-linear) and we ask whether adding s(x) to the model with only x as a covariate is supported by the data."
>
> |" Test for linearity<http://qcbs.ca/wiki/_export/code/r_workshop8?codeblock=4>
>
> linear_model = gam(y_obs~x)
> nested_gam_model = gam(y_obs~s(x)+x)
> print<http://stat.ethz.ch/R-manual/R-devel/library/base/html/print.html>(anova<http://stat.ethz.ch/R-manual/R-devel/library/stats/html/anova.html>(linear_model, nested_gam_model, test="Chisq"))"
>
> Given that in their case it was a GAM model, I was not sure how to do something similar in a gamm4 context. I searched online but I could not find an answer.
>
>
> My model looks like this:
>
> mg = gamm4(y ~ s(age)+offset(expy), random=~(1|g)+(1|s), data=s25h, family= poisson)
>
>
> summary(mg$gam)
>
> Family: poisson
> Link function: log
>
> Formula:
> y ~ s(age) + offset(expy)
>
> Parametric coefficients:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -7.3922     0.1065  -69.41   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Approximate significance of smooth terms:
>             edf Ref.df    Chi.sq p-value
> s(age)   1      1         0.13   0.719
>
> R-sq.(adj) =  -8.44e-05
> glmer.ML = 2075.4  Scale est. = 1         n = 10523
>
> summary(mg$mer)
>
> Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
>  Family: poisson  ( log )
>
>      AIC      BIC   logLik deviance df.resid
>   5294.9   5331.2  -2642.4   5284.9    10518
>
> Scaled residuals:
>     Min      1Q  Median      3Q     Max
> -1.8449 -0.0806 -0.0786 -0.0775  4.9384
>
> Random effects:
>  Groups  Name        Variance Std.Dev.
>  g  (Intercept) 2.868    1.694
>  s (Intercept) 3.820    1.954
>  Xr      s(age) 0.000    0.000
> Number of obs: 10523, groups:  g, 1785; s, 1768; Xr, 8
>
> Fixed effects:
>                 Estimate Std. Error z value Pr(>|z|)
> X(Intercept)    -7.39218    0.19834  -37.27   <2e-16 ***
> Xs(age)Fx1  0.03669    0.08418    0.44    0.663
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Correlation of Fixed Effects:
>             X(Int)
> Xs(age)F1 0.024
>
>
> Is this the right way to interpret it:
>
> 1) from the gam part I would conclude that the spline for the variable age has an estimated degree of freedom of 1, indicating that I might not need a spline. However, I understood that one can have a edf of 1 and there might still be a need for including a spline in the model (this part is very ambiguous for me, as I would think this alone should be indicative of a linear association). Either way, the spline term is not significantly associated with my outcome.
>
> 2) from the mer part I would conclude that the linear part of the variable age is not associated with my outcome, either.
>
> Based on this model, I can safely assume that age can be included in my regression model as a linear term, given there is nothing to suggest the association with my outcome is non-parametrical.
>
> I would then check the residuals, as you suggested, and if there is not any pattern, I should be ok without including a spline then.
>
> Does this sound like a reasonable explanation for not including a spline term in my regression model?
>
> Thank you so much. I apologize for these silly questions. I am just trying to assimilate way too much at once and I second guess everything I do since these are all difficult concepts to grasp for a beginner.
>
> Best,
> DaniNM
>
>
>
> ________________________________
> From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> on behalf of Highland Statistics Ltd <highstat at highstat.com>
> Sent: Tuesday, October 24, 2017 2:00 PM
> To: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] compare two GAMM4 models using AICs
>
>
>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Tue, 24 Oct 2017 19:49:50 +0000
>>> From: dani <orchidn at live.com>
>>> To: "r-sig-mixed-models at r-project.org"
>>>     <r-sig-mixed-models at r-project.org>
>>> Subject: [R-sig-ME] compare two GAMM4 models using AICs
>>> Message-ID:
>>>     <MWHPR1201MB0029CE3D9C5EC1956640DB70D6470 at MWHPR1201MB0029.namprd12.prod.outlook.com>
>>>
>>>
>>> Content-Type: text/plain; charset="UTF-8"
>>>
>>> Hello everyone,
>>>
>>>
>>> I am fitting two gamm4 models because I would like to see whether
>>> there is justification for including a spline term for x1. Can this
>>> be done by comparing the AICs for the underlying mixed models (i.e.,
>>> the "mer" part) of the two models?
>>
>>
>
> Technically it won't crash..."so it can be done"..but I am not sure
> whether you want to do this. Internally, the smoother is written as a
> mixed model (X * b + Z * u)....and those random effects (which is part
> of the smoother) don't count towards the number of parameters.
>
>>> b1 <- gamm4(y~x1+offset(e),data=dat,random=~(1|fac))
>>
>>> b2 <- gamm4(y~x1+s(x1)+offset(e),data=dat,random=~(1|fac))
>>
>>
>>
>
>
> I am confused about your use of an offset in a Gaussian model, and I am
> confused why you would use x1 and s(x1) in the same model. The s(x1)
> already contains the linear part of the smoother.
>
> Why not fit the first model and inspect residuals for any patterns? If
> there are, then using a smoother is an option.
>
> Kind regards,
>
> Alain Zuur
>
>
>>
>>>
>>> summary(b1$gam)
>>>
>>> summary(b1$mer)
>>>
>>>
>>> summary(b2$gam)
>>>
>>> summary(b2$mer)
>>>
>>>
>>> AIC(b1$mer)
>>>
>>> AIC(b2$mer)
>>>
>>>
>>> Thank you very much!
>>>
>>> Best,
>>>
>>> DaniNM
>>>
>>> <http://aka.ms/weboutlook>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>>
>>
>
> --
>
> Dr. Alain F. Zuur
> Highland Statistics Ltd.
> 9 St Clair Wynd
> AB41 6DZ Newburgh, UK
> Email: highstat at highstat.com
> URL:   www.highstat.com<http://www.highstat.com>
> Highland Statistics Ltd.<http://www.highstat.com/>
> www.highstat.com
> Statistical consultancy, data analysis and software development. Specialized in time series analysis. Located in Scotland.
>
>
>
>
> And:
> NIOZ Royal Netherlands Institute for Sea Research,
> Department of Coastal Systems, and Utrecht University,
> P.O. Box 59, 1790 AB Den Burg,
> Texel, The Netherlands
>
>
>
> Author of:
> 1. Beginner's Guide to Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA. (2017).
> 2. Beginner's Guide to Zero-Inflated Models with R (2016).
> 3. Beginner's Guide to Data Exploration and Visualisation with R (2015).
> 4. Beginner's Guide to GAMM with R (2014).
> 5. Beginner's Guide to GLM and GLMM with R (2013).
> 6. Beginner's Guide to GAM with R (2012).
> 7. Zero Inflated Models and GLMM with R (2012).
> 8. A Beginner's Guide to R (2009).
> 9. Mixed effects models and extensions in ecology with R (2009).
> 10. Analysing Ecological Data (2007).
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> R-sig-mixed-models Info Page - stat.ethz.ch<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> stat.ethz.ch
> Your email address: Your name (optional): You may enter a privacy password below. This provides only mild security, but should prevent others from messing ...
>
>
>
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



-- 
Gavin Simpson, PhD



More information about the R-sig-mixed-models mailing list