[R-sig-ME] compare two GAMM4 models using AICs

Tue Oct 24 23:54:28 CEST 2017

Hello everyone,

Dr Zuur, thank you so much for your answer.

First, I made a mistake and I forgot to specify that it was a Poisson model with an offset term.

I asked my question because I saw this: http://qcbs.ca/wiki/r_workshop8:

"How do we test whether the non-linear model offers a significant improvement over the linear model? We can use the gam() and anova() commands to formally test whether an assumption of linearity is justified. To do, we must simply set our smoothed model so that it is nested in our linear model; that is, we create model object that includes both x (linear) and s(x) (non-linear) and we ask whether adding s(x) to the model with only x as a covariate is supported by the data."

|" Test for linearity<http://qcbs.ca/wiki/_export/code/r_workshop8?codeblock=4>

linear_model = gam(y_obs~x)
nested_gam_model = gam(y_obs~s(x)+x)
print<http://stat.ethz.ch/R-manual/R-devel/library/base/html/print.html>(anova<http://stat.ethz.ch/R-manual/R-devel/library/stats/html/anova.html>(linear_model, nested_gam_model, test="Chisq"))"

Given that in their case it was a GAM model, I was not sure how to do something similar in a gamm4 context. I searched online but I could not find an answer.

My model looks like this:

mg = gamm4(y ~ s(age)+offset(expy), random=~(1|g)+(1|s), data=s25h, family= poisson)

summary(mg$gam)

Family: poisson
Link function: log

Formula:
y ~ s(age) + offset(expy)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -7.3922     0.1065  -69.41   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
            edf Ref.df    Chi.sq p-value
s(age)   1      1         0.13   0.719

R-sq.(adj) =  -8.44e-05
glmer.ML = 2075.4  Scale est. = 1         n = 10523

summary(mg$mer)

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: poisson  ( log )

     AIC      BIC   logLik deviance df.resid
  5294.9   5331.2  -2642.4   5284.9    10518

Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.8449 -0.0806 -0.0786 -0.0775  4.9384

Random effects:
 Groups  Name        Variance Std.Dev.
 g  (Intercept) 2.868    1.694
 s (Intercept) 3.820    1.954
 Xr      s(age) 0.000    0.000
Number of obs: 10523, groups:  g, 1785; s, 1768; Xr, 8

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)
X(Intercept)    -7.39218    0.19834  -37.27   <2e-16 ***
Xs(age)Fx1  0.03669    0.08418    0.44    0.663
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            X(Int)
Xs(age)F1 0.024

Is this the right way to interpret it:

1) from the gam part I would conclude that the spline for the variable age has an estimated degree of freedom of 1, indicating that I might not need a spline. However, I understood that one can have a edf of 1 and there might still be a need for including a spline in the model (this part is very ambiguous for me, as I would think this alone should be indicative of a linear association). Either way, the spline term is not significantly associated with my outcome.

2) from the mer part I would conclude that the linear part of the variable age is not associated with my outcome, either.

Based on this model, I can safely assume that age can be included in my regression model as a linear term, given there is nothing to suggest the association with my outcome is non-parametrical.

I would then check the residuals, as you suggested, and if there is not any pattern, I should be ok without including a spline then.

Does this sound like a reasonable explanation for not including a spline term in my regression model?

Thank you so much. I apologize for these silly questions. I am just trying to assimilate way too much at once and I second guess everything I do since these are all difficult concepts to grasp for a beginner.

Best,
DaniNM

________________________________
From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> on behalf of Highland Statistics Ltd <highstat at highstat.com>
Sent: Tuesday, October 24, 2017 2:00 PM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] compare two GAMM4 models using AICs

>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 24 Oct 2017 19:49:50 +0000
>> From: dani <orchidn at live.com>
>> To: "r-sig-mixed-models at r-project.org"
>>     <r-sig-mixed-models at r-project.org>
>> Subject: [R-sig-ME] compare two GAMM4 models using AICs
>> Message-ID:
>>     <MWHPR1201MB0029CE3D9C5EC1956640DB70D6470 at MWHPR1201MB0029.namprd12.prod.outlook.com>
>>
>>
>> Content-Type: text/plain; charset="UTF-8"
>>
>> Hello everyone,
>>
>>
>> I am fitting two gamm4 models because I would like to see whether
>> there is justification for including a spline term for x1. Can this
>> be done by comparing the AICs for the underlying mixed models (i.e.,
>> the "mer" part) of the two models?
>
>

Technically it won't crash..."so it can be done"..but I am not sure
whether you want to do this. Internally, the smoother is written as a
mixed model (X * b + Z * u)....and those random effects (which is part
of the smoother) don't count towards the number of parameters.

>> b1 <- gamm4(y~x1+offset(e),data=dat,random=~(1|fac))
>
>> b2 <- gamm4(y~x1+s(x1)+offset(e),data=dat,random=~(1|fac))
>
>
>

I am confused about your use of an offset in a Gaussian model, and I am
confused why you would use x1 and s(x1) in the same model. The s(x1)
already contains the linear part of the smoother.

Why not fit the first model and inspect residuals for any patterns? If
there are, then using a smoother is an option.

Kind regards,

Alain Zuur

>
>>
>> summary(b1$gam)
>>
>> summary(b1$mer)
>>
>>
>> summary(b2$gam)
>>
>> summary(b2$mer)
>>
>>
>> AIC(b1$mer)
>>
>> AIC(b2$mer)
>>
>>
>> Thank you very much!
>>
>> Best,
>>
>> DaniNM
>>
>> <http://aka.ms/weboutlook>
>>
>>     [[alternative HTML version deleted]]
>>
>>
>

--

Dr. Alain F. Zuur
Highland Statistics Ltd.
9 St Clair Wynd
AB41 6DZ Newburgh, UK
Email: highstat at highstat.com
URL:   www.highstat.com<http://www.highstat.com>
Highland Statistics Ltd.<http://www.highstat.com/>
www.highstat.com
Statistical consultancy, data analysis and software development. Specialized in time series analysis. Located in Scotland.

And:
NIOZ Royal Netherlands Institute for Sea Research,
Department of Coastal Systems, and Utrecht University,
P.O. Box 59, 1790 AB Den Burg,
Texel, The Netherlands

Author of:
1. Beginner's Guide to Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA. (2017).
2. Beginner's Guide to Zero-Inflated Models with R (2016).
3. Beginner's Guide to Data Exploration and Visualisation with R (2015).
4. Beginner's Guide to GAMM with R (2014).
5. Beginner's Guide to GLM and GLMM with R (2013).
6. Beginner's Guide to GAM with R (2012).
7. Zero Inflated Models and GLMM with R (2012).
8. A Beginner's Guide to R (2009).
9. Mixed effects models and extensions in ecology with R (2009).
10. Analysing Ecological Data (2007).

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
R-sig-mixed-models Info Page - stat.ethz.ch<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
stat.ethz.ch
Your email address: Your name (optional): You may enter a privacy password below. This provides only mild security, but should prevent others from messing ...

	[[alternative HTML version deleted]]