[R] conducting GAM-GEE within gamm4?

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue May 8 11:34:18 CEST 2012


On Thu, 2012-05-03 at 08:32 -0500, Nathan Furey wrote:
> Dear R-help users,
> 
> I am trying to analyze some visual transect data of organisms to generate a
> habitat distribution model. Once organisms are sighted, they are followed
> as point data is collected at a given time interval.  Because of the
> autocorrelation among these "follows," I wish to utilize a GAM-GEE approach
> similar to that of Pirotta et al. 2011, using packages 'yags' and 'splines'
> (http://www.int-res.com/abstracts/meps/v436/p257-272/).  Their R scripts
> are shown here (
> http://www.int-res.com/articles/suppl/m436p257_supp/m436p257_supp1-code.r).
> I have used this code with limited success and multiple issues of models
> failing to converge.
> 
<snip />
> Following this example, the following is the code I used for my dataset
> 
> b <- gamm4(dolphin_presence~s(dist_slag)+s(Depth),random=(form=~1|block),
> family=binomial(),data=dat)

I'm not familiar with gamm4 but a quick glance at the manual and
examples suggests to me that you are expressing the random effects
argument `random` incorrectly, perhaps because you were reading an
example for the gamm() function in package mgcv?

`random` expects a formula so just pass it as `random = ~ (1|block)`.

The reason for `form = ~ 1 | block` in gamm() is that the corAR1()
function for AR(1) correlation structures has more arguments than just
the `formula` one and the `formula` argument is not the first argument
to that function. But this has nothing to do with the `random` argument
of the gamm4() function.

If you make the above change, gamm4() should take into account the
grouping structure of your data.

HTH

G

> However, by examining the output (summary(b$gam)) and specifically
> summary(b$mer)), I am either unsure of how to interpret the results,
> or do not believe that the autocorrelation within the group is being
> taken into consideration.
> 
> 
> > summary(b$gam)
> 
> Family: binomial
> Link function: logit
> 
> Formula:
> dolphin_presence ~ s(dist_slag) + s(Depth)
> 
> Parametric coefficients:
> 
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -13.968      5.145  -2.715  0.00663 **
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> 
> Approximate significance of smooth terms:
> 
>                edf Ref.df Chi.sq  p-value
> s(dist_slag) 4.943  4.943  70.67 6.85e-14 ***
> s(Depth)     6.869  6.869 115.59  < 2e-16 ***
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> 
> R-sq.(adj) =  0.317glmer.ML score =  10504  Scale est. = 1         n = 10792
> >
> 
> > summary(b$mer)
> Generalized linear mixed model fit by the Laplace approximation
> 
>    AIC   BIC logLik deviance
>  10514 10551  -5252    10504
> Random effects:
>  Groups Name         Variance Std.Dev.
>  Xr     s(dist_slag) 1611344  1269.39
>  Xr.0   s(Depth)       98622   314.04
> Number of obs: 10792, groups: Xr, 8; Xr.0, 8
> 
> Fixed effects:
>                  Estimate Std. Error z value Pr(>|z|)
> X(Intercept)      -13.968      5.145  -2.715  0.00663 **
> Xs(dist_slag)Fx1  -35.871     33.944  -1.057  0.29063
> Xs(Depth)Fx1        3.971      3.740   1.062  0.28823
> 
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> 
> Correlation of Fixed Effects:
>             X(Int) X(_)F1
> Xs(dst_s)F1  0.654
> Xs(Dpth)Fx1 -0.030  0.000
> >
> 
> 
> How do I ensure that autocorrelation is indeed being accounted for
> within each unique value of the "block" variable? What is the simplest
> way to interpret the output for "summary(b$mer)"?
> 
> The results do differ from a normal gam (package mgcv) using the same
> variables and parameters without the "correlation=..." term,
> indicating that *something *different is occurring.
> 
> However, when I use a different variable for the correlation term
> (season), I get the SAME output:
> 
> > dat2 <- data.frame(dist_slag = dat$dist_slag, Depth = dat$Depth, dolphin_presence = dat$dolphin_presence,
> + block = dat$block, season=dat$season)
> > head(dat2)
>   dist_slag    Depth dolphin_presence block season
> 1  26475.47 -10.0934                0     1      F
> 2  26340.47 -10.4870                0     1      F
> 3  25886.33 -16.5752                0     1      F
> 4  25399.88 -22.2474                0     1      F
> 5  24934.29 -29.6797                0     1      F
> 6  24519.90 -26.2370                0     1      F
> 
> > summary(dat2$season)
>    F    S
> 3224 7568
> 
> 
> > b <- gamm4(dolphin_presence~s(dist_slag)+s(Depth),correlation=corAR1(1, form=~1 | season), family=binomial(),data=dat2)
> > summary(b$gam)
> 
> Family: binomial
> Link function: logit
> 
> Formula:
> dolphin_presence ~ s(dist_slag) + s(Depth)
> 
> Parametric coefficients:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -13.968      5.145  -2.715  0.00663 **
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> 
> Approximate significance of smooth terms:
>                edf Ref.df Chi.sq  p-value
> s(dist_slag) 4.943  4.943  70.67 6.85e-14 ***
> s(Depth)     6.869  6.869 115.59  < 2e-16 ***
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> 
> R-sq.(adj) =  0.317glmer.ML score =  10504  Scale est. = 1         n = 10792
> > summary(b$mer)
> Generalized linear mixed model fit by the Laplace approximation
>    AIC   BIC logLik deviance
>  10514 10551  -5252    10504
> Random effects:
>  Groups Name         Variance Std.Dev.
>  Xr     s(dist_slag) 1611344  1269.39
>  Xr.0   s(Depth)       98622   314.04
> Number of obs: 10792, groups: Xr, 8; Xr.0, 8
> 
> Fixed effects:
>                  Estimate Std. Error z value Pr(>|z|)
> X(Intercept)      -13.968      5.145  -2.715  0.00663 **
> Xs(dist_slag)Fx1  -35.871     33.944  -1.057  0.29063
> Xs(Depth)Fx1        3.971      3.740   1.062  0.28823
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> 
> Correlation of Fixed Effects:
>             X(Int) X(_)F1
> Xs(dst_s)F1  0.654
> Xs(Dpth)Fx1 -0.030  0.000
> >
> 
> 
> I just want to make sure it is correctly allowing for correlation
> within each value for the "block" variable. How do I formulate the
> model to say that autocorrelation can exist within each single value
> for block, but assume independence among blocks?
> 
> 
> On another note, I am also receiving the following warning message
> after model completion for larger models (with many more variables
> than 2):
> 
> Warning message:
> In mer_finalize(ans) : false convergence (8)
> 
> 
> 
> 
> Thank you for your time, effort, and patience dealing with a novice.
> 
> Sincerely,
> 
> Nathan
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list