[R] GAMs and GAMMS with correlated acoustic data
Simon Wood
s.wood at bath.ac.uk
Mon Nov 17 10:46:10 CET 2008
David,
Are you using the normalized residuals from the $lme part of the gam object
(i.e. something like residuals(foo$lme,type="normalized"))? Without
standardization the raw residuals will look pretty much as bad for the gamm
as they did for the gam (actually they might even lookl a little worse).
best,
simon
On Saturday 15 November 2008 17:19, David M Warner wrote:
> Greetings
> This is a long email.
>
> I'm struggling with a data set comprising 2,278 hydroacoustic estimates of
> fish biomass density made along line transects in two lakes (lakes
> Michigan and Huron, three years in each lake). The data represent
> lakewide surveys in each year and each data point represents the estimate
> for a horizontal interval 1 km in length.
>
> I'm interested in comparing biomass density and bathymetric distribution
> (bottom depth) in the two lakes and there is graphical evidence of a
> non-linear relationship between biomass density and bottom depth. Hence
> my interest in GAMs.
>
> Predictors of primary interest are lake (factor) and bottom depth
> (continuous).
>
> The fish data are autocorrelated at varying ranges, depending on species
> and year. I've tested this using correlog (package ncf)
>
> The bottom depth data are also highly autocorrelated.
>
> Because of the autocorrelations in data, autocorrelations in GAM residuals
> (up to 20 lags in some cases), patterns in residual plots from GAM models,
> and very narrow confidence intervals for the predictions, I feel that GAM
> results are biased and have attempted to use GAMM.
>
> Data and procedure examples:
> #> fish[1:10, ]
> Transect yaoalebiom yaosmeltbiom yaobloaterbiom year depth lake x
> y interval
> 1 nn_1 12.019655 34.910370110 2.647370 2005 97.07525 2
> 526601.8 4850206 1
> 2 nn_1 12.164686 35.331548810 3.982028 2005 98.37024 2
> 526742.2 4849339 2
> 3 nn_1 11.176009 32.460052230 1.646604 2005 99.98218 2
> 526886.9 4848348 3
> 4 nn_1 0.000000 0.036457091 5.306225 2005 81.44616 2
> 526993.4 4850849 4
> 5 nn_1 40.808118 10.988825410 3.222485 2005 101.45707 2
> 526997.5 4847359 5
> 6 nn_1 6.273421 18.176753520 18.832348 2005 98.69197 2
> 527084.1 4846366 6
> 7 nn_1 6.225799 16.050983390 66.941892 2005 94.14283 2
> 527214.7 4845372 7
> 8 nn_1 7.322910 19.001196850 47.273341 2005 91.21771 2
> 527331.6 4844636 8
> 9 nn_1 0.000000 0.067646462 20.912908 2005 87.76123 2
> 527495.9 4843390 9
> 10 nn_1 0.000000 0.006012106 26.611785 2005 87.59767 2
> 527606.6 4842426 10
>
> #GAM example
> bloat.gam8 <- gam(log10(yaobloaterbiom+0.00325) ~ lakef +s(depth,
> by=lakef), data=fish3)
>
> #GAMM example:
> bloat.gamm1 <- gamm(log10(yaobloaterbiom+0.00325) ~ lakef + s(depth,
> by=lakef), correlation=corAR1(form = ~ interval|tranf), data=fish3)
>
> However, GAMM results from models including a wide variety of correlation
> structures (corExp, CorSpher, CorLin, AR1, ARMA) produce autocorrelated
> residuals (similar lag range as GAM), patterns in residuals plots, and
> confidence intervals for predictions that are only slightly large than for
> GAMs. This suggests to me that GAMM is not performing much better than
> GAM (or I've not specified models correctly).
>
> Is my assessment of the GAMM performance reasonable? None of the models
> (GAM or GAMM) explain much of the deviance (~20%).
>
> I'm interested in an information-theoretic approach to selecting the best
> model from a set of possible models (AICc, dAICc, AICc weights), but
> cannot run some of the GAM models with GAM because they lack a random
> term. I'm not sure how to use the GAMM output to compare the models I can
> run with this procedure.
>
> Finally, as a last resort, I've subsampled the original data set so that I
> have 1 record per transect per lake per year for a total N=99.
>
> I get different "best models" from GAM (original data) GAMM (original data
> but including correlation structure), and GAM (subsetted data). Selection
> of different models leads to fairly different conclusions about the
> similarities and differences between the lakes.
>
> I'm not sure where to go with this as a result.
>
> Any thoughts/comments would be appreciated.
> Dave
>
>
>
>
>
>
> David Warner
> Research Fishery Biologist
> USGS Great Lakes Science Center
> 1451 Green Road
> Ann Arbor MI 48105
> 734.214.9392
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.
--
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603 www.maths.bath.ac.uk/~sw283
More information about the R-help
mailing list