[R] GAMs and GAMMS with correlated acoustic data

Simon Wood s.wood at bath.ac.uk
Mon Nov 17 10:46:10 CET 2008


David, 

Are you using the normalized residuals from the $lme part of the gam object 
(i.e. something like residuals(foo$lme,type="normalized"))? Without 
standardization the raw residuals will look pretty much as bad for the gamm 
as they did for the gam (actually they might even lookl a little worse).

best,
simon 

On Saturday 15 November 2008 17:19, David M Warner wrote:
> Greetings
> This is a long email.
>
> I'm struggling with a data set comprising 2,278 hydroacoustic estimates of
> fish biomass density made along line transects in two lakes (lakes
> Michigan and Huron, three years in each lake).  The data represent
> lakewide surveys in each year and each data point represents the estimate
> for a horizontal interval 1 km in length.
>
> I'm interested in comparing biomass density and bathymetric distribution
> (bottom depth) in the two lakes and there is graphical evidence of a
> non-linear relationship between biomass density and bottom depth.  Hence
> my interest in GAMs.
>
> Predictors of primary interest are lake (factor) and bottom depth
> (continuous).
>
> The fish data are autocorrelated at varying ranges, depending on species
> and year.  I've tested this using correlog (package ncf)
>
> The bottom depth data are also highly autocorrelated.
>
> Because of the autocorrelations in data, autocorrelations in GAM residuals
> (up to 20 lags in some cases), patterns in residual plots from GAM models,
> and very narrow confidence intervals for the predictions, I feel that GAM
> results are biased and have attempted to use GAMM.
>
> Data and procedure examples:
> #> fish[1:10, ]
>    Transect yaoalebiom yaosmeltbiom yaobloaterbiom year     depth lake  x
>     y interval
> 1      nn_1  12.019655 34.910370110       2.647370 2005  97.07525    2
> 526601.8 4850206        1
> 2      nn_1  12.164686 35.331548810       3.982028 2005  98.37024    2
> 526742.2 4849339        2
> 3      nn_1  11.176009 32.460052230       1.646604 2005  99.98218    2
> 526886.9 4848348        3
> 4      nn_1   0.000000  0.036457091       5.306225 2005  81.44616    2
> 526993.4 4850849        4
> 5      nn_1  40.808118 10.988825410       3.222485 2005 101.45707    2
> 526997.5 4847359        5
> 6      nn_1   6.273421 18.176753520      18.832348 2005  98.69197    2
> 527084.1 4846366        6
> 7      nn_1   6.225799 16.050983390      66.941892 2005  94.14283    2
> 527214.7 4845372        7
> 8      nn_1   7.322910 19.001196850      47.273341 2005  91.21771    2
> 527331.6 4844636        8
> 9      nn_1   0.000000  0.067646462      20.912908 2005  87.76123    2
> 527495.9 4843390        9
> 10     nn_1   0.000000  0.006012106      26.611785 2005  87.59767    2
> 527606.6 4842426       10
>
> #GAM example
> bloat.gam8 <- gam(log10(yaobloaterbiom+0.00325) ~ lakef +s(depth,
> by=lakef), data=fish3)
>
> #GAMM example:
> bloat.gamm1 <- gamm(log10(yaobloaterbiom+0.00325) ~ lakef +  s(depth,
> by=lakef), correlation=corAR1(form = ~ interval|tranf), data=fish3)
>
> However, GAMM results from models including a wide variety of correlation
> structures (corExp, CorSpher, CorLin, AR1, ARMA) produce autocorrelated
> residuals (similar lag range as GAM), patterns in residuals plots, and
> confidence intervals for predictions that are only slightly large than for
> GAMs.  This suggests to me that GAMM is not performing much better than
> GAM (or I've not specified models correctly).
>
> Is my assessment of the GAMM performance reasonable?  None of the models
> (GAM or GAMM) explain much of the deviance (~20%).
>
> I'm interested in an information-theoretic approach to selecting the best
> model from a set of possible models (AICc, dAICc, AICc weights), but
> cannot run some of the GAM models with GAM because they lack a random
> term.  I'm not sure how to use the GAMM output to compare the models I can
> run with this procedure.
>
> Finally, as a last resort, I've subsampled the original data set so that I
> have 1 record per transect per lake per year for a total N=99.
>
> I get different "best models" from GAM (original data) GAMM (original data
> but including correlation structure), and GAM (subsetted data).  Selection
> of different models leads to fairly different conclusions about the
> similarities and differences between the lakes.
>
> I'm not sure where to go with this as a result.
>
> Any thoughts/comments would be appreciated.
> Dave
>
>
>
>
>
>
> David Warner
> Research Fishery Biologist
> USGS Great Lakes Science Center
> 1451 Green Road
> Ann Arbor MI 48105
> 734.214.9392
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283



More information about the R-help mailing list