[R-sig-eco] Cosinor with data that trend over time

Gavin Simpson ucfagls at gmail.com
Wed Mar 26 05:06:09 CET 2014


I would probably attack this using a GAM modified to model the
residuals as a stochastic time series process.

For example

require("mgcv")
mod <- gamm(y ~ s(DoY, bs = "cc") + s(time), data = foo,
                         correlation = corCAR1(form = ~ time))

where `foo` is your data frame, `DoY` is a variable in the data frame
computed as `as.numeric(strftime(RDate, format = "%j"))` and `time` is
a variable for the passage of time - you could do `as.numeric(RDate)`
but the number of days is probably large as we might encounter more
problems fitting the model. Instead you might do `as.numeric(RDate) /
1000` say to produce values on a more manageable scale. The `bs =
"cc"` bit specifies a cyclic spline applicable to data measured
throughout a year. You may want to fix the start and end knots to be
days 1 and days 366 respectively, say via `knots = list(DoY =
c(0,366))` as an argument to `gam()` [I think I have this right,
specifying the boundary knots, but let me know if you get an error
about the number of knots]. The residuals are said to follow a
continuois time AR(1), the irregular-spaced counter part to the AR(1),
plus random noise.

There may be identifiability issues as the `s(time)` and `corCAR1()`
compete to explain the fine-scale variation. If you hit such a case,
you can make an educated guess as to the wiggliness (degrees of
freedom) for the smooth terms based on a plot of the data and fix the
splines at those values via argument `k = x` and `fx = TRUE`, where
`x` in `k = x` is some integer value. Both these go in as arguments to
the `s()` functions. If the trend is not very non-linear you can use a
low value 1-3 here for x and for the DoY term say 3-4 might be
applicable.

There are other ways to approach this problem of identifiability, but
that would require more time/space here, which I can go into via a
follow-up if needed.

You can interrogate the fitted splines to see when the peak value of
the `DoY` term is in the year.

You can also allow the seasonal signal to vary in time with the trend
by allowing the splines to "interact" in a 2d-tensor product spline.
Using `te(DoY, time, bs = c("cc","cr"))` instead of the two `s()`
terms (or using `ti()` terms for the two "marginal" splines and the
2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
`te()` term where `x` and `y` are the dfs for each dimension in the
`te()` term. It is a bit more complex to do this for `ti()` terms.

Part of the reason to prefer a spline for DoY for the seasonal term is
that one might not expect the seasonal cycle to be a symmetric cycle
as a cos/sin terms would imply.

A recent ecological paper describing a similar approach (though using
different package in R) is that of Claire Ferguson and colleagues in J
Applied Ecology (2008) http://doi.org/10.1111/j.1365-2664.2007.01428.x
(freely available).

HTH

G

On 25 March 2014 19:14, Jacob Cram <cramjaco at gmail.com> wrote:
> Hello all,
>      I am thinking about applying season::cosinor() analysis to some
> irregularely spaced time series data. The data are unevenly spaced, so
> usual time series methods, as well as the nscosinor() function are out. My
> data do however trend over time and I am wondering if I can feed date as a
> variable into my cosinor analyis.  In the example below, then I'd conclude
> then that the abundances are seasonal, with maximal abundance in mid June
> and furthermore, they are generally decreasing over time.
> Can I use both time variables together like this? If not, is there some
> better approach I should take?
> Thanks in advance,
> -Jacob
>
> For context lAbundance is logg-odds transformed abundance data of a
> microbial species in a given location over time.  RDate is the date the
> sample was collected in the r date format.
>
>
>> res <- cosinor(lAbundance ~ RDate, date = "RDate", data = lldata)
>
>>summary(res)
> Cosinor test
> Number of observations = 62
> Amplitude = 0.58
> Phase: Month = June , day = 14
> Low point: Month = December , day = 14
> Significant seasonality based on adjusted significance level of 0.025  =
> TRUE
>
>>summary(res$glm)
>
>
> Call:
> glm(formula = f, family = family, data = data, offset = offset)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max
> -2.3476  -0.6463   0.1519   0.6574   1.9618
>
> Coefficients:
>               Estimate Std. Error t value Pr(>|t|)
> (Intercept)  0.0115693  1.8963070   0.006  0.99515
> RDate       -0.0003203  0.0001393  -2.299  0.02514 *
> cosw        -0.5516458  0.1837344  -3.002  0.00395 **
> sinw         0.1762904  0.1700670   1.037  0.30423
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for gaussian family taken to be 0.9458339)
>
>     Null deviance: 70.759  on 61  degrees of freedom
> Residual deviance: 54.858  on 58  degrees of freedom
> AIC: 178.36
>
> Number of Fisher Scoring iterations: 2
>
>
> llldata as a csv below
> "RDate","lAbundance"
> 2003-03-12,-3.3330699059335
> 2003-05-21,-3.04104625745886
> 2003-06-17,-3.04734680029566
> 2003-07-02,-4.18791034708572
> 2003-09-18,-3.04419201802053
> 2003-10-22,-3.13805060873929
> 2004-02-19,-3.80688269144794
> 2004-03-17,-4.50755507726145
> 2004-04-22,-4.38846502542992
> 2004-05-19,-3.06618649442674
> 2004-06-17,-5.20518774876304
> 2004-07-14,-3.75041853151097
> 2004-08-25,-3.67882486716196
> 2004-09-22,-5.22205827512234
> 2004-10-14,-3.99297508670535
> 2004-11-17,-4.68793287601157
> 2004-12-15,-4.31712380781011
> 2005-02-16,-4.30893550479904
> 2005-03-16,-4.05781773988454
> 2005-05-11,-3.94746237402035
> 2005-07-19,-4.91195185391358
> 2005-08-17,-4.93590576323119
> 2005-09-15,-4.85820800095518
> 2005-10-20,-5.22956391101343
> 2005-12-13,-5.12244047315448
> 2006-01-18,-3.04854660925046
> 2006-02-22,-6.77145858348375
> 2006-03-29,-4.33151493849021
> 2006-04-19,-3.36152357710535
> 2006-06-20,-3.09071584142593
> 2006-07-25,-3.31430484483825
> 2006-08-24,-3.09974933041469
> 2006-09-13,-3.33288992218458
> 2007-12-17,-4.19942661980677
> 2008-03-19,-3.86146499633625
> 2008-04-22,-3.36161599919095
> 2008-05-14,-4.30878307213324
> 2008-06-18,-3.74372448768828
> 2008-07-09,-4.65951429661651
> 2008-08-20,-5.35984647704619
> 2008-09-22,-4.78481898261137
> 2008-10-20,-3.58588161980965
> 2008-11-20,-3.10625125552057
> 2009-02-18,-6.90675477864855
> 2009-03-11,-3.43446932013368
> 2009-04-23,-3.82688066341466
> 2009-05-13,-4.44885332005661
> 2009-06-18,-3.97671552612412
> 2009-07-09,-3.40185954003936
> 2009-08-19,-3.44958231694091
> 2009-09-24,-3.86508161094726
> 2010-01-28,-4.95281587967569
> 2010-02-11,-3.78064756876257
> 2010-03-24,-3.5823501176064
> 2010-04-27,-4.33635715879999
> 2010-05-17,-3.90545735473055
> 2010-07-21,-3.3147176517321
> 2010-08-11,-4.53218360860017
> 2010-10-21,-6.90675477864855
> 2010-11-23,-6.90675477864855
> 2010-12-16,-6.75158176094352
> 2011-01-11,-6.90675477864855
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD



More information about the R-sig-ecology mailing list