[R-sig-eco] Cosinor with data that trend over time

Gavin Simpson ucfagls at gmail.com
Wed Mar 26 17:46:30 CET 2014


1) Visually - unless it actually matters exactly on which day in the
year the peak is observed? If visually is OK, just do `plot(mod, pages
= 1)` to see the fitted splines on a single page. See `?plot.gam` for
more details on the plot method.

2) You could generate some new data to predict upon as follows:

    newdat <- data.frame(DoY = seq_len(366), time = mean(foo$time))

Then predict for those new data but collect the individual
contributions of the spline terms to the predicted value rather than
just the final prediction

    pred <- predict(mod, newdata = newdat, type = "terms")

Then find the maximal value of the DoY contribution

    take <- which(pred$DoY == max(pred$DoY))
    newdat[take, , drop = FALSE]

You could use

    take <- which.max(pred$DoY)

instead of the `which()` I used, but only if there is a single maximal value.

This works because the spline terms in the additive model are just
that; additive. Hence because you haven't let the DoY and time splines
interact (in the simple model I mentioned, it is more complex if you
allow these to interact as you then need to predict DoY for each years
worth of time points), you can separate DoY from the other terms.

None of the above code has been tested, but was written off top of my
head, but should work or at least get you pretty close to something
that works.

HTH

G

On 26 March 2014 10:02, Jacob Cram <cramjaco at gmail.com> wrote:
> Thanks Gavin,
>      This seems like a promising approach and a first pass suggests it works
> with this data. I can't quite figure out how I would go about interrogating
> the fitted spline to deterine when the peak value happens with respect to
> DoY.  Any suggestions?
> -Jacob
>
>
> On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson <ucfagls at gmail.com> wrote:
>>
>> I would probably attack this using a GAM modified to model the
>> residuals as a stochastic time series process.
>>
>> For example
>>
>> require("mgcv")
>> mod <- gamm(y ~ s(DoY, bs = "cc") + s(time), data = foo,
>>                          correlation = corCAR1(form = ~ time))
>>
>> where `foo` is your data frame, `DoY` is a variable in the data frame
>> computed as `as.numeric(strftime(RDate, format = "%j"))` and `time` is
>> a variable for the passage of time - you could do `as.numeric(RDate)`
>> but the number of days is probably large as we might encounter more
>> problems fitting the model. Instead you might do `as.numeric(RDate) /
>> 1000` say to produce values on a more manageable scale. The `bs =
>> "cc"` bit specifies a cyclic spline applicable to data measured
>> throughout a year. You may want to fix the start and end knots to be
>> days 1 and days 366 respectively, say via `knots = list(DoY =
>> c(0,366))` as an argument to `gam()` [I think I have this right,
>> specifying the boundary knots, but let me know if you get an error
>> about the number of knots]. The residuals are said to follow a
>> continuois time AR(1), the irregular-spaced counter part to the AR(1),
>> plus random noise.
>>
>> There may be identifiability issues as the `s(time)` and `corCAR1()`
>> compete to explain the fine-scale variation. If you hit such a case,
>> you can make an educated guess as to the wiggliness (degrees of
>> freedom) for the smooth terms based on a plot of the data and fix the
>> splines at those values via argument `k = x` and `fx = TRUE`, where
>> `x` in `k = x` is some integer value. Both these go in as arguments to
>> the `s()` functions. If the trend is not very non-linear you can use a
>> low value 1-3 here for x and for the DoY term say 3-4 might be
>> applicable.
>>
>> There are other ways to approach this problem of identifiability, but
>> that would require more time/space here, which I can go into via a
>> follow-up if needed.
>>
>> You can interrogate the fitted splines to see when the peak value of
>> the `DoY` term is in the year.
>>
>> You can also allow the seasonal signal to vary in time with the trend
>> by allowing the splines to "interact" in a 2d-tensor product spline.
>> Using `te(DoY, time, bs = c("cc","cr"))` instead of the two `s()`
>> terms (or using `ti()` terms for the two "marginal" splines and the
>> 2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
>> `te()` term where `x` and `y` are the dfs for each dimension in the
>> `te()` term. It is a bit more complex to do this for `ti()` terms.
>>
>> Part of the reason to prefer a spline for DoY for the seasonal term is
>> that one might not expect the seasonal cycle to be a symmetric cycle
>> as a cos/sin terms would imply.
>>
>> A recent ecological paper describing a similar approach (though using
>> different package in R) is that of Claire Ferguson and colleagues in J
>> Applied Ecology (2008) http://doi.org/10.1111/j.1365-2664.2007.01428.x
>> (freely available).
>>
>> HTH
>>
>> G
>>
>> On 25 March 2014 19:14, Jacob Cram <cramjaco at gmail.com> wrote:
>> > Hello all,
>> >      I am thinking about applying season::cosinor() analysis to some
>> > irregularely spaced time series data. The data are unevenly spaced, so
>> > usual time series methods, as well as the nscosinor() function are out.
>> > My
>> > data do however trend over time and I am wondering if I can feed date as
>> > a
>> > variable into my cosinor analyis.  In the example below, then I'd
>> > conclude
>> > then that the abundances are seasonal, with maximal abundance in mid
>> > June
>> > and furthermore, they are generally decreasing over time.
>> > Can I use both time variables together like this? If not, is there some
>> > better approach I should take?
>> > Thanks in advance,
>> > -Jacob
>> >
>> > For context lAbundance is logg-odds transformed abundance data of a
>> > microbial species in a given location over time.  RDate is the date the
>> > sample was collected in the r date format.
>> >
>> >
>> >> res <- cosinor(lAbundance ~ RDate, date = "RDate", data = lldata)
>> >
>> >>summary(res)
>> > Cosinor test
>> > Number of observations = 62
>> > Amplitude = 0.58
>> > Phase: Month = June , day = 14
>> > Low point: Month = December , day = 14
>> > Significant seasonality based on adjusted significance level of 0.025  =
>> > TRUE
>> >
>> >>summary(res$glm)
>> >
>> >
>> > Call:
>> > glm(formula = f, family = family, data = data, offset = offset)
>> >
>> > Deviance Residuals:
>> >     Min       1Q   Median       3Q      Max
>> > -2.3476  -0.6463   0.1519   0.6574   1.9618
>> >
>> > Coefficients:
>> >               Estimate Std. Error t value Pr(>|t|)
>> > (Intercept)  0.0115693  1.8963070   0.006  0.99515
>> > RDate       -0.0003203  0.0001393  -2.299  0.02514 *
>> > cosw        -0.5516458  0.1837344  -3.002  0.00395 **
>> > sinw         0.1762904  0.1700670   1.037  0.30423
>> > ---
>> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> >
>> > (Dispersion parameter for gaussian family taken to be 0.9458339)
>> >
>> >     Null deviance: 70.759  on 61  degrees of freedom
>> > Residual deviance: 54.858  on 58  degrees of freedom
>> > AIC: 178.36
>> >
>> > Number of Fisher Scoring iterations: 2
>> >
>> >
>> > llldata as a csv below
>> > "RDate","lAbundance"
>> > 2003-03-12,-3.3330699059335
>> > 2003-05-21,-3.04104625745886
>> > 2003-06-17,-3.04734680029566
>> > 2003-07-02,-4.18791034708572
>> > 2003-09-18,-3.04419201802053
>> > 2003-10-22,-3.13805060873929
>> > 2004-02-19,-3.80688269144794
>> > 2004-03-17,-4.50755507726145
>> > 2004-04-22,-4.38846502542992
>> > 2004-05-19,-3.06618649442674
>> > 2004-06-17,-5.20518774876304
>> > 2004-07-14,-3.75041853151097
>> > 2004-08-25,-3.67882486716196
>> > 2004-09-22,-5.22205827512234
>> > 2004-10-14,-3.99297508670535
>> > 2004-11-17,-4.68793287601157
>> > 2004-12-15,-4.31712380781011
>> > 2005-02-16,-4.30893550479904
>> > 2005-03-16,-4.05781773988454
>> > 2005-05-11,-3.94746237402035
>> > 2005-07-19,-4.91195185391358
>> > 2005-08-17,-4.93590576323119
>> > 2005-09-15,-4.85820800095518
>> > 2005-10-20,-5.22956391101343
>> > 2005-12-13,-5.12244047315448
>> > 2006-01-18,-3.04854660925046
>> > 2006-02-22,-6.77145858348375
>> > 2006-03-29,-4.33151493849021
>> > 2006-04-19,-3.36152357710535
>> > 2006-06-20,-3.09071584142593
>> > 2006-07-25,-3.31430484483825
>> > 2006-08-24,-3.09974933041469
>> > 2006-09-13,-3.33288992218458
>> > 2007-12-17,-4.19942661980677
>> > 2008-03-19,-3.86146499633625
>> > 2008-04-22,-3.36161599919095
>> > 2008-05-14,-4.30878307213324
>> > 2008-06-18,-3.74372448768828
>> > 2008-07-09,-4.65951429661651
>> > 2008-08-20,-5.35984647704619
>> > 2008-09-22,-4.78481898261137
>> > 2008-10-20,-3.58588161980965
>> > 2008-11-20,-3.10625125552057
>> > 2009-02-18,-6.90675477864855
>> > 2009-03-11,-3.43446932013368
>> > 2009-04-23,-3.82688066341466
>> > 2009-05-13,-4.44885332005661
>> > 2009-06-18,-3.97671552612412
>> > 2009-07-09,-3.40185954003936
>> > 2009-08-19,-3.44958231694091
>> > 2009-09-24,-3.86508161094726
>> > 2010-01-28,-4.95281587967569
>> > 2010-02-11,-3.78064756876257
>> > 2010-03-24,-3.5823501176064
>> > 2010-04-27,-4.33635715879999
>> > 2010-05-17,-3.90545735473055
>> > 2010-07-21,-3.3147176517321
>> > 2010-08-11,-4.53218360860017
>> > 2010-10-21,-6.90675477864855
>> > 2010-11-23,-6.90675477864855
>> > 2010-12-16,-6.75158176094352
>> > 2011-01-11,-6.90675477864855
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > R-sig-ecology mailing list
>> > R-sig-ecology at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>>
>>
>> --
>> Gavin Simpson, PhD
>
>



-- 
Gavin Simpson, PhD



More information about the R-sig-ecology mailing list