[R-sig-ME] How to include multiple temporal processes in one model?

Wed Jan 12 20:06:50 CET 2022

On Wed, 12 Jan 2022, Adriaan de Jong wrote:

Hi,
>
> Present:
> a 25 year series of count data of individuals of one migratory bird
> species observed from my driver's seat (2815 counts from the same c. 20 km
> road transect). The dataset includes the variables: Year, Month, Day, Hour,
> Minute (5 min precision), (driving)Direction and Count(result) (sample
> below).
>
> Objectives:
> 1. Has there been a trend in the numbers over the years?
> 2. How do the numbers generally vary over the breeding season? (I live in
> northern Sweden and the breeding/observation season is April-August)
> I have no intentions to make predictions for neither future developments
> (temporal extrapolation) nor other transects (spatial extrapolation).
>
> Problems/limitations:
> a. The sampling has been opportunistic (which was a main point because no
> extra effort was needed) and thus, unevenly spread over the hours of the
> day with more counts in the morning and late afternoon (most are from
> commuting to work).
> b. The distribution of the timing over the day has varied over the years.
> c. The dataset contains a significant proportion (43%) of zero counts,
> especially during the early and late parts of the breeding season.
> d. The number of transect counts has varied over the years (range 66-167,
> but no clear trend over the years)
> e. The direction of driving has an impact on what can be seen (non-flat
> landscape) and thus, needs to be included as a covariate (random effect?)
> (I can provide graphs of frequency distributions if needed)
>
> My question is:
> How should I include the three temporal factors (year, time of season and
> time of day) and driving direction in the logistic models for the two
> different objectives?
>
> Thanks in advance for your suggestions and comments.
> Cheers,
> Adjan
>
> Adriaan "Adjan" de Jong
> Associate professor
> Dept of Wildlife, Fish, and Environmental Studies
> Swedish University of Agricultural Sciences
>
> Data structure (fake numbers)
> YearMonthDayHourMinuteDirectionCount
> 199742591505
> 1997514153510
> 1997515745016
> .
> .
> 20218281000
>
> PS. I understand I have to combine the Mont and Day, and the Hour and
> Minute variables into two new variables for Time of season and Time of day..
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
> personuppgifter. För att läsa mer om hur detta går till, klicka här <
> https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here <
> https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Hi Adjan,

At the moment, I can offer some suggestions on how to include the different
temporal trends (the first part of your question) for such a model via a
GAM. It could go something like the following:

library(tidyverse)
library(lubridate) # helps with time objects
library(hms) # for time of day objects
library(mgcv) # all things GAM

## fake data -- not entirely sure I parsed it correctly
fake_data <- tibble(
  date_time = ymd_hm(c(
    "1997-4-25 09:15", "1997-5-14 15:35",
    "1997-5-15 7:45", "2021-8-28 10:00"
  )),
  direction = c(0, 1, 1, 0),
  count = c(5, 0, 6, 0)
)

## create new time variables, in numeric form but still interpretable

# helper function to convert Date's to decimal years (e.g. June 2021 ~=
2021.5)
dec_year_from_date <- function(date){
  require(lubridate)
  dec_year = year(date) + (yday(date)/(ifelse(leap_year(date), 366, 365)))
  return(dec_year)
}

fake_data <- fake_data %>%
  mutate(
    DOY = yday(date_time), # day-of-year variable (1 to 366)
    time_of_day_s = as.numeric(as_hms(date_time)), # time-of-day variable
    time_of_day_h = time_of_day_s / (60 * 60), # in decimal h
    dec_year = dec_year_from_date(date_time)
  ) %>%
  select(date_time, dec_year, DOY, time_of_day_h, direction, count)

# structure now looks like:
head(fake_data)

## # A tibble: 4 x 6
##   date_time           dec_year   DOY time_of_day_h direction count
##   <dttm>                 <dbl> <dbl>         <dbl>     <dbl> <dbl>
## 1 1997-04-25 09:15:00    1997.315   115          9.25         0     5
## 2 1997-05-14 15:35:00    1997.367  134         15.6          1     0
## 3 1997-05-15 07:45:00    1997.370   135          7.75         1     6
## 4 2021-08-28 10:00:00    2022.658   240         10            0     0

Then you could use some different smooth terms within a GAM for each of the
temporal trends: a default smooth term for the long term trend and
cubic-cyclic splines (bs = 'cc') for the cyclical terms (season and time of
day). The following could be something to get started with:

# # GAM formula for yearly trend
# gam(count ~ direction +
#       s(dec_year) + # long-term trend
#       s(DOY, bs = 'cc') + # seasonal trend
#       s(time_of_day_h, bs = 'cc') # time of day trend
#     knots = list(DOY = c(1, 366), time_of_day_h = c(0, 24))
#     family = ??
#     data = fake_data
#     )

I've no expertise with count data (particularly with lots of zeros), so I
leave the family and link function up to you and the list.
Other terms (e.g., a random-effect for transect? - via s(transect, bs =
're')) and interactions could be built in. Perhaps seasonal trend varies
across years. Driving direction could be included as a 'by' variable in the
smooth terms as well to create factor smooth interactions. E.g. s(DOY, by =
direction, bs = 'cc') would create a seasonal smooth term that is
conditional on the driving direction.

Hope this helps,
Zach

-- 
Zach Simpson
Post-doc, Dept. Agronomy
Iowa State University

	[[alternative HTML version deleted]]