[R-sig-ME] glmmPQL: Temporal correlation structure for grouped data with observation gaps

Wed Mar 10 05:58:04 CET 2021

Dear list members,

I have a question regarding the implementation of a temporal correlation
structure in a GLMM context in R. This is my first time posting here so I
hope the question is suitable for this forum.

I have a data set that comprises summer (June-September) paired daily air
and water temperatures measured at 83 monitoring sites over a period of 3
years. The purpose is to predict water temperatures based on air
temperatures and a suite of environmental variables related to the
characteristics of the rivers and contributing catchment (land use, slope,
elevation and so on). The number of observations among sites differ somehow
(the series at some sites started at slightly different dates and some
sites have missing values within the series).

My approach so far has been as follows:

1. Fit a GLMM on water temperatures (WT) with air temperature (AT) and the
different environmental variables as fixed factors and site as random
factor (random intercept).

model <- glmmPQL(WT ~ AT + H2OArea + avEleE + RipB60_fdec + RipB60_fev,
random = ~1|Site, data = t_d, family = Gamma(link = "log"))

The reason for using glmmPQL is because of the need to incorporate a
temporal correlation structure (see point 2)

2. Given the nature of the data, temporal correlation is to be expected. A
check of acf and pacf functions on the model residuals by site shows
consistently a gradually decaying acf, and a sharp drop of the pacf below
the significance threshold after lag 1 (for which the pacf equals approx..
0.8), suggesting an AR1 process is suitable. Considering the different
observation gaps in the temperature series by station and the seasonal gaps
between years (only summer data), I decided to try with a corCAR1 structure
of the form:

modelAR <- update(m5, correlation = corCAR1(0.8, form = ~days|Site))

Where days correspond to the day number since the start of the series
across all sites (the first day an observation was made across all series).
That is:

t_d$days <- difftime(t_d$date, min(t_d$date), units = "days")

3. However, the acf and pacf plots for the residuals in the updated model
look nearly identical from those of the original model, which makes me
think that I have not specified the correlation structure or the time
variable correctly or that I’m missing something else?

An extract of the data set for a few stations can be downloaded here:

https://drive.google.com/file/d/1iMaXSSYvCJ1Xo09ZoJjbLdzmc0gjfJKy/view?usp=sharing

Any help/suggestions will be greatly appreciated!

Many thanks,
Jorge

	[[alternative HTML version deleted]]