[R-sig-ME] Temporal correlation structure in mixed model
Ben Bolker
bbolker at gmail.com
Thu Mar 27 17:57:50 CET 2014
On 14-03-27 12:23 PM, Worthington, Thomas A wrote:
> Dear All
> I'm trying to run a mixed model on ecology data. The setup is as
follows, a group of fish are placed in a flume and attempt to pass a
barrier. The dependent variable is the number of approaches to pass
the barrier per minute of the trial, with each trial lasting 10
minutes. There are two configurations of the barrier. There are 20
trials in total, 10 for each setup. The aim of the study is to see
whether the number of approaches in minute 1, minute 2, minute 3
etc. is different between the two setups. Therefore my model is of
the form Approaches ~ Setup + Minute + Setup * Minute, where minute
(10 levels) and setup (2 levels) are both factors.
The dependent variable sounds more like a count than a continuous
variable, so you might consider a Poisson GLMM, _but_ it is harder
to incorporate temporal autocorrelation in that case, so if it seems
reasonable you might try to stick with the linear mixed model (i.e.
Gaussian/Normal responses). How big is your data set (is a 'trial'
equivalent to a single fish? 10 levels=10 minutes? Does that you
mean you have a total of 200 observations?) What is a typical
number/range of attempts per minute?
> I have three potential models and am slightly confused over the
model specification, I believe I need to include a correlation
structure to deal with temporal autocorrelation but am unclear over
whether a random effect is needed instead or both are required. I
have included a variance structure to deal with the issue of
homogeneity of variance in the data.
A Poisson GLMM might help here.
> The three models are as follows
> test1 <- gls(Approaches ~ Setup * Minute, data=data, correlation =
corAR1(form =~ Minute|Trial), weights = varIdent(form=~ 1| Minute *
Setup), control=list(opt="optim"))
This will estimate correlation, but will assume that the average
number of attempts is the same for every trial within a setup
category.
> test2 <- lme(Approaches ~ Setup * Minute, data=data, random = ~
Minute | Trial, weights = varIdent(form=~ 1| Minute * Setup),
control=list(opt="optim"))
This looks overspecified (does it really work??) -- it is allowing
the differences among every pair of minutes to be characterized
for every trial. Even ~1|Trial/Minute is overspecified, since
there is only one observation per minute per trial (thus the
among-minute-within-trial variance would be confounded with
the residual variance).
> test3 <- lme(Approaches ~ Setup * Minute, data=data, random = ~
Minute | Trial, weights = varIdent(form=~ 1| Minute * Setup),
correlation = corAR1(form =~ Minute|Trial),
control=list(opt="optim"))
This looks even more overspecified. I would suggest
lme(Approaches ~ Setup * Minute, data=data, random = ~ 1 | Trial,
weights = varIdent(form=~ 1| Minute * Setup),
correlation = corAR1(form =~ Minute|Trial),
control=list(opt="optim"))
which allows for variation among trials (1|Trial) and autocorrelation
within Trial -- the Minute:Trial interaction will show up in the
residual variance.
This still looks ambitious if you have a total of 200 data points,
as the varIdent() specification will estimate variances for all 20
of the minute*setup combinations.
* Would log- or square-root transforming your data take care of
the heteroscedasticity adequately?
* Are you sure you want to treat minute as categorical, estimating
a separate response for every minute? Or would a smooth trend
(linear, quadratic, or spline) describe the data adequately and
more parsimoniously? (This applies to both the mean and variance
description)
> I believe (maybe wrongly) that Test 1 and 2 would be broadly
similar, but if some could suggest the most sensible model I would
greatly appreciate it
> Best wishes
>
> Tom
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list