[R-sig-ME] Temporal correlation structure in mixed model

Fri Mar 28 00:55:06 CET 2014

Ok so it appears I was specifying the factors incorrectly when I use

data$Setup<-factor(data$Setup)
data$Minute<-factor(data$Minute)
data$Trial<-factor(data$Trial)

and rerun the model I get the following error message 

Error in structure(res, levels = lv, names = nm, class = "factor") : 
  'names' attribute [180] must be the same length as the vector [0]

Which appears to be related to the correlation structure, as when I remove this the model runs. Although the model output called with 

summary(test4b)

now contains comparisons between Minute 1 and Minutes 2, 3, 4 etc rather than the overall statistics for Setup, Minute and the interaction 

Any suggestions of a way forward would be appreciated? Or do I need a model that contains a least one continuous variable?

Best wishes

Tom   

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Worthington, Thomas A
Sent: Thursday, March 27, 2014 1:56 PM
To: Ben Bolker; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Temporal correlation structure in mixed model

Dear Ben 

Thank you for your suggestions  in answer to your questions

A trial is multiple fish but it is impossible to tell individuals apart, therefore the dependent variable is for the group. 

Yes the ten levels relates to an overall trial length of 10 minutes and yes there are a total of 200 observations

The typical number of approaches varies from 0.2 - 0.5 for setup 0 and 0.2 - 1.7 for setup 1, this value is corrected for fish that past the barrier. >From examination of the data the number of approaches is only different between the two setups for minutes 1, 2 and possibly 3.  The rational for treating minute as a categorical variable  was determine if there was an attraction to the barrier due to setup in those first minutes, which I was going to test using planned contrast once the overall model had been specified. WE had toyed with a simpler approach of multiple t-tests the correcting for multiple comparisons

In terms of the model you suggested maybe it would be preferable to set the variance structure to weights = varIdent(form=~ 1| Minute) as the difference in variance is mostly between minutes? I avoided transforming the data as it was suggested that it might be preferable to model the variance structure in Zuur et al 2009 Mixed Effects Models and Extensions in Ecology with R. 

test4b<-lme(Approaches ~ Setup * Minute, data=data, random = ~ 1 | Trial, weights = varIdent(form=~ 1| Minute), correlation = corAR1(form =~ Minute|Trial), control=list(opt="optim"))

I have compared the 5 models with AIC and BIC and 4b seems to fit the best (Test 4a being the one you put forth as a suggestion)

       df       AIC
test1  25 -40.44546
test2  27 -32.25960
test3  28 -38.20182
test4a 26 -42.20184
test4b 16 -20.23438

and 

       df      BIC
test1  25 41.50741
test2  27 56.24950
test3  28 53.58539
test4a 26 43.02914
test4b 16 32.21545

Best wishes

Tom 

-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com]
Sent: Thursday, March 27, 2014 12:58 PM
To: Worthington, Thomas A; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Temporal correlation structure in mixed model

On 14-03-27 12:23 PM, Worthington, Thomas A wrote:
> Dear All

> I'm trying to run a mixed model on ecology data. The setup is as
  follows, a group of fish are placed in a flume and attempt to pass a
  barrier. The dependent variable is the number of approaches to pass
  the barrier per minute of the trial, with each trial lasting 10
  minutes. There are two configurations of the barrier.  There are 20
  trials in total, 10 for each setup. The aim of the study is to see
  whether the number of approaches in minute 1, minute 2, minute 3
  etc.  is different between the two setups. Therefore my model is of
  the form Approaches ~ Setup + Minute + Setup * Minute, where minute
  (10 levels) and setup (2 levels) are both factors.

The dependent variable sounds more like a count than a continuous variable, so you might consider a Poisson GLMM, _but_ it is harder to incorporate temporal autocorrelation in that case, so if it seems reasonable you might try to stick with the linear mixed model (i.e.
Gaussian/Normal responses).  How big is your data set (is a 'trial'
equivalent to a single fish?  10 levels=10 minutes?  Does that you mean you have a total of 200 observations?)  What is a typical number/range of attempts per minute?

> I have three potential models and am slightly confused over the
  model specification, I believe I need to include a correlation
  structure to deal with temporal autocorrelation but am unclear over
  whether a random effect is needed instead or both are required. I
  have included a variance structure to deal with the issue of
  homogeneity of variance in the data.

A Poisson GLMM might help here.

> The three models are as follows

> test1 <- gls(Approaches ~ Setup * Minute, data=data, correlation =
  corAR1(form =~ Minute|Trial), weights = varIdent(form=~ 1| Minute *
  Setup), control=list(opt="optim"))

This will estimate correlation, but will assume that the average number of attempts is the same for every trial within a setup category.

> test2 <- lme(Approaches ~ Setup * Minute, data=data, random = ~
  Minute | Trial, weights = varIdent(form=~ 1| Minute * Setup),
  control=list(opt="optim"))

  This looks overspecified (does it really work??) -- it is allowing the differences among every pair of minutes to be characterized for every trial.  Even ~1|Trial/Minute is overspecified, since there is only one observation per minute per trial (thus the among-minute-within-trial variance would be confounded with the residual variance).

> test3 <- lme(Approaches ~ Setup * Minute, data=data, random = ~
  Minute | Trial, weights = varIdent(form=~ 1| Minute * Setup),
  correlation = corAR1(form =~ Minute|Trial),
  control=list(opt="optim"))

This looks even more overspecified.  I would suggest

lme(Approaches ~ Setup * Minute, data=data, random = ~ 1 | Trial,
   weights = varIdent(form=~ 1| Minute * Setup),
  correlation = corAR1(form =~ Minute|Trial),
  control=list(opt="optim"))

 which allows for variation among trials (1|Trial) and autocorrelation within Trial -- the Minute:Trial interaction will show up in the residual variance.

  This still looks ambitious if you have a total of 200 data points, as the varIdent() specification will estimate variances for all 20 of the minute*setup combinations.
  * Would log- or square-root transforming your data take care of the heteroscedasticity adequately?
  * Are you sure you want to treat minute as categorical, estimating a separate response for every minute?  Or would a smooth trend (linear, quadratic, or spline) describe the data adequately and more parsimoniously?  (This applies to both the mean and variance
description)

> I believe (maybe wrongly) that Test 1 and 2 would be broadly
  similar, but if some could suggest the most sensible model I would
  greatly appreciate it

> Best wishes
> 
> Tom
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models