[R-sig-ME] lmer blocking by subject?
bbolker at gmail.com
Mon Feb 6 04:50:52 CET 2012
Tiffanie Cross <tahtg6 at ...> writes:
> I am new to R and would like some guidance on how to block by subject.
> I collected presence/absence 24 hrs/day for 44 birds, denoted "bird", at 5
> stations, denoted "colony" for the duration of a breeding season. The
> breeding season was broken into biologically relevant time periods (3
> levels), denoted "period". The birds were captured on 2 different colonies,
> 22 on each colony, denoted "homecolony". I also have the variable, "sex". I
> have about 300,000 total observations for these 44 birds. This data is
> temporally auto-correlated because it is VHF radio-telemetry data recorded
> continually throughout the study.
What is the temporal resolution? Based on a guess at the length
of the breeding season (60 days), I'm guessing about every 10-12 minutes.
This isn't essential information but would help get a feel for the
Be aware that modeling temporal autocorrelation in a binary
variable may be a little bit tricky -- there are a variety of
approaches, but none are quite as easy as the way one builds
autocorrelation into a normal-response model, by making the residuals
within blocks multivariate normal with a specified autocorrelation. A
few possibilities that spring to mind are (1) because there is
presumably no error in the observations themselves, you could
condition on the previous observation (i.e. put it in as a predictor);
(2) aggregate the data to a coarser temporal scale and fit the data as
binomial (e.g. number of presence/absence values per 2-hour period,
or per day, or some other appropriate period that balances resolution
and lack of autocorrelation); (3) if there are long 'runs' of
presence/absences, aggregate the data down to times when birds entered
or left; (4) [fanciest, but not necessarily worth the trouble or most
appropriate] assume an underlying multivariate normal distribution
that *is* autocorrelated and controls probability of presence
(i.e. a hierarchical model with autocorrelation in the level
below the observation level).
> I have a binary response variable, "present", with independent variables
> "sex" at 2 levels, "homecolony" at 2 levels, "colony" at 5 levels, and
> "period" at 3 levels. I would like to incorporate "period", "bird" and
> "colony" as random effects. Fixed effects are "sex" and "homecolony". The
> "bird" variable is what I want the model blocked by.
Practically speaking, you can't treat period as a random effect --
not enough levels.
> I am trying to answer the following questions: "Are there differences in
> presence between sexes, homecolony, and period?" "Do birds from one
> homecolony differ in their use of other colonies (colony)?"
> I cannot figure out how to specify that I want the model to block by
> subject, i.e. "bird". I have tried incorporating "bird" as a random effect,
> but the degrees of freedom still come out to be close to 300,000 which is
> near the total number of observations. Can anyone show me syntax that will
> incorporate bird as a random effect that the model blocks by subject?
Where are you seeing the degrees of freedom?
> I tried following examples from Doug Bates' LME4 book, Chapter 4:
> Glmm_FD <- lmer(present ~ 1 + gender + period + homecolony + colony + (1 +
> period|bird), data = FD, family = binomial, REML = 0)
> I know that the (1+period|bird) term is probably incorrect.
REML=0 is meaningless in this case (lmer doesn't use REML
or an analogue of it when fitting a non-Gaussian model), but
otherwise your model specification seems to be on the right track.
Since I'm guessing the birds can only be detected at a single
station at a time, this is a bit more of a categorical response
(i.e., is bird X present at colony 1-5 or "none of the above"
at time T?) Have you thought about multistate mark-recapture
models ... ?
This seems like a fairly complex problem. How have others
in your field handled these kinds of data? With lots of data
on each bird, you may be able to simplify your life a bit by
analyzing each bird separately -- i.e. a two-stage model rather
than a mixed/multilevel model -- Murtaugh 2007 _Ecology_ recommends
this, although I don't always agree with him ...
I would recommend Zuur et al for messy/complex ecological
data, although I don't agree with all of that either ...
More information about the R-sig-mixed-models