[R-sig-ME] lmer blocking by subject?

Tue Feb 7 15:23:18 CET 2012

  [cc'ing back to r-sig-mixed-models]

On 12-02-05 11:31 PM, Tiffanie Cross wrote:
> 
> 
> On Sun, Feb 5, 2012 at 7:50 PM, Ben Bolker <bbolker at gmail.com
> <mailto:bbolker at gmail.com>> wrote:
> 
>     Tiffanie Cross <tahtg6 at ...> writes:
> 
>     > I am new to R and would like some guidance on how to block by subject.
>     >
>     > I collected presence/absence 24 hrs/day for 44 birds, denoted
>     "bird", at 5
>     > stations, denoted "colony" for the duration of a breeding season. The
>     > breeding season was broken into biologically relevant time periods (3
>     > levels), denoted "period". The birds were captured on 2 different
>     colonies,
>     > 22 on each colony, denoted "homecolony". I also have the variable,
>     "sex". I
>     > have about 300,000 total observations for these 44 birds. This data is
>     > temporally auto-correlated because it is VHF radio-telemetry data
>     recorded
>     > continually throughout the study.
> 
>      What is the temporal resolution?  Based on a guess at the length
>     of the breeding season (60 days), I'm guessing about every 10-12
>     minutes.
>     This isn't essential information but would help get a feel for the
>     data.
> 
> 
> The breeding season is from 22 May to 15 August. The maximum detection
> interval is 20 minutes in the case that all tagged birds are present,
> and the minimum is 3 minutes in the case that only one tagged bird is
> present.  

  So the samples are unevenly spaced too ... ?

> 
> 
>       Be aware that modeling temporal autocorrelation in a binary
>     variable may be a little bit tricky -- there are a variety of
>     approaches, but none are quite as easy as the way one builds
>     autocorrelation into a normal-response model, by making the residuals
>     within blocks multivariate normal with a specified autocorrelation.  A
>     few possibilities that spring to mind are (1) because there is
>     presumably no error in the observations themselves, you could
>     condition on the previous observation (i.e. put it in as a predictor);
>     (2) aggregate the data to a coarser temporal scale and fit the data as
>     binomial (e.g.  number of presence/absence values per 2-hour period,
>     or per day, or some other appropriate period that balances resolution
>     and lack of autocorrelation); (3) if there are long 'runs' of
>     presence/absences, aggregate the data down to times when birds entered
>     or left; (4) [fanciest, but not necessarily worth the trouble or most
>     appropriate] assume an underlying multivariate normal distribution
>     that *is* autocorrelated and controls probability of presence
>     (i.e. a hierarchical model with autocorrelation in the level
>     below the observation level).
> 
> 
> I had the model fitted as you describe in option 4 using PROC GLIMMIX in
> SAS. The next step in SAS was to look at residuals and I couldn't figure
> out how to do that. So, I consulted a colleague who turned me on to Zuur
> et al. which was helpful initially. And now I'm stuck in R, too. I'm
> supposed to defend soon and this is kind of ruining that, I fear. =( 

   The closest equivalent to PROC GLIMMIX in SAS is glmmPQL from the
MASS package, in R.  Most of what you can do in GLIMMIX you can also do
in glmmPQL.

  The model statement for blocking by bird and allowing for correlation
within birds across time would be something like

  glmmPQL(present ~  gender + period + homecolony + colony,
   random=~period|bird), correlation=corCAR1(form=~time|bird),
    data = FD, family = binomial)

>     > I know that the (1+period|bird) term is probably incorrect

  why do you think so?

  this allows for the effect of period to vary across birds.  I would
say you *might* want ~period+colony|bird , if your data can support it ...

   The two drawbacks with using glmmPQL or GLIMMIX are that (1) they
both use penalized quasi-likelihood, which are known to give biased
estimates of the variances for binary data; and (2) it's possible to fit
models that don't really make much sense in them.  However, both of
these are matters of taste (in my opinion) rather than absolute
show-stoppers.  I hate to say it, but if SAS was mostly working for you
and it was just a matter of getting residuals wouldn't that be an
easier problem to solve ... ?  (I

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glimmix_a0000001413.htm

  BTW GLIMMIX now does Laplace and quadrature methods too, but it does
not allow autocorrelation with these methods:

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glimmix_a0000001405.htm

 *  Because a valid conditional distribution is required, R-side random
 effects are not permitted for METHOD=LAPLACE in the GLIMMIX procedure.
In other words, the GLIMMIX procedure requires for METHOD=LAPLACE
conditional independence without R-side overdispersion or covariance
structure.

> 
> I did build a column into my data for "hour", so I guess I do have an
> "hour" variable that 'detections per hour' could be calculated for each
> individual. 
> 
> 
>     > I have a binary response variable, "present", with independent
>     variables
>     > "sex" at 2 levels, "homecolony" at 2 levels, "colony" at 5 levels, and
>     > "period" at 3 levels. I would like to incorporate "period", "bird" and
>     > "colony" as random effects. Fixed effects are "sex" and
>     "homecolony". The
>     > "bird" variable is what I want the model blocked by.
> 
>       Practically speaking, you can't treat period as a random effect --
>     not enough levels.
> 
> 
> Ok. I will specify it as fixed. :) Thanks! 
> 
> 
>     > I am trying to answer the following questions: "Are there
>     differences in
>     > presence between sexes, homecolony, and period?" "Do birds from one
>     > homecolony differ in their use of other colonies (colony)?"
>     >
>     > I cannot figure out how to specify that I want the model to block by
>     > subject, i.e. "bird". I have tried incorporating "bird" as a
>     random effect,
>     > but the degrees of freedom still come out to be close to 300,000
>     which is
>     > near the total number of observations. Can anyone show me syntax
>     that will
>     > incorporate bird as a random effect that the model blocks by subject?
> 
>       Where are you seeing the degrees of freedom?
> 
> 
> I don't remember and I didn't save the output.  
> 
>     >
>     > I tried following examples from Doug Bates' LME4 book, Chapter 4:
> 
>     > Glmm_FD <- lmer(present ~ 1 + gender + period + homecolony +
>     colony + (1 +
>     > period|bird), data = FD, family = binomial, REML = 0)
>     > I know that the (1+period|bird) term is probably incorrect.
> 
>      REML=0 is meaningless in this case (lmer doesn't use REML
>     or an analogue of it when fitting a non-Gaussian model), but
>     otherwise your model specification seems to be on the right track.
> 
>      Since I'm guessing the birds can only be detected at a single
>     station at a time, this is a bit more of a categorical response
>     (i.e., is bird X present at colony 1-5 or "none of the above"
>     at time T?)  Have you thought about multistate mark-recapture 
> 
>     models ... ?
> 
> 
> I'll take REML = 0 out of the model. The birds can only be detected at
> one station at time T. I don't know what multistate mark-recapture
> models are. I mentally steered clear of mark-recapture because they make
> me think of population estimates or homerange type analyses. I'm after
> mostly behavioral information here.

  Well, I think they could be useful, but no need to delve into
unnecessary complications ...
> 
> 
>       This seems like a fairly complex problem.  How have others
>     in your field handled these kinds of data?  With lots of data
>     on each bird, you may be able to simplify your life a bit by
>     analyzing each bird separately -- i.e. a two-stage model rather
>     than a mixed/multilevel model -- Murtaugh 2007 _Ecology_ recommends
>     this, although I don't always agree with him ...
> 
> 
> I haven't seen others in my field with this sort of data use GLM or GLMM
> at all. I actually disagreed with previous methods and if my memory
> serves I didn't see any statistical information provided. 
> 
> 
>      I would recommend Zuur et al for messy/complex ecological
>     data, although I don't agree with all of that either ...
> 
> 
> So is there no command like "Subject = bird" as there is in PROC GLIMMIX
> in SAS?

  Well, the point is that (1|bird) *is* more or less equivalent (as I
understand it) to "Subject=bird".

  I would definitely follow Murtaugh's advice here and use the
*simplest* method that you think will make sense.  Doing a two-stage
analysis might be just fine.

  Ben Bolker