[R-sig-ME] Multilevel binomial/survival model with repeated measures

Fri Oct 7 15:19:51 CEST 2011

Adam Smith <raptorbio at ...> writes:

> Fruits are counted over time (14 occasions; every 3-5 days) to
> evaluate consumption by migratory birds.  The fruits are clustered
> within one of two subplots (a treatment and a control) assigned
> randomly within a larger plot (12 plots in total).  Individual
> fruits were not marked, but rather the number of fruits consumed
> during a given interval was noted.  Thus, the number of fruits
> present at the start of a time interval varies among subplots and
> over time.  I'm interested in how consumption varies with treatment
> (at the subplot level), plot geography (e.g., north v south), and
> count interval on the hazard/likelihood of consumption.  Modeling
> time as a categorical variable makes more sense biologically (i.e.,
> modelling consumption separately for each count period), as
> consumption over time is not likely to change linearly or
> quadratically.

 (or in some not quite as simple but still smooth/deterministic
way that could be modeled by a spline curve ... ?)

> I can envision two possible specifications for the
>  response variable:  (1) model consumption using a time-to-event type 
> binary response in which each individual fruit provides up to 14 rows of
>  data, with consumption equaling 0 for each count period until 
> consumption (1) occurs (fruits can not contribute in count periods after
>  they have been consumed); this equates to what Singer and Willet 
> (Applied Longitudinal Data Analysis, 2003) call a discrete-time hazard 
> model, if that's at all familiar; or (2) ignore individual consumption 
> histories and aggregate fruits within subplots and model the proportion 
> of fruits consumed in each count interval.
> 
> I lack the expertise 
> to decide which is most appropriate.  Moreover, I'm stumped as to how to
>  specify the clustering and nesting in glmer, regardless of the approach
>  taken.
> 
> For (1), the following comes to mind:
> 
> consumption.survival
>  <- glmer (consumption ~ treatment + geography + count + (1|plot) + 
> (1|treatment:plot), family = binomial, ...)
> 
> For (2):
> 
> consumption.binomial
>  <- glmer (cbind(consumed, total) ~ treatment + geography + count + 
> (1|plot) + (1|treatment:plot), family = binomial, ...)
> 

  I would say that #2 is much more natural.

* glmer is expecting cbind(consumed,total-consumed) rather than 
cbin(consumed,total)

* I think you might additionally want to make 'count'
a random effect; make sure it is a factor (unless you are
really interested in the specific effects at particular
time periods and you don't mind using up all those degrees
of freedom estimating them ...

* You don't seem to have accounted for the inter-sample
period (you could do a bit of exploratory data analysis,
or residual analysis, to see if the difference between
3 and 5 days seems to matter).  You might try family=binomial(link="cloglog")
with an offset equal to log(Dt), which will make the probability
of consumption within Dt days equal to 1-(1-exp(-Dt*p1)), where
p1 is the per-day consumption probability.

* You may want to a random effect for individual observations (i.e.
mydata$obs <- seq(nrow(mydata))).  #2 seems counterintuitive,
but it allows for overdispersion (which can be driven either by
non-independence within samples, or by heterogeneity within
samples).

  I'm not sure either approach is perfect, but the binomial
approach seems better here.