[R-sig-ME] Multilevel binomial/survival model with repeated measures
Ben Bolker
bbolker at gmail.com
Fri Oct 7 15:19:51 CEST 2011
Adam Smith <raptorbio at ...> writes:
> Fruits are counted over time (14 occasions; every 3-5 days) to
> evaluate consumption by migratory birds. The fruits are clustered
> within one of two subplots (a treatment and a control) assigned
> randomly within a larger plot (12 plots in total). Individual
> fruits were not marked, but rather the number of fruits consumed
> during a given interval was noted. Thus, the number of fruits
> present at the start of a time interval varies among subplots and
> over time. I'm interested in how consumption varies with treatment
> (at the subplot level), plot geography (e.g., north v south), and
> count interval on the hazard/likelihood of consumption. Modeling
> time as a categorical variable makes more sense biologically (i.e.,
> modelling consumption separately for each count period), as
> consumption over time is not likely to change linearly or
> quadratically.
(or in some not quite as simple but still smooth/deterministic
way that could be modeled by a spline curve ... ?)
> I can envision two possible specifications for the
> response variable: (1) model consumption using a time-to-event type
> binary response in which each individual fruit provides up to 14 rows of
> data, with consumption equaling 0 for each count period until
> consumption (1) occurs (fruits can not contribute in count periods after
> they have been consumed); this equates to what Singer and Willet
> (Applied Longitudinal Data Analysis, 2003) call a discrete-time hazard
> model, if that's at all familiar; or (2) ignore individual consumption
> histories and aggregate fruits within subplots and model the proportion
> of fruits consumed in each count interval.
>
> I lack the expertise
> to decide which is most appropriate. Moreover, I'm stumped as to how to
> specify the clustering and nesting in glmer, regardless of the approach
> taken.
>
> For (1), the following comes to mind:
>
> consumption.survival
> <- glmer (consumption ~ treatment + geography + count + (1|plot) +
> (1|treatment:plot), family = binomial, ...)
>
> For (2):
>
> consumption.binomial
> <- glmer (cbind(consumed, total) ~ treatment + geography + count +
> (1|plot) + (1|treatment:plot), family = binomial, ...)
>
I would say that #2 is much more natural.
* glmer is expecting cbind(consumed,total-consumed) rather than
cbin(consumed,total)
* I think you might additionally want to make 'count'
a random effect; make sure it is a factor (unless you are
really interested in the specific effects at particular
time periods and you don't mind using up all those degrees
of freedom estimating them ...
* You don't seem to have accounted for the inter-sample
period (you could do a bit of exploratory data analysis,
or residual analysis, to see if the difference between
3 and 5 days seems to matter). You might try family=binomial(link="cloglog")
with an offset equal to log(Dt), which will make the probability
of consumption within Dt days equal to 1-(1-exp(-Dt*p1)), where
p1 is the per-day consumption probability.
* You may want to a random effect for individual observations (i.e.
mydata$obs <- seq(nrow(mydata))). #2 seems counterintuitive,
but it allows for overdispersion (which can be driven either by
non-independence within samples, or by heterogeneity within
samples).
I'm not sure either approach is perfect, but the binomial
approach seems better here.
More information about the R-sig-mixed-models
mailing list