[R-sig-ME] Multilevel binomial/survival model with repeated measures

Adam Smith raptorbio at hotmail.com
Fri Oct 7 23:23:04 CEST 2011

> > Fruits are counted over time (14 occasions; every 3-5 days) to
> > evaluate consumption by migratory birds. The fruits are clustered
> > within one of two subplots (a treatment and a control) assigned
> > randomly within a larger plot (12 plots in total). Individual
> > fruits were not marked, but rather the number of fruits consumed
> > during a given interval was noted. Thus, the number of fruits
> > present at the start of a time interval varies among subplots and
> > over time. I'm interested in how consumption varies with treatment
> > (at the subplot level), plot geography (e.g., north v south), and
> > count interval on the hazard/likelihood of consumption. Modeling
> > time as a categorical variable makes more sense biologically (i.e.,
> > modelling consumption separately for each count period), as
> > consumption over time is not likely to change linearly or
> > quadratically.
> (or in some not quite as simple but still smooth/deterministic
> way that could be modeled by a spline curve ... ?)

Like the thought, but even splines are not likely to approximate patterns of consumption well in this system...it's that sporadic.

> > I can envision two possible specifications for the
> > response variable: (1) model consumption using a time-to-event type
> > binary response in which each individual fruit provides up to 14 rows of
> > data, with consumption equaling 0 for each count period until
> > consumption (1) occurs (fruits can not contribute in count periods after
> > they have been consumed); this equates to what Singer and Willet
> > (Applied Longitudinal Data Analysis, 2003) call a discrete-time hazard
> > model, if that's at all familiar; or (2) ignore individual consumption
> > histories and aggregate fruits within subplots and model the proportion
> > of fruits consumed in each count interval.
> >
> > I lack the expertise
> > to decide which is most appropriate. Moreover, I'm stumped as to how to
> > specify the clustering and nesting in glmer, regardless of the approach
> > taken.
> >
> > For (1), the following comes to mind:
> >
> > consumption.survival
> > <- glmer (consumption ~ treatment + geography + count + (1|plot) +
> > (1|treatment:plot), family = binomial, ...)
> >
> > For (2):
> >
> > consumption.binomial
> > <- glmer (cbind(consumed, total) ~ treatment + geography + count +
> > (1|plot) + (1|treatment:plot), family = binomial, ...)
> >
> I would say that #2 is much more natural.

Not to mention computationally less taxing on my computer.  Have I specified the random effects correctly to account for the nesting present in the design?  Additionally, is it fair to explore interactions among factors (e.g., geography|treatment, treatment|count)?
> * glmer is expecting cbind(consumed,total-consumed) rather than
> cbin(consumed,total)

Noted.  This will also account for the changing "availability" of fruit during a count period over time, yet consider, for example, 2 of 4 fruits consumed equally to, say, 100 of 200 fruits consumed, correct?

> * I think you might additionally want to make 'count'
> a random effect; make sure it is a factor (unless you are
> really interested in the specific effects at particular
> time periods and you don't mind using up all those degrees
> of freedom estimating them ...

I am interested in estimating the probability of consumption for each count period, as I'd like to relate these estimates to other factor that influence the abundance of birds on the study site, but perhaps I can do this with the estimated random effects just the same?  However, I don't necessarily need these estimates to be adjusted for other factors (e.g., treatment, geography), but rather marginal means for each count period?  

> * You don't seem to have accounted for the inter-sample
> period (you could do a bit of exploratory data analysis,
> or residual analysis, to see if the difference between
> 3 and 5 days seems to matter). You might try family=binomial(link="cloglog")
> with an offset equal to log(Dt), which will make the probability
> of consumption within Dt days equal to 1-(1-exp(-Dt*p1)), where
> p1 is the per-day consumption probability.

I included Dt as a covariate in some earlier messing, but I will explore the idea of using it as an offset.  

> * You may want to a random effect for individual observations (i.e.
> mydata$obs <- seq(nrow(mydata))). #2 seems counterintuitive,
> but it allows for overdispersion (which can be driven either by
> non-independence within samples, or by heterogeneity within
> samples).

I saw this in the Browne et al. (2005, JRoyStatSoc A, 168, 599-613) your referenced in an older thread, and thought it worthwhile to pursue.  In this case, each observation would be the cbind(consumed, total-consumed) for each treatment by plot by count, correct?  Thus, if there were 10 plots, 2 treatments, and 10 count periods, there would be 200 observations?

> I'm not sure either approach is perfect, but the binomial
> approach seems better here.

More information about the R-sig-mixed-models mailing list