[R-sig-ME] model for clustered longitudinal binary data

Wed Oct 9 16:00:14 CEST 2013

Thanks Ben for your reply,

> >
> > Dear list members,
> 
> 
> [snip]
> 
> > I measure a longitudinal binary outcome (correctness of detection,
> > 0: incorrect, 1: correct) with respect to 5 different experimental
> > conditions (1 baseline and 4 treatments). The outcome is always
> > measured at the same 10 time points. Each of the 9 subjects
> > participated in all 5 conditions.  Additionally, for each subject and
> > condition, the experiment was replicated 36 times. I therefore end up
> > with 9*5*36=1620 binary longitudinal series (= trials of 10 points
> > each).
> 
> > My aim is to assess the influence of the experimental condition on my
> > binary outcome. I need to build a model that would take into
> > consideration the correlation along time for a given trial and the
> > correlation among trials for a given subject.
> 
>   Correlation among trials for a given subject should be straightforward,
> correlation along time for a given trial may be difficult (see below).

Yes, this is my main issue.

> 
> > I am considering a 3 levels logistic models where 10 consecutive
> > binary measurements (level 1) are obtained on replicates (level 2)
> > which are clustered into subjects (level 3). My only level 1 covariate
> > would be the time of measurement (ordinal factor, T = 1, ..., 10) and
> > as level 2 covariate, I consider the experimental condition. I don't
> > consider any level 3 covariate per se, but still want the model to
> > account for between-subject variability.
> 
> This all seems reasonable.  If you really want time to be treated as ordinal,
> you'll want to look at the clmm function from the 'ordinal'
> package.  In most R modeling packages you don't need to state explicitly
> which levels the covariates are measured at (but keeping track of it is of
> course useful for thinking about issues of identifiability, etc.)

I am not sure to understand how I can use the clmm function, I am not familiar with it but from what I could read, it is used to fit cumulative link models for an ordinal response variable, while in my case time is not the response variable but a factor (and my response variable is binary).

I preferred to treat time as discrete factor rather than a continuous variable for 2 reasons:
1) it represents a number of cycles which is discrete and ordered by nature
2) on average, the correctness (logit) increases with time, but the relationship is nonlinear. It means that, if I use the time as a continuous variable, I should choose an adequate transformation to obtain a linear relationship, which can be very subjective. Since my main objective is to study the influence of the experimental condition, I didn't really want to go there.

> 
> A simple model would be something like
> 
>  response ~ time + expcond + (1|rep/sub)

I tried something like that with the lmer function, only difference is that I had as random effect (1|sub/rep). I thought that it was the proper syntax for replicates nested within subjects, giving a random intercept for each subject and for each replicate within subject. Am I missing something?

> 
> As a more complete model you could consider
> 
>  response ~ time + expcond + (time|rep/sub) + (expcond|sub)

With such a model where expcond is also used to define the random effect structure, can I use the anova function to compare it to the following "null model":
response ~ time + (time|rep/sub) + (expcond|sub)
and make a statement on the significance of the effect of the experiment condition?

> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models