[R-sig-ME] model for clustered longitudinal binary data

Wed Oct 9 23:46:19 CEST 2013

Adrien Combaz <Adrien.Combaz at ...> writes:

[snip]

> > > I measure a longitudinal binary outcome (correctness of detection,
> > > 0: incorrect, 1: correct) with respect to 5 different experimental
> > > conditions (1 baseline and 4 treatments). The outcome is always
> > > measured at the same 10 time points. Each of the 9 subjects
> > > participated in all 5 conditions.  Additionally, for each subject and
> > > condition, the experiment was replicated 36 times. I therefore end up
> > > with 9*5*36=1620 binary longitudinal series (= trials of 10 points
> > > each).

[snip]

> >   Correlation among trials for a given subject 
> should be straightforward,
> > correlation along time for a given trial may be difficult (see below).
> 
> Yes, this is my main issue.

  I forgot to say that unless you are explicitly interested
in the estimated correlation structure, you could hope to get
around this by fitting the model without correlation and then
showing that the temporal autocorrelation in the residuals is
negligible ....

> > > I am considering a 3 levels logistic models where 10 consecutive
> > > binary measurements (level 1) are obtained on replicates (level 2)
> >> which are clustered into subjects (level 3). My only level 1 covariate
> > > would be the time of measurement (ordinal factor, T = 1, ..., 10) and
> > > as level 2 covariate, I consider the experimental condition. I don't
> > > consider any level 3 covariate per se, but still want the model to
> > > account for between-subject variability.

> > This all seems reasonable.  If you really want time to be treated
> > as ordinal, you'll want to look at the clmm function from the
> > 'ordinal' package.   

[snip]

> I am not sure to understand how I can use the clmm function, I am
> not familiar with it but from what I could read, it is used to fit
> cumulative link models for an ordinal response variable, while in my
> case time is not the response variable but a factor (and my response
> variable is binary).

 You're right, my bad.  The only difference between ordered and
unordered factors in the standard R approach to model-fitting is
that by default, treatment contrasts are used for unordered and
orthogonal polynomial contrasts are used for ordered factors.  Another
perhaps underused option is to specify successive-differences
contrasts, using the contr.sdif() function in the MASS package.
None of these will make a difference in the overall complexity or
fit of the model, just in the interpretation of the parameters.

> I preferred to treat time as discrete factor rather than a
> continuous variable for 2 reasons: 1) it represents a number of
> cycles which is discrete and ordered by nature 2) on average, the
> correctness (logit) increases with time, but the relationship is
> nonlinear. It means that, if I use the time as a continuous
> variable, I should choose an adequate transformation to obtain a
> linear relationship, which can be very subjective. Since my main
> objective is to study the influence of the experimental condition, I
> didn't really want to go there.

> > A simple model would be something like
> > 
> >  response ~ time + expcond + (1|rep/sub)

> I tried something like that with the lmer function, only difference
> is that I had as random effect (1|sub/rep). I thought that it was
> the proper syntax for replicates nested within subjects, giving a
> random intercept for each subject and for each replicate within
> subject. Am I missing something?

  No, my bad again.  it should be sub/rep

> 
> > 
> > As a more complete model you could consider
> > 
> >  response ~ time + expcond + (time|rep/sub) + (expcond|sub)

> With such a model where expcond is also used to define the random
> effect structure, can I use the anova function to compare it to the
> following "null model": response ~ time + (time|rep/sub) +
> (expcond|sub) and make a statement on the significance of the effect
> of the experiment condition? 

  Yes.