[R-sig-ME] model for clustered longitudinal binary data

Adrien Combaz Adrien.Combaz at med.kuleuven.be
Fri Oct 11 10:08:13 CEST 2013


> -----Original Message-----
> From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-
> models-bounces at r-project.org] On Behalf Of Ben Bolker
> Sent: Wednesday, October 09, 2013 11:46 PM
> To: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] model for clustered longitudinal binary data
> 
> Adrien Combaz <Adrien.Combaz at ...> writes:
> 
> 
> [snip]
> 
> > > > I measure a longitudinal binary outcome (correctness of detection,
> > > > 0: incorrect, 1: correct) with respect to 5 different experimental
> > > > conditions (1 baseline and 4 treatments). The outcome is always
> > > > measured at the same 10 time points. Each of the 9 subjects
> > > > participated in all 5 conditions.  Additionally, for each subject
> > > > and condition, the experiment was replicated 36 times. I therefore
> > > > end up with 9*5*36=1620 binary longitudinal series (= trials of 10
> > > > points each).
> 
> [snip]
> 
> > >   Correlation among trials for a given subject
> > should be straightforward,
> > > correlation along time for a given trial may be difficult (see below).
> >
> > Yes, this is my main issue.
> 
>   I forgot to say that unless you are explicitly interested in the estimated
> correlation structure, you could hope to get around this by fitting the model
> without correlation and then showing that the temporal autocorrelation in
> the residuals is negligible ....
> 

That would indeed be nice.
Although, I was advised to avoid looking at residuals when doing logistic mixed models on binary data. I'm actually not sure about what they represent. When doing a normal mixed model, I'm able to retrieve my observed data by adding up fitted values and residuals, but it's not the case with logistic regression.
Therefore I'm wondering what they really represent and if looking at their autocorrelation will give me the information I expect.


> 
> > > > I am considering a 3 levels logistic models where 10 consecutive
> > > > binary measurements (level 1) are obtained on replicates (level 2)
> > >> which are clustered into subjects (level 3). My only level 1
> > >> covariate
> > > > would be the time of measurement (ordinal factor, T = 1, ..., 10)
> > > > and as level 2 covariate, I consider the experimental condition. I
> > > > don't consider any level 3 covariate per se, but still want the
> > > > model to account for between-subject variability.
> 
> > > This all seems reasonable.  If you really want time to be treated as
> > > ordinal, you'll want to look at the clmm function from the
> > > 'ordinal' package.
> 
> [snip]
> 
> > I am not sure to understand how I can use the clmm function, I am not
> > familiar with it but from what I could read, it is used to fit
> > cumulative link models for an ordinal response variable, while in my
> > case time is not the response variable but a factor (and my response
> > variable is binary).
> 
>  You're right, my bad.  The only difference between ordered and unordered
> factors in the standard R approach to model-fitting is that by default,
> treatment contrasts are used for unordered and orthogonal polynomial
> contrasts are used for ordered factors.  Another perhaps underused option is
> to specify successive-differences contrasts, using the contr.sdif() function in
> the MASS package.
> None of these will make a difference in the overall complexity or fit of the
> model, just in the interpretation of the parameters.
> 
> > I preferred to treat time as discrete factor rather than a continuous
> > variable for 2 reasons: 1) it represents a number of cycles which is
> > discrete and ordered by nature 2) on average, the correctness (logit)
> > increases with time, but the relationship is nonlinear. It means that,
> > if I use the time as a continuous variable, I should choose an
> > adequate transformation to obtain a linear relationship, which can be
> > very subjective. Since my main objective is to study the influence of
> > the experimental condition, I didn't really want to go there.
> 
> > > A simple model would be something like
> > >
> > >  response ~ time + expcond + (1|rep/sub)
> 
> > I tried something like that with the lmer function, only difference is
> > that I had as random effect (1|sub/rep). I thought that it was the
> > proper syntax for replicates nested within subjects, giving a random
> > intercept for each subject and for each replicate within subject. Am I
> > missing something?
> 
>   No, my bad again.  it should be sub/rep
> 
> >
> > >
> > > As a more complete model you could consider
> > >
> > >  response ~ time + expcond + (time|rep/sub) + (expcond|sub)
> 
> > With such a model where expcond is also used to define the random
> > effect structure, can I use the anova function to compare it to the
> > following "null model": response ~ time + (time|rep/sub) +
> > (expcond|sub) and make a statement on the significance of the effect
> > of the experiment condition?
> 
>   Yes.

Although this model seems nice, I'm reaching the maximum number of iterations without getting convergence, so I'll probably have to go for something a bit simpler.

> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list