[R-sig-ME] nested cross-sectional model using lme4 or nlme

Fri Jun 26 14:54:37 CEST 2009

Le vendredi 26 juin 2009 à 07:15 +0100, Alan Kelly a écrit :
> Dear all, following a suggestion from one of your contributors, I'm  
> posting the following query to R-sig-mixed having had limited response  
> from R-help.
> 
> I'm attempting to analyse a nested cross-sectional design in which an  
> intervention was offered to a series of randomly selected (small)  
> communities, so the unit of randomisation is the community.  All  
> available individuals in each community were interviewed before the  
> intervention and again at follow-up post-intervention.  The set of  
> available individuals at baseline and at follow-up was far from  
> identical (a common feature of such designs).  Similarly, a series of  
> control communities were interviewed.  This type of design is  used in  
> epidemiological studies particularly in interventions designed to  
> alter lifestyle factors.  Such designs tend to be highly unbalanced.   
> Murray et al. discuss the appropriate analysis of such studies  
> (Analysis of data from group-randomized trials with repeat  
> observations on the same groups, Stats in Med. 17, 1581-1600).  They  
> offer three examples of  SAS code - one of which is as follow:
> 
> proc mixed;
> class cond unit timecat;
> model y=cond timecat cond*timecat/ddfm=res;
> random int timecat/subject=unit(cond);
> run;
> 
> cond is 0/1 corresponding to control/intervention
> timecat is 0/1 corresponding to baseline/follow-up
> unit is 1 to 39 and identifies the communities.
> and y is a continuous score
> 
> Unfortunately I'm not familiar with SAS code.  I would expect random  
> effects for unit and timecat X unit.
> 
> I would much appreciate any suggestions on how to code the above in  
> lme4 or nlme.

Caution note : I have not used proc mixed in a log time, and I might be
(quite) rusty). Caveat emptor...

I assume that cond and timecat are coded either as boolean
(FALSE:control|baseline) or as factors (reference=control|baseline), and
that unit is a factor.

One thing is not clear in your description : is y measured at the
community level (i. e. one measure per community per occasion) or at an
individual level (i. e. n_i measures per community per occasion) ? If
the latter, can you individualize successive measures (responses of the
same individual) ? These three eventualities would, IMHO, lead to three
very different models.

In the first case (community-level answer), an "obvious" coding in lmer
would be :

lmer(y ~ cond * timecat + (timecat | unit), ...)

In lme, that would be

lme(y ~ cond * timecat, random=~timecat | unit, ...)

In this model, the "cond" fixed effect is in fact the difference between
conditions at baseline. Since your study is randomized, you should
expect this effect to be 0. The timecat effect is indeed the "natural
evolution" of you communities without intervention, and the net effect
of your intervention is measured by the cond:timecat interaction.

Since you have only one observation per experimental slot, I doubt that
you will get much precision...

You may also note that, in this case, your plan is balanced by
definition (1 observation per slot). aov() might do the job...

In the second case (individual answers, not identifiable between
occasions), the model is written the same way. Here, lme might be
prefered to lmer because the variance and correlation function might
allow you to model the variability between experimental slots (e. g.
heteroscedasticity, between-occasions correlation, etc...).

In the third case (individual answers identifiable between occasions),
another important coupling appears : the answers given by an individual
after intervention are quite likely related to the answers before
intervention. Assuming that an individual belongs to one community only,
it is tempting to exploit this by writing :

lmer(y ~ cond * timecat + (timecat | unit/individual), ...)

If individuals can "switch communities" between occasions (or belong to
more than one), this simple nesting is not valid. Here, lmer does
probably a better job than lme...

HTH,

					Emmanuel Charpentier