[R-sig-ME] random effects specification

Mon Apr 7 08:10:17 CEST 2008

On 07/04/2008, at 1:04 AM, Douglas Bates wrote:
> On Sat, Apr 5, 2008 at 10:14 PM, Ken Beath <kjbeath at kagi.com> wrote:
>> On 05/04/2008, at 12:05 AM, Sebastian P. Luque wrote:
>>> On Fri, 4 Apr 2008 07:17:36 -0500,
>>> "Douglas Bates" <bates at stat.wisc.edu> wrote:
>>>
>>> [...]
>>>
>>>> I'm not sure that I understand what you mean by "treatment being
>>>> nested within community".  Does this mean that there are really 8
>>>> different treatments because treatment 1 in community A is  
>>>> different
>>>> from treatment 1 in community B?  If so, then it would make sense  
>>>> to
>>>> me to simply create a new factor that is the interaction of  
>>>> treatment
>>>> and community.
>>>
>>> I was not employing the term "nested" properly.  The number of  
>>> levels
>>> for both community and treatment are 2 and 4, respectively, just  
>>> as in
>>> the example.  The same 4 treatments were used in both communties, so
>>> in
>>> fact, treatment is crossed with community, not nested.  However,
>>> subjects are nested within communities because each subject  
>>> belongs to
>>> one community only, yet received all 4 treatments.  Sorry for this
>>> confusion.
>>>
>>
>> Once they are considered fixed effects, concepts of crossing and
>> nesting are irrelevant. They are simply covariates. So a model of the
>> form n ~ treatment + community +(1|id) or if the treatment effect is
>> allowed to vary between communities n ~ treatment *community +(1|id)
>> is appropriate. The main problem is your subject id are not unique.
>> You will need to define a new id.
>
> I agree with everything up to here.
>
>> The easiest way is to add a
>> different large number to id depending on community.
>
> That approach contradicts your later advice to represent a factor
> variable as a factor in R.  If id is a factor (as it should be) you
> can't add  a large number to it.
>

I was thinking of this in terms of the original data, and should have  
mentioned that a conversion may be required in R, as described in  
help(factor).

> The specification (1|treatment:id) generates unique id's.
>
> To me the convention that different experimental units should be given
> the same level of 'id' is just another nonsensical aspect of the
> traditional approaches to random-effects models using observed and
> expected mean squares, for which it makes sense to index the
> observations by group and by unit within group.
>
> If we could manage to unlearn old habits and just give each subject a
> unique id at the start it would make life easier.
>
>>
>>
>>>
>>>> Perhaps I am approaching the community factor incorrectly.  In your
>>>> data there are two communities so, even if it would be reasonable  
>>>> to
>>>> model community effects as random effects, that would be difficult.
>>>> With only two levels I think it is best modeled as a fixed effect,
>>>> which would mean that questions about treatment and community are
>>>> related to the fixed effects.
>>>
>>> Could you please show a formula for the case where each individual  
>>> is
>>> seen at both communities (community and treatment still being  
>>> fixed)?
>>> This would help me understand the syntax better.
>>>
>>
>> Same model as previous, provided a subject only receives a treatment
>> once. If a subject receives the same treatment more than once then
>> there needs to be a random effect that models the correlation between
>> repeated measurements of the same treatment, so the model is
>> y~treatment+community+(1|id/treatment) One problem that may have
>> occurred in your original attempts is that id and treatment need to  
>> be
>> factors.
>
> Yes, that is one way of expressing an interaction between a random
> effect for id and a fixed effect for treatment.
>
> It expands to two random effects terms (1|id) + (1|id:treatment).  The
> first is the effect for person and the second is the effect of
> different individuals having different responses to the levels of
> treatment.
>
> A more general model (and consequently more difficult to estimate on
> occasion) has possible correlations of the random effects for
> different levels of treatment within individual.  The term is written
> (treatment|id).
>

This does give problems fitting although I get sensible results.

The need for id to be a factor is not consistent but depends on the  
model (this is using the version of lme4 on CRAN). In the following  
simulation (for the model with subjects treated in both communities)   
id as a numeric works for (1|id) but needs to be a factor for (1|id/ 
treatment).

library(lme4)

nsubjects <- 100

id <- rep(1:nsubjects,each=8)
treatment <- rep(1:4,times=nsubjects*2)
community <- rep(1:2,each=4,times=nsubjects)

thedata <- data.frame(id,treatment,community)

subject <- data.frame(id=1:nsubjects,subject=rnorm(nsubjects))

thedata <- merge(thedata,subject)

treatment <- data.frame(id=rep(1:nsubjects, 
4),treatment=rep(1:4,each=nsubjects),subtreat=rnorm(4*nsubjects))

thedata <- merge(thedata,treatment)

thedata$y <- thedata$subject+thedata$subtreat+thedata$community+thedata 
$treatment+rnorm(8*nsubjects)/10

thedata$treatment <- factor(thedata$treatment)
thedata$community <- factor(thedata$community)

# ignoring  treatment within subject correlations
lmer0 <- lmer(y~treatment+community+(1|id),data=thedata)

print(lmer0)

thedata$id <- factor(thedata$id)

lmer1 <- lmer(y~treatment+community+(1|id/treatment),data=thedata)

print(lmer1)

Ken