[R-sig-ME] [R] Meaning of /, :, and %in% in lmer
Douglas Bates
bates at stat.wisc.edu
Sat Apr 19 20:58:21 CEST 2008
On 4/18/08, Claus Wilke <cwilke at mail.utexas.edu> wrote:
> > The short answer is that (1|A/B) is expanded to (1|A) + (1|A:B) so you
> > can choose whatever form makes sense to you.
> Thanks, that was what I needed to hear.
> > There are different circumstances where a notation like (1|A/B) would
> > be used. Some are reasonable choices and some are artifacts of
> > artificial ways of assigning labels to factor levels. Rather than my
> > trying to guess what kind of application you have in mind, could you
> > describe a situation where you would want to fit an lmer model with
> > terms like that?
> It's a virology experiment. We have two ancestral strains. From each of those
> we have derived several new strains, and then have made multiple fitness
> measurements on the new strains. We want to know whether the ancestral strain
> has an effect on the fitness of the derived strains. The model I'm using for
> that is
> fitness ~ ancestor + (1|ancestor:strain),
> because strains are nested within ancestors. If I were using
> fitness ~ ancestor + (1|ancestor/strain),
> then ancestor would get both a fixed and a random effect, which doesn't make
> sense.
The labeling question is related to the levels of the strain factor.
To me the sensible way to label strains is to give each unique strain
a unique label. In fact, I would go so far as to say that is the only
sensible way. So suppose the ancestral strains are called "A" and "B"
and there were 8 strains derived from "A" and 12 strains derived from
"B". The I would give them labels like "A01" up to "A08" and "B01" up
to "B12". Many people feel the strains from ancestor A should be
labeled 1 up to 8 and those from ancestor B labeled 1 up to 12 and
then incorporate the information that strain is nested within ancestor
somewhere in the model description. To me this makes no sense. If
strain 1 from ancestor A is not related in any way to strain 1 from
ancestor B, why call them both "1".
If the strains are labeled so that each unique strain has a unique
label then the model can be written as
fitness ~ ancestor + (1|strain)
or as
fitness ~ ancestor + (1|ancestor:strain)
whichever one makes sense to you. If the levels of strain reflect an
implicit nesting (that is, you need to know that strain 1 from
ancestor A is not the same as strain 1 from ancestor B, even though
they are given the same level of strain) then you must write the model
in the second form but only because the labels of strain are ambiguous
and the expression ancestor:strain is required to disambiguate the
levels.
> I have a second question, related to the hypothesis testing of whether the
> fixed ancestor effect is significant. I've read all the threads about why it
> is problematic to do an F test to calculate a p value, and that it is better
> to do markov-chain monte carlo. My question is: Is there a proper reference I
> can cite to substantiate the claim that the standard (i.e., SAS) way of
> calculating significance in this case is problematic, or do I have to refer
> to the mailing list archive?
Harald Baayen's recent book on "Analyzing Linguistic Data" has a good
discussion of some of the issues in determining significance of
fixed-effects terms in a mixed-effects model. I like some of the
explanations in his chapter 7.
To tell the truth I expect that the standard approach is reasonably
accurate for cases where the only random effects term in the model is
of the form (1|strain); it's in the more complex models that the
simple approximations get off track. The sort of data that Harald and
many others in psychometric areas consider is cross-classified
according to subject and item and the standard approaches get bogged
down there.
> Thanks a lot,
>
> Claus
>
> --
> Claus Wilke
> Section of Integrative Biology
> and Center for Computational Biology and Bioinformatics
> University of Texas at Austin
> 1 University Station C0930
> Austin, TX 78712
> cwilke at mail.utexas.edu
> 512 471 6028
>
More information about the R-sig-mixed-models
mailing list