[R-sig-ME] lmer formula syntax?

Wed Feb 2 15:20:22 CET 2011

On Tue, Feb 1, 2011 at 4:14 PM, Colin Wahl <biowahl at gmail.com> wrote:
> Dear r-sig-mixed-models List,
>
> I've been digging through the r-help archive and various books in search of
> an answer to my query. I have been thus far unsuccessful. I am not new to R,
> but have primarily focused on multivariate non parametric analyses in the
> past; my experience with lmer has been confined to the last few weeks. The
> model formulas I have written work and I have some idea of which are
> accurate/best based on AIC scores, but I would very much appreciate an
> informed opinion on my syntax.
>
> I am performing an ecological study of stream health using %EPT (An
> aggregate relative abundance variable for invertebrates) as my response
> variable. I am interested in how invertebrate counts vary between different
> watershed types (forested, cultivated, developed, grassland) and reach types
> (forested, non forested). There are a total of 12 streams, 12 watersheds and
> 24 reaches. The wshed factor is unbalanced.
>
> Data structure:
>> str(ept)
> 'data.frame':    72 obs. of  4 variables:
>  $ wshed  : Factor w/ 4 levels "c","d","f","g": 1 1 1 1 1 1 1 1 1 1 ...
>  $ stream : Factor w/ 12 levels "BA","D1","D2",..: 4 4 4 6 6 6 10 10 10 4
> ...
>  $ riparia: Factor w/ 2 levels "F","N": 1 1 1 1 1 1 1 1 1 2 ...
>  $ ept    : num  0.318 0.156 0.194 0.146 0.158 ...

> stream is the only random variable and it is nested (not implicitly) in
> wshed. My standard statistical model is as follows:
> ept(ijkl) = ì + wshedi  + stream(i)j + ripariak + wshed*ripariaik +
> stream*riparia(i)jk + å(ijk)l

> The best option I have come up with thus far is:
>>model<-lmer(ept ~ wshed + riparia + wshed:riparia + (1| wshed/stream) + (1|
> stream:riparia), data=ept, family="gaussian")
> or alternatively:
>>model<-lmer(ept ~ wshed*riparia + (1| wshed/stream) + (1| stream:riparia),
> data=ept, family="gaussian")

Thanks to Toby Marthews and George Wang for their answers.

I'm sorry to say that there is no good comprehensive tutorial on lme4
yet - still working on it but right now I am more immersed in the
code.

The best way to read a term like (1|wshed/stream) is "wshed and stream
within wshed".  I tend to avoid that notation because I prefer to be
explicit about the terms.

So we would get to a formula like

eps ~ wshed*riparia + (1|stream)

before considering the stream:riparia interaction.  When you have two
factors, one with fixed levels like riparia and one with random levels
like stream you can write their interaction in two ways: a
vector-valued random effects term like (riparia|stream) where there
are two, possibly correlated, random effects for each stream; and as
two nested scalar random effects, (1|stream) + (1|riparia:stream).
For either to make sense, you must have several streams with both "F"
and "N" levels of riparia.

I generally prefer the second form because it models only the
combinations that occur in the datal, although it looks like you have
balanced data (3 obs on each stream under each riparia condition) so
that wouldn't be an issue for you.  The first formulation involves
estimating three variance components (two variances and one
covariance) whereas the second involves estimating only two variances.
 The way that you wrote the model in "subscript fest" notation would
correspond to the second form.

Finally, it is best not to include family="gaussian" in a call to
lmer.  Technically, lmer only fits linear mixed models in which all
distributions are Gaussian.  At one point I added code that would
detect a family argument and quietly call glmer instead.  Then I
showed examples of this, which was a bad idea because it is now common
practice to call lmer in place of glmer.  I believe the code checks
for the case of family="gaussian" (with the default link) and goes
ahead with the call to lmer but if I get that wrong then you end up
calling glmer, which is slower, instead of lmer.

I hope this helps.

> Question: Am I specifying the random terms and nesting structure correctly?
>
> Secondarily: I have read a few times that "wshed/stream" is interchangeable
> with  "wshed:stream" which is a meaningless interaction. Also I've seen that
> random effects are specified as (a|b) where a is a covariate and b is a
> grouping factor. Does having 1 as a covariate simply specifying an intercept
> of 1? What is the purpose of placing a factor in the place of 1 as a
> covariate? OR is there a nice complete summary tutorial that I've missed.
>
>
> I'm looking forward to hearing any comments.
>
> Thank you,
> --
> Colin Wahl
> Department of Biology
> Western Washington University
> Bellingham, WA 98225
> ph: 360-391-9881
>
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>