[R-sig-ME] glmer question family=Gamma(link="identity") error. Re: lmer formula syntax?

Ben Bolker bbolker at gmail.com
Mon Feb 7 04:37:34 CET 2011

Colin Wahl <biowahl at ...> writes:

>
> Generalized mixed model geeks,
>
> I just recently asked a question to this list about lmer formula syntax and
> was fortunate to be flattered by your helpful responses (Re: lmer formula
> syntax). I now have a new question regarding the specification of my error
> distribution.
>
> My previous study description: I am performing an ecological study of stream
> health using %ept (An aggregate relative abundance variable for
> invertebrates) as my response variable. I am interested in how invertebrate
> counts vary between different watershed types (forested, cultivated,
> developed, grassland) and reach types (forested, non forested). There are a
> total of 12 streams, 12 watersheds (wsh) and 24 riparia(rip). I am using OSx
>
> > summary(ept[,2:5])
>  wsh        stream       rip         ept
>  c:18        BA     : 6   F:36      Min.   :0.000000
>  d:18        D1     : 6   N:36     1st Qu.:0.005797
>  f:24         D2     : 6                Median :0.029630
>  g:12        DC     : 6               Mean   :0.117869
>                 FE     : 6               3rd Qu.:0.221027
>                 JO     : 6               Max.   :0.527090
>                (Other):36
>
> The complete data set is detailed below.
> stream is the only random variable and it is nested (not implicitly) in
> wshed. My statistical model is as follows:
> ept(ijkl) = μ + wshedi  + stream(i)j + ripariak + wshed*ripariaik +
> stream*riparia(i)jk + ε(ijk)l
>
> My response variable (%ept) is the proportion of invertebrates sampled that
> belong to sensitive orders (ephemeroptera, plecoptera, tricoptera). This is
> a binomial response ("ept" or "not ept"), but I am using a proportion
> instead of discrete counts. The error and variance structures should be
> better modeled with a binomial family (or Gamma for continuous [or
> proportional] data?) I tried this using the following formula with a Gamma
> distribution and an identity link:
>

My bottom line on this:
if you can get the denominator information (how many total inverts
were sampled in each place?) then you can do this as a binomial GLMM.
Otherwise I would just go for it as a regular LMM; the q-q plots of
residuals based on this approach don't look too bad. Binomial would
probably be slightly better: transforming the response variable won't
work because you have a big pile of zeros in your data.

Is there a reason you are interested in the within-stream,
within-zone variance?  If not, I would be tempted to aggregate
these data to make life simpler (and the residuals a bit more
normal, although it doesn't quite reduce to a balanced split-plot
design)

ax <- aggregate(ept~rip+wsh+stream,data=x,FUN=mean)

library(ggplot2)
ggplot(x,aes(x=rip,y=ept,colour=rip))+
facet_grid(.~wsh)+stat_sum(aes(size=..n..))+
geom_line(data=ax,aes(group=stream),colour="gray")

m1 <- lmer(ept~rip+wsh+(1|stream),data=ax)
qqnorm(residuals(m1))
qqline(residuals(m1),col="gray")

m2 <- lmer(ept~rip+wsh+(1|stream),data=x)
qqnorm(residuals(m2),pch=as.numeric(x\$stream))
qqline(residuals(m1),col="gray")

Murtaugh, Paul A. 2007. Simplicity and Complexity in Ecological Data Analysis.
Ecology 88, no. 1: 56-62.
http://www.esajournals.org/doi/abs/10.1890/0012-9658%282007%2988%5B56%3ASACIED%5D2.0.CO%3B2.