[R-sig-ME] glmer question family=Gamma(link="identity") error. Re: lmer formula syntax?
Ben Bolker
bbolker at gmail.com
Mon Feb 7 04:37:34 CET 2011
Colin Wahl <biowahl at ...> writes:
>
> Generalized mixed model geeks,
>
> I just recently asked a question to this list about lmer formula syntax and
> was fortunate to be flattered by your helpful responses (Re: lmer formula
> syntax). I now have a new question regarding the specification of my error
> distribution.
>
> My previous study description: I am performing an ecological study of stream
> health using %ept (An aggregate relative abundance variable for
> invertebrates) as my response variable. I am interested in how invertebrate
> counts vary between different watershed types (forested, cultivated,
> developed, grassland) and reach types (forested, non forested). There are a
> total of 12 streams, 12 watersheds (wsh) and 24 riparia(rip). I am using OSx
>
> > summary(ept[,2:5])
> wsh stream rip ept
> c:18 BA : 6 F:36 Min. :0.000000
> d:18 D1 : 6 N:36 1st Qu.:0.005797
> f:24 D2 : 6 Median :0.029630
> g:12 DC : 6 Mean :0.117869
> FE : 6 3rd Qu.:0.221027
> JO : 6 Max. :0.527090
> (Other):36
>
> The complete data set is detailed below.
> stream is the only random variable and it is nested (not implicitly) in
> wshed. My statistical model is as follows:
> ept(ijkl) = μ + wshedi + stream(i)j + ripariak + wshed*ripariaik +
> stream*riparia(i)jk + ε(ijk)l
>
> My response variable (%ept) is the proportion of invertebrates sampled that
> belong to sensitive orders (ephemeroptera, plecoptera, tricoptera). This is
> a binomial response ("ept" or "not ept"), but I am using a proportion
> instead of discrete counts. The error and variance structures should be
> better modeled with a binomial family (or Gamma for continuous [or
> proportional] data?) I tried this using the following formula with a Gamma
> distribution and an identity link:
>
My bottom line on this:
if you can get the denominator information (how many total inverts
were sampled in each place?) then you can do this as a binomial GLMM.
Otherwise I would just go for it as a regular LMM; the q-q plots of
residuals based on this approach don't look too bad. Binomial would
probably be slightly better: transforming the response variable won't
work because you have a big pile of zeros in your data.
Is there a reason you are interested in the within-stream,
within-zone variance? If not, I would be tempted to aggregate
these data to make life simpler (and the residuals a bit more
normal, although it doesn't quite reduce to a balanced split-plot
design)
x <- read.table("ept.dat",header=TRUE)
ax <- aggregate(ept~rip+wsh+stream,data=x,FUN=mean)
library(ggplot2)
ggplot(x,aes(x=rip,y=ept,colour=rip))+
facet_grid(.~wsh)+stat_sum(aes(size=..n..))+
geom_line(data=ax,aes(group=stream),colour="gray")
m1 <- lmer(ept~rip+wsh+(1|stream),data=ax)
qqnorm(residuals(m1))
qqline(residuals(m1),col="gray")
m2 <- lmer(ept~rip+wsh+(1|stream),data=x)
qqnorm(residuals(m2),pch=as.numeric(x$stream))
qqline(residuals(m1),col="gray")
Murtaugh, Paul A. 2007. Simplicity and Complexity in Ecological Data Analysis.
Ecology 88, no. 1: 56-62.
http://www.esajournals.org/doi/abs/10.1890/0012-9658%282007%2988%5B56%3ASACIED%5D2.0.CO%3B2.
More information about the R-sig-mixed-models
mailing list