[R-sig-ME] fitting a distribution to zero-inflated catch per unit effort mixed model
Ben Bolker
bbolker at gmail.com
Sat Feb 25 22:41:41 CET 2012
[cc'ing back to r-sig-mixed-models]
On 12-02-25 03:53 PM, Karla Letto wrote:
> Thank you Ben for the detailed response. I greatly appreciate it. I will
> let you know how it goes after I give each a try. In answer to your
> questions. My sample size is 25 Meadow Voles.
Meaning you caught a total of 25 voles in the whole study? Be warned,
you may find that your model is overfitted -- typically you can fit
about/at most 1 parameter per 10 (effective) data points, which is
something between 25 (the number of voles) and 48 (observations) -- so
your fixed-effect parameters (line+habitat+type = 6 parameters) are
already on the verge of more information than you can estimate, even
before you start counting random effects (3 variances, for
site/cycle/observation-level variation).
> I calculated CPUE for each
> line. So I have a total of 48 observations (3 lines for 8 sites and each
> site was visited twice. I thought I had to use cycle as a random factor
> to account for temporal pseudoreplication. I can see how your idea of
> nesting cycle within site as a random factor will work better though.
>
> I tested for normality using a normal q-q plot and homogeneity by
> plotting the residuals and fitted values (I kept getting a cone shape).
Hmm. A cone shape does imply heteroscedasticity that isn't handled by
the model assumptions ... I *think* resid() should give you Pearson
residuals (i.e. already corrected assuming variance=mean). So that's a
little puzzling. I don't expect normality at all in residuals from a
model with a mean of ~ 2 individuals per sample ...
>
>> qqnorm(resid(model))
>
>> qqline(resid(model))
>
>
>> plot(fitted(model),resid(model))
>
> Can you please tell me why I have to use LOG(effort) as an offset
> instead of just effort as the offset? That is not the first time I heard
> that it had to be LOG(effort) but I cannot find a reason for why.
Because the offset is added on the scale of the linear predictor,
which in this case is the log scale -- i.e., the expected mean number of
counts is
exp([fixed effect terms] + [random effect terms] + offset) =
exp([fixed effect terms] + [random effect terms])* exp(offset)
>
> Thank you,
>
> Karla
>
> On Thu, Feb 23, 2012 at 7:40 PM, Ben Bolker <bbolker at gmail.com
> <mailto:bbolker at gmail.com>> wrote:
>
> Karla Letto <karla.letto at ...> writes:
>
> > I am having trouble fitting a distribution to my mixed model for
> meadow
> > vole catch per unit effort (CPUE) data. I have tried several
> families and
> > cannot find one that does not violate both the homogeneity and
> normality
> > assumptions. NOTE: The data set is zero-inflated (no captures).
> > Here is my study design:
> >
> > I am trying to determine if the CPUE of meadow voles differ among
> lines
> > (line = near,mid, or far) at increasing distances from a linear
> feature
> > (type = road, trail or powerline corridor) in two different
> habitat types
> > (habitat=forest or barren).
> > Response variable: catch per unit effort (non-integer values)
> > Fixed explanatory variables: line (3 categories), habitat (2
> categories),
> > type (3 categories)
> > Random explanatory variable: site (8 categories), cycle (2 categories)
>
>
> > The random variable site is for the 8 different sites I sampled in (4
> > barren and 4 forest) and the cycle is there because I visited each
> site
> > twice.
>
> Practically speaking you probably can't use cycle as a random
> effect; you can include cycle as a fixed effect (specifying the
> difference between first & second visits), and possibly
> nesting it within site as a random effect (if you have more than
> one observation per site/sample combination).
>
> What is the total size (number of observations) in your data set?
>
> > Here is an excerpt of my data set:
> > CE2 catch effort site line cycle habitat type
> > 0.000000 0 57.5 A near first forest trail
> > 3.278689 2 61.0 A mid first forest trail
> > 0.000000 0 60.5 A far first forest trail
> > 0.000000 0 66.5 G near first barren road
> > 0.000000 0 74.5 G mid first barren road
> > 0.000000 0 74.0 G far first barren road
> > 1.449275 1 69.0 E near second barren powerline
> > 0.000000 0 73.0 E mid second barren powerline
> > 0.000000 0 71.5 E far second barren powerline
> >
> > I tried the lme4 package using the following syntax:
> >
> [snip]
> >
> > I then tried using a poisson error structure using catch (the
> actual number
> > of animals) as the response and incorporated effort as an offset.
> Effort as
> > an offset is commonly used for analysis of CPUE data.
> >
> > Model2<-
> > glmer(catch~line+habitat+type+(1|site)+(1|cycle)+
> > offset(effort),family=poisson)
>
> This is a good way to do it, but you need to incorporate the
> LOG of effort. You may also need to account for overdispersion
> and/or zero-inflation; the former via incorporating an observation-level
> random effect (in glmer, glmmadmb, or MCMCglmm) or negative binomial
> distribution (in glmmadmb), the latter (if necessary) via zero-inflation
> or hurdle models (in glmmadmb or MCMCglmm).
>
> [snip snip snip]
>
> > Does anyone have any suggestions on how I can analyze
> zero-inflated CPUE
> > data? I have been trying to figure out how to do a Monte Carlo
> permutation
> > test for a mixed model but I am having trouble figuring out the
> syntax. Any
> > help would be greatly appreciated.
>
> What are you using to assess homogeneity of variance and normality?
> Normality of residuals can only be expected approximately (and in
> the case of
> large mean counts) in this case.
>
> I would start off this way:
>
> mydata$obs <- factor(seq(nrow(mydata)))
> glmer(catch~line+habitat+type+cycle+(1|site/cycle)+(1|obs)+
> offset(log(effort)),family=poisson, data=mydata)
>
> You can use the simulate() method to simulate data sets, count
> the proportion of zeros expected, and see if your observed
> proportion of zeros is off ...
>
> _______________________________________________
> R-sig-mixed-models at r-project.org
> <mailto:R-sig-mixed-models at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
More information about the R-sig-mixed-models
mailing list