[R-sig-ME] fitting a distribution to zero-inflated catch per unit effort mixed model

Fri Feb 24 00:10:21 CET 2012

Karla Letto <karla.letto at ...> writes:

> I am having trouble fitting a distribution to my mixed model for meadow
> vole catch per unit effort (CPUE) data. I have tried several families and
> cannot find one that does not violate both the homogeneity and normality
> assumptions. NOTE: The data set is zero-inflated (no captures).
> Here is my study design:
> 
> I am trying to determine if the CPUE of meadow voles differ among lines
> (line = near,mid, or far) at increasing distances from a linear feature
> (type = road, trail or powerline corridor) in two different habitat types
> (habitat=forest or barren).
> Response variable: catch per unit effort (non-integer values)
> Fixed explanatory variables: line (3 categories), habitat (2 categories),
> type (3 categories)
> Random explanatory variable: site (8 categories), cycle (2 categories)

> The random variable site is for the 8 different sites I sampled in (4
> barren and 4 forest) and the cycle is there because I visited each site
> twice.

  Practically speaking you probably can't use cycle as a random
effect; you can include cycle as a fixed effect (specifying the
difference between first & second visits), and possibly
nesting it within site as a random effect (if you have more than
one observation per site/sample combination).

  What is the total size (number of observations) in your data set?

> Here is an excerpt of my data set:
>    CE2  catch effort site line   cycle habitat      type
> 0.000000     0   57.5    A near  first   forest     trail
> 3.278689     2   61.0    A   mid   first   forest     trail
> 0.000000     0   60.5    A   far   first   forest     trail
> 0.000000     0   66.5    G near  first   barren       road
> 0.000000     0   74.5    G   mid   first   barren       road
> 0.000000     0   74.0    G   far   first   barren       road
> 1.449275     1   69.0    E near second  barren powerline
> 0.000000     0   73.0    E   mid second  barren powerline
> 0.000000     0   71.5    E   far second  barren powerline
> 
> I tried the lme4 package using the following syntax:
> 
 [snip]
> 
> I then tried using a poisson error structure using catch (the actual number
> of animals) as the response and incorporated effort as an offset. Effort as
> an offset is commonly used for analysis of CPUE data.
> 
> Model2<-
> glmer(catch~line+habitat+type+(1|site)+(1|cycle)+
>  offset(effort),family=poisson)

  This is a good way to do it, but you need to incorporate the
LOG of effort.  You may also need to account for overdispersion
and/or zero-inflation; the former via incorporating an observation-level
random effect (in glmer, glmmadmb, or MCMCglmm) or negative binomial
distribution (in glmmadmb), the latter (if necessary) via zero-inflation
or hurdle models (in glmmadmb or MCMCglmm).

 [snip snip snip]

> Does anyone have any suggestions on how I can analyze zero-inflated CPUE
> data? I have been trying to figure out how to do a Monte Carlo permutation
> test for a mixed model but I am having trouble figuring out the syntax. Any
> help would be greatly appreciated.

  What are you using to assess homogeneity of variance and normality?
Normality of residuals can only be expected approximately (and in the case of
large mean counts) in this case.

   I would start off this way:

mydata$obs <- factor(seq(nrow(mydata)))
glmer(catch~line+habitat+type+cycle+(1|site/cycle)+(1|obs)+
  offset(log(effort)),family=poisson, data=mydata)

You can use the simulate() method to simulate data sets, count
the proportion of zeros expected, and see if your observed 
proportion of zeros is off ...