[R-sig-ME] fitting a distribution to zero-inflated catch per unit effort mixed model

Sat Feb 25 22:41:41 CET 2012

  [cc'ing back to r-sig-mixed-models]

On 12-02-25 03:53 PM, Karla Letto wrote:
> Thank you Ben for the detailed response. I greatly appreciate it. I will
> let you know how it goes after I give each a try. In answer to your
> questions. My sample size is 25 Meadow Voles. 

  Meaning you caught a total of 25 voles in the whole study?  Be warned,
you may find that your model is overfitted -- typically you can fit
about/at most 1 parameter per 10 (effective) data points, which is
something between 25 (the number of voles) and 48 (observations) -- so
your fixed-effect parameters (line+habitat+type = 6 parameters) are
already on the verge of more information than you can estimate, even
before you start counting random effects (3 variances, for
site/cycle/observation-level variation).

> I calculated CPUE for each
> line. So I have a total of 48 observations (3 lines for 8 sites and each
> site was visited twice. I thought I had to use cycle as a random factor
> to account for temporal pseudoreplication.  I can see how your idea of
> nesting cycle within site as a random factor will work better though. 
> 
> I tested for normality using a normal q-q plot and homogeneity by
> plotting the residuals and fitted values (I kept getting a cone shape).

  Hmm.  A cone shape does imply heteroscedasticity that isn't handled by
the model assumptions ... I *think* resid() should give you Pearson
residuals (i.e. already corrected assuming variance=mean). So that's a
little puzzling.  I don't expect normality at all in residuals from a
model with a mean of ~ 2 individuals per sample ...

> 
>> qqnorm(resid(model))
> 
>> qqline(resid(model))
> 
> 
>> plot(fitted(model),resid(model))
> 
> Can you please tell me why I have to use LOG(effort) as an offset
> instead of just effort as the offset? That is not the first time I heard
> that it had to be LOG(effort) but I cannot find a reason for why. 

  Because the offset is added on the scale of the linear predictor,
which in this case is the log scale -- i.e., the expected mean number of
counts is

  exp([fixed effect terms] + [random effect terms] + offset) =
exp([fixed effect terms] + [random effect terms])* exp(offset)

> 
> Thank you, 
> 
> Karla 
> 
> On Thu, Feb 23, 2012 at 7:40 PM, Ben Bolker <bbolker at gmail.com
> <mailto:bbolker at gmail.com>> wrote:
> 
>     Karla Letto <karla.letto at ...> writes:
> 
>     > I am having trouble fitting a distribution to my mixed model for
>     meadow
>     > vole catch per unit effort (CPUE) data. I have tried several
>     families and
>     > cannot find one that does not violate both the homogeneity and
>     normality
>     > assumptions. NOTE: The data set is zero-inflated (no captures).
>     > Here is my study design:
>     >
>     > I am trying to determine if the CPUE of meadow voles differ among
>     lines
>     > (line = near,mid, or far) at increasing distances from a linear
>     feature
>     > (type = road, trail or powerline corridor) in two different
>     habitat types
>     > (habitat=forest or barren).
>     > Response variable: catch per unit effort (non-integer values)
>     > Fixed explanatory variables: line (3 categories), habitat (2
>     categories),
>     > type (3 categories)
>     > Random explanatory variable: site (8 categories), cycle (2 categories)
> 
> 
>     > The random variable site is for the 8 different sites I sampled in (4
>     > barren and 4 forest) and the cycle is there because I visited each
>     site
>     > twice.
> 
>      Practically speaking you probably can't use cycle as a random
>     effect; you can include cycle as a fixed effect (specifying the
>     difference between first & second visits), and possibly
>     nesting it within site as a random effect (if you have more than
>     one observation per site/sample combination).
> 
>      What is the total size (number of observations) in your data set?
> 
>     > Here is an excerpt of my data set:
>     >    CE2  catch effort site line   cycle habitat      type
>     > 0.000000     0   57.5    A near  first   forest     trail
>     > 3.278689     2   61.0    A   mid   first   forest     trail
>     > 0.000000     0   60.5    A   far   first   forest     trail
>     > 0.000000     0   66.5    G near  first   barren       road
>     > 0.000000     0   74.5    G   mid   first   barren       road
>     > 0.000000     0   74.0    G   far   first   barren       road
>     > 1.449275     1   69.0    E near second  barren powerline
>     > 0.000000     0   73.0    E   mid second  barren powerline
>     > 0.000000     0   71.5    E   far second  barren powerline
>     >
>     > I tried the lme4 package using the following syntax:
>     >
>      [snip]
>     >
>     > I then tried using a poisson error structure using catch (the
>     actual number
>     > of animals) as the response and incorporated effort as an offset.
>     Effort as
>     > an offset is commonly used for analysis of CPUE data.
>     >
>     > Model2<-
>     > glmer(catch~line+habitat+type+(1|site)+(1|cycle)+
>     >  offset(effort),family=poisson)
> 
>      This is a good way to do it, but you need to incorporate the
>     LOG of effort.  You may also need to account for overdispersion
>     and/or zero-inflation; the former via incorporating an observation-level
>     random effect (in glmer, glmmadmb, or MCMCglmm) or negative binomial
>     distribution (in glmmadmb), the latter (if necessary) via zero-inflation
>     or hurdle models (in glmmadmb or MCMCglmm).
> 
>      [snip snip snip]
> 
>     > Does anyone have any suggestions on how I can analyze
>     zero-inflated CPUE
>     > data? I have been trying to figure out how to do a Monte Carlo
>     permutation
>     > test for a mixed model but I am having trouble figuring out the
>     syntax. Any
>     > help would be greatly appreciated.
> 
>      What are you using to assess homogeneity of variance and normality?
>     Normality of residuals can only be expected approximately (and in
>     the case of
>     large mean counts) in this case.
> 
>       I would start off this way:
> 
>     mydata$obs <- factor(seq(nrow(mydata)))
>     glmer(catch~line+habitat+type+cycle+(1|site/cycle)+(1|obs)+
>      offset(log(effort)),family=poisson, data=mydata)
> 
>     You can use the simulate() method to simulate data sets, count
>     the proportion of zeros expected, and see if your observed
>     proportion of zeros is off ...
> 
>     _______________________________________________
>     R-sig-mixed-models at r-project.org
>     <mailto:R-sig-mixed-models at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
>