[R-sig-eco] hurdle model for experiment anlysis

Ben Bolker bbolker at gmail.com
Wed Sep 1 14:55:05 CEST 2010


Renke Lühken wrote:
> The habitats are replicated, sorry for the confusion!
>
> How can I decide if I really need a zero-inflated model or is it just
> expert judgment?
> I decided that I need it, because more than 50% of the data are true
> zeros (detectability=100%) and therfore e.g. Martin et al. (2005) or
> Zuur et al., et al. (2009) recommended zero-inflated models.
> but:
> "In this study it appeared that zero inflated distributions were not
> usually needed, which implies that observations in which a taxon does
> not occur could usually be distinguished from those where the taxon
> does occur (a priori or using environmental variables), i.e. most
> zeros could be attributed to the systematic component of the model,
> rather than taking the more complicated route and incorporating them
> into the random component of the model." (Warton 2005) [p. 288]
> --> Does this mean that I do not need a zero inflation model if the
> detectability=100% and therefore "all zeros"="true zeros"?
  You can either fit a non-inflated model (negative binomial or Poisson)
and see if the fraction of zeros is predicted reasonably
well, or fit a zero-inflated model and see if it improves the fit over a
non-inflated model (e.g., via likelihood ratio test).
Detectability is not the only source of 'structural' zeros (I really
dislike the terminology of 'true zeros', although I appreciate
that it is standard in the field) -- there are also unsuitable habitats,
etc..
> If I am allowed to use a hurdle model, is something wrong about the
> hurdle-function of the pscl-package?
> I think about something like this:
> hurdle<-
> hurdle(NumberOfIndividuals~HabitatComplexity+Time+Colour+Treatment+HabitatDensity*Zeit+HabitatDensity*Treatment...,
> dist = "poisson", link = "logit")
>
> Response variables
> NumberOfIndividuals: Number of individuals in the habitat
>
> Explanatory variables:
> HabitatComplexity: Complexity of the habitat (4 levels) [continuous
> variable]
> Colour: Colour of the habitat (4 levels)
> Treatment: tap water, tap water + additive 1, tap water + additive 2
> (3 levels)
> Time: Resampling every 5 min (5-30 min) (6 levels) [continuous variable]
>

  The only problem with the hurdle() function is that it doesn't deal
with random effects. If you're willing to treat all
predictors as fixed, and if hurdle() gives you reasonable answers, it
will certainly make your life easier *not* to
delve into the topic of zero-inflated mixed models ....
> Citeted literature:
>
> Martin, T.G., Wintle, B.A., Rhodes, J.R., Kuhnert, P.M., Low Choy, S.,
> Field, S.A., Tyre, A.J., & Possingham, H.P. (2005) Zero tolerance
> ecology: improving ecological inference by modelling the source of
> zero observations. /Ecology Letters/, 8, 1235-1246
>
> Warton, D. I. 2005. Many zeros do not mean zero inflation: comparing
> the goodness-of-fit of parametric models to multivariate abundance
> data. Environmetrics 16:275-289. doi: 10.1002/env.702
> <http://dx.doi.org/10.1002/env.702>
>
> A.F. Zuur et al.(2009). Mixed effects models and extensions in ecology
> with R. Chapter 11 Zero-Truncated and Zero-Inflated Models for Count Data.
>
> Am 31/08/2010 11:21, schrieb Ben Bolker:
>>   A few comments:
>>
>>   are the habitats replicated? If not, you have a fairly serious
>> experimental design problem -- you can't statistically distinguish
>> between the measured covariates and other, unmeasured/unintentional
>> differences among the habitats ...
>>
>>   * are you willing to treat complexity as a continuous variable, or do
>> you not want to assume that the 'distance' between neighboring
>> values of complexity is the same (an ordinal variable)?
>>
>>  * for colour and water, you should probably simply leave these as
>> single categorical variables and let R sort out the construction of
>> dummy variables for you.
>>
>>  * you should think about whether you want to look for regular/monotonic
>> trends with time (i.e. code time as a continuous
>> variable) or allow for any possible temporal variable (code time a
>> fixed, or possibly random-effect, categorical variable)
>>   if habitats are indeed replicated you probably want something like
>>
>> MCMCglmm(fixed=number_in_habitat~complexity+colour+water+time,random=~habitat,
>>    family=zipoisson,data=...)
>>
>>   you have a few options for the family depending on how you want to
>> treat the zeros (hurdle vs. zero-inflated)
>>
>>   are you sure the data are zero-inflated  (Warton, D. I. 2005. Many
>> zeros do not mean zero inflation: comparing the goodness-of-fit of
>> parametric models to multivariate abundance data. Environmetrics
>> 16:275-289. doi: 10.1002/env.702 <http://dx.doi.org/10.1002/env.702>.  ) ?
>>
>>
>>
>> Renke Lühken wrote:
>>     
>>>  Hi all,
>>> I want to analyse an experiment at which insects were allowed to
>>> choose between four habitats with different characteristics (see
>>> below). Number of individuals per habitat were resampled six times
>>> (every 5 min). I want to know which variables and which interactions
>>> of the variables have an influence on the number of individuals in the
>>> habitats.
>>>
>>> Problem: The response variable shows a zero inflation (>50% of the
>>> data) by true zeros (100% detectability).
>>>
>>> Question: Can I use a hurdle model (ZAP) for that or do you recommend
>>> another method for the data analysis?
>>>
>>> Thanks in advance,
>>> Renke Lühken
>>>
>>>
>>> Response variables
>>> Number of individuals in the habitat
>>>
>>> Explanatory variables:
>>> Complexity of the habitat (4 levels)
>>> Habitat has the Colour 1 (yes/no)
>>> Habitat has the Colour 2 (yes/no)
>>> Habitat has the Colour 3 (yes/no)
>>> Habitat has the Colour 4 (yes/no)
>>> Water additive 1 (yes/no)
>>> Water additive 2 (yes/no)
>>> Time (6 levels: 5-30 min [resampling every 5 min])
>>>
>>> _______________________________________________
>>> R-sig-ecology mailing list
>>> R-sig-ecology at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>       
>>     
>



More information about the R-sig-ecology mailing list