[R-sig-eco] hurdle model

Ben Bolker bbolker at gmail.com
Thu Aug 19 17:26:51 CEST 2010


Yingjie Zhang wrote:
> Dear all,
>
> Thanks for all your perspectives, I agree with Dr. Gavin Simpson's opinion that the author cooked the model by themselves, and there is no Hurdle function in  package 'stats'.
>
> I got a data set of the abundance of microbial community, I think some of you will know how it looks like, it's from 1000 different marine microbial species and selected from 45 locations, for each species, it comes up with many 0s, besides, the above zero-part is quite spars...and with some extreme values like 2000, and all the rest are abounded around some certain small positive value, for instance, 10.  I don't think the extreme values are outliers, because it says somethings, and for further considerations, data-transformation is not my choice either. Any suggestion from the models to fit this kind of data set? or what's the classical model for analyzing microbial abundance?
>
> Thanks and Regards,
> Yingjie
>
>   
  I don't know what the "classical" model is.  I would be tempted to fit
this as a three-category ordinal model (i.e. estimating the probability
of (0, 1-20, >20) [putting the breakpoint wherever it seems to make
sense].  It doesn't make much sense to me to try to fit numbers in the
range 1-20 and numbers in the range of thousands together into the same
distribution -- presumably (??) three different processes are occurring.
(How informative do you think the range of variation in the middle
(1-20) category is?  In the extreme (100-) category?) Having 1000
species adds another complexity --it is less than optimal to fit 1000
separate responses. Random effects model, possibly with phylogenetic or
functional groupings included?  Unfortunately, putting these pieces
together -- ordinal models, random effects, large data sets -- makes the
problem challenging.  I would check out the ordinal and MCMCglmm packages.



More information about the R-sig-ecology mailing list