[R] clusters in zero-inflated negative binomial models

Wed May 16 23:26:55 CEST 2012

Lies Durnez <ldurnez <at> itg.be> writes:

> I want to build a model in R based on animal collection data, that look like
the following
> 
> Nr	Village	District	Site	Survey	Species	Count
> 1	AX	A	F	Dry	B	0
> 2	AY	A	V	Wet	A	5
> 3	BX	B	F	Wet	B	1
> 4	BY	B	V	Dry	B	0

> Each data point shows one collection unit in a certain Village,
> District, Site, and Survey for a certain Species. 'Count' is the
> number of animals collected in that collection unit. It is possible
> that zero animals are collected in that unit because of very low
> densities, but also because of climatic conditions (wind, rain,
> etc), so we would expect an excess in zeroes. I have tested that the
> data are overdispersed (variance much bigger than mean), so a
> zero-inflated negative binomial model seems the most suitable model
> in this case.

 [snip snip snip]

> However, the animal collections were only done in 4 districts, and
> in each district 3 villages were chosen (a total of 12
> villages). This should be included in the design. The package survey
> allows this for the standard negative binomial model, but it seems
> to me that it is not possible for the zero-inflated NB. So, my
> question is two-fold: 1. Is a zero-inflated NB possible in the
> survey package. If yes, how?  2. If no, how can I build a
> zero-inflated NB model that takes into account the clustering of the
> observations (animal counts) in villages and the clustering of the
> villages in districts.

  Treating villages and districts as random effects (clusters)
basically puts you in the domain of generalized linear mixed models.
You can use the glmmADMB package to fit zero-inflated, mixed negative
binomial models.  You can also use the MCMCglmm package to fit
lognormal-Poisson models, which are another form of overdispersed
count data (it depends how strongly you require that the actual model
be NB as opposed to just a reasonable model for overdispersed count
data).

4 districts is not very many for estimating an among-district variance 
(which is basically what you are doing when you fit a clustered/
mixed model), so I might suggest using district as a fixed effect,
but then using district:village (i.e. the interaction between district
and village, or village alone if they are uniquely labeled).

  http://glmm.wikidot.com/faq may be useful.

  I would suggest that you send follow-ups to the
r-sig-mixed-models <at> r-project.org mailing list.