[R] Underdispersion and count data

Fri Nov 8 07:55:41 CET 2013

On Thu, 7 Nov 2013, sv.june at yahoo.ca wrote:

> Hello,
>
> I have count data for 4 groups, 2 of which have a large number of zeroes 
> and are overdispersed, and the other 2 underdispersed with no zeroes.

Are you sure that it's really underdispersion in addition to the lack of 
zeros? It could also be that due to the missing zeros, there is less 
dispersion.

> I have two questions about model fitting, which I am quite new to, and 
> have been using mostly the pscl package.
>
> 1 - How do I deal with underdispersion? Almost all the published and 
> online advice is regarding overdispersion, and neither the Poisson nor 
> negative binomial distribution seem appropriate. The COM Poisson comes 
> up sometimes as a suggestion, but it's not clear to me how I can use 
> this, explain my choice of it, or what information I would report for 
> publication purposes.

There are (at least) two packages on CRAN: compoisson and ComPoissonReg 
which support this.

However, I would check first whether this is really needed or maybe a 
zero-truncated Poisson model is already sufficient.

The package "countreg" on R-Forge 
(https://R-Forge.R-project.org/R/?group_id=522) has a function zerotrunc() 
which is essentially the same code that hurdle() in "pscl" uses. So it 
should be easy to use for you.

> 2 - For the overdispersed data with lots of zeroes, I've tried 
> zero-inflated Poisson and NegBin and hurdle models, and used the Vuong 
> test to compare. However, I get equal fit for two candidate models that 
> produce quite different coefficient estimates for my predictor 
> variables, and hence different p values. I am unsure how to proceed in 
> choosing one of these models, and how I would justify one over the other 
> given that the Vuong test seems not to discriminate.

Is it just zero-inflated vs. hurdle or also differences in the regressors? 
If the former: zero-inflated and hurdle models are parametrized 
differently but often lead to similar fits. But the former has a count 
part plus a zero-inlation part whereas the latter as a zero-truncated 
count part and a zero hurdle.

If the regressors are different, then it's probably a subject-matter 
decision.

hth,
Z

> Thank you and any advice would be much appreciated.
>
> Mo
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>