[R] Lambert (1992) simulation

Sun May 6 16:31:51 CEST 2012

On Sat, 5 May 2012, Christopher Desjardins wrote:

> Hi,
> I am a little confused at the output from predict() for a zeroinfl object.
> 
> Here's my confusion:
> 
> ## From zeroinfl package
> fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin")
> 
> 
> ## The raw zero-inflated overdispersed data
> > table(bioChemists$art)
> 
>   0   1   2   3   4   5   6   7   8   9  10  11  12  16  19
> 275 246 178  84  67  27  17  12   1   2   1   1   2   1   1
> 
> ## The default output from predict. It looks like it is doing a horrible
> job. Does it really predict 7 zeros?

No, see also this R-help post on "Zero-inflated regression models: 
predicting no 0s":
https://stat.ethz.ch/pipermail/r-help/2011-June/279765.html

The predicted _mean_ of a negative binomial distribution is not the most 
likely outcome (i.e., the _mode_) of the distribution. The post above 
presents some hands on examples.

> > table(round(predict(fm_zinb2)) )
> 
>   0   1   2   3   4   5   6  10
>   7 354 487  45  12   6   3   1
> 
> ##  The output from predict using "count"
> > table(round(predict(fm_zinb2,type="count")))
> 
>   1   2   3   4   5   6  10
> 312 536  45  12   6   3   1
> 
> ## The output from predict using "zero", but here it predicts 24
> "structural" zeros?
> > table(round(predict(fm_zinb2,type="zero")))
> 
>   0   1
> 891  24
> 
> 
> So my question is how do I interpret these different outputs from the
> zeroinf object? What are the differences? The help page just left me
> confused. I would expect that table(round(predict(fm_zinb2))) would be E(Y)
> and would most accurately track table(bioChemists$art) but I am wrong. How
> can I find the E(Y) that would most closely track the raw data?
> 
> Thanks,
> Chris
> 
>