[R] Zinb for Non-interger data

(Ted Harding) Ted.Harding at manchester.ac.uk
Sun Jul 19 13:25:43 CEST 2009


On 18-Jul-09 17:26:36, JPS2009 wrote:
> Sorry bit of a Newbie question, and I promise I have searched the
> forum already, but I'm getting a bit desperate!
> 
> I have over-dispersed, zero inflated data, with variance greater
> than the mean, suggesting Zero-Inflated Negative Binomial - which
> I attempted in R with the pscl package suggested on
> http://www.ats.ucla.edu/stat/R/dae/zinbreg.htm
> 
> However my data is non-integer with some pesky decimals (i.e. 33.12)
> and zinb / pscl doesn't like that - not surprising as zinb is for
> count data, normally whole integers etc.
> 
> Does anyone know of a different zinb package that will allow
> non-integers or and equivalent test/ model to zinb for non-integer
> data? Or should I try something else like a quasi-Poisson GLM?
> 
> Apologies for the Newbie question! Any help much appreciated!
> Thanks!

The presence of decimals suggests that those data values are records
of quantities which ought to be modelled as continuous variables.
For instance, in answer to a survey question "How much did you spend
on alcoholic drinks yesterday", the answer would be either a positive
sum of money (with decimals), or zero, depending on whether the
person spent anything at all on alcohol.

So:
With probability p, the amount spent was positive and, conditional
on being positive, has a distribution which can be modelled by a
particular continuous distribution (maybe Log-normal?).

With probability (1-p), the amount spent was zero.

So a correct approach first requires you to face the question of
how to model the positive part of the distribution.

Once you have settled that question, it is then possible to see
whether that particular class of problem is covered by some package
in R, or whether you need to develop an approach yourself.

In any case, if I am barking up the right tree above, neither negative
binomial nor Poisson would, in principle, be correct for such data
since, as you observe, these are intended for count data, not for
data which is essentially continuous.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 19-Jul-09                                       Time: 12:25:39
------------------------------ XFMail ------------------------------




More information about the R-help mailing list