[R-sig-ME] analysis of count data with many zero values
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Fri Oct 29 17:27:21 CEST 2010
Hi Steve,
In addition to the comments of Chris, I would like to add the a high
number of zero's does not imply a zero-inflated distribution. Have a
look at the example below.
HTH,
Thierry
> set.seed(123)
> # ordinay poisson with 91% zero
> counts <- rpois(10000, lambda = 0.1)
> mean(counts == 0)
[1] 0.9105
> table(counts)
counts
0 1 2 3
9105 855 39 1
> # ordinay poisson with 99% zero
> counts <- rpois(10000, lambda = 0.01)
> mean(counts == 0)
[1] 0.9912
> table(counts)
counts
0 1
9912 88
> # ordinay poisson without zero
> counts <- rpois(10000, lambda = 10)
> mean(counts == 0)
[1] 0
> table(counts)
counts
1 2 3 4 5 6 7 8 9 10
4 21 86 203 389 606 887 1144 1277 1243
11 12 13 14 15 16 17 18 19 20
1147 904 707 553 367 218 108 75 36 10
21 22 24 27
9 4 1 1
> # zero-inflated poisson with 50% zero's
> # 20% zero's from the inflation
> # 30% zero's from the poisson
> # 50% non-zero from the poisson
> zi <- rbinom(10000, prob = 0.2, size = 1)
> counts <- rpois(10000, lambda = 1)
> counts[zi == 1] <- 0
> mean(counts == 0)
[1] 0.4961
> table(counts)
counts
0 1 2 3 4 5 6 7
4961 2946 1463 472 129 25 3 1
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium
Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org
> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens
> Christopher Desjardins
> Verzonden: vrijdag 29 oktober 2010 16:47
> Aan: Steve Hong
> CC: r-sig-mixed-models at r-project.org
> Onderwerp: Re: [R-sig-ME] analysis of count data with many zero values
>
> Hi Steve,
>
> The MCMCglmm package has several different models that you
> could fit to zero-inflated count data. You can fit
> zero-inflated Poisson models, hurdle models, zero-alterated
> and zero-truncated models. I don't believe you can fit
> zero-inflated negative binomials with that package but I
> could be wrong.
> Also I believe that ZINB models work well when you have
> zero-inflated and non-zero overdispersed data. You could also
> roll your own using rjags or r2winbugs, etc.
>
> There are lots of publications out there examining
> zero-inflation especially using MCMC based approaches. (Do a
> quick Google Scholar search for zero-inflated multilevel
> models). In addition, Jarrod Hadfield's CourseNotes (they
> come w/ MCMCglmm) are also quite informative and provide some
> examples of how you might fit such a model. In my experience
> with count data that are highly zero-inflated (86% of all
> data were zeroes), the ZIP model worked well but converged
> very slowly and required about 60,000 MCMC iterations. If
> you'd like to see the code I can share it as well. Also I
> believe this topic has come up several times and I would
> encourage to search through the archives of R-Sig-Mixed-Models.
>
> HTH,
> Chris
>
>
>
>
> On Fri, Oct 29, 2010 at 9:32 AM, Steve Hong
> <emptican at gmail.com> wrote:
>
> > Dear list,
> >
> > This is the first time I have this type of data. I have count data
> > collected repeatedly from the same plot with multiple years
> (14 yrs)
> > and have found that proportion of 'zero' values are very
> high (average
> > of proportion is about 92 %, min: 53 %, max: 100 %). Only one year
> > has 53% of zeros in the data and the rest of years have at least
> > greater than 86% zero values in the data set.
> >
> > The objective of the study is to develop predictive models and
> > validate them, for example, using cross validation.
> >
> > Variables collected are: year, insect count, longitude,
> latitude, soil
> > properties (x1...x4).
> >
> > Since data have too many zero observations, I am thinking
> about using
> > zero inflated model to fit the data. However, I am very
> new to this method.
> >
> > My questions are:
> > 1. Is it possible to use zero inflated model to fit data with about
> > 90% zeros? I am wondering if zero proportion is too high
> to make any
> > inference using statistical methods.
> > 2. If I can use zero inflated models, can I use either Poisson
> > distribution or negative binomial distribution? Or both?
> > 3. Do you have any good reference (paper and/or website)
> for good and
> > 'easy'
> > tutorial for me to study?
> >
> > I am wondering if I provided enough information or submitted it to
> > correct mailing list. Please let me know if you have any
> comments and suggestions.
> > I would greatly appreciate that.
> >
> > Thank you very much in advance!!!
> >
> > Steve
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
>
>
> --
> Christopher David Desjardins
> Ph.D. student, Quantitative Methods in Education M.S.
> student, Statistics University of Minnesota
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list