[R-sig-ME] analysis of count data with many zero values

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Fri Oct 29 17:27:21 CEST 2010


Hi Steve,

In addition to the comments of Chris, I would like to add the a high
number of zero's does not imply a zero-inflated distribution. Have a
look at the example below.

HTH,

Thierry


> set.seed(123)
> # ordinay poisson with 91% zero
> counts <- rpois(10000, lambda = 0.1)
> mean(counts == 0)
[1] 0.9105
> table(counts)
counts
   0    1    2    3 
9105  855   39    1 
> # ordinay poisson with 99% zero
> counts <- rpois(10000, lambda = 0.01)
> mean(counts == 0)
[1] 0.9912
> table(counts)
counts
   0    1 
9912   88 
> # ordinay poisson without zero
> counts <- rpois(10000, lambda = 10)
> mean(counts == 0)
[1] 0
> table(counts)
counts
   1    2    3    4    5    6    7    8    9   10 
   4   21   86  203  389  606  887 1144 1277 1243 
  11   12   13   14   15   16   17   18   19   20 
1147  904  707  553  367  218  108   75   36   10 
  21   22   24   27 
   9    4    1    1 
> # zero-inflated poisson with 50% zero's
> # 20% zero's from the inflation
> # 30% zero's from the poisson
> # 50% non-zero from the poisson
> zi <- rbinom(10000, prob = 0.2, size = 1)
> counts <- rpois(10000, lambda = 1)
> counts[zi == 1] <- 0
> mean(counts == 0)
[1] 0.4961
> table(counts)
counts
   0    1    2    3    4    5    6    7 
4961 2946 1463  472  129   25    3    1 

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org 
> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens 
> Christopher Desjardins
> Verzonden: vrijdag 29 oktober 2010 16:47
> Aan: Steve Hong
> CC: r-sig-mixed-models at r-project.org
> Onderwerp: Re: [R-sig-ME] analysis of count data with many zero values
> 
> Hi Steve,
> 
> The MCMCglmm package has several different models that you 
> could fit to zero-inflated count data. You can fit 
> zero-inflated Poisson models, hurdle models, zero-alterated 
> and zero-truncated models. I don't believe you can fit 
> zero-inflated negative binomials with that package but I 
> could be wrong.
> Also I believe that ZINB models work well when you have 
> zero-inflated and non-zero overdispersed data. You could also 
> roll your own using rjags or r2winbugs, etc.
> 
> There are lots of publications out there examining 
> zero-inflation especially using MCMC based approaches. (Do a 
> quick Google Scholar search for zero-inflated multilevel 
> models). In addition, Jarrod Hadfield's CourseNotes (they 
> come w/ MCMCglmm) are also quite informative and provide some 
> examples of how you might fit such a model. In my experience 
> with count data that are highly zero-inflated (86% of all 
> data were zeroes), the ZIP model worked well but converged 
> very slowly and required about 60,000 MCMC iterations. If 
> you'd like to see the code I can share it as well. Also I 
> believe this topic has come up several times and I would 
> encourage to search through the archives of R-Sig-Mixed-Models.
> 
> HTH,
> Chris
> 
> 
> 
> 
> On Fri, Oct 29, 2010 at 9:32 AM, Steve Hong 
> <emptican at gmail.com> wrote:
> 
> > Dear list,
> >
> > This is the first time I have this type of data.  I have count data 
> > collected repeatedly from the same plot with multiple years 
> (14 yrs) 
> > and have found that proportion of 'zero' values are very 
> high (average 
> > of proportion is about 92 %, min: 53 %, max: 100 %).  Only one year 
> > has 53% of zeros in the data and the rest of years have at least 
> > greater than 86% zero values in the data set.
> >
> > The objective of the study is to develop predictive models and 
> > validate them, for example, using cross validation.
> >
> > Variables collected are: year, insect count, longitude, 
> latitude, soil 
> > properties (x1...x4).
> >
> > Since data have too many zero observations, I am thinking 
> about using 
> > zero inflated model to fit the data.  However, I am very 
> new to this method.
> >
> > My questions are:
> > 1. Is it possible to use zero inflated model to fit data with about 
> > 90% zeros?  I am wondering if zero proportion is too high 
> to make any 
> > inference using statistical methods.
> > 2. If I can use zero inflated models, can I use either Poisson 
> > distribution or negative binomial distribution?  Or both?
> > 3. Do you have any good reference (paper and/or website) 
> for good and 
> > 'easy'
> > tutorial for me to study?
> >
> > I am wondering if I provided enough information or submitted it to 
> > correct mailing list.  Please let me know if you have any 
> comments and suggestions.
> > I would greatly appreciate that.
> >
> > Thank you very much in advance!!!
> >
> > Steve
> >
> >        [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> 
> 
> 
> --
> Christopher David Desjardins
> Ph.D. student, Quantitative Methods in Education M.S. 
> student, Statistics University of Minnesota
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 




More information about the R-sig-mixed-models mailing list