[R-sig-eco] Continuous distribution for zero or positive values (inGLM or GLMM)

Rubén Roa rroa at azti.es
Thu May 19 08:37:40 CEST 2011


This is a well known modeling issue and several approaches are available.

We have tried the Delta approach with good results. This is essentially a binomial glm for the presence-absence representation of the data, and conditional on the Bernoulli variable being 1, a regular continuous distribution such as Gamma glm for the positive values. Option 3) in Ben Bolker's list.

See

Aitchison, J. 1955. On the distribution of a positive random variable having a discrete
probability mass at the origin. Journal of American Statistical Association 50, 901-908

Pennington, M. 1983. Efficient estimators of abundance, for fish and plankton surveys.
Biometrics 39, 281-286

Another option is to transform your continuous response into counts. For example, if you have 0.745 kg of grass from one plot, how many stahdard 100 ml containers can you fill in a standardized manner with that? That's a count. Now if you turn your grass biomass into counts and if you are lucky (not excessive number of zeroes) then maybe a Poisson glm will be good. And the Poisson does not bug you with nuisance parameters ...

We tried several things like that, and also the Tweedie distribution (number 4) in Ben Bolker's list) in this paper:

Tascheri, R., Saavedra-Nievas, J.C., Roa-Ureta, R. 2010. Statistical models to standardize 
catch rates in the multi-species trawl fishery for Patagonian grenadier (Macruronus magellanicus) 
off Southern Chile. Fisheries Research 105: 200–214

Adding a constant to the zeroes is just not right (see p. 324 of the below quoted article for an authoritative sentence on this matter):

Venables, W.N., Dichmont, C.M. 20004. GLMs, GAMs and GLMMs: an overview of theory for 
applications in fisheries research. Fisheries Research 70: 319–337.

HTH

Rubén

____________________________________________________________________________________ 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

 

> -----Mensaje original-----
> De: r-sig-ecology-bounces at r-project.org 
> [mailto:r-sig-ecology-bounces at r-project.org] En nombre de 
> Fred Takahashi
> Enviado el: miércoles, 18 de mayo de 2011 17:47
> Para: r-sig-ecology at r-project.org
> Asunto: [R-sig-eco] Continuous distribution for zero or 
> positive values (inGLM or GLMM)
> 
> Hello, just a basic question: what distribution I should use 
> to analyze continuous data which can had zero or positive values (eg.
> mass of grasses in plots assuming that zero mass is 
> meaningful)? My expected analysis is in the framework of GLM 
> or GLMM (or alternatively, GAM / GAMM).
> 
> One idea I had is to add 0.0001 to all values and use family=Gamma.
> That is a good approach?
> 
> If a better choice is to use a different distribution, 
> suggestions of packages to do that are welcome.
> 
> Thanks,
> Fred Takahashi
> Universidade de Brasília - Brasil
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 


More information about the R-sig-ecology mailing list