[R-sig-eco] proportion data with many zeros

Liz Pryde elizabethpryde at gmail.com
Sat Feb 2 20:47:11 CET 2013


Have you plotted the raw data to have a look at the distribution?
You could try another exponential family distribution like tweedie that has a mass at zero but is otherwise similar to poisson/gamma - so you're directly modeling the zeroes. It won't work if you have a lot of high values though. 
Proportions are tricky. Have a read of the Warton paper (2012/11?) "the arcsine is asinine".

Liz



On 02/02/2013, at 6:34 PM, v_coudrain at voila.fr wrote:

> Thank you very much for this suggestion. In fact I reconsidered my question and I am not sure that zero-inflated model is what I need. If I understood it properly, 
> a zero-inflated model is best suited when we don't know if zero values are true or false absences (right?). In my case all zero values are assumed to be real 
> absence and are therefore informative. However, fitting quasipoisson on raw counts or quasibinomial on proportion gives me awful distributions of residuals and 
> meaningless results. 
> 
> Valérie
> 
> 
>> Message du 01/02/13 à 17h22
>> De : "Cade, Brian" 
>> A : v_coudrain at voila.fr
>> Copie à : r-sig-ecology at r-project.org
>> Objet : Re: [R-sig-eco] proportion data with many zeros
>> 
>> For a fully parametric approach, you might want to use of zero-inflated
>> beta distribution (e.g., as available in gamlss package), which is designed
>> for zero-inflated proportions. Or for a semi-parametric approach, you
>> could estimated a sequence of quantile regression estimates (e.g., in
>> package quantreg), where some interval (hopefully not to large) of the
>> quantiles will be uninformative because they are massed at the zero values.
>> 
>> Brian
>> 
>> Brian S. Cade, PhD
>> 
>> U. S. Geological Survey
>> Fort Collins Science Center
>> 2150 Centre Ave., Bldg. C
>> Fort Collins, CO 80526-8818
>> 
>> email: brian_cade at usgs.gov
>> tel: 970 226-9326
>> 
>> 
>> 
>> On Fri, Feb 1, 2013 at 1:30 AM,  wrote:
>> 
>>> Dear all, I am trying to test how the proportion of pollen of different
>>> plants found in the brood cells of a wild bee changes over time. I
>>> conducted 4 sampling sessions
>>> (thus time is a factor with 4 levels) and collected several pollen samples
>>> for each time point (300 pollen grains counted for each sample). I thought
>>> about applying a
>>> quasi-binomial glm:
>>> 
>>> y = cbind(total pollen - pollen of plant X, pollen of plant X)
>>> 
>>> glm(y~time, family=quasibinomial)
>>> 
>>> The problem is that I have a lot of zero value, because the pollen of some
>>> plants only occurred rarely or very clumped in time. I thought about
>>> applying a zero-inflated
>>> model, but I have never used it and I am not sure if it is suitable for
>>> proportion data. Additionally I wondered if I have to consider the fact
>>> that I don't have the same
>>> number of pollen sample for each date, which makes my design unbalanced.
>>> Thank you in advance for advice.
>>> 
>>> Best wishes
>>> Valérie
>>> ___________________________________________________________
>>> CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr
>>> http://sports.voila.fr/football/can/
>>> 
>>> _______________________________________________
>>> R-sig-ecology mailing list
>>> R-sig-ecology at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> ___________________________________________________________
> CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr http://sports.voila.fr/football/can/
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list