[R-sig-eco] proportion data with many zeros

Mon Feb 4 13:42:56 CET 2013

Thank you very much for clarifying this point. My algorithm is certainly pretty bad because as you say I am basically looking at zeros. One point I don't really 
understand is that for a pollen type I have a lot of pollen collected at date 1, some at time 2, few at time 3 and not at all at time 4. I get a significant difference 
between time 1 and 2 but no significance between 1 and 3 or 1 and 4. That is illogical...maybe is it anyway a problem of the residuals because the residuals are 
pretty well balanced for time points with fitted values >0, but for time points with no pollen collected there is no variance at all. Well I think that if I had a very large 
number of data such that the non-zero part of my data would look nicely continuous I could use some zero-inflated models, but with only 4 points in time and a 
positive part of the model which does not fit well a continuous distribution it is difficult. I'd certainly better take a descriptive way of presenting my data for 
sparse pollen types.

Best wishes
Valérie

> Message du 04/02/13 à 13h15
> De : "Liz Pryde" 
> A : "v_coudrain at voila.fr" 
> Copie à : 
> Objet : Re: [R-sig-eco] proportion data with many zeros
> 
> Hi,
> If you're using a categorical predictor those QQ plots Etc are pretty useless. Just do a residuals vs fits plots and make sure the residuals look Randomly 
scattered.
> 
> Is the problem with the smaller pollen types just that they're very low across all time scales? The algorithm won't fit b/c you're basically looking at zero data - or 
a vector of zeroes. So you can assume that this is sig diff from the abundant types. This is to do with the way ML estimation works - it's a bit complicated. 
> Some people suggest using bayes methods for this (& it works well) but its way too over-complicated for what you're trying to answer.
> 
> The mean variance relationship is specified by the 'family' part if the GLM formula. It is essentially the error structure if your data.
> Liz
> 
> 
> On 04/02/2013, at 7:55 PM, v_coudrain at voila.fr wrote:
> 
> > I tried to use tweedie and it again worked very well for the most abundant pollen types and when trying to fit the less abundant ones I got the error: "glm.fit: 
> > algorithm did not converge".
> > I have the impress that it is hopeless to try fitting a model...But anyway thank you very much for making me aware of tweedie. I still should go a bit more into 
the 
> > theorical background. I just wonder about the residuals. For the pollen types that can be modelled, the QQ-plots don't look very nice, but the residuals are 
relatively 
> > well homogeneously distributed. It is difficult to judge how good the fit is, but the results make sense in regard to the raw data.
> > 
> > Valérie
> > ___________________________________________________________
> > CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr http://sports.voila.fr/football/can/
> 

___________________________________________________________
CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr http://sports.voila.fr/football/can/