[R-sig-eco] proportion data with many zeros
v_coudrain at voila.fr
v_coudrain at voila.fr
Mon Feb 4 13:42:56 CET 2013
Thank you very much for clarifying this point. My algorithm is certainly pretty bad because as you say I am basically looking at zeros. One point I don't really
understand is that for a pollen type I have a lot of pollen collected at date 1, some at time 2, few at time 3 and not at all at time 4. I get a significant difference
between time 1 and 2 but no significance between 1 and 3 or 1 and 4. That is illogical...maybe is it anyway a problem of the residuals because the residuals are
pretty well balanced for time points with fitted values >0, but for time points with no pollen collected there is no variance at all. Well I think that if I had a very large
number of data such that the non-zero part of my data would look nicely continuous I could use some zero-inflated models, but with only 4 points in time and a
positive part of the model which does not fit well a continuous distribution it is difficult. I'd certainly better take a descriptive way of presenting my data for
sparse pollen types.
Best wishes
Valérie
> Message du 04/02/13 à 13h15
> De : "Liz Pryde"
> A : "v_coudrain at voila.fr"
> Copie à :
> Objet : Re: [R-sig-eco] proportion data with many zeros
>
> Hi,
> If you're using a categorical predictor those QQ plots Etc are pretty useless. Just do a residuals vs fits plots and make sure the residuals look Randomly
scattered.
>
> Is the problem with the smaller pollen types just that they're very low across all time scales? The algorithm won't fit b/c you're basically looking at zero data - or
a vector of zeroes. So you can assume that this is sig diff from the abundant types. This is to do with the way ML estimation works - it's a bit complicated.
> Some people suggest using bayes methods for this (& it works well) but its way too over-complicated for what you're trying to answer.
>
> The mean variance relationship is specified by the 'family' part if the GLM formula. It is essentially the error structure if your data.
> Liz
>
>
> On 04/02/2013, at 7:55 PM, v_coudrain at voila.fr wrote:
>
> > I tried to use tweedie and it again worked very well for the most abundant pollen types and when trying to fit the less abundant ones I got the error: "glm.fit:
> > algorithm did not converge".
> > I have the impress that it is hopeless to try fitting a model...But anyway thank you very much for making me aware of tweedie. I still should go a bit more into
the
> > theorical background. I just wonder about the residuals. For the pollen types that can be modelled, the QQ-plots don't look very nice, but the residuals are
relatively
> > well homogeneously distributed. It is difficult to judge how good the fit is, but the results make sense in regard to the raw data.
> >
> > Valérie
> > ___________________________________________________________
> > CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr http://sports.voila.fr/football/can/
>
___________________________________________________________
CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr http://sports.voila.fr/football/can/
More information about the R-sig-ecology
mailing list