[R-sig-eco] ZINB or density data models with lots of zeros

tavery trevor.avery at acadiau.ca
Thu Mar 11 22:54:00 CET 2010


Thanks for the replies (including the last one from Scott as I was 
writing this reply). I'll summarize better what I did and what happens 
when I try weights and offsets (both implemented in the zeroinfl() 
function of package pscl). [Scott, no, I do not have to use the 
densities, but I would like to account for the surface area in the 
model. As well, yes, I would like to compare among body parts so I will 
look into the random effects - later.]

I am using AIC for model selection (although most of the NB models are 
quite similar. In that case, I select the simplest model). I test each 
model with lrtest() as well to simplify the models. The result works 
great with using counts directly and I am happy with the results.

The best model for this species is below and the estimates line up 
nicely with the data:
zinb3<-zeroinfl(caligus_elongatus~location_on_body | 1, dist="negbin", 
link="logit", data=sturg)

As suggested, weighting by surface area sounds like a great way to solve 
the issue. When I add weights = surface_area as:
zinb3<-zeroinfl(caligus_elongatus~location_on_body | 1, 
weights=surface_area, dist="negbin", link="logit", data=sturg)

I get the following errors:
Error in solve.default(as.matrix(fit$hessian)) :
  Lapack routine dgesv: system is exactly singular
In addition: Warning messages:
1: In eval(expr, envir, enclos) :
  non-integer #successes in a binomial glm!
2: In glm.fit(Z, as.integer(Y0), weights = weights, family = 
binomial(link = linkstr)) :
  fitted probabilities numerically 0 or 1 occurred

There are two factor levels of location_on_body that have no counts so I 
removed those, but the same error occurs (the errors above are actually 
without those two factor levels - and I made sure to refactor the factor 
to remove empty levels).

When I add in an offset, as suggested by Scott, the model runs. It does 
not run if I log(caligus_elongatus) counts - possibly I am interpreting 
the log( E( count)) = linear predictor + log( size) model setup as 
suggested by Scott?

zinb3<-zeroinfl(caligus_elongatus~location_on_body | 1, 
offset=log(surface_area), dist="negbin", link="logit", data=sturg)

Finally, I tried surface area as a covariate as it seems reasonable to 
assume that the effect of surface area is related to counts and this 
would be one way to mitigate the relationship. Please correct me if this 
seems out to lunch :-)

The model then is below and works fine producing a nice coefficient 
output with a significant surface area factor. The rest of the 
coefficients (and multiple comparisons I can build from that by changing 
the baseline) are quite similar to the model without a covariate i.e. 
significant differences remain, but p-values change.

zinb3<-zeroinfl(caligus_elongatus~surface_area+location_on_body | 1, 
dist="negbin", link="logit", data=sturg)

All well and good, but now the offset model produces no significant 
differences, whereas the covariate model (either as surface_area or 
log(surface_area)) produces similar significant differences as the model 
without either offset or covariate. I suspect this may be due to a 
incomplete model as I have not accounted for nesting body parts in 
individuals, but in case it is not, anyone have ideas why the outcomes 
are quite different? Or suggestions on which model to give the thumbs up 
to (AICs are so close as to not be of much use: 399-408 where other 
models were 500+)?

many thanks!
trevor



More information about the R-sig-ecology mailing list