[R] glm and percentage data with many zero values
Christian Kamenik
christian.kamenik at ips.unibe.ch
Thu Jan 20 17:02:35 CET 2005
Dear all,
I am interested in correctly testing effects of continuous environmental
variables and ordered factors on bacterial abundance. Bacterial
abundance is derived from counts and expressed as percentage. My problem
is that the abundance data contain many zero values:
Bacteria <-
c(2.23,0,0.03,0.71,2.34,0,0.2,0.2,0.02,2.07,0.85,0.12,0,0.59,0.02,2.3,0.29,0.39,1.32,0.07,0.52,1.2,0,0.85,1.09,0,0.5,1.4,0.08,0.11,0.05,0.17,0.31,0,0.12,0,0.99,1.11,1.78,0,0,0,2.33,0.07,0.66,1.03,0.15,0.15,0.59,0,0.03,0.16,2.86,0.2,1.66,0.12,0.09,0.01,0,0.82,0.31,0.2,0.48,0.15)
First I tried transforming the data (e.g., logit) but because of the
zeros I was not satisfied. Next I converted the percentages into integer
values by round(Bacteria*10) or ceiling(Bacteria*10) and calculated a
glm with a Poisson error structure; however, I am not very happy with
this approach because it changes the original percentage data
substantially (e.g., 0.03 becomes either 0 or 1). The same is true for
converting the percentages into factors and calculating a multinomial or
proportional-odds model (anyway, I do not know if this would be a
meaningful approach).
I was searching the web and the best answer I could get was
http://www.biostat.wustl.edu/archives/html/s-news/1998-12/msg00010.html
in which several persons suggested quasi-likelihood. Would it be
reasonable to use a glm with quasipoisson? If yes, how I can I find the
appropriate variance function? Any other suggestions?
Many thanks in advance, Christian
================================
Christian Kamenik
Institute of Plant Sciences
University of Bern
Altenbergrain 21
3013 Bern
Switzerland
More information about the R-help
mailing list