[R-sig-eco] ZINB or density data models with lots of zeros
Scott Foster
scott.foster at csiro.au
Thu Mar 11 22:25:26 CET 2010
Hi,
I, and many others, would not use the density as the outcome. If you
are using the log-link for the NB part of the model then it is
appropriate to use log( size) as an offset. This is because
log( E( count / size)) = log( E( count) / size); as expectation is
w.r.t. counts, not size
= log( E( count)) - log( size)
= linear predictor
So
log( E( count)) = linear predictor + log( size).
This can be done in R by simply appending +offset( log( size)) in your
model formula for the NB model. This essentially adds a term to your
linear predictor with a set coefficient of 1. If you think that this
may not be appropriate then you can add it as a covariate (as Peter just
suggested).
Just a cautionary note: If you start to look at counts within
particular areas of a fish (i.e. test if more parasites are on fins or
on body or on ...) then you will start to require nested models, ala
mixed effects. After all, if a fish is infected then it is(?) more
likely to have parasites in all areas (and vice versa for a fish that
isn't infected). I've not used random effects in a zero inflated model
but I have seen a paper that includes random effects in zero-inflated
models (Hall 2000). I'm not aware of any software that will do this
automatically for you. Someone else might know...
If you really *must* use the ratio as an outcome and do not use the
offset approach then you are left with two options: delta approaches
(often called hurdle models), or Tweedie type models. I won't say any
more now as I suspect that this is probably not exactly what you want to
be doing (I do have particular opinions though).
HTH,
Scott
Hall, D.B. Zero-Inflated Poisson and Binomial Regression with Random
Effects: A Case Study. Biometrics 56, 1030-1039.
Peter Solymos wrote:
> Trevor,
>
> You can use weights in the model to provide the surface area (or
> sqrt(surface area) to enhance linearity) and leave the counts as they
> are in the ZINB model. (In the zeroinfl function weights are used to
> weight the log-likelihood and to scale the residuals.)
>
> Cheers,
>
> Peter
>
> Péter Sólymos
> Alberta Biodiversity Monitoring Institute
> and Boreal Avian Modelling project
> Department of Biological Sciences
> CW 405, Biological Sciences Bldg
> University of Alberta
> Edmonton, Alberta, T6G 2E9, Canada
> Phone: 780.492.8534
> Fax: 780.492.7635
>
>
> On Thu, Mar 11, 2010 at 9:03 AM, tavery <trevor.avery at acadiau.ca> wrote:
>
>> I have completed zero-inflation negative binomial (ZINB) models on count
>> data for the absolute counts of ectoparasites on fish where there are lots
>> of zeros (everything worked well using Zuur et al. and a host of other
>> sources). The fish are of different sizes with corresponding differences in
>> surface areas of fins etc. and I would now like to compare density of
>> parasites among each area. Densities were calculated by dividing the counts
>> for each parasite by the surface area of the fin (etc.) and surface areas
>> were different for each individual i.e. scaled for size of fish.
>>
>> The comparisons are then of non-integer values that do not play nice with
>> Poisson or Negative Binomial models. However, the issue of having lots of
>> zeros remains and will affect mean values if I were to use some sort of
>> ANOVA based analysis.
>>
>> Does anyone have any suggestions on how to deal with the many zeros for the
>> density data (assuming I was to use an ANOVA type analysis)? I have also
>> thought to just include fish size as a covariate in the ZINB models, but a)
>> have not seen an example of such, b) do not want to over complicate the
>> analysis, and/or c) this will only scale the counts to overall fish size,
>> not to fin etc. surface areas. Of course, fin surface areas probably scale
>> linearly and, if so, body size might be an appropriate covariate to remove
>> the effect from the ZINB comparisons of counts on fins i.e. essentially the
>> same comparison (density = ZINB with fish size covariate). Does that make
>> sense?
>>
>> This ZINB model works well (yes, the 'false' zeros are independent of
>> factors hence the "| 1"). Where would I insert the fish size covariate?
>> location_on_body = where parasites were found (7 body areas)
>> location = site of fish capture (2 sites)
>>
>> zinb2<-zeroinfl(caligus_elongatus~location_on_body+location | 1,
>> dist="negbin", link="logit", data=sturg)
>>
>> I would assume this:
>>
>> zinb2<-zeroinfl(caligus_elongatus~fish_size+location_on_body+location | 1,
>> dist="negbin", link="logit", data=sturg)
>>
>> thanks in advance,
>> trevor
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>>
>>
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
--
Scott Foster
CSIRO Mathematics, Informatics and Statistics
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania
Australia
Phone: (03) 6232 5178
Fax: (03) 6232 5000
Email: scott.foster at csiro.au
More information about the R-sig-ecology
mailing list