[R-sig-eco] ZINB or density data models with lots of zeros
Peter Solymos
solymos at ualberta.ca
Fri Mar 12 02:10:28 CET 2010
Scott,
thanks for pointing out the Hall paper. BTW the Negative Binomial
model itself is a kind or random intercept model, where the random
effect is not Normal on the linear predictor scale, but Gamma on the
response scale. But you are right, it can be tricky to define more
complicated random effects in R for zero-inflated data, or at least I
am not aware of any instant solutions.
Trevor: you can try transforming the surface area for the weights, but
if you are interested in prediction, the best way could be to simply
take the covariate model, that is something like an abundance-area
relationship.
Cheers,
Peter
Péter Sólymos
Alberta Biodiversity Monitoring Institute
and Boreal Avian Modelling project
Department of Biological Sciences
CW 405, Biological Sciences Bldg
University of Alberta
Edmonton, Alberta, T6G 2E9, Canada
Phone: 780.492.8534
Fax: 780.492.7635
On Thu, Mar 11, 2010 at 2:25 PM, Scott Foster <scott.foster at csiro.au> wrote:
> Hi,
>
> I, and many others, would not use the density as the outcome. If you are
> using the log-link for the NB part of the model then it is appropriate to
> use log( size) as an offset. This is because
>
> log( E( count / size)) = log( E( count) / size); as expectation is w.r.t.
> counts, not size
> = log( E( count)) - log( size)
> = linear predictor
>
> So
>
> log( E( count)) = linear predictor + log( size).
>
> This can be done in R by simply appending +offset( log( size)) in your model
> formula for the NB model. This essentially adds a term to your linear
> predictor with a set coefficient of 1. If you think that this may not be
> appropriate then you can add it as a covariate (as Peter just suggested).
>
> Just a cautionary note: If you start to look at counts within particular
> areas of a fish (i.e. test if more parasites are on fins or on body or on
> ...) then you will start to require nested models, ala mixed effects. After
> all, if a fish is infected then it is(?) more likely to have parasites in
> all areas (and vice versa for a fish that isn't infected). I've not used
> random effects in a zero inflated model but I have seen a paper that
> includes random effects in zero-inflated models (Hall 2000). I'm not aware
> of any software that will do this automatically for you. Someone else might
> know...
>
> If you really *must* use the ratio as an outcome and do not use the offset
> approach then you are left with two options: delta approaches (often called
> hurdle models), or Tweedie type models. I won't say any more now as I
> suspect that this is probably not exactly what you want to be doing (I do
> have particular opinions though).
>
> HTH,
>
> Scott
>
> Hall, D.B. Zero-Inflated Poisson and Binomial Regression with Random
> Effects: A Case Study. Biometrics 56, 1030-1039.
>
>
> Peter Solymos wrote:
>>
>> Trevor,
>>
>> You can use weights in the model to provide the surface area (or
>> sqrt(surface area) to enhance linearity) and leave the counts as they
>> are in the ZINB model. (In the zeroinfl function weights are used to
>> weight the log-likelihood and to scale the residuals.)
>>
>> Cheers,
>>
>> Peter
>>
>> Péter Sólymos
>> Alberta Biodiversity Monitoring Institute
>> and Boreal Avian Modelling project
>> Department of Biological Sciences
>> CW 405, Biological Sciences Bldg
>> University of Alberta
>> Edmonton, Alberta, T6G 2E9, Canada
>> Phone: 780.492.8534
>> Fax: 780.492.7635
>>
>>
>> On Thu, Mar 11, 2010 at 9:03 AM, tavery <trevor.avery at acadiau.ca> wrote:
>>
>>>
>>> I have completed zero-inflation negative binomial (ZINB) models on count
>>> data for the absolute counts of ectoparasites on fish where there are
>>> lots
>>> of zeros (everything worked well using Zuur et al. and a host of other
>>> sources). The fish are of different sizes with corresponding differences
>>> in
>>> surface areas of fins etc. and I would now like to compare density of
>>> parasites among each area. Densities were calculated by dividing the
>>> counts
>>> for each parasite by the surface area of the fin (etc.) and surface areas
>>> were different for each individual i.e. scaled for size of fish.
>>>
>>> The comparisons are then of non-integer values that do not play nice with
>>> Poisson or Negative Binomial models. However, the issue of having lots of
>>> zeros remains and will affect mean values if I were to use some sort of
>>> ANOVA based analysis.
>>>
>>> Does anyone have any suggestions on how to deal with the many zeros for
>>> the
>>> density data (assuming I was to use an ANOVA type analysis)? I have also
>>> thought to just include fish size as a covariate in the ZINB models, but
>>> a)
>>> have not seen an example of such, b) do not want to over complicate the
>>> analysis, and/or c) this will only scale the counts to overall fish size,
>>> not to fin etc. surface areas. Of course, fin surface areas probably
>>> scale
>>> linearly and, if so, body size might be an appropriate covariate to
>>> remove
>>> the effect from the ZINB comparisons of counts on fins i.e. essentially
>>> the
>>> same comparison (density = ZINB with fish size covariate). Does that make
>>> sense?
>>>
>>> This ZINB model works well (yes, the 'false' zeros are independent of
>>> factors hence the "| 1"). Where would I insert the fish size covariate?
>>> location_on_body = where parasites were found (7 body areas)
>>> location = site of fish capture (2 sites)
>>>
>>> zinb2<-zeroinfl(caligus_elongatus~location_on_body+location | 1,
>>> dist="negbin", link="logit", data=sturg)
>>>
>>> I would assume this:
>>>
>>> zinb2<-zeroinfl(caligus_elongatus~fish_size+location_on_body+location |
>>> 1,
>>> dist="negbin", link="logit", data=sturg)
>>>
>>> thanks in advance,
>>> trevor
>>>
>>> _______________________________________________
>>> R-sig-ecology mailing list
>>> R-sig-ecology at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>
>>>
>>>
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>
> --
> Scott Foster
> CSIRO Mathematics, Informatics and Statistics
> GPO Box 1538
> Castray Esplanade
> Hobart 7001
> Tasmania Australia
>
> Phone: (03) 6232 5178
> Fax: (03) 6232 5000
> Email: scott.foster at csiro.au
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>
More information about the R-sig-ecology
mailing list