[R-sig-eco] ZINB or density data models with lots of zeros

Scott Foster scott.foster at csiro.au
Thu Mar 11 22:25:26 CET 2010


Hi,

I, and many others, would not use the density as the outcome.  If you 
are using the log-link for the NB part of the model then it is 
appropriate to use log( size) as an offset.  This is because

log( E( count / size)) = log( E( count) / size); as expectation is 
w.r.t. counts, not size
                                 = log( E( count)) - log( size)
                                 = linear predictor

So

log( E( count)) = linear predictor + log( size).

This can be done in R by simply appending +offset( log( size)) in your 
model formula for the NB model.  This essentially adds a term to your 
linear predictor with a set coefficient of 1.  If you think that this 
may not be appropriate then you can add it as a covariate (as Peter just 
suggested).

Just a cautionary note:  If you start to look at counts within 
particular areas of a fish (i.e. test if more parasites are on fins or 
on body or on ...) then you will start to require nested models, ala 
mixed effects.  After all, if a fish is infected then it is(?) more 
likely to have parasites in all areas (and vice versa for a fish that 
isn't infected).  I've not used random effects in a zero inflated model 
but I have seen a paper that includes random effects in zero-inflated 
models (Hall 2000).  I'm not aware of any software that will do this 
automatically for you.  Someone else might know...

If you really *must* use the ratio as an outcome and do not use the 
offset approach then you are left with two options: delta approaches 
(often called hurdle models), or Tweedie type models.  I won't say any 
more now as I suspect that this is probably not exactly what you want to 
be doing (I do have particular opinions though).

HTH,

Scott

Hall, D.B. Zero-Inflated Poisson and Binomial Regression with Random 
Effects: A Case Study.  Biometrics 56, 1030-1039.


Peter Solymos wrote:
> Trevor,
>
> You can use weights in the model to provide the surface area (or
> sqrt(surface area) to enhance linearity) and leave the counts as they
> are in the ZINB model. (In the zeroinfl function weights are used to
> weight the log-likelihood and to scale the residuals.)
>
> Cheers,
>
> Peter
>
> Péter Sólymos
> Alberta Biodiversity Monitoring Institute
> and Boreal Avian Modelling project
> Department of Biological Sciences
> CW 405, Biological Sciences Bldg
> University of Alberta
> Edmonton, Alberta, T6G 2E9, Canada
> Phone: 780.492.8534
> Fax: 780.492.7635
>
>
> On Thu, Mar 11, 2010 at 9:03 AM, tavery <trevor.avery at acadiau.ca> wrote:
>   
>> I have completed zero-inflation negative binomial (ZINB) models on count
>> data for the absolute counts of ectoparasites on fish where there are lots
>> of zeros (everything worked well using Zuur et al. and a host of other
>> sources). The fish are of different sizes with corresponding differences in
>> surface areas of fins etc. and I would now like to compare density of
>> parasites among each area. Densities were calculated by dividing the counts
>> for each parasite by the surface area of the fin (etc.) and surface areas
>> were different for each individual i.e. scaled for size of fish.
>>
>> The comparisons are then of non-integer values that do not play nice with
>> Poisson or Negative Binomial models. However, the issue of having lots of
>> zeros remains and will affect mean values if I were to use some sort of
>> ANOVA based analysis.
>>
>> Does anyone have any suggestions on how to deal with the many zeros for the
>> density data (assuming I was to use an ANOVA type analysis)? I have also
>> thought to just include fish size as a covariate in the ZINB models, but a)
>> have not seen an example of such, b) do not want to over complicate the
>> analysis, and/or c) this will only scale the counts to overall fish size,
>> not to fin etc. surface areas. Of course, fin surface areas probably scale
>> linearly and, if so, body size might be an appropriate covariate to remove
>> the effect from the ZINB comparisons of counts on fins i.e. essentially the
>> same comparison (density = ZINB with fish size covariate). Does that make
>> sense?
>>
>> This ZINB model works well (yes, the 'false' zeros are independent of
>> factors hence the "| 1"). Where would I insert the fish size covariate?
>> location_on_body = where parasites were found (7 body areas)
>> location = site of fish capture (2 sites)
>>
>> zinb2<-zeroinfl(caligus_elongatus~location_on_body+location | 1,
>> dist="negbin", link="logit", data=sturg)
>>
>> I would assume this:
>>
>> zinb2<-zeroinfl(caligus_elongatus~fish_size+location_on_body+location | 1,
>> dist="negbin", link="logit", data=sturg)
>>
>> thanks in advance,
>> trevor
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>>
>>     
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>   

-- 
Scott Foster
CSIRO Mathematics, Informatics and Statistics
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania 
Australia

Phone:     (03) 6232 5178
Fax:       (03) 6232 5000
Email:     scott.foster at csiro.au



More information about the R-sig-ecology mailing list