[R-sig-ME] zero inflated and spatial autocorrelation

Highland Statistics Ltd. highstat at highstat.com
Wed Sep 23 19:08:19 CEST 2009

> (sorry if the message was double posted...)
> Good afternoon,
> I currently try to estimate the relative contribution of habitat quality 
> and a proxy of conspecific attraction for a bird species (3 years of 
> data: 2000, 2004 and 2008) with a model of the form
> Abundance (year t) ~ Abundance (year t-1) + Habitat_quality + Epsi
> I would like being able to compare what happened between 2000-2004 to 
> 2004-2008
> and would need some advices.
> Problems are
> 1/ that the distribution of abundance (integer values) is zero inflated, 
> with 80% of my grid cells "tagged" with 0

The negative binomial can cope with some degree of zeros....but 80% is a 
lot. I guess a zero inflated distribution would be better. Besides...I 
can't remember wether gamm in mgcv is estimating the theta, or whether 
it uses a fixed value.

Have a look how these guys fitted their models....it is similar:


The first author published a couple of similar papers. But I don't think 
that this is going to be a simple call to a gamm function.

> 2/ I need to account for spatial autocorrelation (which is detectable 
> below 10 km, my grid being 3 x 3 km in order to account for between 
> years short dispersal).
> I lack some skills in statistics to fully handle this on my own 
This is not easy..:-)

> Shall I run straight ahead with a gamm model which is the only one, to 
> my knowledge, that can account for both the spatial autocorrelation 
> structure (with for example correlation=corSpher(form=~(X+Y))) and the 
> zero inflated distribution (with family= negbin) ?
> Or is there a way to relax some constraints on the distribution in order 
> to handle overdispersion (induced by the ZI distribution) and thus trust 
> the run of a glmmPQL with quasipoisson ?

Have a look at the VGAM package. Perhaps it can do this type of stuff by 
now. If I remember well it can do smoothing with zero inflation. Not 
sure if it can do correlation. As a quick-and-dirty approach you could 
include s(X,Y) and capture the spatial pattern with such a 2-d smoother. 
But that may be more for larger scale patterns? However....it may cause 
trouble if your habitat stuff is collinear with spatial positions.

Have fun....this is not easy...but shit happens.


>  >From what I read, glmm and gamm are at the edge of statistics and need 
> to be examined cautiously... so if I can avoid inserting irrelevant 
> information in it...
> Thanks for any help or link
> Best regards
> Alex
> Tiebreaker: my observation window has an irregular shape, meaning that 
> cells at the edge are truncated. If it is the correct solution to manage 
> this, how do I properly specify weights=...  with the area of cells as 
> argument.
> Do I need to transform the value of areas ?
> Or would it also be correct to exclude those piece of cells (which are 
> mainly zeros --> 0 would then represent "only" 68% of observations, 
> 392/577 )?


Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7

2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.

3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer

Other books: http://www.highstat.com/books.htm

Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highstat at highstat.com
URL: www.highstat.com
URL: www.brodgar.com

More information about the R-sig-mixed-models mailing list