[R-sig-eco] logistic regression and spatial autocorrelation

Pedro Lima Pequeno pacolipe at gmail.com
Thu Aug 25 19:36:19 CEST 2011


Hi Tim,

there are several ways of dealing with spatial autocorrelation in
ecological models (see e.g. Dormann 2007: Methods to account for
spatial autocorrelation in the analysis of species distributional
data: a review; and Beale et al. 2010: Regression analysis of spatial
data). As always, this is an area of active research, so the right or
wrong thing to do is not as clear as it may seem. Some have even
concluded  that "changes in coefficients between spatial and
non-spatial methods depend on the method used and are largely
idiosyncratic", so that "researchers may have little choice but to be
more explicit about the uncertainty of models and more cautious in
their interpretation" (Bini et al. 2009: Coefficient shifts in
geographical ecology: an empirical evaluation of spatial and
non-spatial regression). Thus, new methods are emerging at a faster
rate than people studying and comparing their properties. Nonetheless,
I think some observations are useful:
1) there are methods explicitly designed to detect spatial
autocorrelation such as Moran's autocorrelograms or variograms
(available in several R packages). As already pointed out, the
autocorrelation function is "well behaved" with linear, equally spaced
series in time or space;
2) minimal adequate model selection with the AIC is sensitive to
residual autocorrelation; it tends to generate unstable and overfitted
models. Thus, when applying any model selection procedure, you should
account for the uncertainty in the process by averaging the model set
(or model predictions) with respect to the relative support of each
model (e.g. Akaike weights). Since you have a large sample, you could
account for residual spatial autocorrelation using eigenvector
filtering, which produces synthetic variables that capture spatial
patterns and can be included in linear models as explanatory variables
- a more intuitive approach if you don't want to mess with random
factors (see e.g. Diniz-Filho et al. 2008: Model selection and
information theory in geographical ecology). This can be implemented
with packages spacemakeR and vegan;
3) autocorrelation in model residuals is not the only - nor most
important - problem in biological modeling; model misspecification is
the major issue. Residual autocorrelation often arises due to not
including relevant explanatory variables, interaction terms, assuming
an inappropriate response shape and/or an inadequate variance
structure, or any combination of these. All these things need to be
checked for proper model validation, for instance by partial
regression plots (or added-varialbe plots), which help you see the
shape of the response to each explanatory variable after account for
the variation in the remaining explanatory set.
At the same time, you could plot the residuals againts both model
predictions and explanatory variables. Since residuals are the
stochastic component of the model ("noise"), its relation with the
systematic components should be random; clear patterns in these plots
are indications of misspecification.
Finally, all this "model tinkering" is based on two fundamental
premises: you want to model a mean tendecy of response and the pattern
of variation around it. These are strictly statistical properties of
data - they have nothing to do with biology. If you don't really
believe the biological process you are studying implies a mean
response, but rather e.g. a maximum one (such as in population or
abundance limitation), than all these methods will actually induce you
to misspecify the model, but there are alternatives (see e.g. Cade et
al. 2005 - Quantile regression reveals hidden bias and uncertainty in
habitat models).

2011/8/25, Tim Seipel <t.seipel at env.ethz.ch>:
>
> Dear List,
> I am trying to determine the best environmental predictors of the
> presence of a species along an elevational gradient.
> Elevation ranges from 400 to 2050 m a.s.l. and the ratio of presences to
> absences is low (132 presences out 2800 samples)
>
> So to start I fit the full model of with the variable of interest.
>
> sc.m<-glm(PA~sp.max+su.mmin+su.max+fa.mmin+fa.max+Slope+Haupt4+Pop_density+Dist_G+Growi_sea+,data=sc.pa,'binomial')
>
> First, I performed univariate and backward selection using Akaike
> Information Criteria, and the fit was good and realistic given my
> knowledge of the environment though the D^2 was low 0.08. My final model
> was:
> ---------------------------------
> glm(formula = PA ~ Slope + sp.mmin + su.max + fa.mmin + Haupt4,
>      family = "binomial", data = sc.pa)
>
> Deviance Residuals:
>      Min       1Q   Median       3Q      Max
> -0.5415  -0.3506  -0.2608  -0.1762   3.0768
>
> Coefficients:
>               Estimate Std. Error z value Pr(>|z|)
> (Intercept) -73.45212   23.13842  -3.174  0.00150 **
> Slope        -0.03834    0.01174  -3.265  0.00109 **
> sp.mmin     -15.34594    5.30360  -2.893  0.00381 **
> su.max        5.09712    1.70332   2.992  0.00277 **
> fa.mmin      13.52262    4.64021   2.914  0.00357 **
> Haupt42      -0.72237    0.27710  -2.607  0.00914 **
> Haupt43      -0.95730    0.37762  -2.535  0.01124 *
> Haupt44      -0.25357    0.24330  -1.042  0.29731
> ---
>      Null deviance: 958.21  on 2784  degrees of freedom
> Residual deviance: 896.10  on 2777  degrees of freedom
> AIC: 912.1
>
> ----------------------
>
> I then realized that my residuals were all highly correlated (0.8-0.6)
> when I plotted them using acf() function.
>
> So to account for this I used glmmPQL to fit the full model:
>
> model.sc.c <- glmmPQL(PA ~
> sp.mmin+su.mmin+su.max+fa.mmin+Slope+Haupt4+Pop_density+Dist_G+Growi_sea,
> random=
> ~1|group.sc, data=sc.dat, family=binomial, correlation=corAR1())
>
> However, the algorithm failed to converge and all the p-vaules were
> either 0 or 1 and coefficient estimates approached infinity.
> Additionally the grouping factor of the random effect is slightly
> arbitrary and accounts a tiny amount of variation.
>
> ---
> So know I feel stuck between a rock and a hard place, on the one hand I
> know I have a lot of autocorrelation and on the other hand I don't have
> a clear way to include it in the model.
>
> I would appreciate any advice on the matter.
>
> Sincerely,
>
> Tim
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>


-- 
Pedro A. C. Lima Pequeno
Programa de Pós-graduação em Ecologia
Instituto Nacional de Pesquisas da Amazônia
Manaus, AM, Brasil



More information about the R-sig-ecology mailing list