[R-sig-eco] logistic regression and spatial autocorrelation

Thu Aug 25 13:42:19 CEST 2011

Hi Tim,

If you are interested in model predictions, forward, backward or stepwise 
predictor selection has a lot of disadvantages (see 
http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf for a summary). My 
experience with logistic regression applied to species distribution models 
[1] tells that stepwise predictor selection using AIC doesn't improve model 
performance compared to full models, i.e, is not useful to avoid overfitting 
in a scenario of too many predictors for the available number of species 
occurrences. Including some kind of regularization may help in such 
scenarios. In R you can fit penalized logistic regression models using the 
rms package.

[1] GASTÓN A., GARCÍA-VIÑAS J.I., 2011. Modelling species distributions with 
penalised logistic regressions: A comparison with maximum entropy models. 
Ecol.Model., 222(13), 2037-2041. 
http://dx.doi.org/10.1016/j.ecolmodel.2011.04.015

Regards,

Aitor

--------------------------------------------------
From: "Tim Seipel" <t.seipel at env.ethz.ch>
Sent: Thursday, August 25, 2011 11:04 AM
To: <r-sig-ecology at r-project.org>
Subject: [R-sig-eco] logistic regression and spatial autocorrelation

>
> Dear List,
> I am trying to determine the best environmental predictors of the
> presence of a species along an elevational gradient.
> Elevation ranges from 400 to 2050 m a.s.l. and the ratio of presences to
> absences is low (132 presences out 2800 samples)
>
> So to start I fit the full model of with the variable of interest.
>
> sc.m<-glm(PA~sp.max+su.mmin+su.max+fa.mmin+fa.max+Slope+Haupt4+Pop_density+Dist_G+Growi_sea+,data=sc.pa,'binomial')
>
> First, I performed univariate and backward selection using Akaike
> Information Criteria, and the fit was good and realistic given my
> knowledge of the environment though the D^2 was low 0.08. My final model
> was:
> ---------------------------------
> glm(formula = PA ~ Slope + sp.mmin + su.max + fa.mmin + Haupt4,
>     family = "binomial", data = sc.pa)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max
> -0.5415  -0.3506  -0.2608  -0.1762   3.0768
>
> Coefficients:
>              Estimate Std. Error z value Pr(>|z|)
> (Intercept) -73.45212   23.13842  -3.174  0.00150 **
> Slope        -0.03834    0.01174  -3.265  0.00109 **
> sp.mmin     -15.34594    5.30360  -2.893  0.00381 **
> su.max        5.09712    1.70332   2.992  0.00277 **
> fa.mmin      13.52262    4.64021   2.914  0.00357 **
> Haupt42      -0.72237    0.27710  -2.607  0.00914 **
> Haupt43      -0.95730    0.37762  -2.535  0.01124 *
> Haupt44      -0.25357    0.24330  -1.042  0.29731
> ---
>     Null deviance: 958.21  on 2784  degrees of freedom
> Residual deviance: 896.10  on 2777  degrees of freedom
> AIC: 912.1
>
> ----------------------
>
> I then realized that my residuals were all highly correlated (0.8-0.6)
> when I plotted them using acf() function.
>
> So to account for this I used glmmPQL to fit the full model:
>
> model.sc.c <- glmmPQL(PA ~
> sp.mmin+su.mmin+su.max+fa.mmin+Slope+Haupt4+Pop_density+Dist_G+Growi_sea, 
> random=
> ~1|group.sc, data=sc.dat, family=binomial, correlation=corAR1())
>
> However, the algorithm failed to converge and all the p-vaules were
> either 0 or 1 and coefficient estimates approached infinity.
> Additionally the grouping factor of the random effect is slightly
> arbitrary and accounts a tiny amount of variation.
>
> ---
> So know I feel stuck between a rock and a hard place, on the one hand I
> know I have a lot of autocorrelation and on the other hand I don't have
> a clear way to include it in the model.
>
> I would appreciate any advice on the matter.
>
> Sincerely,
>
> Tim
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology