[R-sig-eco] logistic regression and spatial autocorrelation
Aitor Gastón
aitor.gaston at upm.es
Thu Aug 25 13:42:19 CEST 2011
Hi Tim,
If you are interested in model predictions, forward, backward or stepwise
predictor selection has a lot of disadvantages (see
http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf for a summary). My
experience with logistic regression applied to species distribution models
[1] tells that stepwise predictor selection using AIC doesn't improve model
performance compared to full models, i.e, is not useful to avoid overfitting
in a scenario of too many predictors for the available number of species
occurrences. Including some kind of regularization may help in such
scenarios. In R you can fit penalized logistic regression models using the
rms package.
[1] GASTÓN A., GARCÍA-VIÑAS J.I., 2011. Modelling species distributions with
penalised logistic regressions: A comparison with maximum entropy models.
Ecol.Model., 222(13), 2037-2041.
http://dx.doi.org/10.1016/j.ecolmodel.2011.04.015
Regards,
Aitor
--------------------------------------------------
From: "Tim Seipel" <t.seipel at env.ethz.ch>
Sent: Thursday, August 25, 2011 11:04 AM
To: <r-sig-ecology at r-project.org>
Subject: [R-sig-eco] logistic regression and spatial autocorrelation
>
> Dear List,
> I am trying to determine the best environmental predictors of the
> presence of a species along an elevational gradient.
> Elevation ranges from 400 to 2050 m a.s.l. and the ratio of presences to
> absences is low (132 presences out 2800 samples)
>
> So to start I fit the full model of with the variable of interest.
>
> sc.m<-glm(PA~sp.max+su.mmin+su.max+fa.mmin+fa.max+Slope+Haupt4+Pop_density+Dist_G+Growi_sea+,data=sc.pa,'binomial')
>
> First, I performed univariate and backward selection using Akaike
> Information Criteria, and the fit was good and realistic given my
> knowledge of the environment though the D^2 was low 0.08. My final model
> was:
> ---------------------------------
> glm(formula = PA ~ Slope + sp.mmin + su.max + fa.mmin + Haupt4,
> family = "binomial", data = sc.pa)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -0.5415 -0.3506 -0.2608 -0.1762 3.0768
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -73.45212 23.13842 -3.174 0.00150 **
> Slope -0.03834 0.01174 -3.265 0.00109 **
> sp.mmin -15.34594 5.30360 -2.893 0.00381 **
> su.max 5.09712 1.70332 2.992 0.00277 **
> fa.mmin 13.52262 4.64021 2.914 0.00357 **
> Haupt42 -0.72237 0.27710 -2.607 0.00914 **
> Haupt43 -0.95730 0.37762 -2.535 0.01124 *
> Haupt44 -0.25357 0.24330 -1.042 0.29731
> ---
> Null deviance: 958.21 on 2784 degrees of freedom
> Residual deviance: 896.10 on 2777 degrees of freedom
> AIC: 912.1
>
> ----------------------
>
> I then realized that my residuals were all highly correlated (0.8-0.6)
> when I plotted them using acf() function.
>
> So to account for this I used glmmPQL to fit the full model:
>
> model.sc.c <- glmmPQL(PA ~
> sp.mmin+su.mmin+su.max+fa.mmin+Slope+Haupt4+Pop_density+Dist_G+Growi_sea,
> random=
> ~1|group.sc, data=sc.dat, family=binomial, correlation=corAR1())
>
> However, the algorithm failed to converge and all the p-vaules were
> either 0 or 1 and coefficient estimates approached infinity.
> Additionally the grouping factor of the random effect is slightly
> arbitrary and accounts a tiny amount of variation.
>
> ---
> So know I feel stuck between a rock and a hard place, on the one hand I
> know I have a lot of autocorrelation and on the other hand I don't have
> a clear way to include it in the model.
>
> I would appreciate any advice on the matter.
>
> Sincerely,
>
> Tim
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
More information about the R-sig-ecology
mailing list