[R-sig-eco] logistic regression and spatial autocorrelation

cparker at pdx.edu cparker at pdx.edu
Thu Aug 25 20:08:52 CEST 2011


Tim pointed put that he has only 132 samples out of 2800 with a species present and I am curious what people think about how well we can model that with logistic regression.
-Chris




On Aug 25, 2011, at 10:36 AM, Pedro Lima Pequeno <pacolipe at gmail.com> wrote:

> Hi Tim,
> 
> there are several ways of dealing with spatial autocorrelation in
> ecological models (see e.g. Dormann 2007: Methods to account for
> spatial autocorrelation in the analysis of species distributional
> data: a review; and Beale et al. 2010: Regression analysis of spatial
> data). As always, this is an area of active research, so the right or
> wrong thing to do is not as clear as it may seem. Some have even
> concluded  that "changes in coefficients between spatial and
> non-spatial methods depend on the method used and are largely
> idiosyncratic", so that "researchers may have little choice but to be
> more explicit about the uncertainty of models and more cautious in
> their interpretation" (Bini et al. 2009: Coefficient shifts in
> geographical ecology: an empirical evaluation of spatial and
> non-spatial regression). Thus, new methods are emerging at a faster
> rate than people studying and comparing their properties. Nonetheless,
> I think some observations are useful:
> 1) there are methods explicitly designed to detect spatial
> autocorrelation such as Moran's autocorrelograms or variograms
> (available in several R packages). As already pointed out, the
> autocorrelation function is "well behaved" with linear, equally spaced
> series in time or space;
> 2) minimal adequate model selection with the AIC is sensitive to
> residual autocorrelation; it tends to generate unstable and overfitted
> models. Thus, when applying any model selection procedure, you should
> account for the uncertainty in the process by averaging the model set
> (or model predictions) with respect to the relative support of each
> model (e.g. Akaike weights). Since you have a large sample, you could
> account for residual spatial autocorrelation using eigenvector
> filtering, which produces synthetic variables that capture spatial
> patterns and can be included in linear models as explanatory variables
> - a more intuitive approach if you don't want to mess with random
> factors (see e.g. Diniz-Filho et al. 2008: Model selection and
> information theory in geographical ecology). This can be implemented
> with packages spacemakeR and vegan;
> 3) autocorrelation in model residuals is not the only - nor most
> important - problem in biological modeling; model misspecification is
> the major issue. Residual autocorrelation often arises due to not
> including relevant explanatory variables, interaction terms, assuming
> an inappropriate response shape and/or an inadequate variance
> structure, or any combination of these. All these things need to be
> checked for proper model validation, for instance by partial
> regression plots (or added-varialbe plots), which help you see the
> shape of the response to each explanatory variable after account for
> the variation in the remaining explanatory set.
> At the same time, you could plot the residuals againts both model
> predictions and explanatory variables. Since residuals are the
> stochastic component of the model ("noise"), its relation with the
> systematic components should be random; clear patterns in these plots
> are indications of misspecification.
> Finally, all this "model tinkering" is based on two fundamental
> premises: you want to model a mean tendecy of response and the pattern
> of variation around it. These are strictly statistical properties of
> data - they have nothing to do with biology. If you don't really
> believe the biological process you are studying implies a mean
> response, but rather e.g. a maximum one (such as in population or
> abundance limitation), than all these methods will actually induce you
> to misspecify the model, but there are alternatives (see e.g. Cade et
> al. 2005 - Quantile regression reveals hidden bias and uncertainty in
> habitat models).
> 
> 2011/8/25, Tim Seipel <t.seipel at env.ethz.ch>:
>> 
>> Dear List,
>> I am trying to determine the best environmental predictors of the
>> presence of a species along an elevational gradient.
>> Elevation ranges from 400 to 2050 m a.s.l. and the ratio of presences to
>> absences is low (132 presences out 2800 samples)
>> 
>> So to start I fit the full model of with the variable of interest.
>> 
>> sc.m<-glm(PA~sp.max+su.mmin+su.max+fa.mmin+fa.max+Slope+Haupt4+Pop_density+Dist_G+Growi_sea+,data=sc.pa,'binomial')
>> 
>> First, I performed univariate and backward selection using Akaike
>> Information Criteria, and the fit was good and realistic given my
>> knowledge of the environment though the D^2 was low 0.08. My final model
>> was:
>> ---------------------------------
>> glm(formula = PA ~ Slope + sp.mmin + su.max + fa.mmin + Haupt4,
>>     family = "binomial", data = sc.pa)
>> 
>> Deviance Residuals:
>>     Min       1Q   Median       3Q      Max
>> -0.5415  -0.3506  -0.2608  -0.1762   3.0768
>> 
>> Coefficients:
>>              Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -73.45212   23.13842  -3.174  0.00150 **
>> Slope        -0.03834    0.01174  -3.265  0.00109 **
>> sp.mmin     -15.34594    5.30360  -2.893  0.00381 **
>> su.max        5.09712    1.70332   2.992  0.00277 **
>> fa.mmin      13.52262    4.64021   2.914  0.00357 **
>> Haupt42      -0.72237    0.27710  -2.607  0.00914 **
>> Haupt43      -0.95730    0.37762  -2.535  0.01124 *
>> Haupt44      -0.25357    0.24330  -1.042  0.29731
>> ---
>>     Null deviance: 958.21  on 2784  degrees of freedom
>> Residual deviance: 896.10  on 2777  degrees of freedom
>> AIC: 912.1
>> 
>> ----------------------
>> 
>> I then realized that my residuals were all highly correlated (0.8-0.6)
>> when I plotted them using acf() function.
>> 
>> So to account for this I used glmmPQL to fit the full model:
>> 
>> model.sc.c <- glmmPQL(PA ~
>> sp.mmin+su.mmin+su.max+fa.mmin+Slope+Haupt4+Pop_density+Dist_G+Growi_sea,
>> random=
>> ~1|group.sc, data=sc.dat, family=binomial, correlation=corAR1())
>> 
>> However, the algorithm failed to converge and all the p-vaules were
>> either 0 or 1 and coefficient estimates approached infinity.
>> Additionally the grouping factor of the random effect is slightly
>> arbitrary and accounts a tiny amount of variation.
>> 
>> ---
>> So know I feel stuck between a rock and a hard place, on the one hand I
>> know I have a lot of autocorrelation and on the other hand I don't have
>> a clear way to include it in the model.
>> 
>> I would appreciate any advice on the matter.
>> 
>> Sincerely,
>> 
>> Tim
>> 
>>    [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>> 
> 
> 
> -- 
> Pedro A. C. Lima Pequeno
> Programa de Pós-graduação em Ecologia
> Instituto Nacional de Pesquisas da Amazônia
> Manaus, AM, Brasil
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list