[R-sig-eco] logistic regression and spatial autocorrelation
cparker at pdx.edu
cparker at pdx.edu
Thu Aug 25 20:08:52 CEST 2011
Tim pointed put that he has only 132 samples out of 2800 with a species present and I am curious what people think about how well we can model that with logistic regression.
-Chris
On Aug 25, 2011, at 10:36 AM, Pedro Lima Pequeno <pacolipe at gmail.com> wrote:
> Hi Tim,
>
> there are several ways of dealing with spatial autocorrelation in
> ecological models (see e.g. Dormann 2007: Methods to account for
> spatial autocorrelation in the analysis of species distributional
> data: a review; and Beale et al. 2010: Regression analysis of spatial
> data). As always, this is an area of active research, so the right or
> wrong thing to do is not as clear as it may seem. Some have even
> concluded that "changes in coefficients between spatial and
> non-spatial methods depend on the method used and are largely
> idiosyncratic", so that "researchers may have little choice but to be
> more explicit about the uncertainty of models and more cautious in
> their interpretation" (Bini et al. 2009: Coefficient shifts in
> geographical ecology: an empirical evaluation of spatial and
> non-spatial regression). Thus, new methods are emerging at a faster
> rate than people studying and comparing their properties. Nonetheless,
> I think some observations are useful:
> 1) there are methods explicitly designed to detect spatial
> autocorrelation such as Moran's autocorrelograms or variograms
> (available in several R packages). As already pointed out, the
> autocorrelation function is "well behaved" with linear, equally spaced
> series in time or space;
> 2) minimal adequate model selection with the AIC is sensitive to
> residual autocorrelation; it tends to generate unstable and overfitted
> models. Thus, when applying any model selection procedure, you should
> account for the uncertainty in the process by averaging the model set
> (or model predictions) with respect to the relative support of each
> model (e.g. Akaike weights). Since you have a large sample, you could
> account for residual spatial autocorrelation using eigenvector
> filtering, which produces synthetic variables that capture spatial
> patterns and can be included in linear models as explanatory variables
> - a more intuitive approach if you don't want to mess with random
> factors (see e.g. Diniz-Filho et al. 2008: Model selection and
> information theory in geographical ecology). This can be implemented
> with packages spacemakeR and vegan;
> 3) autocorrelation in model residuals is not the only - nor most
> important - problem in biological modeling; model misspecification is
> the major issue. Residual autocorrelation often arises due to not
> including relevant explanatory variables, interaction terms, assuming
> an inappropriate response shape and/or an inadequate variance
> structure, or any combination of these. All these things need to be
> checked for proper model validation, for instance by partial
> regression plots (or added-varialbe plots), which help you see the
> shape of the response to each explanatory variable after account for
> the variation in the remaining explanatory set.
> At the same time, you could plot the residuals againts both model
> predictions and explanatory variables. Since residuals are the
> stochastic component of the model ("noise"), its relation with the
> systematic components should be random; clear patterns in these plots
> are indications of misspecification.
> Finally, all this "model tinkering" is based on two fundamental
> premises: you want to model a mean tendecy of response and the pattern
> of variation around it. These are strictly statistical properties of
> data - they have nothing to do with biology. If you don't really
> believe the biological process you are studying implies a mean
> response, but rather e.g. a maximum one (such as in population or
> abundance limitation), than all these methods will actually induce you
> to misspecify the model, but there are alternatives (see e.g. Cade et
> al. 2005 - Quantile regression reveals hidden bias and uncertainty in
> habitat models).
>
> 2011/8/25, Tim Seipel <t.seipel at env.ethz.ch>:
>>
>> Dear List,
>> I am trying to determine the best environmental predictors of the
>> presence of a species along an elevational gradient.
>> Elevation ranges from 400 to 2050 m a.s.l. and the ratio of presences to
>> absences is low (132 presences out 2800 samples)
>>
>> So to start I fit the full model of with the variable of interest.
>>
>> sc.m<-glm(PA~sp.max+su.mmin+su.max+fa.mmin+fa.max+Slope+Haupt4+Pop_density+Dist_G+Growi_sea+,data=sc.pa,'binomial')
>>
>> First, I performed univariate and backward selection using Akaike
>> Information Criteria, and the fit was good and realistic given my
>> knowledge of the environment though the D^2 was low 0.08. My final model
>> was:
>> ---------------------------------
>> glm(formula = PA ~ Slope + sp.mmin + su.max + fa.mmin + Haupt4,
>> family = "binomial", data = sc.pa)
>>
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -0.5415 -0.3506 -0.2608 -0.1762 3.0768
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -73.45212 23.13842 -3.174 0.00150 **
>> Slope -0.03834 0.01174 -3.265 0.00109 **
>> sp.mmin -15.34594 5.30360 -2.893 0.00381 **
>> su.max 5.09712 1.70332 2.992 0.00277 **
>> fa.mmin 13.52262 4.64021 2.914 0.00357 **
>> Haupt42 -0.72237 0.27710 -2.607 0.00914 **
>> Haupt43 -0.95730 0.37762 -2.535 0.01124 *
>> Haupt44 -0.25357 0.24330 -1.042 0.29731
>> ---
>> Null deviance: 958.21 on 2784 degrees of freedom
>> Residual deviance: 896.10 on 2777 degrees of freedom
>> AIC: 912.1
>>
>> ----------------------
>>
>> I then realized that my residuals were all highly correlated (0.8-0.6)
>> when I plotted them using acf() function.
>>
>> So to account for this I used glmmPQL to fit the full model:
>>
>> model.sc.c <- glmmPQL(PA ~
>> sp.mmin+su.mmin+su.max+fa.mmin+Slope+Haupt4+Pop_density+Dist_G+Growi_sea,
>> random=
>> ~1|group.sc, data=sc.dat, family=binomial, correlation=corAR1())
>>
>> However, the algorithm failed to converge and all the p-vaules were
>> either 0 or 1 and coefficient estimates approached infinity.
>> Additionally the grouping factor of the random effect is slightly
>> arbitrary and accounts a tiny amount of variation.
>>
>> ---
>> So know I feel stuck between a rock and a hard place, on the one hand I
>> know I have a lot of autocorrelation and on the other hand I don't have
>> a clear way to include it in the model.
>>
>> I would appreciate any advice on the matter.
>>
>> Sincerely,
>>
>> Tim
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>
>
> --
> Pedro A. C. Lima Pequeno
> Programa de Pós-graduação em Ecologia
> Instituto Nacional de Pesquisas da Amazônia
> Manaus, AM, Brasil
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
More information about the R-sig-ecology
mailing list