[R-sig-eco] logistic regression and spatial autocorrelation

Pedro Lima Pequeno pacolipe at gmail.com
Thu Aug 25 22:43:08 CEST 2011


Tim and Chris,

I think zero-inflated linear models for presence/absence data could be
implemented with package "vgam", but I have never tried it.

Regards

2011/8/25, cparker at pdx.edu <cparker at pdx.edu>:
> Tim pointed put that he has only 132 samples out of 2800 with a species
> present and I am curious what people think about how well we can model that
> with logistic regression.
> -Chris
>
>
>
>
> On Aug 25, 2011, at 10:36 AM, Pedro Lima Pequeno <pacolipe at gmail.com> wrote:
>
>> Hi Tim,
>>
>> there are several ways of dealing with spatial autocorrelation in
>> ecological models (see e.g. Dormann 2007: Methods to account for
>> spatial autocorrelation in the analysis of species distributional
>> data: a review; and Beale et al. 2010: Regression analysis of spatial
>> data). As always, this is an area of active research, so the right or
>> wrong thing to do is not as clear as it may seem. Some have even
>> concluded  that "changes in coefficients between spatial and
>> non-spatial methods depend on the method used and are largely
>> idiosyncratic", so that "researchers may have little choice but to be
>> more explicit about the uncertainty of models and more cautious in
>> their interpretation" (Bini et al. 2009: Coefficient shifts in
>> geographical ecology: an empirical evaluation of spatial and
>> non-spatial regression). Thus, new methods are emerging at a faster
>> rate than people studying and comparing their properties. Nonetheless,
>> I think some observations are useful:
>> 1) there are methods explicitly designed to detect spatial
>> autocorrelation such as Moran's autocorrelograms or variograms
>> (available in several R packages). As already pointed out, the
>> autocorrelation function is "well behaved" with linear, equally spaced
>> series in time or space;
>> 2) minimal adequate model selection with the AIC is sensitive to
>> residual autocorrelation; it tends to generate unstable and overfitted
>> models. Thus, when applying any model selection procedure, you should
>> account for the uncertainty in the process by averaging the model set
>> (or model predictions) with respect to the relative support of each
>> model (e.g. Akaike weights). Since you have a large sample, you could
>> account for residual spatial autocorrelation using eigenvector
>> filtering, which produces synthetic variables that capture spatial
>> patterns and can be included in linear models as explanatory variables
>> - a more intuitive approach if you don't want to mess with random
>> factors (see e.g. Diniz-Filho et al. 2008: Model selection and
>> information theory in geographical ecology). This can be implemented
>> with packages spacemakeR and vegan;
>> 3) autocorrelation in model residuals is not the only - nor most
>> important - problem in biological modeling; model misspecification is
>> the major issue. Residual autocorrelation often arises due to not
>> including relevant explanatory variables, interaction terms, assuming
>> an inappropriate response shape and/or an inadequate variance
>> structure, or any combination of these. All these things need to be
>> checked for proper model validation, for instance by partial
>> regression plots (or added-varialbe plots), which help you see the
>> shape of the response to each explanatory variable after account for
>> the variation in the remaining explanatory set.
>> At the same time, you could plot the residuals againts both model
>> predictions and explanatory variables. Since residuals are the
>> stochastic component of the model ("noise"), its relation with the
>> systematic components should be random; clear patterns in these plots
>> are indications of misspecification.
>> Finally, all this "model tinkering" is based on two fundamental
>> premises: you want to model a mean tendecy of response and the pattern
>> of variation around it. These are strictly statistical properties of
>> data - they have nothing to do with biology. If you don't really
>> believe the biological process you are studying implies a mean
>> response, but rather e.g. a maximum one (such as in population or
>> abundance limitation), than all these methods will actually induce you
>> to misspecify the model, but there are alternatives (see e.g. Cade et
>> al. 2005 - Quantile regression reveals hidden bias and uncertainty in
>> habitat models).
>>
>> 2011/8/25, Tim Seipel <t.seipel at env.ethz.ch>:
>>>
>>> Dear List,
>>> I am trying to determine the best environmental predictors of the
>>> presence of a species along an elevational gradient.
>>> Elevation ranges from 400 to 2050 m a.s.l. and the ratio of presences to
>>> absences is low (132 presences out 2800 samples)
>>>
>>> So to start I fit the full model of with the variable of interest.
>>>
>>> sc.m<-glm(PA~sp.max+su.mmin+su.max+fa.mmin+fa.max+Slope+Haupt4+Pop_density+Dist_G+Growi_sea+,data=sc.pa,'binomial')
>>>
>>> First, I performed univariate and backward selection using Akaike
>>> Information Criteria, and the fit was good and realistic given my
>>> knowledge of the environment though the D^2 was low 0.08. My final model
>>> was:
>>> ---------------------------------
>>> glm(formula = PA ~ Slope + sp.mmin + su.max + fa.mmin + Haupt4,
>>>     family = "binomial", data = sc.pa)
>>>
>>> Deviance Residuals:
>>>     Min       1Q   Median       3Q      Max
>>> -0.5415  -0.3506  -0.2608  -0.1762   3.0768
>>>
>>> Coefficients:
>>>              Estimate Std. Error z value Pr(>|z|)
>>> (Intercept) -73.45212   23.13842  -3.174  0.00150 **
>>> Slope        -0.03834    0.01174  -3.265  0.00109 **
>>> sp.mmin     -15.34594    5.30360  -2.893  0.00381 **
>>> su.max        5.09712    1.70332   2.992  0.00277 **
>>> fa.mmin      13.52262    4.64021   2.914  0.00357 **
>>> Haupt42      -0.72237    0.27710  -2.607  0.00914 **
>>> Haupt43      -0.95730    0.37762  -2.535  0.01124 *
>>> Haupt44      -0.25357    0.24330  -1.042  0.29731
>>> ---
>>>     Null deviance: 958.21  on 2784  degrees of freedom
>>> Residual deviance: 896.10  on 2777  degrees of freedom
>>> AIC: 912.1
>>>
>>> ----------------------
>>>
>>> I then realized that my residuals were all highly correlated (0.8-0.6)
>>> when I plotted them using acf() function.
>>>
>>> So to account for this I used glmmPQL to fit the full model:
>>>
>>> model.sc.c <- glmmPQL(PA ~
>>> sp.mmin+su.mmin+su.max+fa.mmin+Slope+Haupt4+Pop_density+Dist_G+Growi_sea,
>>> random=
>>> ~1|group.sc, data=sc.dat, family=binomial, correlation=corAR1())
>>>
>>> However, the algorithm failed to converge and all the p-vaules were
>>> either 0 or 1 and coefficient estimates approached infinity.
>>> Additionally the grouping factor of the random effect is slightly
>>> arbitrary and accounts a tiny amount of variation.
>>>
>>> ---
>>> So know I feel stuck between a rock and a hard place, on the one hand I
>>> know I have a lot of autocorrelation and on the other hand I don't have
>>> a clear way to include it in the model.
>>>
>>> I would appreciate any advice on the matter.
>>>
>>> Sincerely,
>>>
>>> Tim
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-ecology mailing list
>>> R-sig-ecology at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>
>>
>>
>> --
>> Pedro A. C. Lima Pequeno
>> Programa de Pós-graduação em Ecologia
>> Instituto Nacional de Pesquisas da Amazônia
>> Manaus, AM, Brasil
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>


-- 
Pedro A. C. Lima Pequeno
Programa de Pós-graduação em Ecologia
Instituto Nacional de Pesquisas da Amazônia
Manaus, AM, Brasil



More information about the R-sig-ecology mailing list