[R-sig-eco] Resource selection- correlation between variables

Cade, Brian cadeb at usgs.gov
Tue Jun 7 16:42:25 CEST 2016


Teresa:  There probably are no simple short cuts here - you need to
investigate the correlations structure for each of your possible
comparisons.  You can use the variance inflation factor function vif() in
the car package for glms, which includes an extension for categorical
predictors.  I recommend the vif over pairwise correlations as it is the
linear correlation among multiple predictors that creates issues.  Note
that really large VIFs (e.g., >10 or so) are likely to indicate instability
with standard errors for regression coefficient estimates.  Smaller VIFs
1-5 largely indicate an issue with how to interpret regression coefficients
as partial effects.  VIFs close to 1 indicate no linear correlation.  You
don't necessarily need to eliminate predictor variables from a model just
because there is some multicollinearity, e.g., VIFs in the range 1-5.  You
just need to understand that the interpretation of the regression
coefficient as a partial effect for a unit change in the predictor variable
really needs to be interpreted as a unit change in the part of the
predictor that is not linearly related to the other predictors (see Cade
2015.  Ecology 96:2370-2382).  This, of course, is why it is so wonderful
to have perfectly uncorrelated predictors - the interpretation of partial
effects is simpler. But that is not realistic for most resource selection
analyses.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  cadeb at usgs.gov <brian_cade at usgs.gov>
tel:  970 226-9326


On Sat, Jun 4, 2016 at 9:54 AM, Teresa Oliveira <mteresaoliveira92 at gmail.com
> wrote:

> Dear all,
>
> I have a doubt regarding correlation between variables, and I would like to
> hear your opinion on this.
>
> Background:
> I am working with telemetry data of a single species, collected by several
> researchers, from five study areas. I aim to analyse resource selection
> (with resource selection functions, RSF), applying Design II (individual
> locations (used resource units) against study area (available resource
> units)) and Design III (individual locations (used resource units) against
> home range area (available resource units)). I have 13 variables in total:
> 10 binary variables (land cover characteristics) and 3 continuous variables
> (roughness, distance to water and distance to human settlements).
>
> I want to construct models for each study area and also a global model
> including all five study areas, because I want to see if it is possible to
> apply a global model for all areas or if they are very different from each
> other. I'm planning to use a sampling with replacement method to understand
> the effect of each area on the global model.
>
> Question:
> Before starting with RSF, I want to check if my variables are independent,
> and I'm not quite sure how I am supposed to conduct the analyses. Should I
> use all data (from all individuals in all study areas) to test correlation
> between all variables? Or should I conduct different analyses for each
> study area (or even each individual)?
>
> Does anyone have any suggestions?
>
> Thank you very much in advance,
> Teresa
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>

	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list