[R-sig-eco] Using pcnm to correct for spatial autocorrelation

Sat Nov 6 08:27:48 CET 2010

Kevin,

I'll answer only some of your technical questions. I don't want to implicate
that you should use PCNM, but I only say how to use them if you use them.
For conceptual issues, you ma also check B. Gilbert & J. R. Bennett Journal
of Applied Ecology, Volume 47, Issue 5, pages 10711082, October 2010.

On 5/11/10 22:59 PM, "Kevin McCluney" <Kevin.McCluney at colostate.edu> wrote:

> I am trying to use pcnm in vegan to correct for spatial autocorrelation in
> analyses of influences of environmental factors on multivariate community
> composition, as well as univariate analyses of diversity and abundance.  I
> have several questions.
> 
> 1.  PCNM requires a threshold and I am aware that the value that keeps all
> sites connected is most commonly used.  My data has a group of 4 sites
> spaced <70 m apart that are over 3 km from another group of 30 sites spaced
> less than 105 m apart, and one more site which is 700 m from that group.  It
> seems more reasonable to me to use a threshold of 105 m than one of >3km,
> especially since my study focuses on ground arthropods.  This would create 3
> groupings of sites.  If I use the >3km threshold I get only 8 pcnm axes,
> whereas if I use 105 m I get 23 pcnm axes.  What are the dangers of using
> 105 m for the threshold?
> 
The choice of threshold is arbitrary and it will influence the results. The
standard (which also is the default in vegan) is indeed to use longest
possible threshold to keep the data connected. I cannot see any dangers in
using any other thresholds. It is not more dangerous to use threshold of
700m than to use a threshold of 3000m. You could quite as well as what are
the danger of using the default of >3km.

The number of PCNM vectors has no relevance for the choice.

If you start from the Euclidean distances of spatial locations on a plane
(like Earth is for many practical purposes), you would get back two
principal coordinates. We put there an arbitrary threshold in PCNM and these
non-Euclidify the matrix. Therefore you get more than two PCNM vectors and
sevaral negative eigenvalues for locations on a plane. Having a low number
of PCNM vectors and not too many negative eigenvalues is a sign of more
Euclidean space. You cannot have completely Euclidean (=2 dims) space for
PCNM since then you just fall back to a simple linear trend surface. With
PCNM you have trickier surfaces.

> 2.  I know that the standard procedure for pcnm involves removing or
> ignoring axes with negative eigenvalues.  When I use the pcnm function in
> vegan, I get more eigenvalues than I have axes.  If I assume that the first
> values correspond to the first axes, then all the axes in my analyses have
> positive eigenvalues, but then the "extra" values are all negative.  What
> are these "extra" eigenvalues?
It is a correct assumption that the eigenvalues and corresponding
eigenvectors are ordered similarly. All PCNMs that you get are for those
positive eigenvalues, and the first eigenvalue is for the first axis etc.
You will normally get negative eigenvalues. It is not only a common
procedure to ignore axes with negative eigenvalues, but it is about the only
practical choice (and the only choice you have in vegan).
> 
> 3. The next step in pcnm involves selecting pcnm axes with significant
> effects on responses.  I know that it has been suggested to use forward
> selection routines in cca to select these axes.  I'm also aware of some of
> the limitations of this technique and the suggestion that forward selection
> should have additional criteria. Namely, that forward selection should only
> be used if the full model with all terms is significant and should also
> compare the adjusted R2 with each term added to that of the original full
> model.  If I perform an analysis with all non-negative pcnm terms and the
> model is not significant, does this mean there is no spatial autocorrelation
> and no selection procedure is needed?  If it is significant, then I need
> adjusted R2 values to perform the forward selection, but I don't see how to
> get these using cca in vegan.
>
There is no way of doing this in vegan. You can do it for rda(), and the
development version of vegan in repository http://r-forge.r-project.org/ has
an automatic function ordiR2step for rda() or capscale() to do that forward
selection (it is not yet in the release version, because I have expected
Guillaume Blanchet's and Pierre Legendre's blessing to the function before
release). The problem is that vegan does not have adjusted R2 for CCA. I
have seen a Pedro Peres-Neto's paper on calculating adjusted R2 fro CCA, but
the calculation is pretty tricky and slow, and we haven't implemented that
in vegan. I guess you refer to Blanchet et al., Ecology 89, 2623-2632 (2008)
when you write about recommended procedure: that paper only considered RDA.
With CCA you must trust your own judgment when you decide how to select your
PCNMs. 

I hope you remembered to supply weights to pcnm() function in vegan when you
calculated your PCNM vectors.

Non-significant PCNMs say nothing about spatial autocorrelation. They may
say something about spatial structure that can be expressed with your set of
spatial basis vectors.

> Additionally, for my multivariate analyses, I am using adonis to perform
> statistical tests, not cca.  Therefore, should I also be using adonis to
> look for significant pcnm axes?  I haven't seen this done, but do not see
> potential drawbacks.  Are there any?
> 
I wouldn't mix adonis() and pcnm(). The PCNMs were designed to be used with
RDA or other ordination techniques. adonis() works with dissimilarities. In
PCNM you start with distances, then you change these to basis vectors, and
for adonis() you would change those back to distances. I think this is not
wise. Why not use the PCNM distances directly? One reason is that you cannot
get them directly in vegan or other functions I know without editing vegan
functions to return them or calculating them manually. That would be rather
easy since you can just copy the commands of pcnm() functions (it seems that
one line is enough).

I think CCA and adonis is not a good coupling: CCA is weighted method using
Chi-square metric and adonis is unweighted using Euclidean metric. Coupling
RDA and adonis or capscale() and adonis (for non-Euclidean case) is more
natural.

Cheers, Jari Oksanen