[R-sig-eco] PCA with vegan

Jari Oksanen jari.oksanen at oulu.fi
Tue Jun 7 09:54:59 CEST 2011


On 6/06/11 22:43 PM, "amelie_can" <amelie_qcan at hotmail.com> wrote:

> Hello all, 
> 
> I am doing a pca using the vegan library with the function rda on two
> dataset: 
> 1) species abundances (where I have my sites on the lines and the species
> names on the columns, they were previously transformed with Hellinger)
> 2) on a community mean weight analysis (where I have my sites on the lines
> and my traits in columns). That second matrix was calculated by multiplying
> two matrix (the first one was species abundances which were the relative
> contribution of each species to its community (pi) was calculated and the
> second one was quantitative traits associated to each species).
> 
> The first time I did the pca on each dataset, I wrote SCALE = FALSE, which
> means that my data are centered by column but not reduced. From what I
> understood, as we have the same kind of data within my matrix, we do not
> need to reduce it, otherwise it would be a "double standardization". My
> supervisor was not sure about that method as I got a result where my first
> axis explained 84% of the variation of my response variable. They thought it
> was too high to be right.
> 
> If I do a  SCALE = TRUE, it reduces my percent explained by my first axis. R
> says that it  "Scale species to unit variance (like correlations)".
> 
Amelie,

84% is high for community data. Probably your data are not very
multivariate, but there is a species or a few species that contribute most
to the variance, and PCA is happy when it projects those extreme species.

You can inspect the equality of variances of species in data frame or matrix
'x' using

sd(x)^2
sort(sd(x)^2, dec=TRUE)

If most of the variance comes from a couple of species, it is easy to find a
one- or two-dimensional solution that explains just those species.

Having scale = TRUE makes all species contribute the same amount (1) to the
variance, and PCA has more trouble in projecting all these species. Indeed,
having scale = TRUE on Hellinger transformed data is a bit weird. I assume
you have some reasons for using Hellinger transformation, but it may not be
the most effective one if you want to reduce the influence of most abundant
species that usually contribute most to the total variance.

Cheers, Jari Oksanen



More information about the R-sig-ecology mailing list