[R] bootstrapped eigenvector method following prcomp
Axel Strauß
a.strauss at tu-bs.de
Mon Jan 19 09:53:03 CET 2009
G'Day R users!
Following an ordination using prcomp, I'd like to test which variables
singnificantly contribute to a principal component. There is a method
suggested by Peres-Neto and al. 2003. Ecology 84:2347-2363 called
"bootstrapped eigenvector". It was asked for that in this forum in
January 2005 by Jérôme Lemaître:
"1) Resample 1000 times with replacement entire raws from the original
data sets []
2) Conduct a PCA on each bootstrapped sample
3) To prevent axis reflexion and/or axis reordering in the bootstrap,
here are two more steps for each bootstrapped sample
3a) calculate correlation matrix between the PCA scores of the original
and those of the bootstrapped sample
3b) Examine whether the highest absolute correlation is between the
corresponding axis for the original and bootstrapped samples. When it is
not the case, reorder the eigenvectors. This means that if the highest
correlation is between the first original axis and the second
bootstrapped axis, the loadings for the second bootstrapped axis and use
to estimate the confidence interval for the original first PC axis.
4) Determine the p value for each loading. Obtained as follow: number of
loadings >=0 for loadings that were positive in the original matrix
divided by the number of boostrap samples (1000) and/or number of
loadings =<0 for loadings that were negative in the original matrix
divided by the number of boostrap samples (1000)."
(see https://stat.ethz.ch/pipermail/r-help/2005-January/065139.html ).
The suggested solution (by Jari Oksanen) was
function (x, permutations=1000, ...)
{
pcnull <- princomp(x, ...)
res <- pcnull$loadings
out <- matrix(0, nrow=nrow(res), ncol=ncol(res))
N <- nrow(x)
for (i in 1:permutations) {
pc <- princomp(x[sample(N, replace=TRUE), ], ...)
pred <- predict(pc, newdata = x)
r <- cor(pcnull$scores, pred)
k <- apply(abs(r), 2, which.max)
reve <- sign(diag(r[k,]))
sol <- pc$loadings[ ,k]
sol <- sweep(sol, 2, reve, "*")
out <- out + ifelse(res > 0, sol <= 0, sol >= 0)
}
out/permutations
}
However, in a post from March 2005 ( http://r-help.com/msg/6125.html )
Jari himself mentioned that there is a bug in this method.
I was wondering whether someone could tell me where the bug is or
whether there is a better method in R to test for significance of
loadings (not the significance of the PCs).
Maybe it is not a good idea to do it at all, but I would prefer to have
some guidline for interpretation rather than making decisions
arbitrarily. I tried to look everywhere before posting here.
I would be very thankful for any help,
Axel
--
Gravity is a habit that is hard to shake off.
Terry Pratchett
More information about the R-help
mailing list