[R] bootstrapped eigenvector method following prcomp
Gavin Simpson
gavin.simpson at ucl.ac.uk
Mon Jan 19 10:42:36 CET 2009
Hi Alex,
I presume you've asked Jari what the bug is? He obviously knows and you
are asking a lot for the rest of us to i) know the method you speak of
and ii) debug the code of another.
As an alternative, try function testdim() in the ade4 package. This
implements the method of Dray:
Dray, S (2008) On the number of principal components: A test of
dimensionality based on measurements of similarity between matrices.
Computational Statistics and Data Analysis, 58:2228:2237.
G
On Mon, 2009-01-19 at 09:53 +0100, Axel Strauß wrote:
> G'Day R users!
>
> Following an ordination using prcomp, I'd like to test which variables
> singnificantly contribute to a principal component. There is a method
> suggested by Peres-Neto and al. 2003. Ecology 84:2347-2363 called
> "bootstrapped eigenvector". It was asked for that in this forum in
> January 2005 by Jérôme Lemaître:
> "1) Resample 1000 times with replacement entire raws from the original
> data sets []
> 2) Conduct a PCA on each bootstrapped sample
> 3) To prevent axis reflexion and/or axis reordering in the bootstrap,
> here are two more steps for each bootstrapped sample
> 3a) calculate correlation matrix between the PCA scores of the original
> and those of the bootstrapped sample
> 3b) Examine whether the highest absolute correlation is between the
> corresponding axis for the original and bootstrapped samples. When it is
> not the case, reorder the eigenvectors. This means that if the highest
> correlation is between the first original axis and the second
> bootstrapped axis, the loadings for the second bootstrapped axis and use
> to estimate the confidence interval for the original first PC axis.
> 4) Determine the p value for each loading. Obtained as follow: number of
> loadings >=0 for loadings that were positive in the original matrix
> divided by the number of boostrap samples (1000) and/or number of
> loadings =<0 for loadings that were negative in the original matrix
> divided by the number of boostrap samples (1000)."
>
> (see https://stat.ethz.ch/pipermail/r-help/2005-January/065139.html ).
>
> The suggested solution (by Jari Oksanen) was
>
>
> function (x, permutations=1000, ...)
> {
> pcnull <- princomp(x, ...)
> res <- pcnull$loadings
> out <- matrix(0, nrow=nrow(res), ncol=ncol(res))
> N <- nrow(x)
> for (i in 1:permutations) {
> pc <- princomp(x[sample(N, replace=TRUE), ], ...)
> pred <- predict(pc, newdata = x)
> r <- cor(pcnull$scores, pred)
> k <- apply(abs(r), 2, which.max)
> reve <- sign(diag(r[k,]))
> sol <- pc$loadings[ ,k]
> sol <- sweep(sol, 2, reve, "*")
> out <- out + ifelse(res > 0, sol <= 0, sol >= 0)
> }
> out/permutations
> }
>
> However, in a post from March 2005 ( http://r-help.com/msg/6125.html )
> Jari himself mentioned that there is a bug in this method.
>
> I was wondering whether someone could tell me where the bug is or
> whether there is a better method in R to test for significance of
> loadings (not the significance of the PCs).
> Maybe it is not a good idea to do it at all, but I would prefer to have
> some guidline for interpretation rather than making decisions
> arbitrarily. I tried to look everywhere before posting here.
>
> I would be very thankful for any help,
>
> Axel
>
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090119/745bc1fe/attachment-0002.bin>
More information about the R-help
mailing list