[R] Bootstrapped eigenvector

Mon Jan 31 14:42:56 CET 2005

Jérôme,

On Sat, 2005-01-29 at 14:14 -0500, Jérôme Lemaître wrote:
> Hello alls,
> 
> I found in the literature a technique that has been evaluated as one of the
> more robust to assess statistically the significance of the loadings in a
> PCA: bootstrapping the eigenvector (Jackson, Ecology 1993, 74: 2204-2214;
> Peres-Neto and al. 2003. Ecology 84:2347-2363). However, I'm not able to
> transform by myself the following steps into a R program, yet?
> 
> Can someone could help me with this?
> 
> I thank you very much by advance.
> 
> Here are the steps that I need to perform:
> 1) Resample 1000 times with replacement entire raws from the original data
> sets (7 variables, 126 raws)
> 2) Conduct a PCA on each bootstrapped sample
> 3) To prevent axis reflexion and/or axis reordering in the bootstrap, here
> are two more steps for each bootstrapped sample
> 3a) calculate correlation matrix between the PCA scores of the original and
> those of the bootstrapped sample
> 3b) Examine whether the highest absolute correlation is between the
> corresponding axis for the original and bootstrapped samples. When it is not
> the case, reorder the eigenvectors. This means that if the highest
> correlation is between the first original axis and the second bootstrapped
> axis, the loadings for the second bootstrapped axis and use to estimate the
> confidence interval for the original first PC axis.
> 4) Determine the p value for each loading. Obtained as follow: number of
> loadings >=0 for loadings that were positive in the original matrix divided
> by the number of boostrap samples (1000) and/or number of loadings =<0 for
> loadings that were negative in the original matrix divided by the number of
> boostrap samples (1000).
> 
The following function seems to run the analysis like Peres-Neto and
others defined:

> netoboot
function (x, permutations=1000, ...)
{
   pcnull <- princomp(x, ...)
   res <- pcnull$loadings
   out <- matrix(0, nrow=nrow(res), ncol=ncol(res))
   N <- nrow(x)
   for (i in 1:permutations) {
       pc <- princomp(x[sample(N, replace=TRUE), ], ...)
       pred <- predict(pc, newdata = x)
       r <-  cor(pcnull$scores, pred)
       k <- apply(abs(r), 2, which.max)
       reve <- sign(diag(r[k,]))
       sol <- pc$loadings[ ,k]
       sol <- sweep(sol, 2, reve, "*")
       out <- out + ifelse(res > 0, sol <=  0, sol >= 0)
   }
   out/permutations
}

With typical chemical data, you should pass option cor = TRUE to
princomp. Another issue is whether you should use this method. Opinions
may be divided here, but I'll let that to the proper Statistician to
comment on.

Best wishes, Jari Oksanen

-- 
Jari Oksanen <jarioksa at sun3.oulu.fi>