[R] efficient code - yet another question
Kevin Wright
kw.statr at gmail.com
Mon Sep 29 19:12:04 CEST 2008
You should also have a look at the pcaMethods package (on
Bioconductor). I did some code optimization for the NIPALS algorithm
in that package which sped up the algorithm by at least a factor of
two.
Kevin Wright
On Wed, Apr 30, 2008 at 10:56 PM, steven wilson <swpt07 at gmail.com> wrote:
> Dear list members;
>
> The code given below corresponds to the PCA-NIPALS (principal
> component analysis) algorithm adapted from the nipals function in the
> package chemometrics. The reason for using NIPALS instead of SVD is
> the ability of this algorithm to handle missing values, but that's a
> different story. I've been trying to find a way to improve (if
> possible) the efficiency of the code, especially when the number of
> components to calculate is higher than 100. I've been working with a
> data set of 500 rows x 700 variables. The code gets really slow when
> the number of PC to calculate is for example 600 (why that number of
> components?....because I need them to detect outliers using another
> algorithm). In my old laptop running Win XP and R 2.7.0 (1GB RAM) it
> takes around 6 or 7 minutes. That shouldn't be a problem for one
> analysis, but when cross-validation is added the time increases
> severely.....Although there are several examples on the R help list
> giving some with 'hints' to improve effciency the truth is that I
> don't know (or I can't see it) the part of the code that can be
> improved (but it is clear that the bottle neck is the for and while
> loops). I would really appreciate any ideas/comments/etc...
>
> Thanks
>
> #################################################################
>
> pca.nipals <- function(X, ncomp, iter = 50, toler = sqrt(.Machine$double.eps))
> # X...data matrix, ncomp...number of components,
> # iter...maximal number of iterations per component,
> # toler...precision tolerance for calculation of components
> {
>
> Xh <- scale(X, center = TRUE, scale = FALSE)
> nr <- 0
> T <- NULL; P <- NULL # matrix of scores and loadings
> for (h in 1:ncomp)
> {
> th <- Xh[, 1]
> ende <- FALSE
> while (!ende)
> {
> nr <- nr + 1
> ph <- t(crossprod(th, Xh) * as.vector(1 /
> crossprod(th)))
> ph <- ph * as.vector(1/sqrt(crossprod(ph)))
> thnew <- t(tcrossprod(t(ph), Xh) *
> as.vector(1/(crossprod(ph))))
> prec <- crossprod(th-thnew)
> th <- thnew
> if (prec <= (tol^2)) ende <- TRUE
> if (it <= nr) ende <- TRUE # didn't converge
> }
>
> Xh <- Xh - tcrossprod(th)
> T <- cbind(T, th); P <- cbind(P, ph)
> nr <- 0
> }
> list(T = T, P = P)
> }
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list