[Rd] More efficient code?

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Jul 6 20:54:49 CEST 2005


Dear List,

I have some code (as part of a larger package I intend to submit to
CRAN) that implements a permutation test of a method known as Co-
correspondence Analysis [1]. The code has to perform n permutations
(default n = 99) for each axis of the method. In the example below,
there are 29 axes that need testing, so c. 2800 permutations. The code
uses a number of calls to LA.svd() and "%*%". The particular function
that is called once per permutation is:

coinertiaI <- function(X, Y)
  {
    A <- t(X) %*% Y
    svdA <- La.svd(A)
    Ksi <- X %*% svdA$u
    Psi <- Y %*% t(svdA$vt)
    L <- diag(svdA$d)^2
    retval <- list(weights = list(X = svdA$u, Y = t(svdA$vt)),
                   scores = list(X = Ksi, Y = Psi),
                   lambda = L, call = match.call())
    class(retval) <- c("coinertiaI", "coinertia")
    return(retval)
  }

On my P4 3GHz, 2GB RAM desktop the following problem takes c. 80 seconds
to complete with 99 permutations for each of 29 axis:

beetles <-  read.csv(url
("http://ecrc3.geog.ucl.ac.uk/download/cocorresp/beetles.csv"),
                     strip.white = TRUE, row.names = 1)
beetles <- log(beetles + 1)
plants <- read.csv(url
("http://ecrc3.geog.ucl.ac.uk/download/cocorresp/plants.csv"),
                   strip.white = TRUE, row.names = 1)
source(url
("http://ecrc3.geog.ucl.ac.uk/download/cocorresp/cocorresp.R"))
beetlesPlants.pred <- predcoca(beetles, plants)
# slow!!!!
system.time(beetlesPlants.perm <- predcoca.perm(beetlesPlants.pred),
TRUE)

I would like to speed up this function if feasible given that I
currently know almost zero C and Fortran. I am willing to invest the
time to learn and if it would make a substantial difference to the
timings for this example then this may the be the right time (and
project) to begin that education.

I have profiled the timed task above (results abbreviated below)
If the full Rprof.out file helps then I have placed this here:
http://ecrc3.geog.ucl.ac.uk/download/cocorresp/Rprof.out

...Which to my naive eye would suggest that a large % of the time is
spent calling compiled code - setting up the calls. As such, if I want
to make any decent speed improvements I would have to look into moving
the functions coinertiaI() and teststat() to C or Fortran (teststat()
can be viewed in the file I source above if needed).

Is this interpretation correct?

If anyone is brave enough to take a look at the sourced file (see above)
and suggest how I might improve my code I would, of course be extremely
grateful - but that is not my motivation for posting. I would be more
than happy to have my interpretation of the profiling results confirmed
or corrected.

Many thanks in advance,

Gav

Ref [1] C.J. ter Braak and A.P. Schaffers (2004) Cocorrespondence
Analysis: a new ordination method to relate two community compositions.
Ecology 85(3), 834-846.

summaryRprof(filename = "Rprof.out")
$by.self
                     self.time self.pct total.time total.pct
".Call"                  50.40     59.6      50.40      59.6
"%*%"                    10.50     12.4      10.50      12.4
".Fortran"                2.78      3.3       2.78       3.3
"La.svd"                  2.52      3.0      57.98      68.6
"^"                       2.48      2.9       2.48       2.9
"t.default"               2.30      2.7       2.30       2.7
"matrix"                  1.92      2.3       1.98       2.3
"as.double"               1.78      2.1       2.70       3.2
"list"                    0.88      1.0       0.88       1.0
"as.double.default"       0.86      1.0       0.86       1.0
"rep.default"             0.74      0.9       0.86       1.0
"sum"                     0.72      0.9       2.58       3.1
"!"                       0.50      0.6       0.50       0.6
"diag"                    0.44      0.5       1.46       1.7
"qr.coef"                 0.42      0.5       6.26       7.4
"permtest"                0.38      0.4      83.60      98.9
"is.finite"               0.38      0.4       0.38       0.4
"storage.mode<-"          0.36      0.4       3.26       3.9
"as.integer"              0.34      0.4       0.40       0.5
"t"                       0.32      0.4       2.62       3.1
"$"                       0.32      0.4       0.32       0.4
"paste"                   0.28      0.3       0.54       0.6
"teststat"                0.24      0.3      83.10      98.3
"residualMatrix"          0.22      0.3      10.32      12.2
"qr"                      0.20      0.2       1.24       1.5
...

$by.total
                     total.time total.pct self.time self.pct
"predcoca.perm"           84.50     100.0      0.00      0.0
"permtest"                83.60      98.9      0.38      0.4
"teststat"                83.10      98.3      0.24      0.3
"coinertiaI"              73.24      86.7      0.18      0.2
"La.svd"                  57.98      68.6      2.52      3.0
".Call"                   50.40      59.6     50.40     59.6
"%*%"                     10.50      12.4     10.50     12.4
"residualMatrix"          10.32      12.2      0.22      0.3
"qr.coef"                  6.26       7.4      0.42      0.5
"storage.mode<-"           3.26       3.9      0.36      0.4
".Fortran"                 2.78       3.3      2.78      3.3
"as.double"                2.70       3.2      1.78      2.1
"t"                        2.62       3.1      0.32      0.4
"eval"                     2.62       3.1      0.14      0.2
"sum"                      2.58       3.1      0.72      0.9
...

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [T] +44 (0)20 7679 5522
ENSIS Research Fellow             [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC                 [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-devel mailing list