[R-sig-hpc] Overhead in computation of SVD

Dirk Eddelbuettel edd at debian.org
Sat Jun 4 17:38:29 CEST 2011


Hi Gappy,

On 4 June 2011 at 10:39, Giuseppe Paleologo wrote:
| R uses LAPACK for svd computation, but it is my understanding from a post on
| Radford Neal's blog (
| http://radfordneal.wordpress.com/2011/05/21/slowing-down-matrix-multiplication-in-r/#more-739),
| that additional checks are performed to propagate NA/NaNs and Inf, (at least
| on level-3 BLAS calls, not sure LAPACK). I have to run a large number of
| svds on large dense matrices, and am looking for efficient execution of

How large is large?  

Would a transfer to a GPU be worth it?  My initial explorations in the gcbd
package/vignette (using the gputools package) showed that there is a
crossover point, but it requires 'large' data.  And it will also depend
somewhat on which BLAS implementation this would replace etc pp. This is a
fast moving target and new solutions appear --- check e.g. this (commercial)
offering of drop-in BLAS replacements using the GPU: http://www.culatools.com.
The single precision version is free, double precisionis not.

And what do you actually need?  It seems you can truncate the number of
eigenvectors to just a few, so speed gains can be had (as per the help page
of svd() in R itself) and also via irlba (more below).

| these decompositions. I think a truncated svd would be acceptable, and R
| does not have a base routine for this. My questions:
| 
| 1. when there are no NAs, Is there a benefit in binding directly to DGESVD

I would call that an empirical issue, and would suggest you try it on a few
different matrix sizes and approaches.

| 2. has anyone tried packages svd (uses PROPACK routines) and irlba?

Jeff already pointed you to the irlba docs. Jeff and I are true fanboys of
Bryan Lewis who has been working on this for a while now.  Givem that irlba
is now on CRAN, nothing stops you from including this in the empirical tests
suggested above.  We are all eager to hear about timing comparisons...

If however you need all eigenvectors then maybe the gain _may_ be less
drastic. It all depends.

Lastly, I have a lot of respect for Radford Neal and his critique, but I feel
he forgets what John Chambers (in Chapter 1 of his "Software for Data
Analysis") called the 'prime directive': "trustworthy software".  R has no
choice but to test for NAs so it is a little unfair to bemoan the fact that
it is rigorous about checking and validating inputs.  You cannot have your
cake ("fastest possible R") and eat it too ("general purpose data programming
environment").  We have seen in some Rcpp benchmarks that skipping tests for
NAs etc can increase speed. That said, I think Knuth's "root of all evil"
comment is important too here.  But if you can be sure your matrices are all
good then skipping the tests may be an alternative for a specific solution.

Cheers, Dirk


-- 
Gauss once played himself in a zero-sum game and won $50.
                      -- #11 at http://www.gaussfacts.com



More information about the R-sig-hpc mailing list