[Rd] svd for Large Matrix

Radford Neal r@d|ord @end|ng |rom c@@toronto@edu
Mon Aug 16 17:30:32 CEST 2021


> Dario Strbenac <dstr7320 using uni.sydney.edu.au> writes:
>
> I have a real scenario involving 45 million biological cells
> (samples) and 60 proteins (variables) which leads to a segmentation
> fault for svd. I thought this might be a good example of why it
> might benefit from a long vector upgrade.

Rather than the full SVD of a 45000000x60 X, my guess is that you
may really only be interested in the eigenvalues and eigenvectors of
X^T X, in which case eigen(t(X)%*%X) would probably be much faster.
(And eigen(crossprod(X)) would be even faster.)

Note that if you instead want the eigenvalues and eigenvectors of
X X^T (which is an enormous matrix), the eigenvalues of this are the
same as those of X^T X, and the eigenvectors are Xv, where v is an
eigenvector of X^T X.

For example, with R 4.0.2, and the reference BLAS/LAPACK, I get

  > X<-matrix(rnorm(100000),10000,10)
  > system.time(for(i in 1:1000) rs<-svd(X))
     user  system elapsed
    2.393   0.008   2.403
  > system.time(for(i in 1:1000) re<-eigen(crossprod(X)))
     user  system elapsed
    0.609   0.000   0.609
  > rs$d^2
   [1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566  9931.538
   [8]  9813.841  9703.818  9598.532
  > re$values
   [1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566  9931.538
   [8]  9813.841  9703.818  9598.532

Possibly some other LAPACK might implement svd better, though I
suspect that R will allocate more big matrices than really necessary
for the svd even aside from whatever LAPACK is doing.

Regards,

   Radford Neal



More information about the R-devel mailing list