[R] Speed up sum of outer products?

AjayT ajaytalati at googlemail.com
Mon Mar 14 22:37:06 CET 2011


Hi Dennis, sorry for the delayed reply and thanks for the article. I digged
into it and found that if you have a GPU, the CUBLAS library beats the
BLAS/ATLAS implementation in the Matrix package for 'large' problems. Here's
what I mean,

its = 2500
dim = 1750

X = matrix(rnorm(its*dim),its, dim) 

system.time({C=matrix(0, dim, dim);for(i in 1:its)C = C + (X[i,] %o%
X[i,])}) # single thread breakup calculation 
system.time({C1 = t(X) %*% X})                                               
# single thread - BLAS matrix mult
system.time({C2 = crossprod(X)})                                              
# single thread - BLAS matrix mult
library(gputools)
system.time({C3 = gpuCrossprod(X, X)})                                     
# multithread - CUBLAS cublasSgemm function
print(all.equal(C,C1,C2,C3))
   user  system elapsed 
 27.210   6.680  33.342 
   user  system elapsed 
  6.260   0.000   5.982 
   user  system elapsed 
  4.340   0.000   4.284 
   user  system elapsed 
   1.49    0.00    1.48 
[1] TRUE

The last line shows a x3 speed up, using my dated graphics card which has 16
cores, compared to my cpu which is a quad core. I should be able to try this
out on a 512 core card in the next few days, and will post the result.

All the best,

Aj 

--
View this message in context: http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3355139.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list