[R] Speed up sum of outer products?
AjayT
ajaytalati at googlemail.com
Mon Mar 14 22:37:06 CET 2011
Hi Dennis, sorry for the delayed reply and thanks for the article. I digged
into it and found that if you have a GPU, the CUBLAS library beats the
BLAS/ATLAS implementation in the Matrix package for 'large' problems. Here's
what I mean,
its = 2500
dim = 1750
X = matrix(rnorm(its*dim),its, dim)
system.time({C=matrix(0, dim, dim);for(i in 1:its)C = C + (X[i,] %o%
X[i,])}) # single thread breakup calculation
system.time({C1 = t(X) %*% X})
# single thread - BLAS matrix mult
system.time({C2 = crossprod(X)})
# single thread - BLAS matrix mult
library(gputools)
system.time({C3 = gpuCrossprod(X, X)})
# multithread - CUBLAS cublasSgemm function
print(all.equal(C,C1,C2,C3))
user system elapsed
27.210 6.680 33.342
user system elapsed
6.260 0.000 5.982
user system elapsed
4.340 0.000 4.284
user system elapsed
1.49 0.00 1.48
[1] TRUE
The last line shows a x3 speed up, using my dated graphics card which has 16
cores, compared to my cpu which is a quad core. I should be able to try this
out on a 512 core card in the next few days, and will post the result.
All the best,
Aj
--
View this message in context: http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3355139.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list