[R] Using OpenBLAS with R
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sun Nov 16 11:29:04 CET 2014
On 16/11/2014 00:11, Michael Hannon wrote:
> Greetings. I'd like to get some advice about using OpenBLAS with R, rather
> than using the BLAS that comes built in to R.
That was really a topic for the R-devel list: see the posting guide.
> I've tried this on my Fedora 20 system (see the appended for details). I ran
> a simple test -- multiplying two large matrices -- and the results were very
> impressive, i.e., in favor of OpenBLAS, which is consistent with discussions
> I've seen on the web.
If that is all you do, then you should be using an optimized BLAS, and
choose the one(s) best for your (unstated) machine(s).
> My concern is that maybe this is too good to be true. I.e., the standard R
> configuration is vetted by thousands of people every day. Can I have the same
> degree of confidence with OpenBLAS that I have in the built-in version?
No. And it is 'too good to be true' for most users of R, for whom BLAS
operations take a negligible proportion of their CPU time.
> And/or are there other caveats to using OpenBLAS of which I should be aware?
Yes: see the 'R Installation and Administration Manual'. Known issues
include:
1) Optimized BLAS trade accuracy for speed. Surprisingly much
published R code relies on using extended-precision FPU registers for
intermediate results, which optimized BLAS do much less than the
reference BLAS.
Some packages rely on a particular sign of the solution to svd or eigen
problems: people then report as bugs that optimized BLAS give a
different sign from the reference BLAS.
2) Fast BLAS normally use multi-threading: that usually helps elapsed
time for a single R task at the expense of increased total CPU time.
Fine if you have unused CPU cores, but not advantageous in a fully-used
multi-core machine, e.g. one that is doing many R sessions in parallel.
3) Many BLAS optimize their use of CPU caches. This works best if the
BLAS-using process is the only task running on a particular core (or CPU
where CPU cores share cache). (It also means that optimizing on one CPU
model and running on another can be disastrous.)
>
> Thanks.
>
> -- Mike
>
> #### Here's the version of R, compiled locally with configuration options:
> #### ./configure --enable-R-shlib --enable-BLAS-shlib
>
> $ R
>
> R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
> Copyright (C) 2014 The R Foundation for Statistical Computing
> Platform: x86_64-unknown-linux-gnu (64-bit)
> .
> .
> .
>
> #### Here's the R source code for this little test:
>
> library(microbenchmark)
>
> mSize <- 10000
> set.seed(42)
>
> aMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
> bMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
>
> cMat <- aMat %*% bMat ## do the calculation once to see that it works
>
> traceCMat <- sum(diag(cMat)) ## a mild sanity check on the calculation
> traceCMat
>
> microbenchmark(aMat %*% bMat, times=5L) ## repeat a few more times
>
> -----
>
> #### Here is the output from code, running under various conditions:
>
>> traceCMat ###### Using the built-in BLAS from R
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
> expr min lq mean median uq max neval
> aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662 5
>
> ----------
>
>> traceCMat ###### Using libopenblas.so from Fedora
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
> expr min lq mean median uq max neval
> aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475 5
>>
>
> ----------
>
>> traceCMat <- sum(diag(cMat)) ###### libopenblas.so from Fedora with
>> traceCMat ###### export OMP_NUM_THREADS=6
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
> expr min lq mean median uq max neval
> aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866 5
>>
>
> ###### Fedora libopenblas.so appears to be single-threaded
>
> ----------
>
>> traceCMat <- sum(diag(cMat)) ###### libopenblas.so compiled locally
>> traceCMat ###### from source w/OMP_NUM_THREADS=6
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
> expr min lq mean median uq max neval
> aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705 5
>>
>
> ###### Locally-compiled openblas appears to be multi-threaded
> ###### The microbenchmark appeared to use all 8 processors, even
> ###### though I asked for only 6.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK
More information about the R-help
mailing list