[R] Using OpenBLAS with R

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Nov 16 11:29:04 CET 2014


On 16/11/2014 00:11, Michael Hannon wrote:
> Greetings.  I'd like to get some advice about using OpenBLAS with R, rather
> than using the BLAS that comes built in to R.

That was really a topic for the R-devel list: see the posting guide.

> I've tried this on my Fedora 20 system (see the appended for details).  I ran
> a simple test -- multiplying two large matrices -- and the results were very
> impressive, i.e., in favor of OpenBLAS, which is consistent with discussions
> I've seen on the web.

If that is all you do, then you should be using an optimized BLAS, and 
choose the one(s) best for your (unstated) machine(s).

> My concern is that maybe this is too good to be true.  I.e., the standard R
> configuration is vetted by thousands of people every day.  Can I have the same
> degree of confidence with OpenBLAS that I have in the built-in version?

No.  And it is 'too good to be true' for most users of R, for whom BLAS 
operations take a negligible proportion of their CPU time.

> And/or are there other caveats to using OpenBLAS of which I should be aware?

Yes: see the 'R Installation and Administration Manual'.  Known issues 
include:

1) Optimized BLAS trade accuracy for speed.   Surprisingly much 
published R code relies on using extended-precision FPU registers for 
intermediate results, which optimized BLAS do much less than the 
reference BLAS.

Some packages rely on a particular sign of the solution to svd or eigen 
problems: people then report as bugs that optimized BLAS give a 
different sign from the reference BLAS.

2) Fast BLAS normally use multi-threading: that usually helps elapsed 
time for a single R task at the expense of increased total CPU time. 
Fine if you have unused CPU cores, but not advantageous in a fully-used 
multi-core machine, e.g. one that is doing many R sessions in parallel.

3) Many BLAS optimize their use of CPU caches.  This works best if the 
BLAS-using process is the only task running on a particular core (or CPU 
where CPU cores share cache).  (It also means that optimizing on one CPU 
model and running on another can be disastrous.)


>
> Thanks.
>
> -- Mike
>
> #### Here's the version of R, compiled locally with configuration options:
> #### ./configure --enable-R-shlib --enable-BLAS-shlib
>
> $ R
>
> R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
> Copyright (C) 2014 The R Foundation for Statistical Computing
> Platform: x86_64-unknown-linux-gnu (64-bit)
> .
> .
> .
>
> #### Here's the R source code for this little test:
>
> library(microbenchmark)
>
> mSize <- 10000
> set.seed(42)
>
> aMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
> bMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
>
> cMat <- aMat %*% bMat  ## do the calculation once to see that it works
>
> traceCMat <- sum(diag(cMat))  ## a mild sanity check on the calculation
> traceCMat
>
> microbenchmark(aMat %*% bMat, times=5L)  ## repeat a few more times
>
> -----
>
> #### Here is the output from code, running under various conditions:
>
>> traceCMat ###### Using the built-in BLAS from R
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
>            expr      min       lq     mean   median       uq     max neval
>   aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662     5
>
> ----------
>
>> traceCMat  ###### Using libopenblas.so from Fedora
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
>            expr      min       lq     mean   median       uq      max neval
>   aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475     5
>>
>
> ----------
>
>> traceCMat <- sum(diag(cMat))  ###### libopenblas.so from Fedora with
>> traceCMat                     ###### export OMP_NUM_THREADS=6
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
>            expr      min       lq    mean   median       uq      max neval
>   aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866     5
>>
>
> ###### Fedora libopenblas.so appears to be single-threaded
>
> ----------
>
>> traceCMat <- sum(diag(cMat))  ###### libopenblas.so compiled locally
>> traceCMat                     ###### from source w/OMP_NUM_THREADS=6
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
>            expr      min       lq     mean   median       uq      max neval
>   aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705     5
>>
>
> ###### Locally-compiled openblas appears to be multi-threaded
> ###### The microbenchmark appeared to use all 8 processors, even
> ###### though I asked for only 6.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK



More information about the R-help mailing list