[R] Compiling R with multi-threaded BLAS math libraries - why not actually ?

Sat Jun 12 15:39:13 CEST 2010

On Sat, Jun 12, 2010 at 6:18 AM, Tal Galili <tal.galili at gmail.com> wrote:
> Hello Gabor, Matt, Dirk.
>
> Thank you all for clarifying the situation.
>
> So if I understand correctly then:
> 1) Changing the BLAST would require specific BLAST per computer
> configuration (OS/chipset).

It's BLAS (Basic Linear Algebra Subroutines) not BLAST.  Normally I
wouldn't be picky like this but if you plan to use a search engine you
won't find anything helpful under BLAST.

> 2) The advantage would be available only when doing  _lots_ of linear
> algebra

You need to be working with large matrices and doing very specific
kinds of operations before the time savings of multiple threads
overcomes the communications overhead.  In fact, sometimes the
accelerated BLAS can slow down numerical linear algebra calculations,
such as sparse matrix operations.

> So I am left wondering for each item:
> 1) How do you find a "better" (e.g: more suited) BLAST for your system? (I
> am sure there are tutorials for that, but if someone here has
> a recommendation on one - it would be nice)

As Dirk has pointed out, it is a simple process.

Step 1: Install Ubuntu or some other Debian-based Linux system.
Step 2: type
sudo apt-get install r-base-core libatlas3gf-base

> 2) In what situations do we use __lots" of linear algebra?  For example, I
> have cases where I performed many linear regressions on a problem, would
> that be a case the BLAST engine be effecting?

Re-read David's posting.  The lm and glm functions do not benefit
substantially from accelerated BLAS because the underlying
computational methods only use level-1 BLAS. (David said they don't
use BLAS but that is not quite correct.  I posted a follow-up comment
describing why lm and glm don't benefit from accelerated BLAS.)

> I am trying to understand if REvolution emphasis on this is a
> marketing gimmick, or are they insisting on something that some R users
> might wish to take into account.  In which case I would, naturally (for many
> reasons), prefer to be able to tweak the native R system instead of needing
> to work with REvolution distribution.

As those who, in Duncan Murdoch's phrase, found the situation
sufficiently extreme to cause them to read the documentation, would
know, descriptions of using accelerated BLAS with R have been in the R
administration manual for years.  Admittedly it is not a
straightforward process but that is because, like so many other
things, it needs to be handled differently on each operating system.
In fact it is even worse because the procedure can be specific to the
operating system and the processor architecture and, sometimes, even
the task.  Again, re-read David's posting where he says that you
probably don't want to combine multiple MKL threads with explicit
parallel programming in R using doSMP.

David's posting (appropriately) shows very specific examples that
benefit greatly from accelerated BLAS.   Notice that these examples
incorporate very large matrices.  The first two examples involve
forming chol(crossprod(A)) where A is 10000 by 5000.  If you have very
specific structure in A this calculation might be meaningful.  In
general, it is meaningless because crossprod(A) is almost certainly
singular.  (I am vague on the details but perhaps someone who is
familiar with the distribution of singular values of matrices can
explain the theoretical results.  There is a whole field of statistics
research dealing with sparsity in the estimation of covariance
matrices that attacks exactly this "large n, large p" rank deficiency
problem.)

> Lastly, following on Matt suggestion, if any has a tutorial on the subject,
> I'd be more then glad to publish it on r-statistics/r-bloggers.
>
> Thanks again to everyone for the detailed replies.
>
> Best,
> Tal
>
>
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
>
>
>
>
> On Sat, Jun 12, 2010 at 6:01 AM, Matt Shotwell <shotwelm at musc.edu> wrote:
>
>> In the case of REvolution R, David mentioned using the Intel MKL,
>> proprietary library which may not be distributed in the way R is
>> distributed. Maybe REvolution has a license to redistribute the library.
>> For the others, I suspect Gabor has the right idea, that the R-core team
>> would rather not keep architecture dependent code in the sources,
>> although there is a very small amount already (`grep -R __asm__`).
>>
>> However, I know using Linux (Debian in particular) it is fairly
>> straightforward to build R with `enhanced' BLAS libraries. The R
>> Administration and Installation manual has a pretty good section on
>> linking with enhanced BLAS and LAPACK libs, including the Intel MKL, if
>> you are willing cough up $399, or swear not to use the library
>> commercially or academically.
>>
>> Maybe a short tutorial using free software, such as ATLAS would be
>> suitable content for an r-bloggers post :) ?
>>
>> Matt Shotwell
>> Graduate Student
>> Div. Biostatistics and Epidemiology
>> Medical University of South Carolina
>>
>> On Fri, 2010-06-11 at 19:21 -0400, Tal Galili wrote:
>> > Hello all,
>> > I came across<
>> http://www.r-bloggers.com/performance-benefits-of-linking-r-to-multithreaded-math-libraries/
>> >
>> > David
>> > Smith's new post
>> > Performance benefits of linking R to multithreaded math
>> > libraries<
>> http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html
>> >
>> > Which explains how (and why) REvolution distribution of R uses
>> > different BLAS math libraries for R, so to
>> > allow multi-threaded mathematical computation.
>> > What the post doesn't explain is why it is that native R distribution
>> > doesn't use the multi-threaded version of the libraries.  Is it because
>> > R-devel team didn't get to it yet or is it for some technical reason.
>> > Could someone please help to explain the situation?
>> >
>> > Thanks in advance,
>> > Tal
>> >
>> > p.s: I wasn't sure if to send the question here or to R-devel, I decided
>> to
>> > send it here.  If I am in the wrong - please let me know.
>> >
>> >
>> >
>> > ----------------Contact
>> > Details:-------------------------------------------------------
>> > Contact me: Tal.Galili at gmail.com |  972-52-7275845
>> > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
>> > www.r-statistics.com (English)
>> >
>> ----------------------------------------------------------------------------------------------
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>