[R-sig-hpc] Matrix multiplication

Simon Urbanek simon.urbanek at r-project.org
Wed Mar 14 02:59:24 CET 2012


On Mar 13, 2012, at 3:05 PM, Paul Gilbert wrote:

> 
> 
> On 12-03-13 12:50 PM, Brian G. Peterson wrote:
>> On Tue, 2012-03-13 at 12:40 -0400, Paul Gilbert wrote:
>>> Brian
>>> 
>>> Thanks for spelling this out for those of us that are a bit slow.
>>> (Newbie questions below)
>> 
>> <... snip ...>
>> 
>>>> So, if your BLAS does multithreaded matrix multiplication, it will use
>>>> multiple threads 'implicitly', as Simon pointed out.
>>> 
>>> Is there an easy way to know if the R I am using has been compiled with
>>> multi-thread BLAS support?
>> 
>> BLAS should be 'plug and play', as R is usually compiled to use a shared
>> object BLAS.  As such, installing the BLAS on your machine (and
>> appropriately configuring it) should 'just work' with te new BLAS when
>> you restart R.
>> 
>> Dirk et. al. wrote a paper, now a bit dated, that benchmarked some of
>> the BLAS libraries, that should have some additional details.
> 
> (I have a long history of getting things that should 'just work' to 'just not work'.) But I didn't really state my question very well. I'm really wondering about two related situations. How can I confirm after a change to underlying system that R is using the new configuration, and second, if I am  running benchmarks in R is there an easy way to record the underlying configuration that is being used.
> 

You can check whether you're leveraging multiple cores simply via system.time:

> m=matrix(rnorm(4e6),2000)
> system.time(m %*% m)
   user  system elapsed 
  6.860   0.020   0.584 

The above is clearly using threaded BLAS (here I'm using ATLAS), because the elapsed time is much smaller than the CPU time so it was computed in parallel. In contrast this is what you get using single-threaded R BLAS on the same machine:

> system.time(m %*% m)
   user  system elapsed 
 10.480   0.020  10.505 

It takes about 18x longer - this is a combination of the number of cores and the less optimized BLAS - and the elapsed time is greater or equal to the CPU time = single-threaded.

As for recording the underlying configuration - that is not really possible in general - you have to know what you enabled/compiled. In case of a shared BLAS implementation you may be able to infer that from the library name, but for static BLAS it is close to impossible to figure it out.

Cheers,
Simon



> Thanks again,
> Paul
>> 
>> <...snip...>
>> 
>>>> Be aware that there can be unintended negative interactions between
>>>> implicit and explicit parallelization.  On cluster nodes I tend to
>>>> configure the BLAS to use only one thread to avoid resource contention
>>>> when all cores are doing explicit parallelization.
>>> 
>>> How do you do this? Does it need to be done when you are compiling R, or
>>> can it be done on the fly while running R processes?
>> 
>> Some BLAS, like gotoblas, support an environment variable to change the
>> number of cores to be used.  This can be changed at run-time.  Others,
>> like the mkl, are always multithreaded.  Others, like ATLAS, can be
>> compiled in either single threaded or multi-threaded modes.
>> 
>> So, for me, on my cluster nodes, I use a single threaded BLAS, assuming
>> that *explicit* parallelization will be the primary driver of CPU load,
>> and not wanting to over-commit the processor when 12 calculations each
>> try to spawn 12 threads in the BLAS.  On other machines, I might use a
>> multithreaded BLAS like gotoblas so that I have some flexibility (though
>> apparently unlike Claudia, I rarely change it in practice).
>> 
>> Regards,
>> 
>>    - Brian
>> 
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
> 



More information about the R-sig-hpc mailing list