[R-sig-hpc] Matrix multiplication

Paul Gilbert pgilbert902 at gmail.com
Wed Mar 14 17:53:53 CET 2012



On 12-03-13 09:59 PM, Simon Urbanek wrote:
>
> On Mar 13, 2012, at 3:05 PM, Paul Gilbert wrote:
>
>>
>>
>> On 12-03-13 12:50 PM, Brian G. Peterson wrote:
>>> On Tue, 2012-03-13 at 12:40 -0400, Paul Gilbert wrote:
>>>> Brian
>>>>
>>>> Thanks for spelling this out for those of us that are a bit slow.
>>>> (Newbie questions below)
>>>
>>> <... snip ...>
>>>
>>>>> So, if your BLAS does multithreaded matrix multiplication, it will use
>>>>> multiple threads 'implicitly', as Simon pointed out.
>>>>
>>>> Is there an easy way to know if the R I am using has been compiled with
>>>> multi-thread BLAS support?
>>>
>>> BLAS should be 'plug and play', as R is usually compiled to use a shared
>>> object BLAS.  As such, installing the BLAS on your machine (and
>>> appropriately configuring it) should 'just work' with te new BLAS when
>>> you restart R.
>>>
>>> Dirk et. al. wrote a paper, now a bit dated, that benchmarked some of
>>> the BLAS libraries, that should have some additional details.
>>
>> (I have a long history of getting things that should 'just work' to 'just not work'.) But I didn't really state my question very well. I'm really wondering about two related situations. How can I confirm after a change to underlying system that R is using the new configuration, and second, if I am  running benchmarks in R is there an easy way to record the underlying configuration that is being used.
>>
>
> You can check whether you're leveraging multiple cores simply via system.time:
>
>> m=matrix(rnorm(4e6),2000)
>> system.time(m %*% m)
>     user  system elapsed
>    6.860   0.020   0.584
>
> The above is clearly using threaded BLAS (here I'm using ATLAS), because
> the elapsed time is much smaller than the CPU time so it was computed in parallel.

Perhaps I am misreading something. I don't see elapse < CPU, so it does 
not seem quite as obvious as you suggest, but I certainly see the 
difference with the single-thread below.

>In contrast this is what you get using single-threaded R BLAS on the same machine:
>
>> system.time(m %*% m)
>     user  system elapsed
>   10.480   0.020  10.505
>
> It takes about 18x longer - this is a combination of the number of cores and the less optimized BLAS - and the elapsed time is greater or equal to the CPU time = single-threaded.
>
> As for recording the underlying configuration - that is not really possible in general - you have toknow what you enabled/compiled. In case of a shared BLAS implementation 
you may be able to infer that from the library name, but for static BLAS 
it is close to impossible to figure it out.

I was afraid this would be the case. It is often hard to keep track even 
when I'm compiling R myself, and I guess if you don't compile yourself 
there is not much hope of knowing what you really have.
(Food for thought when considering timing comparisons.)

Thanks,
Paul

> Cheers,
> Simon
>
>
>
>> Thanks again,
>> Paul
>>>
>>> <...snip...>
>>>
>>>>> Be aware that there can be unintended negative interactions between
>>>>> implicit and explicit parallelization.  On cluster nodes I tend to
>>>>> configure the BLAS to use only one thread to avoid resource contention
>>>>> when all cores are doing explicit parallelization.
>>>>
>>>> How do you do this? Does it need to be done when you are compiling R, or
>>>> can it be done on the fly while running R processes?
>>>
>>> Some BLAS, like gotoblas, support an environment variable to change the
>>> number of cores to be used.  This can be changed at run-time.  Others,
>>> like the mkl, are always multithreaded.  Others, like ATLAS, can be
>>> compiled in either single threaded or multi-threaded modes.
>>>
>>> So, for me, on my cluster nodes, I use a single threaded BLAS, assuming
>>> that *explicit* parallelization will be the primary driver of CPU load,
>>> and not wanting to over-commit the processor when 12 calculations each
>>> try to spawn 12 threads in the BLAS.  On other machines, I might use a
>>> multithreaded BLAS like gotoblas so that I have some flexibility (though
>>> apparently unlike Claudia, I rarely change it in practice).
>>>
>>> Regards,
>>>
>>>     - Brian
>>>
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
>



More information about the R-sig-hpc mailing list