[R-sig-hpc] Matrix multiplication

Paul Gilbert pgilbert902 at gmail.com
Wed Mar 14 18:09:14 CET 2012



On 12-03-14 10:47 AM, Claudia Beleites wrote:
>>   On other machines, I might use a
>> multithreaded BLAS like gotoblas so that I have some flexibility (though
>> apparently unlike Claudia, I rarely change it in practice).
>
> :-) Yes, I do change it in practice, because I have steps where I use
> explicit parallelization via multicore or snow and I switch between the
> 3 different parallel computation types. Our server has 2 hex-core CPUs
> but only 8 GB RAM. The spectroscopic data analysis I use usually isn't
> really hard computationally, but the data sets are often uncomfortably
> large for the server. With explicit parallelization RAM often restricts
> me to 2 or 3 threads.
>
> Here's what I observe and why I switch back and forth:
>
> If the calculation is implicitly parallel with the optimized BLAS,
> that's the way to go. Easiest on RAM, fast, no whatsoever coding effort.
> Just lean back and enjoy seeing all cores hard at work.
> There are functions like %*% and (t)crossprod that use all 12 cores (or
> whatever I restrict NUM_GOTO_THREADS to).
>
> Other functions, e.g. loess () seem never to use more than all 6 cores
> of one CPU. For these, I'm better off with explicit parallelization with
> 2 snow nodes and NUM_GOTO_THREADS = 6 (I have to execute taskset on each
> node). However, snow (and multicore) need more RAM as the data must be
> loaded in each node. That would mean e.g. NUM_GOTO_THREADS = 11 (to
> leave an "alibi-core" for my colleague) in the main R session, and e.g.
> 2 nodes with NUM_GOTO_THREADS = 6 or 3 nodes with NUM_GOTO_THREADS = 4.

How does this work?  I can imagine problems where I could use 
Sys.setenv() within an R function, to speed up different parts of a 
calculation in different ways,  but if goto is reading an environment 
variable everytime it does a calculation, that would slow it down a 
whole lot.

Thanks,
Paul
>
> Multicore doesn't make use of the implicit parallelization of the BLAS.
> But it is easier to use than snow: no cluster set up required, no hassle
> with exporting all variables, etc.
> So, if the function anyways doesn't have any implicit parallelization, I
> just change lapply to mclapply, and that's it.
>
> Best,
>
> Claudia
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



More information about the R-sig-hpc mailing list