[Rd] Can you share a working example of R program aided by fast BLAS?
Paul Johnson
pauljohn32 at gmail.com
Thu Aug 19 20:49:19 CEST 2010
Can one of you give me an R program that displays the benefits an
accelerated BLAS in R?
Here's why I ask, in case you wonder:
In a linux cluster, I've hit some bumps in the road. The worst one by
far was that I installed R, then GotoBLAS2 with default settings, and
after that, jobs using Rmpi were *really* *really* slow. I mean
horrible. If a job took 15 minutes when run by itself, outside of
MPI, it took 1 full day when run inside MPI. Literally the same job.
I learned later that GotoBLAS2 defaults to allow threads equal to the
number of cores, and that the threads are not compatible with MPI.
This latter point not clearly stated in the GotoBLAS2 documents, so
far as I can find, but after I realized that was the problem, I did
find one other cluster website that mentioned the same problem. "If
your application uses GotoBLAS and all cores as MPI threads, setting
GOTO_NUM_THREADS larger than one will usually result in drastically
slower performance."
(http://hpc.uark.edu/hpc/support/software/numerical.html#gotoblas).
In the GotoBLAS2 documentation, it warns of weird thread related
delays, but it implies that the slowdown--if it happens--is a result
of bad user code, rather than this more fundamental mismatch between
OpenMPI (or MPI in general) and GotoBLAS2.
In the process of diagnosing the big slowdown, I've been making many
time comparisons. When I installed GotoBLAS2 in the first place, it
was because so many people (and the R admin manual) said that R's
ordinary BLAS is rudimentary/slow. In the test cases I've tried, R's
BLAS is not that bad. In fact, in the test programs we run, the time
is not substantially different with GotoBLAS2 and R's BLAS. I also
compared the Intel Kernel Math Library BLAS and didn't notice a huge
difference.
So, well, I think that means I'm running bad test cases for R and GotoBLAS2.
Oh, and one more thing. I have not been able to find an example R
program that benefitted at all from allowing threads > 1 in GotoBLAS2
environment settings. In fact, if a one-thread job takes15 minutes,
the one that allows 2 or more threads is 21 minutes. And the more
threads allowed causes a job to take longer. This is literally the
same job, same cluster node, the only difference is changing the
environment variable that adjusts the GotoBLAS2 threads allowed.
So if you know whether your example depends on threads or not, I would
appreciate the warning.
pj
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
More information about the R-devel
mailing list