[R-sig-hpc] socket cluster/gotoblas2 configuration confusion
Claudia Beleites
claudia.beleites at ipht-jena.de
Wed Nov 23 15:44:45 CET 2011
Dear all,
I'm just doing my first steps with parallelized calculations and got
quite confused.
Here's what I want, what I have and what I did:
- I want to parallelize calculations on a Centos server with 2 x 6 cores
and 8 GB RAM (it is actually part of a cluster, but I have access only
to this node, and the other nodes do not (yet) have R installed).
- My Data is too large to work with in one piece.
But it comes in separate files of suitable size: I can work nicely with
2 to 3 samples in memory at the same time.
- So my idea was to start up a snow socket cluster with 2 or 3 workers.
- In addition I want to use an optimized and blas. Linear algebra is
only a small part of the analysis so it does make sense to have the
socket cluster with as many workers as possible and have the linear
algebra parts use up to n / nworkers cores.
So I built R 2.14.0 using gotoblas2 and set $GOTO_NUM_THREADS to 6.
Matrix multiplication in a fresh R session now is much faster and CPU
usage shows the expected 6 cores working:
> system.time ({m <- matrix (1:9e6, 3e3); m%*%m; NULL})
User System verstrichen
5.219 0.126 1.111
However, the socket clusters seem not to use the GOTO_NUM_THREADS:
> library (snow)
> cl <- makeCluster(2,type="SOCK")
> tm <- snow.time(clusterEvalQ(cl, {m <- matrix (1:9e6, 3e3); m%*%m;
NULL}))
> tm
elapsed send receive node 1 node 2
9.553 0.001 0.010 9.510 9.543
> tm$data
[[1]]
send_start send_end recv_start recv_end exec
[1,] 0 0.001 9.511 9.512 9.51
[[2]]
send_start send_end recv_start recv_end exec
[1,] 0.001 0.001 9.544 9.553 9.543
> tm$elapsed
elapsed
9.553
>
CPU usage shows 2 cores working, and the times correspond to that.
What configuration do I need to do in order to make the blas use more
threads for the worker processes? Anything else I should do differently?
> sessionInfo ()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] snow_0.3-8
Additional questions:
- Is there some command like sessionInfo () that yields information
about the blas (particularly NUM_THREADS)?
- Is there some command that I can use to tell the blas how many threads
to use during an R session? Can I set environment variables from within
R? Searching didn't help as I got only info about R environments...)
Would that actually help here?
Thanks a lot for your help.
Claudia
--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany
email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax: +49 2641 206-399
More information about the R-sig-hpc
mailing list