[R-sig-hpc] socket cluster/gotoblas2 configuration confusion
Claudia Beleites
claudia.beleites at ipht-jena.de
Wed Nov 23 17:09:01 CET 2011
Steve,
> You don't say how you set GOTO_NUM_THREADS to 6.
sorry, I forgot to tell you all:
I used
export GOTO_NUM_THREADS=6
in the shell before starting R.
and I did check by
> clusterEvalQ(cl, system ("echo $GOTO_NUM_THREADS"))
which gave me 6 for both workers.
so does:
> clusterEvalQ(cl, Sys.getenv('GOTO_NUM_THREADS'))
[[1]]
[1] "6"
[[2]]
[1] "6"
I did not know the Sys.getenv/Sys.setenv functions, though.
Thanks,
Claudia
> You
> might want to verify that it did get set in each of the snow worker
> processes by using the command:
>
> clusterEvalQ(cl, Sys.getenv('GOTO_NUM_THREADS'))
>
> If it returns any empty strings in the resulting list, then the
> environment variable is not set in the corresponding worker.
>
> You probably should set this variable through an appropriate
> shell startup file, but you could at least temporarily use:
>
> clusterEvalQ(cl, Sys.setenv(GOTO_NUM_THREADS=6))
>
> - Steve
>
>
> On Wed, Nov 23, 2011 at 9:44 AM, Claudia Beleites
> <claudia.beleites at ipht-jena.de> wrote:
>> Dear all,
>>
>> I'm just doing my first steps with parallelized calculations and got quite
>> confused.
>>
>> Here's what I want, what I have and what I did:
>>
>> - I want to parallelize calculations on a Centos server with 2 x 6 cores and
>> 8 GB RAM (it is actually part of a cluster, but I have access only to this
>> node, and the other nodes do not (yet) have R installed).
>>
>> - My Data is too large to work with in one piece.
>> But it comes in separate files of suitable size: I can work nicely with 2 to
>> 3 samples in memory at the same time.
>>
>> - So my idea was to start up a snow socket cluster with 2 or 3 workers.
>>
>> - In addition I want to use an optimized and blas. Linear algebra is only a
>> small part of the analysis so it does make sense to have the socket cluster
>> with as many workers as possible and have the linear algebra parts use up to
>> n / nworkers cores.
>>
>> So I built R 2.14.0 using gotoblas2 and set $GOTO_NUM_THREADS to 6. Matrix
>> multiplication in a fresh R session now is much faster and CPU usage shows
>> the expected 6 cores working:
>>
>>> system.time ({m<- matrix (1:9e6, 3e3); m%*%m; NULL})
>> User System verstrichen
>> 5.219 0.126 1.111
>>
>> However, the socket clusters seem not to use the GOTO_NUM_THREADS:
>>> library (snow)
>>> cl<- makeCluster(2,type="SOCK")
>>> tm<- snow.time(clusterEvalQ(cl, {m<- matrix (1:9e6, 3e3); m%*%m; NULL}))
>>> tm
>> elapsed send receive node 1 node 2
>> 9.553 0.001 0.010 9.510 9.543
>>> tm$data
>> [[1]]
>> send_start send_end recv_start recv_end exec
>> [1,] 0 0.001 9.511 9.512 9.51
>>
>> [[2]]
>> send_start send_end recv_start recv_end exec
>> [1,] 0.001 0.001 9.544 9.553 9.543
>>
>>> tm$elapsed
>> elapsed
>> 9.553
>>>
>>
>> CPU usage shows 2 cores working, and the times correspond to that.
>>
>> What configuration do I need to do in order to make the blas use more
>> threads for the worker processes? Anything else I should do differently?
>>
>>
>>> sessionInfo ()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
>> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] snow_0.3-8
>>
>> Additional questions:
>> - Is there some command like sessionInfo () that yields information about
>> the blas (particularly NUM_THREADS)?
>> - Is there some command that I can use to tell the blas how many threads to
>> use during an R session? Can I set environment variables from within R?
>> Searching didn't help as I got only info about R environments...) Would that
>> actually help here?
>>
>> Thanks a lot for your help.
>>
>> Claudia
>>
>> --
>> Claudia Beleites
>> Spectroscopy/Imaging
>> Institute of Photonic Technology
>> Albert-Einstein-Str. 9
>> 07745 Jena
>> Germany
>>
>> email: claudia.beleites at ipht-jena.de
>> phone: +49 3641 206-133
>> fax: +49 2641 206-399
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany
email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax: +49 2641 206-399
More information about the R-sig-hpc
mailing list