[R-sig-hpc] socket cluster/gotoblas2 configuration confusion

Claudia Beleites claudia.beleites at ipht-jena.de
Wed Nov 23 17:09:01 CET 2011


Steve,

> You don't say how you set GOTO_NUM_THREADS to 6.
sorry, I forgot to tell you all:

I used
export GOTO_NUM_THREADS=6
in the shell before starting R.

and I did check by
 >      clusterEvalQ(cl, system ("echo $GOTO_NUM_THREADS"))
which gave me 6 for both workers.

so does:
 >   clusterEvalQ(cl, Sys.getenv('GOTO_NUM_THREADS'))
[[1]]
[1] "6"

[[2]]
[1] "6"

I did not know the Sys.getenv/Sys.setenv functions, though.

Thanks,

Claudia

> You
> might want to verify that it did get set in each of the snow worker
> processes by using the command:
>
>      clusterEvalQ(cl, Sys.getenv('GOTO_NUM_THREADS'))
>
> If it returns any empty strings in the resulting list, then the
> environment variable is not set in the corresponding worker.
>
> You probably should set this variable through an appropriate
> shell startup file, but you could at least temporarily use:
>
>      clusterEvalQ(cl, Sys.setenv(GOTO_NUM_THREADS=6))


>
> - Steve
>
>
> On Wed, Nov 23, 2011 at 9:44 AM, Claudia Beleites
> <claudia.beleites at ipht-jena.de>  wrote:
>> Dear all,
>>
>> I'm just doing my first steps with parallelized calculations and got quite
>> confused.
>>
>> Here's what I want, what I have and what I did:
>>
>> - I want to parallelize calculations on a Centos server with 2 x 6 cores and
>> 8 GB RAM (it is actually part of a cluster, but I have access only to this
>> node, and the other nodes do not (yet) have R installed).
>>
>> - My Data is too large to work with in one piece.
>> But it comes in separate files of suitable size: I can work nicely with 2 to
>> 3 samples in memory at the same time.
>>
>> - So my idea was to start up a snow socket cluster with 2 or 3 workers.
>>
>> - In addition I want to use an optimized and blas. Linear algebra is only a
>> small part of the analysis so it does make sense to have the socket cluster
>> with as many workers as possible and have the linear algebra parts use up to
>> n / nworkers cores.
>>
>> So I built R 2.14.0 using gotoblas2 and set $GOTO_NUM_THREADS to 6. Matrix
>> multiplication in a fresh R session now is much faster and CPU usage shows
>> the expected 6 cores working:
>>
>>> system.time ({m<- matrix (1:9e6, 3e3); m%*%m; NULL})
>>        User      System verstrichen
>>       5.219       0.126       1.111
>>
>> However, the socket clusters seem not to use the GOTO_NUM_THREADS:
>>> library (snow)
>>> cl<- makeCluster(2,type="SOCK")
>>> tm<- snow.time(clusterEvalQ(cl, {m<- matrix (1:9e6, 3e3); m%*%m; NULL}))
>>> tm
>> elapsed    send receive  node 1  node 2
>>   9.553   0.001   0.010   9.510   9.543
>>> tm$data
>> [[1]]
>>      send_start send_end recv_start recv_end exec
>> [1,]          0    0.001      9.511    9.512 9.51
>>
>> [[2]]
>>      send_start send_end recv_start recv_end  exec
>> [1,]      0.001    0.001      9.544    9.553 9.543
>>
>>> tm$elapsed
>> elapsed
>>   9.553
>>>
>>
>> CPU usage shows 2 cores working, and the times correspond to that.
>>
>> What configuration do I need to do in order to make the blas use more
>> threads for the worker processes? Anything else I should do differently?
>>
>>
>>> sessionInfo ()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8
>>   [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] snow_0.3-8
>>
>> Additional questions:
>> - Is there some command like sessionInfo () that yields information about
>> the blas (particularly NUM_THREADS)?
>> - Is there some command that I can use to tell the blas how many threads to
>> use during an R session? Can I set environment variables from within R?
>> Searching didn't help as I got only info about R environments...) Would that
>> actually help here?
>>
>> Thanks a lot for your help.
>>
>> Claudia
>>
>> --
>> Claudia Beleites
>> Spectroscopy/Imaging
>> Institute of Photonic Technology
>> Albert-Einstein-Str. 9
>> 07745 Jena
>> Germany
>>
>> email: claudia.beleites at ipht-jena.de
>> phone: +49 3641 206-133
>> fax:   +49 2641 206-399
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>


-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399



More information about the R-sig-hpc mailing list